kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
@ 2024-07-04 19:01 Mickaël Salaün
  2024-07-04 19:01 ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Mickaël Salaün
                   ` (6 more replies)
  0 siblings, 7 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-04 19:01 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o
  Cc: Mickaël Salaün, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

Hi,

The ultimate goal of this patch series is to be able to ensure that
direct file execution (e.g. ./script.sh) and indirect file execution
(e.g. sh script.sh) lead to the same result, especially from a security
point of view.

Overview
--------

This patch series is a new approach of the initial O_MAYEXEC feature,
and a revamp of the previous patch series.  Taking into account the last
reviews [1], we now stick to the kernel semantic for file executability.
One major change is the clear split between access check and policy
management.

The first patch brings the AT_CHECK flag to execveat(2).  The goal is to
enable user space to check if a file could be executed (by the kernel).
Unlike stat(2) that only checks file permissions, execveat2(2) +
AT_CHECK take into account the full context, including mount points
(noexec), caller's limits, and all potential LSM extra checks (e.g.
argv, envp, credentials).

The second patch brings two new securebits used to set or get a security
policy for a set of processes.  For this to be meaningful, all
executable code needs to be trusted.  In practice, this means that
(malicious) users can be restricted to only run scripts provided (and
trusted) by the system.

[1] https://lore.kernel.org/r/CAHk-=wjPGNLyzeBMWdQu+kUdQLHQugznwY7CvWjmvNW47D5sog@mail.gmail.com

Script execution
----------------

One important thing to keep in mind is that the goal of this patch
series is to get the same security restrictions with these commands:
* ./script.py
* python script.py
* python < script.py
* python -m script.py

However, on secure systems, we should be able to forbid these commands
because there is no way to reliably identify the origin of the script:
* xargs -a script.py -d '\r' -- python -c
* cat script.py | python
* python

Background
----------

Compared to the previous patch series, there is no more dedicated
syscall nor sysctl configuration.  This new patch series only add new
flags: one for execveat(2) and four for prctl(2).

This kind of script interpreter restriction may already be used in
hardened systems, which may need to fork interpreters and install
different versions of the binaries.  This mechanism should enable to
avoid the use of duplicate binaries (and potential forked source code)
for secure interpreters (e.g. secure Python [2]) by making it possible
to dynamically enforce restrictions or not.

The ability to control script execution is also required to close a
major IMA measurement/appraisal interpreter integrity [3].

This new execveat + AT_CHECK should not be confused with the O_EXEC flag
(for open) which is intended for execute-only, which obviously doesn't
work for scripts.

I gave a talk about controlling script execution where I explain the
previous approaches [4].  The design of the WIP RFC I talked about
changed quite a bit since then.

[2] https://github.com/zooba/spython
[3] https://lore.kernel.org/lkml/20211014130125.6991-1-zohar@linux.ibm.com/
[4] https://lssna2023.sched.com/event/1K7bO

Execution policy
----------------

The "execution" usage means that the content of the file descriptor is
trusted according to the system policy to be executed by user space,
which means that it interprets the content or (try to) maps it as
executable memory.

It is important to note that this can only enable to extend access
control managed by the kernel.  Hence it enables current access control
mechanism to be extended and become a superset of what they can
currently control.  Indeed, the security policy could also be delegated
to an LSM, either a MAC system or an integrity system.

Complementary W^X protections can be brought by SELinux or IPE [5].

Being able to restrict execution also enables to protect the kernel by
restricting arbitrary syscalls that an attacker could perform with a
crafted binary or certain script languages.  It also improves multilevel
isolation by reducing the ability of an attacker to use side channels
with specific code.  These restrictions can natively be enforced for ELF
binaries (with the noexec mount option) but require this kernel
extension to properly handle scripts (e.g. Python, Perl).  To get a
consistent execution policy, additional memory restrictions should also
be enforced (e.g. thanks to SELinux).

[5] https://lore.kernel.org/lkml/1716583609-21790-1-git-send-email-wufan@linux.microsoft.com/

Prerequisite for security use
-----------------------------

Because scripts might not currently have the executable permission and
still run well as is, or because we might want specific users to be
allowed to run arbitrary scripts, we also need a configuration
mechanism.

According to the threat model, to get a secure execution environment on
top of these changes, it might be required to configure and enable
existing security mechanisms such as secure boot, restrictive mount
points (e.g. with rw AND noexec), correct file permissions (including
executable libraries), IMA/EVM, SELinux policy...

The first thing to patch is the libc to check loaded libraries (e.g. see
chromeOS changes).  The second thing to patch are the script
interpreters by checking direct scripts executability and by checking
their own libraries (e.g. Python's imported files or argument-passed
modules).  For instance, the PEP 578 [6] (Runtime Audit Hooks) enables
Python 3.8 to be extended with policy enforcement points related to code
interpretation, which can be used to align with the PowerShell audit
features.  Additional Python security improvements (e.g. a limited
interpreter without -c, stdin piping of code) are developed [2] [7].

[6] https://www.python.org/dev/peps/pep-0578/
[7] https://lore.kernel.org/lkml/0c70debd-e79e-d514-06c6-4cd1e021fa8b@python.org/

libc patch
----------

Dynamic linking needs still need to check the libraries the same way
interpreters need to check scripts.

chromeOS patches glibc with a fstatvfs check [8] [9]. This enables to
check against noexec mount points, which is OK but doesn't fit with
execve semantics.  Moreover, the kernel is not aware of such check, so
all access control checks are not performed (e.g. file permission, LSMs
security policies, integrity and authenticity checks), it is not handled
with audit, and more importantly this would not work on generic
distributions because of the strict requirement and chromeOS-specific
assumptions.

[8] https://issuetracker.google.com/issues/40054993
[9] https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/6abfc9e327241a5f684b8b941c899b7ca8b6dbc1/sys-libs/glibc/files/local/glibc-2.37/0007-Deny-LD_PRELOAD-of-files-in-NOEXEC-mount.patch

Examples
--------

The initial idea comes from CLIP OS 4 and the original implementation
has been used for more than a decade:
https://github.com/clipos-archive/clipos4_doc
Chrome OS has a similar approach:
https://www.chromium.org/chromium-os/developer-library/guides/security/noexec-shell-scripts/

User space patches can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
There is more than the O_MAYEXEC changes (which matches this search)
e.g., to prevent Python interactive execution. There are patches for
Bash, Wine, Java (Icedtea), Busybox's ash, Perl and Python. There are
also some related patches which do not directly rely on O_MAYEXEC but
which restrict the use of browser plugins and extensions, which may be
seen as scripts too:
https://github.com/clipos-archive/clipos4_portage-overlay/tree/master/www-client

Past talks and articles
-----------------------

An introduction to O_MAYEXEC was given at the Linux Security Summit
Europe 2018 - Linux Kernel Security Contributions by ANSSI:
https://www.youtube.com/watch?v=chNjCRtPKQY&t=17m15s
The "write xor execute" principle was explained at Kernel Recipes 2018 -
CLIP OS: a defense-in-depth OS:
https://www.youtube.com/watch?v=PjRE0uBtkHU&t=11m14s
See also a first LWN article about O_MAYEXEC and a new one about
trusted_for(2) and its background:
* https://lwn.net/Articles/820000/
* https://lwn.net/Articles/832959/

Previous versions:
v18: https://lore.kernel.org/r/20220104155024.48023-1-mic@digikod.net
v17: https://lore.kernel.org/r/20211115185304.198460-1-mic@digikod.net
v16: https://lore.kernel.org/r/20211110190626.257017-1-mic@digikod.net
v15: https://lore.kernel.org/r/20211012192410.2356090-1-mic@digikod.net
v14: https://lore.kernel.org/r/20211008104840.1733385-1-mic@digikod.net
v13: https://lore.kernel.org/r/20211007182321.872075-1-mic@digikod.net
v12: https://lore.kernel.org/r/20201203173118.379271-1-mic@digikod.net
v11: https://lore.kernel.org/r/20201019164932.1430614-1-mic@digikod.net
v10: https://lore.kernel.org/r/20200924153228.387737-1-mic@digikod.net
v9: https://lore.kernel.org/r/20200910164612.114215-1-mic@digikod.net
v8: https://lore.kernel.org/r/20200908075956.1069018-1-mic@digikod.net
v7: https://lore.kernel.org/r/20200723171227.446711-1-mic@digikod.net
v6: https://lore.kernel.org/r/20200714181638.45751-1-mic@digikod.net
v5: https://lore.kernel.org/r/20200505153156.925111-1-mic@digikod.net
v4: https://lore.kernel.org/r/20200430132320.699508-1-mic@digikod.net
v3: https://lore.kernel.org/r/20200428175129.634352-1-mic@digikod.net
v2: https://lore.kernel.org/r/20190906152455.22757-1-mic@digikod.net
v1: https://lore.kernel.org/r/20181212081712.32347-1-mic@digikod.net

Regards,

Mickaël Salaün (5):
  exec: Add a new AT_CHECK flag to execveat(2)
  security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT
    securebits
  selftests/exec: Add tests for AT_CHECK and related securebits
  selftests/landlock: Add tests for execveat + AT_CHECK
  samples/should-exec: Add set-should-exec

 fs/exec.c                                  |   5 +-
 include/linux/binfmts.h                    |   7 +-
 include/uapi/linux/fcntl.h                 |  30 ++
 include/uapi/linux/securebits.h            |  56 ++-
 kernel/audit.h                             |   1 +
 kernel/auditsc.c                           |   1 +
 samples/Kconfig                            |   7 +
 samples/Makefile                           |   1 +
 samples/should-exec/.gitignore             |   1 +
 samples/should-exec/Makefile               |  13 +
 samples/should-exec/set-should-exec.c      |  88 ++++
 security/commoncap.c                       |  63 ++-
 tools/testing/selftests/exec/.gitignore    |   2 +
 tools/testing/selftests/exec/Makefile      |   8 +
 tools/testing/selftests/exec/config        |   2 +
 tools/testing/selftests/exec/false.c       |   5 +
 tools/testing/selftests/exec/should-exec.c | 449 +++++++++++++++++++++
 tools/testing/selftests/landlock/fs_test.c |  26 ++
 18 files changed, 753 insertions(+), 12 deletions(-)
 create mode 100644 samples/should-exec/.gitignore
 create mode 100644 samples/should-exec/Makefile
 create mode 100644 samples/should-exec/set-should-exec.c
 create mode 100644 tools/testing/selftests/exec/config
 create mode 100644 tools/testing/selftests/exec/false.c
 create mode 100644 tools/testing/selftests/exec/should-exec.c


base-commit: f2661062f16b2de5d7b6a5c42a9a5c96326b8454
-- 
2.45.2


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-04 19:01 [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mickaël Salaün
@ 2024-07-04 19:01 ` Mickaël Salaün
  2024-07-05  0:04   ` Kees Cook
                     ` (3 more replies)
  2024-07-04 19:01 ` [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits Mickaël Salaün
                   ` (5 subsequent siblings)
  6 siblings, 4 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-04 19:01 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o
  Cc: Mickaël Salaün, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

Add a new AT_CHECK flag to execveat(2) to check if a file would be
allowed for execution.  The main use case is for script interpreters and
dynamic linkers to check execution permission according to the kernel's
security policy. Another use case is to add context to access logs e.g.,
which script (instead of interpreter) accessed a file.  As any
executable code, scripts could also use this check [1].

This is different than faccessat(2) which only checks file access
rights, but not the full context e.g. mount point's noexec, stack limit,
and all potential LSM extra checks (e.g. argv, envp, credentials).
Since the use of AT_CHECK follows the exact kernel semantic as for a
real execution, user space gets the same error codes.

With the information that a script interpreter is about to interpret a
script, an LSM security policy can adjust caller's access rights or log
execution request as for native script execution (e.g. role transition).
This is possible thanks to the call to security_bprm_creds_for_exec().

Because LSMs may only change bprm's credentials, use of AT_CHECK with
current kernel code should not be a security issue (e.g. unexpected role
transition).  LSMs willing to update the caller's credential could now
do so when bprm->is_check is set.  Of course, such policy change should
be in line with the new user space code.

Because AT_CHECK is dedicated to user space interpreters, it doesn't
make sense for the kernel to parse the checked files, look for
interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
if the format is unknown.  Because of that, security_bprm_check() is
never called when AT_CHECK is used.

It should be noted that script interpreters cannot directly use
execveat(2) (without this new AT_CHECK flag) because this could lead to
unexpected behaviors e.g., `python script.sh` could lead to Bash being
executed to interpret the script.  Unlike the kernel, script
interpreters may just interpret the shebang as a simple comment, which
should not change for backward compatibility reasons.

Because scripts or libraries files might not currently have the
executable permission set, or because we might want specific users to be
allowed to run arbitrary scripts, the following patch provides a dynamic
configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
SECBIT_SHOULD_EXEC_RESTRICT securebits.

This is a redesign of the CLIP OS 4's O_MAYEXEC:
https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
This patch has been used for more than a decade with customized script
interpreters.  Some examples can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://docs.python.org/3/library/io.html#io.open_code [1]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
---

New design since v18:
https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
---
 fs/exec.c                  |  5 +++--
 include/linux/binfmts.h    |  7 ++++++-
 include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
 kernel/audit.h             |  1 +
 kernel/auditsc.c           |  1 +
 5 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 40073142288f..ea2a1867afdc 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
 		.lookup_flags = LOOKUP_FOLLOW,
 	};
 
-	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
+	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
 		return ERR_PTR(-EINVAL);
 	if (flags & AT_SYMLINK_NOFOLLOW)
 		open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
@@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
 		bprm->filename = bprm->fdpath;
 	}
 	bprm->interp = bprm->filename;
+	bprm->is_check = !!(flags & AT_CHECK);
 
 	retval = bprm_mm_init(bprm);
 	if (!retval)
@@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
 
 	/* Set the unchanging part of bprm->cred */
 	retval = security_bprm_creds_for_exec(bprm);
-	if (retval)
+	if (retval || bprm->is_check)
 		goto out;
 
 	retval = exec_binprm(bprm);
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 70f97f685bff..8ff9c9e33aed 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -42,7 +42,12 @@ struct linux_binprm {
 		 * Set when errors can no longer be returned to the
 		 * original userspace.
 		 */
-		point_of_no_return:1;
+		point_of_no_return:1,
+		/*
+		 * Set by user space to check executability according to the
+		 * caller's environment.
+		 */
+		is_check:1;
 	struct file *executable; /* Executable to pass to the interpreter */
 	struct file *interpreter;
 	struct file *file;
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index c0bcc185fa48..bcd05c59b7df 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -118,6 +118,36 @@
 #define AT_HANDLE_FID		AT_REMOVEDIR	/* file handle is needed to
 					compare object identity and may not
 					be usable to open_by_handle_at(2) */
+
+/*
+ * AT_CHECK only performs a check on a regular file and returns 0 if execution
+ * of this file would be allowed, ignoring the file format and then the related
+ * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
+ * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
+ * thread.  See securebits.h documentation.
+ *
+ * Programs should use this check to apply kernel-level checks against files
+ * that are not directly executed by the kernel but directly passed to a user
+ * space interpreter instead.  All files that contain executable code, from the
+ * point of view of the interpreter, should be checked.  The main purpose of
+ * this flag is to improve the security and consistency of an execution
+ * environment to ensure that direct file execution (e.g. ./script.sh) and
+ * indirect file execution (e.g. sh script.sh) lead to the same result.  For
+ * instance, this can be used to check if a file is trustworthy according to
+ * the caller's environment.
+ *
+ * In a secure environment, libraries and any executable dependencies should
+ * also be checked.  For instance dynamic linking should make sure that all
+ * libraries are allowed for execution to avoid trivial bypass (e.g. using
+ * LD_PRELOAD).  For such secure execution environment to make sense, only
+ * trusted code should be executable, which also requires integrity guarantees.
+ *
+ * To avoid race conditions leading to time-of-check to time-of-use issues,
+ * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
+ * descriptor instead of a path.
+ */
+#define AT_CHECK		0x10000
+
 #if defined(__KERNEL__)
 #define AT_GETATTR_NOSEC	0x80000000
 #endif
diff --git a/kernel/audit.h b/kernel/audit.h
index a60d2840559e..8ebdabd2ab81 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -197,6 +197,7 @@ struct audit_context {
 		struct open_how openat2;
 		struct {
 			int			argc;
+			bool			is_check;
 		} execve;
 		struct {
 			char			*name;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 6f0d6fb6523f..b6316e284342 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
 
 	context->type = AUDIT_EXECVE;
 	context->execve.argc = bprm->argc;
+	context->execve.is_check = bprm->is_check;
 }
 
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-04 19:01 [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mickaël Salaün
  2024-07-04 19:01 ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Mickaël Salaün
@ 2024-07-04 19:01 ` Mickaël Salaün
  2024-07-05  0:18   ` Kees Cook
                     ` (2 more replies)
  2024-07-04 19:01 ` [RFC PATCH v19 3/5] selftests/exec: Add tests for AT_CHECK and related securebits Mickaël Salaün
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-04 19:01 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o
  Cc: Mickaël Salaün, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and
their *_LOCKED counterparts are designed to be set by processes setting
up an execution environment, such as a user session, a container, or a
security sandbox.  Like seccomp filters or Landlock domains, the
securebits are inherited across proceses.

When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should
check executable resources with execveat(2) + AT_CHECK (see previous
patch).

When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK).

For a secure environment, we might also want
SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED
to be set.  For a test environment (e.g. testing on a fleet to identify
potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to
still be able to identify potential issues (e.g. with interpreters logs
or LSMs audit entries).

It should be noted that unlike other security bits, the
SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are
dedicated to user space willing to restrict itself.  Because of that,
they only make sense in the context of a trusted environment (e.g.
sandbox, container, user session, full system) where the process
changing its behavior (according to these bits) and all its parent
processes are trusted.  Otherwise, any parent process could just execute
its own malicious code (interpreting a script or not), or even enforce a
seccomp filter to mask these bits.

Such a secure environment can be achieved with an appropriate access
control policy (e.g. mount's noexec option, file access rights, LSM
configuration) and an enlighten ld.so checking that libraries are
allowed for execution e.g., to protect against illegitimate use of
LD_PRELOAD.

Scripts may need some changes to deal with untrusted data (e.g. stdin,
environment variables), but that is outside the scope of the kernel.

The only restriction enforced by the kernel is the right to ptrace
another process.  Processes are denied to ptrace less restricted ones,
unless the tracer has CAP_SYS_PTRACE.  This is mainly a safeguard to
avoid trivial privilege escalations e.g., by a debugging process being
abused with a confused deputy attack.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net
---

New design since v18:
https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
---
 include/uapi/linux/securebits.h | 56 ++++++++++++++++++++++++++++-
 security/commoncap.c            | 63 ++++++++++++++++++++++++++++-----
 2 files changed, 110 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h
index d6d98877ff1a..3fdb0382718b 100644
--- a/include/uapi/linux/securebits.h
+++ b/include/uapi/linux/securebits.h
@@ -52,10 +52,64 @@
 #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \
 			(issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED))
 
+/*
+ * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable
+ * files with execveat(2) + AT_CHECK.  However, such check should only be
+ * performed if all to-be-executed code only comes from regular files.  For
+ * instance, if a script interpreter is called with both a script snipped as
+ * argument and a regular file, the interpreter should not check any file.
+ * Doing otherwise would mislead the kernel to think that only the script file
+ * is being executed, which could for instance lead to unexpected permission
+ * change and break current use cases.
+ *
+ * This secure bit may be set by user session managers, service managers,
+ * container runtimes, sandboxer tools...  Except for test environments, the
+ * related SECBIT_SHOULD_EXEC_CHECK_LOCKED bit should also be set.
+ *
+ * Ptracing another process is deny if the tracer has SECBIT_SHOULD_EXEC_CHECK
+ * but not the tracee.  SECBIT_SHOULD_EXEC_CHECK_LOCKED also checked.
+ */
+#define SECURE_SHOULD_EXEC_CHECK		8
+#define SECURE_SHOULD_EXEC_CHECK_LOCKED		9  /* make bit-8 immutable */
+
+#define SECBIT_SHOULD_EXEC_CHECK (issecure_mask(SECURE_SHOULD_EXEC_CHECK))
+#define SECBIT_SHOULD_EXEC_CHECK_LOCKED \
+			(issecure_mask(SECURE_SHOULD_EXEC_CHECK_LOCKED))
+
+/*
+ * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
+ * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK).  For
+ * instance, script interpreters called with a script snippet as argument
+ * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set.
+ * However, if a script interpreter is called with both
+ * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should
+ * interpret the provided script files if no unchecked code is also provided
+ * (e.g. directly as argument).
+ *
+ * This secure bit may be set by user session managers, service managers,
+ * container runtimes, sandboxer tools...  Except for test environments, the
+ * related SECBIT_SHOULD_EXEC_RESTRICT_LOCKED bit should also be set.
+ *
+ * Ptracing another process is deny if the tracer has
+ * SECBIT_SHOULD_EXEC_RESTRICT but not the tracee.
+ * SECBIT_SHOULD_EXEC_RESTRICT_LOCKED is also checked.
+ */
+#define SECURE_SHOULD_EXEC_RESTRICT		10
+#define SECURE_SHOULD_EXEC_RESTRICT_LOCKED	11  /* make bit-8 immutable */
+
+#define SECBIT_SHOULD_EXEC_RESTRICT (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
+#define SECBIT_SHOULD_EXEC_RESTRICT_LOCKED \
+			(issecure_mask(SECURE_SHOULD_EXEC_RESTRICT_LOCKED))
+
 #define SECURE_ALL_BITS		(issecure_mask(SECURE_NOROOT) | \
 				 issecure_mask(SECURE_NO_SETUID_FIXUP) | \
 				 issecure_mask(SECURE_KEEP_CAPS) | \
-				 issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
+				 issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \
+				 issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \
+				 issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
 #define SECURE_ALL_LOCKS	(SECURE_ALL_BITS << 1)
 
+#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \
+				 issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
+
 #endif /* _UAPI_LINUX_SECUREBITS_H */
diff --git a/security/commoncap.c b/security/commoncap.c
index 162d96b3a676..34b4493e2a69 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -117,6 +117,33 @@ int cap_settime(const struct timespec64 *ts, const struct timezone *tz)
 	return 0;
 }
 
+static bool ptrace_secbits_allowed(const struct cred *tracer,
+				   const struct cred *tracee)
+{
+	const unsigned long tracer_secbits = SECURE_ALL_UNPRIVILEGED &
+					     tracer->securebits;
+	const unsigned long tracee_secbits = SECURE_ALL_UNPRIVILEGED &
+					     tracee->securebits;
+	/* Ignores locking of unset secure bits (cf. SECURE_ALL_LOCKS). */
+	const unsigned long tracer_locked = (tracer_secbits << 1) &
+					    tracer->securebits;
+	const unsigned long tracee_locked = (tracee_secbits << 1) &
+					    tracee->securebits;
+
+	/* The tracee must not have less constraints than the tracer. */
+	if ((tracer_secbits | tracee_secbits) != tracee_secbits)
+		return false;
+
+	/*
+	 * Makes sure that the tracer's locks for restrictions are the same for
+	 * the tracee.
+	 */
+	if ((tracer_locked | tracee_locked) != tracee_locked)
+		return false;
+
+	return true;
+}
+
 /**
  * cap_ptrace_access_check - Determine whether the current process may access
  *			   another
@@ -146,7 +173,8 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
 	else
 		caller_caps = &cred->cap_permitted;
 	if (cred->user_ns == child_cred->user_ns &&
-	    cap_issubset(child_cred->cap_permitted, *caller_caps))
+	    cap_issubset(child_cred->cap_permitted, *caller_caps) &&
+	    ptrace_secbits_allowed(cred, child_cred))
 		goto out;
 	if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE))
 		goto out;
@@ -178,7 +206,8 @@ int cap_ptrace_traceme(struct task_struct *parent)
 	cred = __task_cred(parent);
 	child_cred = current_cred();
 	if (cred->user_ns == child_cred->user_ns &&
-	    cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
+	    cap_issubset(child_cred->cap_permitted, cred->cap_permitted) &&
+	    ptrace_secbits_allowed(cred, child_cred))
 		goto out;
 	if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE))
 		goto out;
@@ -1302,21 +1331,39 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 		     & (old->securebits ^ arg2))			/*[1]*/
 		    || ((old->securebits & SECURE_ALL_LOCKS & ~arg2))	/*[2]*/
 		    || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS))	/*[3]*/
-		    || (cap_capable(current_cred(),
-				    current_cred()->user_ns,
-				    CAP_SETPCAP,
-				    CAP_OPT_NONE) != 0)			/*[4]*/
 			/*
 			 * [1] no changing of bits that are locked
 			 * [2] no unlocking of locks
 			 * [3] no setting of unsupported bits
-			 * [4] doing anything requires privilege (go read about
-			 *     the "sendmail capabilities bug")
 			 */
 		    )
 			/* cannot change a locked bit */
 			return -EPERM;
 
+		/*
+		 * Doing anything requires privilege (go read about the
+		 * "sendmail capabilities bug"), except for unprivileged bits.
+		 * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not
+		 * restrictions enforced by the kernel but by user space on
+		 * itself.  The kernel is only in charge of protecting against
+		 * privilege escalation with ptrace protections.
+		 */
+		if (cap_capable(current_cred(), current_cred()->user_ns,
+				CAP_SETPCAP, CAP_OPT_NONE) != 0) {
+			const unsigned long unpriv_and_locks =
+				SECURE_ALL_UNPRIVILEGED |
+				SECURE_ALL_UNPRIVILEGED << 1;
+			const unsigned long changed = old->securebits ^ arg2;
+
+			/* For legacy reason, denies non-change. */
+			if (!changed)
+				return -EPERM;
+
+			/* Denies privileged changes. */
+			if (changed & ~unpriv_and_locks)
+				return -EPERM;
+		}
+
 		new = prepare_creds();
 		if (!new)
 			return -ENOMEM;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [RFC PATCH v19 3/5] selftests/exec: Add tests for AT_CHECK and related securebits
  2024-07-04 19:01 [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mickaël Salaün
  2024-07-04 19:01 ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Mickaël Salaün
  2024-07-04 19:01 ` [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits Mickaël Salaün
@ 2024-07-04 19:01 ` Mickaël Salaün
  2024-07-04 19:01 ` [RFC PATCH v19 4/5] selftests/landlock: Add tests for execveat + AT_CHECK Mickaël Salaün
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-04 19:01 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o
  Cc: Mickaël Salaün, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

Test that checks performed by execveat(..., AT_CHECK) are consistent
with noexec mount points and file execute permissions.

Test that SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT are
inherited by child processes and that they can be pinned with the
appropriate SECBIT_SHOULD_EXEC_CHECK_LOCKED and
SECBIT_SHOULD_EXEC_RESTRICT_LOCKED bits.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240704190137.696169-4-mic@digikod.net
---

Changes since v18:
* Rewrite tests with the new design: execveat/AT_CHECK and securebits.
* Simplify the capability dropping and improve it with the NOROOT
  securebits.
* Replace most ASSERT with EXPECT.
* Fix NULL execve's argv to avoid kernel warning.
* Move tests to exec/
* Build a "false" static binary to test full execution path.

Changes since v14:
* Add Reviewed-by Kees Cook.

Changes since v13:
* Move -I to CFLAGS (suggested by Kees Cook).
* Update sysctl name.

Changes since v12:
* Fix Makefile's license.

Changes since v10:
* Update selftest Makefile.

Changes since v9:
* Rename the syscall and the sysctl.
* Update tests for enum trusted_for_usage

Changes since v8:
* Update with the dedicated syscall introspect_access(2) and the renamed
  fs.introspection_policy sysctl.
* Remove check symlink which can't be use as is anymore.
* Use socketpair(2) to test UNIX socket.

Changes since v7:
* Update tests with faccessat2/AT_INTERPRETED, including new ones to
  check that setting R_OK or W_OK returns EINVAL.
* Add tests for memfd, pipefs and nsfs.
* Rename and move back tests to a standalone directory.

Changes since v6:
* Add full combination tests for all file types, including block
  devices, character devices, fifos, sockets and symlinks.
* Properly save and restore initial sysctl value for all tests.

Changes since v5:
* Refactor with FIXTURE_VARIANT, which make the tests much more easy to
  read and maintain.
* Save and restore initial sysctl value (suggested by Kees Cook).
* Test with a sysctl value of 0.
* Check errno in sysctl_access_write test.
* Update tests for the CAP_SYS_ADMIN switch.
* Update tests to check -EISDIR (replacing -EACCES).
* Replace FIXTURE_DATA() with FIXTURE() (spotted by Kees Cook).
* Use global const strings.

Changes since v3:
* Replace RESOLVE_MAYEXEC with O_MAYEXEC.
* Add tests to check that O_MAYEXEC is ignored by open(2) and openat(2).

Changes since v2:
* Move tests from exec/ to openat2/ .
* Replace O_MAYEXEC with RESOLVE_MAYEXEC from openat2(2).
* Cleanup tests.

Changes since v1:
* Move tests from yama/ to exec/ .
* Fix _GNU_SOURCE in kselftest_harness.h .
* Add a new test sysctl_access_write to check if CAP_MAC_ADMIN is taken
  into account.
* Test directory execution which is always forbidden since commit
  73601ea5b7b1 ("fs/open.c: allow opening only regular files during
  execve()"), and also check that even the root user can not bypass file
  execution checks.
* Make sure delete_workspace() always as enough right to succeed.
* Cosmetic cleanup.
---
 tools/testing/selftests/exec/.gitignore    |   2 +
 tools/testing/selftests/exec/Makefile      |   8 +
 tools/testing/selftests/exec/config        |   2 +
 tools/testing/selftests/exec/false.c       |   5 +
 tools/testing/selftests/exec/should-exec.c | 449 +++++++++++++++++++++
 5 files changed, 466 insertions(+)
 create mode 100644 tools/testing/selftests/exec/config
 create mode 100644 tools/testing/selftests/exec/false.c
 create mode 100644 tools/testing/selftests/exec/should-exec.c

diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore
index 90c238ba6a4b..20e965dcc98e 100644
--- a/tools/testing/selftests/exec/.gitignore
+++ b/tools/testing/selftests/exec/.gitignore
@@ -9,8 +9,10 @@ execveat.ephemeral
 execveat.denatured
 non-regular
 null-argv
+/false
 /load_address_*
 /recursion-depth
+/should-exec
 xxxxxxxx*
 pipe
 S_I*.test
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
index fb4472ddffd8..fc0cb8925b02 100644
--- a/tools/testing/selftests/exec/Makefile
+++ b/tools/testing/selftests/exec/Makefile
@@ -2,15 +2,20 @@
 CFLAGS = -Wall
 CFLAGS += -Wno-nonnull
 CFLAGS += -D_GNU_SOURCE
+CFLAGS += $(KHDR_INCLUDES)
+
+LDLIBS += -lcap
 
 TEST_PROGS := binfmt_script.py
 TEST_GEN_PROGS := execveat load_address_4096 load_address_2097152 load_address_16777216 non-regular
+TEST_GEN_PROGS_EXTENDED := false
 TEST_GEN_FILES := execveat.symlink execveat.denatured script subdir
 # Makefile is a run-time dependency, since it's accessed by the execveat test
 TEST_FILES := Makefile
 
 TEST_GEN_PROGS += recursion-depth
 TEST_GEN_PROGS += null-argv
+TEST_GEN_PROGS += should-exec
 
 EXTRA_CLEAN := $(OUTPUT)/subdir.moved $(OUTPUT)/execveat.moved $(OUTPUT)/xxxxx*	\
 	       $(OUTPUT)/S_I*.test
@@ -34,3 +39,6 @@ $(OUTPUT)/load_address_2097152: load_address.c
 	$(CC) $(CFLAGS) $(LDFLAGS) -Wl,-z,max-page-size=0x200000 -pie -static $< -o $@
 $(OUTPUT)/load_address_16777216: load_address.c
 	$(CC) $(CFLAGS) $(LDFLAGS) -Wl,-z,max-page-size=0x1000000 -pie -static $< -o $@
+
+$(OUTPUT)/false: false.c
+	$(CC) $(CFLAGS) $(LDFLAGS) -static $< -o $@
diff --git a/tools/testing/selftests/exec/config b/tools/testing/selftests/exec/config
new file mode 100644
index 000000000000..c308079867b3
--- /dev/null
+++ b/tools/testing/selftests/exec/config
@@ -0,0 +1,2 @@
+CONFIG_BLK_DEV=y
+CONFIG_BLK_DEV_LOOP=y
diff --git a/tools/testing/selftests/exec/false.c b/tools/testing/selftests/exec/false.c
new file mode 100644
index 000000000000..104383ec3a79
--- /dev/null
+++ b/tools/testing/selftests/exec/false.c
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+int main(void)
+{
+	return 1;
+}
diff --git a/tools/testing/selftests/exec/should-exec.c b/tools/testing/selftests/exec/should-exec.c
new file mode 100644
index 000000000000..166276a39b4e
--- /dev/null
+++ b/tools/testing/selftests/exec/should-exec.c
@@ -0,0 +1,449 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test execveat(2) with AT_CHECK, and prctl(2) with SECBIT_SHOULD_EXEC_CHECK,
+ * SECBIT_SHOULD_EXEC_RESTRIC, and their locked counterparts.
+ *
+ * Copyright © 2018-2020 ANSSI
+ * Copyright © 2024 Microsoft Corporation
+ *
+ * Author: Mickaël Salaün <mic@digikod.net>
+ */
+
+#include <asm-generic/unistd.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/prctl.h>
+#include <linux/securebits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/capability.h>
+#include <sys/mount.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/sysmacros.h>
+#include <unistd.h>
+
+/* Defines AT_CHECK without type conflicts. */
+#define _ASM_GENERIC_FCNTL_H
+#include <linux/fcntl.h>
+
+#include "../kselftest_harness.h"
+
+static void drop_privileges(struct __test_metadata *const _metadata)
+{
+	const unsigned int noroot = SECBIT_NOROOT | SECBIT_NOROOT_LOCKED;
+	cap_t cap_p;
+
+	if ((cap_get_secbits() & noroot) != noroot)
+		EXPECT_EQ(0, cap_set_secbits(noroot));
+
+	cap_p = cap_get_proc();
+	EXPECT_NE(NULL, cap_p);
+	EXPECT_NE(-1, cap_clear(cap_p));
+
+	/*
+	 * Drops everything, especially CAP_SETPCAP, CAP_DAC_OVERRIDE, and
+	 * CAP_DAC_READ_SEARCH.
+	 */
+	EXPECT_NE(-1, cap_set_proc(cap_p));
+	EXPECT_NE(-1, cap_free(cap_p));
+}
+
+static int test_secbits_set(const unsigned int secbits)
+{
+	int err;
+
+	err = prctl(PR_SET_SECUREBITS, secbits);
+	if (err)
+		return errno;
+	return 0;
+}
+
+FIXTURE(access)
+{
+	int memfd, pipefd;
+	int pipe_fds[2], socket_fds[2];
+};
+
+FIXTURE_VARIANT(access)
+{
+	const bool mount_exec;
+	const bool file_exec;
+};
+
+FIXTURE_VARIANT_ADD(access, mount_exec_file_exec){
+	.mount_exec = true,
+	.file_exec = true,
+};
+
+FIXTURE_VARIANT_ADD(access, mount_exec_file_noexec){
+	.mount_exec = true,
+	.file_exec = false,
+};
+
+FIXTURE_VARIANT_ADD(access, mount_noexec_file_exec){
+	.mount_exec = false,
+	.file_exec = true,
+};
+
+FIXTURE_VARIANT_ADD(access, mount_noexec_file_noexec){
+	.mount_exec = false,
+	.file_exec = false,
+};
+
+static const char binary_path[] = "./false";
+static const char workdir_path[] = "./test-mount";
+static const char reg_file_path[] = "./test-mount/regular_file";
+static const char dir_path[] = "./test-mount/directory";
+static const char block_dev_path[] = "./test-mount/block_device";
+static const char char_dev_path[] = "./test-mount/character_device";
+static const char fifo_path[] = "./test-mount/fifo";
+
+FIXTURE_SETUP(access)
+{
+	int procfd_path_size;
+	static const char path_template[] = "/proc/self/fd/%d";
+	char procfd_path[sizeof(path_template) + 10];
+
+	/* Makes sure we are not already restricted nor locked. */
+	EXPECT_EQ(0, test_secbits_set(0));
+
+	/*
+	 * Cleans previous workspace if any error previously happened (don't
+	 * check errors).
+	 */
+	umount(workdir_path);
+	rmdir(workdir_path);
+
+	/* Creates a clean mount point. */
+	ASSERT_EQ(0, mkdir(workdir_path, 00700));
+	ASSERT_EQ(0, mount("test", workdir_path, "tmpfs",
+			   MS_MGC_VAL | (variant->mount_exec ? 0 : MS_NOEXEC),
+			   "mode=0700,size=9m"));
+
+	/* Creates a regular file. */
+	ASSERT_EQ(0, mknod(reg_file_path,
+			   S_IFREG | (variant->file_exec ? 0700 : 0600), 0));
+	/* Creates a directory. */
+	ASSERT_EQ(0, mkdir(dir_path, variant->file_exec ? 0700 : 0600));
+	/* Creates a character device: /dev/null. */
+	ASSERT_EQ(0, mknod(char_dev_path, S_IFCHR | 0400, makedev(1, 3)));
+	/* Creates a block device: /dev/loop0 */
+	ASSERT_EQ(0, mknod(block_dev_path, S_IFBLK | 0400, makedev(7, 0)));
+	/* Creates a fifo. */
+	ASSERT_EQ(0, mknod(fifo_path, S_IFIFO | 0600, 0));
+
+	/* Creates a regular file without user mount point. */
+	self->memfd = memfd_create("test-exec-probe", MFD_CLOEXEC);
+	ASSERT_LE(0, self->memfd);
+	/* Sets mode, which must be ignored by the exec check. */
+	ASSERT_EQ(0, fchmod(self->memfd, variant->file_exec ? 0700 : 0600));
+
+	/* Creates a pipefs file descriptor. */
+	ASSERT_EQ(0, pipe(self->pipe_fds));
+	procfd_path_size = snprintf(procfd_path, sizeof(procfd_path),
+				    path_template, self->pipe_fds[0]);
+	ASSERT_LT(procfd_path_size, sizeof(procfd_path));
+	self->pipefd = open(procfd_path, O_RDWR | O_CLOEXEC);
+	ASSERT_LE(0, self->pipefd);
+	ASSERT_EQ(0, fchmod(self->pipefd, variant->file_exec ? 0700 : 0600));
+
+	/* Creates a socket file descriptor. */
+	ASSERT_EQ(0, socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0,
+				self->socket_fds));
+}
+
+FIXTURE_TEARDOWN_PARENT(access)
+{
+	/* There is no need to unlink the test files. */
+	EXPECT_EQ(0, umount(workdir_path));
+	EXPECT_EQ(0, rmdir(workdir_path));
+}
+
+static void fill_exec_fd(struct __test_metadata *_metadata, const int fd_out)
+{
+	char buf[1024];
+	size_t len;
+	int fd_in;
+
+	fd_in = open(binary_path, O_CLOEXEC | O_RDONLY);
+	ASSERT_LE(0, fd_in);
+	/* Cannot use copy_file_range(2) because of EXDEV. */
+	len = read(fd_in, buf, sizeof(buf));
+	EXPECT_LE(0, len);
+	while (len > 0) {
+		EXPECT_EQ(len, write(fd_out, buf, len))
+		{
+			TH_LOG("Failed to write: %s (%d)", strerror(errno),
+			       errno);
+		}
+		len = read(fd_in, buf, sizeof(buf));
+		EXPECT_LE(0, len);
+	}
+	EXPECT_EQ(0, close(fd_in));
+}
+
+static void fill_exec_path(struct __test_metadata *_metadata,
+			   const char *const path)
+{
+	int fd_out;
+
+	fd_out = open(path, O_CLOEXEC | O_WRONLY);
+	ASSERT_LE(0, fd_out)
+	{
+		TH_LOG("Failed to open %s: %s", path, strerror(errno));
+	}
+	fill_exec_fd(_metadata, fd_out);
+	EXPECT_EQ(0, close(fd_out));
+}
+
+static void test_exec_fd(struct __test_metadata *_metadata, const int fd,
+			 const int err_code)
+{
+	char *const argv[] = { "", NULL };
+	int access_ret, access_errno;
+
+	/*
+	 * If we really execute fd, filled with the "false" binary, the current
+	 * thread will exits with an error, which will be interpreted by the
+	 * test framework as an error.  With AT_CHECK, we only check a
+	 * potential successful execution.
+	 */
+	access_ret = execveat(fd, "", argv, NULL, AT_EMPTY_PATH | AT_CHECK);
+	access_errno = errno;
+	if (err_code) {
+		EXPECT_EQ(-1, access_ret);
+		EXPECT_EQ(err_code, access_errno)
+		{
+			TH_LOG("Wrong error for execveat(2): %s (%d)",
+			       strerror(access_errno), errno);
+		}
+	} else {
+		EXPECT_EQ(0, access_ret)
+		{
+			TH_LOG("Access denied: %s", strerror(access_errno));
+		}
+	}
+}
+
+static void test_exec_path(struct __test_metadata *_metadata,
+			   const char *const path, const int err_code)
+{
+	int flags = O_CLOEXEC;
+	int fd;
+
+	/* Do not block on pipes. */
+	if (path == fifo_path)
+		flags |= O_NONBLOCK;
+
+	fd = open(path, flags | O_RDONLY);
+	ASSERT_LE(0, fd)
+	{
+		TH_LOG("Failed to open %s: %s", path, strerror(errno));
+	}
+	test_exec_fd(_metadata, fd, err_code);
+	EXPECT_EQ(0, close(fd));
+}
+
+/* Tests that we don't get ENOEXEC. */
+TEST_F(access, regular_file_empty)
+{
+	const int exec = variant->mount_exec && variant->file_exec;
+
+	test_exec_path(_metadata, reg_file_path, exec ? 0 : EACCES);
+
+	drop_privileges(_metadata);
+	test_exec_path(_metadata, reg_file_path, exec ? 0 : EACCES);
+}
+
+TEST_F(access, regular_file_elf)
+{
+	const int exec = variant->mount_exec && variant->file_exec;
+
+	fill_exec_path(_metadata, reg_file_path);
+
+	test_exec_path(_metadata, reg_file_path, exec ? 0 : EACCES);
+
+	drop_privileges(_metadata);
+	test_exec_path(_metadata, reg_file_path, exec ? 0 : EACCES);
+}
+
+/* Tests that we don't get ENOEXEC. */
+TEST_F(access, memfd_empty)
+{
+	const int exec = variant->file_exec;
+
+	test_exec_fd(_metadata, self->memfd, exec ? 0 : EACCES);
+
+	drop_privileges(_metadata);
+	test_exec_fd(_metadata, self->memfd, exec ? 0 : EACCES);
+}
+
+TEST_F(access, memfd_elf)
+{
+	const int exec = variant->file_exec;
+
+	fill_exec_fd(_metadata, self->memfd);
+
+	test_exec_fd(_metadata, self->memfd, exec ? 0 : EACCES);
+
+	drop_privileges(_metadata);
+	test_exec_fd(_metadata, self->memfd, exec ? 0 : EACCES);
+}
+
+TEST_F(access, non_regular_files)
+{
+	test_exec_path(_metadata, dir_path, EACCES);
+	test_exec_path(_metadata, block_dev_path, EACCES);
+	test_exec_path(_metadata, char_dev_path, EACCES);
+	test_exec_path(_metadata, fifo_path, EACCES);
+	test_exec_fd(_metadata, self->socket_fds[0], EACCES);
+	test_exec_fd(_metadata, self->pipefd, EACCES);
+}
+
+
+/* clang-format off */
+FIXTURE(secbits) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(secbits)
+{
+	const bool is_privileged;
+	const int error;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(secbits, priv) {
+	/* clang-format on */
+	.is_privileged = true,
+	.error = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(secbits, unpriv) {
+	/* clang-format on */
+	.is_privileged = false,
+	.error = EPERM,
+};
+
+FIXTURE_SETUP(secbits)
+{
+	/* Makes sure no should-exec bits are set. */
+	EXPECT_EQ(0, test_secbits_set(0));
+	EXPECT_EQ(0, prctl(PR_GET_SECUREBITS));
+
+	if (!variant->is_privileged)
+		drop_privileges(_metadata);
+}
+
+FIXTURE_TEARDOWN(secbits)
+{
+}
+
+TEST_F(secbits, legacy)
+{
+	EXPECT_EQ(variant->error, test_secbits_set(0));
+}
+
+#define CHILD(...)                     \
+	do {                           \
+		pid_t child = vfork(); \
+		EXPECT_LE(0, child);   \
+		if (child == 0) {      \
+			__VA_ARGS__;   \
+			_exit(0);      \
+		}                      \
+	} while (0)
+
+TEST_F(secbits, should_exec)
+{
+	unsigned int secbits = prctl(PR_GET_SECUREBITS);
+
+	secbits |= SECBIT_SHOULD_EXEC_CHECK;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+	EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS));
+	CHILD(EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS)));
+
+	secbits |= SECBIT_SHOULD_EXEC_RESTRICT;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+	EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS));
+	CHILD(EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS)));
+
+	secbits &= ~(SECBIT_SHOULD_EXEC_CHECK | SECBIT_SHOULD_EXEC_RESTRICT);
+	EXPECT_EQ(0, test_secbits_set(secbits));
+	EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS));
+	CHILD(EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS)));
+}
+
+TEST_F(secbits, check_locked_set)
+{
+	unsigned int secbits = prctl(PR_GET_SECUREBITS);
+
+	secbits |= SECBIT_SHOULD_EXEC_CHECK;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+	secbits |= SECBIT_SHOULD_EXEC_CHECK_LOCKED;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+
+	/* Checks lock set but unchanged. */
+	EXPECT_EQ(variant->error, test_secbits_set(secbits));
+	CHILD(EXPECT_EQ(variant->error, test_secbits_set(secbits)));
+
+	secbits &= ~SECBIT_SHOULD_EXEC_CHECK;
+	EXPECT_EQ(EPERM, test_secbits_set(0));
+	CHILD(EXPECT_EQ(EPERM, test_secbits_set(0)));
+}
+
+TEST_F(secbits, check_locked_unset)
+{
+	unsigned int secbits = prctl(PR_GET_SECUREBITS);
+
+	secbits |= SECBIT_SHOULD_EXEC_CHECK_LOCKED;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+
+	/* Checks lock unset but unchanged. */
+	EXPECT_EQ(variant->error, test_secbits_set(secbits));
+	CHILD(EXPECT_EQ(variant->error, test_secbits_set(secbits)));
+
+	secbits &= ~SECBIT_SHOULD_EXEC_CHECK;
+	EXPECT_EQ(EPERM, test_secbits_set(0));
+	CHILD(EXPECT_EQ(EPERM, test_secbits_set(0)));
+}
+
+TEST_F(secbits, restrict_locked_set)
+{
+	unsigned int secbits = prctl(PR_GET_SECUREBITS);
+
+	secbits |= SECBIT_SHOULD_EXEC_RESTRICT;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+	secbits |= SECBIT_SHOULD_EXEC_RESTRICT_LOCKED;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+
+	/* Checks lock set but unchanged. */
+	EXPECT_EQ(variant->error, test_secbits_set(secbits));
+	CHILD(EXPECT_EQ(variant->error, test_secbits_set(secbits)));
+
+	secbits &= ~SECBIT_SHOULD_EXEC_RESTRICT;
+	EXPECT_EQ(EPERM, test_secbits_set(0));
+	CHILD(EXPECT_EQ(EPERM, test_secbits_set(0)));
+}
+
+TEST_F(secbits, restrict_locked_unset)
+{
+	unsigned int secbits = prctl(PR_GET_SECUREBITS);
+
+	secbits |= SECBIT_SHOULD_EXEC_RESTRICT_LOCKED;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+
+	/* Checks lock unset but unchanged. */
+	EXPECT_EQ(variant->error, test_secbits_set(secbits));
+	CHILD(EXPECT_EQ(variant->error, test_secbits_set(secbits)));
+
+	secbits &= ~SECBIT_SHOULD_EXEC_RESTRICT;
+	EXPECT_EQ(EPERM, test_secbits_set(0));
+	CHILD(EXPECT_EQ(EPERM, test_secbits_set(0)));
+}
+
+/* TODO: Add ptrace tests */
+
+TEST_HARNESS_MAIN
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [RFC PATCH v19 4/5] selftests/landlock: Add tests for execveat + AT_CHECK
  2024-07-04 19:01 [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mickaël Salaün
                   ` (2 preceding siblings ...)
  2024-07-04 19:01 ` [RFC PATCH v19 3/5] selftests/exec: Add tests for AT_CHECK and related securebits Mickaël Salaün
@ 2024-07-04 19:01 ` Mickaël Salaün
  2024-07-04 19:01 ` [RFC PATCH v19 5/5] samples/should-exec: Add set-should-exec Mickaël Salaün
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-04 19:01 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o
  Cc: Mickaël Salaün, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module,
	Günther Noack

Extend layout1.execute with the new AT_CHECK flag.  The semantic with
AT_CHECK is the same as with a simple execve(2),
LANDLOCK_ACCESS_FS_EXECUTE is enforced the same way.

Cc: Günther Noack <gnoack@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240704190137.696169-5-mic@digikod.net
---
 tools/testing/selftests/landlock/fs_test.c | 26 ++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index 7d063c652be1..85ef36b09a37 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -37,6 +37,10 @@
 #include <linux/fs.h>
 #include <linux/mount.h>
 
+/* Defines AT_CHECK without type conflicts. */
+#define _ASM_GENERIC_FCNTL_H
+#include <linux/fcntl.h>
+
 #include "common.h"
 
 #ifndef renameat2
@@ -2009,6 +2013,21 @@ static void test_execute(struct __test_metadata *const _metadata, const int err,
 	};
 }
 
+static void test_check_exec(struct __test_metadata *const _metadata,
+			    const int err, const char *const path)
+{
+	int ret;
+	char *const argv[] = { (char *)path, NULL };
+
+	ret = execveat(AT_FDCWD, path, argv, NULL, AT_EMPTY_PATH | AT_CHECK);
+	if (err) {
+		EXPECT_EQ(-1, ret);
+		EXPECT_EQ(errno, err);
+	} else {
+		EXPECT_EQ(0, ret);
+	}
+}
+
 TEST_F_FORK(layout1, execute)
 {
 	const struct rule rules[] = {
@@ -2026,20 +2045,27 @@ TEST_F_FORK(layout1, execute)
 	copy_binary(_metadata, file1_s1d2);
 	copy_binary(_metadata, file1_s1d3);
 
+	/* Checks before file1_s1d1 being denied. */
+	test_execute(_metadata, 0, file1_s1d1);
+	test_check_exec(_metadata, 0, file1_s1d1);
+
 	enforce_ruleset(_metadata, ruleset_fd);
 	ASSERT_EQ(0, close(ruleset_fd));
 
 	ASSERT_EQ(0, test_open(dir_s1d1, O_RDONLY));
 	ASSERT_EQ(0, test_open(file1_s1d1, O_RDONLY));
 	test_execute(_metadata, EACCES, file1_s1d1);
+	test_check_exec(_metadata, EACCES, file1_s1d1);
 
 	ASSERT_EQ(0, test_open(dir_s1d2, O_RDONLY));
 	ASSERT_EQ(0, test_open(file1_s1d2, O_RDONLY));
 	test_execute(_metadata, 0, file1_s1d2);
+	test_check_exec(_metadata, 0, file1_s1d2);
 
 	ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY));
 	ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
 	test_execute(_metadata, 0, file1_s1d3);
+	test_check_exec(_metadata, 0, file1_s1d3);
 }
 
 TEST_F_FORK(layout1, link)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [RFC PATCH v19 5/5] samples/should-exec: Add set-should-exec
  2024-07-04 19:01 [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mickaël Salaün
                   ` (3 preceding siblings ...)
  2024-07-04 19:01 ` [RFC PATCH v19 4/5] selftests/landlock: Add tests for execveat + AT_CHECK Mickaël Salaün
@ 2024-07-04 19:01 ` Mickaël Salaün
  2024-07-08 19:40   ` Mimi Zohar
  2024-07-08 20:35 ` [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mimi Zohar
  2024-07-15 20:16 ` Jonathan Corbet
  6 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-04 19:01 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o
  Cc: Mickaël Salaün, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

Add a simple tool to set SECBIT_SHOULD_EXEC_CHECK,
SECBIT_SHOULD_EXEC_RESTRICT, and their lock counterparts before
executing a command.  This should be useful to easily test against
script interpreters.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240704190137.696169-6-mic@digikod.net
---
 samples/Kconfig                       |  7 +++
 samples/Makefile                      |  1 +
 samples/should-exec/.gitignore        |  1 +
 samples/should-exec/Makefile          | 13 ++++
 samples/should-exec/set-should-exec.c | 88 +++++++++++++++++++++++++++
 5 files changed, 110 insertions(+)
 create mode 100644 samples/should-exec/.gitignore
 create mode 100644 samples/should-exec/Makefile
 create mode 100644 samples/should-exec/set-should-exec.c

diff --git a/samples/Kconfig b/samples/Kconfig
index b288d9991d27..d8f2639bc830 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -180,6 +180,13 @@ config SAMPLE_SECCOMP
 	  Build samples of seccomp filters using various methods of
 	  BPF filter construction.
 
+config SAMPLE_SHOULD_EXEC
+	bool "Should-exec secure bits examples"
+	depends on CC_CAN_LINK && HEADERS_INSTALL
+	help
+	  Build a tool to easily configure SECBIT_SHOULD_EXEC_CHECK,
+	  SECBIT_SHOULD_EXEC_RESTRICT and their lock counterparts.
+
 config SAMPLE_TIMER
 	bool "Timer sample"
 	depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index b85fa64390c5..0e7a97fb222d 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -19,6 +19,7 @@ subdir-$(CONFIG_SAMPLE_PIDFD)		+= pidfd
 obj-$(CONFIG_SAMPLE_QMI_CLIENT)		+= qmi/
 obj-$(CONFIG_SAMPLE_RPMSG_CLIENT)	+= rpmsg/
 subdir-$(CONFIG_SAMPLE_SECCOMP)		+= seccomp
+subdir-$(CONFIG_SAMPLE_SHOULD_EXEC)	+= should-exec
 subdir-$(CONFIG_SAMPLE_TIMER)		+= timers
 obj-$(CONFIG_SAMPLE_TRACE_EVENTS)	+= trace_events/
 obj-$(CONFIG_SAMPLE_TRACE_CUSTOM_EVENTS) += trace_events/
diff --git a/samples/should-exec/.gitignore b/samples/should-exec/.gitignore
new file mode 100644
index 000000000000..ac46c614ec80
--- /dev/null
+++ b/samples/should-exec/.gitignore
@@ -0,0 +1 @@
+/set-should-exec
diff --git a/samples/should-exec/Makefile b/samples/should-exec/Makefile
new file mode 100644
index 000000000000..c4294278dd07
--- /dev/null
+++ b/samples/should-exec/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+userprogs-always-y := set-should-exec
+
+userccflags += -I usr/include
+
+.PHONY: all clean
+
+all:
+	$(MAKE) -C ../.. samples/should-exec/
+
+clean:
+	$(MAKE) -C ../.. M=samples/should-exec/ clean
diff --git a/samples/should-exec/set-should-exec.c b/samples/should-exec/set-should-exec.c
new file mode 100644
index 000000000000..b3c31106d916
--- /dev/null
+++ b/samples/should-exec/set-should-exec.c
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Simple tool to set SECBIT_SHOULD_EXEC_CHECK,  SECBIT_SHOULD_EXEC_RESTRICT,
+ * and their lock counterparts before executing a command.
+ *
+ * Copyright © 2024 Microsoft Corporation
+ */
+
+#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__
+#include <errno.h>
+#include <linux/prctl.h>
+#include <linux/securebits.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+
+static void print_usage(const char *argv0)
+{
+	fprintf(stderr, "usage: %s -c|-r [-l] -- <cmd> [args]...\n\n", argv0);
+	fprintf(stderr, "Execute a command with\n");
+	fprintf(stderr, "- SECBIT_SHOULD_EXEC_CHECK set: -c\n");
+	fprintf(stderr, "- SECBIT_SHOULD_EXEC_RESTRICT set: -r\n");
+	fprintf(stderr, "- SECBIT_SHOULD_EXEC_*_LOCKED set: -l\n");
+}
+
+int main(const int argc, char *const argv[], char *const *const envp)
+{
+	const char *cmd_path;
+	char *const *cmd_argv;
+	int opt, secbits, err;
+	bool has_policy = false;
+
+	secbits = prctl(PR_GET_SECUREBITS);
+
+	while ((opt = getopt(argc, argv, "crl")) != -1) {
+		switch (opt) {
+		case 'c':
+			secbits |= SECBIT_SHOULD_EXEC_CHECK;
+			has_policy = true;
+			break;
+		case 'r':
+			secbits |= SECBIT_SHOULD_EXEC_RESTRICT;
+			has_policy = true;
+			break;
+		case 'l':
+			secbits |= SECBIT_SHOULD_EXEC_CHECK_LOCKED;
+			secbits |= SECBIT_SHOULD_EXEC_RESTRICT_LOCKED;
+			break;
+		default:
+			print_usage(argv[0]);
+			return 1;
+		}
+	}
+
+	if (!argv[optind] || !has_policy) {
+		print_usage(argv[0]);
+		return 1;
+	}
+
+	err = prctl(PR_SET_SECUREBITS, secbits);
+	if (err) {
+		perror("Failed to set secure bit(s).");
+		fprintf(stderr,
+			"Hint: The running kernel may not support this feature.\n");
+		return 1;
+	}
+
+	fprintf(stderr, "SECBIT_SHOULD_EXEC_CHECK: %d\n",
+		!!(secbits & SECBIT_SHOULD_EXEC_CHECK));
+	fprintf(stderr, "SECBIT_SHOULD_EXEC_CHECK_LOCKED: %d\n",
+		!!(secbits & SECBIT_SHOULD_EXEC_CHECK_LOCKED));
+	fprintf(stderr, "SECBIT_SHOULD_EXEC_RESTRICT: %d\n",
+		!!(secbits & SECBIT_SHOULD_EXEC_RESTRICT));
+	fprintf(stderr, "SECBIT_SHOULD_EXEC_RESTRICT_LOCKED: %d\n",
+		!!(secbits & SECBIT_SHOULD_EXEC_RESTRICT_LOCKED));
+
+	cmd_path = argv[optind];
+	cmd_argv = argv + optind;
+	fprintf(stderr, "Executing command...\n");
+	execvpe(cmd_path, cmd_argv, envp);
+	fprintf(stderr, "Failed to execute \"%s\": %s\n", cmd_path,
+		strerror(errno));
+	return 1;
+}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-04 19:01 ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Mickaël Salaün
@ 2024-07-05  0:04   ` Kees Cook
  2024-07-05 17:53     ` Mickaël Salaün
  2024-07-05 18:03   ` Florian Weimer
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 103+ messages in thread
From: Kees Cook @ 2024-07-05  0:04 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote:
> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].
> 
> This is different than faccessat(2) which only checks file access
> rights, but not the full context e.g. mount point's noexec, stack limit,
> and all potential LSM extra checks (e.g. argv, envp, credentials).
> Since the use of AT_CHECK follows the exact kernel semantic as for a
> real execution, user space gets the same error codes.

Nice! I much prefer this method of going through the exec machinery so
we always have a single code path for these kinds of checks.

> Because AT_CHECK is dedicated to user space interpreters, it doesn't
> make sense for the kernel to parse the checked files, look for
> interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> if the format is unknown.  Because of that, security_bprm_check() is
> never called when AT_CHECK is used.

I'd like some additional comments in the code that reminds us that
access control checks have finished past a certain point.

[...]
> diff --git a/fs/exec.c b/fs/exec.c
> index 40073142288f..ea2a1867afdc 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
>  		.lookup_flags = LOOKUP_FOLLOW,
>  	};
>  
> -	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> +	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
>  		return ERR_PTR(-EINVAL);
>  	if (flags & AT_SYMLINK_NOFOLLOW)
>  		open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
[...]
> + * To avoid race conditions leading to time-of-check to time-of-use issues,
> + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> + * descriptor instead of a path.

I want this enforced by the kernel. Let's not leave trivial ToCToU
foot-guns around. i.e.:

	if ((flags & AT_CHECK) == AT_CHECK && (flags & AT_EMPTY_PATH) == 0)
  		return ERR_PTR(-EBADF);

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-04 19:01 ` [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits Mickaël Salaün
@ 2024-07-05  0:18   ` Kees Cook
  2024-07-05 17:54     ` Mickaël Salaün
  2024-07-08 16:17   ` Jeff Xu
  2024-07-20  2:06   ` Andy Lutomirski
  2 siblings, 1 reply; 103+ messages in thread
From: Kees Cook @ 2024-07-05  0:18 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote:
> Such a secure environment can be achieved with an appropriate access
> control policy (e.g. mount's noexec option, file access rights, LSM
> configuration) and an enlighten ld.so checking that libraries are
> allowed for execution e.g., to protect against illegitimate use of
> LD_PRELOAD.
> 
> Scripts may need some changes to deal with untrusted data (e.g. stdin,
> environment variables), but that is outside the scope of the kernel.

If the threat model includes an attacker sitting at a shell prompt, we
need to be very careful about how process perform enforcement. E.g. even
on a locked down system, if an attacker has access to LD_PRELOAD or a
seccomp wrapper (which you both mention here), it would be possible to
run commands where the resulting process is tricked into thinking it
doesn't have the bits set.

But this would be exactly true for calling execveat(): LD_PRELOAD or
seccomp policy could have it just return 0.

While I like AT_CHECK, I do wonder if it's better to do the checks via
open(), as was originally designed with O_MAYEXEC. Because then
enforcement is gated by the kernel -- the process does not get a file
descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into
doing.

And this thinking also applies to faccessat() too: if a process can be
tricked into thinking the access check passed, it'll happily interpret
whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC
is being checked seems substantially safer to me...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-05  0:04   ` Kees Cook
@ 2024-07-05 17:53     ` Mickaël Salaün
  2024-07-08 19:38       ` Kees Cook
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-05 17:53 UTC (permalink / raw)
  To: Kees Cook
  Cc: Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Thu, Jul 04, 2024 at 05:04:03PM -0700, Kees Cook wrote:
> On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote:
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> > 
> > This is different than faccessat(2) which only checks file access
> > rights, but not the full context e.g. mount point's noexec, stack limit,
> > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > real execution, user space gets the same error codes.
> 
> Nice! I much prefer this method of going through the exec machinery so
> we always have a single code path for these kinds of checks.
> 
> > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > make sense for the kernel to parse the checked files, look for
> > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > if the format is unknown.  Because of that, security_bprm_check() is
> > never called when AT_CHECK is used.
> 
> I'd like some additional comments in the code that reminds us that
> access control checks have finished past a certain point.

Where in the code? Just before the bprm->is_check assignment?

> 
> [...]
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 40073142288f..ea2a1867afdc 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> >  		.lookup_flags = LOOKUP_FOLLOW,
> >  	};
> >  
> > -	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > +	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> >  		return ERR_PTR(-EINVAL);
> >  	if (flags & AT_SYMLINK_NOFOLLOW)
> >  		open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> [...]
> > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > + * descriptor instead of a path.
> 
> I want this enforced by the kernel. Let's not leave trivial ToCToU
> foot-guns around. i.e.:
> 
> 	if ((flags & AT_CHECK) == AT_CHECK && (flags & AT_EMPTY_PATH) == 0)
>   		return ERR_PTR(-EBADF);

There are valid use cases relying on pathnames. See Linus's comment:
https://lore.kernel.org/r/CAHk-=whb=XuU=LGKnJWaa7LOYQz9VwHs8SLfgLbT5sf2VAbX1A@mail.gmail.com

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-05  0:18   ` Kees Cook
@ 2024-07-05 17:54     ` Mickaël Salaün
  2024-07-05 21:44       ` Kees Cook
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-05 17:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote:
> On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote:
> > Such a secure environment can be achieved with an appropriate access
> > control policy (e.g. mount's noexec option, file access rights, LSM
> > configuration) and an enlighten ld.so checking that libraries are
> > allowed for execution e.g., to protect against illegitimate use of
> > LD_PRELOAD.
> > 
> > Scripts may need some changes to deal with untrusted data (e.g. stdin,
> > environment variables), but that is outside the scope of the kernel.
> 
> If the threat model includes an attacker sitting at a shell prompt, we
> need to be very careful about how process perform enforcement. E.g. even
> on a locked down system, if an attacker has access to LD_PRELOAD or a

LD_PRELOAD should be OK once ld.so will be patched to check the
libraries.  We can still imagine a debug library used to bypass security
checks, but in this case the issue would be that this library is
executable in the first place.

> seccomp wrapper (which you both mention here), it would be possible to
> run commands where the resulting process is tricked into thinking it
> doesn't have the bits set.

As explained in the UAPI comments, all parent processes need to be
trusted.  This meeans that their code is trusted, their seccomp filters
are trusted, and that they are patched, if needed, to check file
executability.

> 
> But this would be exactly true for calling execveat(): LD_PRELOAD or
> seccomp policy could have it just return 0.

If an attacker is allowed/able to load an arbitrary seccomp filter on a
process, we cannot trust this process.

> 
> While I like AT_CHECK, I do wonder if it's better to do the checks via
> open(), as was originally designed with O_MAYEXEC. Because then
> enforcement is gated by the kernel -- the process does not get a file
> descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into
> doing.

Being able to check a path name or a file descriptor (with the same
syscall) is more flexible and cover more use cases.  The execveat(2)
interface, including current and future flags, is dedicated to file
execution.  I then think that using execveat(2) for this kind of check
makes more sense, and will easily evolve with this syscall.

> 
> And this thinking also applies to faccessat() too: if a process can be
> tricked into thinking the access check passed, it'll happily interpret
> whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC
> is being checked seems substantially safer to me...

If attackers can filter execveat(2), they can also filter open(2) and
any other syscalls.  In all cases, that would mean an issue in the
security policy.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-04 19:01 ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Mickaël Salaün
  2024-07-05  0:04   ` Kees Cook
@ 2024-07-05 18:03   ` Florian Weimer
  2024-07-06 14:55     ` Mickaël Salaün
  2024-07-08 16:08     ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Jeff Xu
  2024-07-06  8:52   ` Andy Lutomirski
  2024-07-17  6:33   ` Jeff Xu
  3 siblings, 2 replies; 103+ messages in thread
From: Florian Weimer @ 2024-07-05 18:03 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

* Mickaël Salaün:

> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].

Some distributions no longer set executable bits on most shared objects,
which I assume would interfere with AT_CHECK probing for shared objects.
Removing the executable bit is attractive because of a combination of
two bugs: a binutils wart which until recently always set the entry
point address in the ELF header to zero, and the kernel not checking for
a zero entry point (maybe in combination with an absent program
interpreter) and failing the execve with ELIBEXEC, instead of doing the
execve and then faulting at virtual address zero.  Removing the
executable bit is currently the only way to avoid these confusing
crashes, so I understand the temptation.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-05 17:54     ` Mickaël Salaün
@ 2024-07-05 21:44       ` Kees Cook
  2024-07-05 22:22         ` Jarkko Sakkinen
  2024-07-06 14:56         ` Mickaël Salaün
  0 siblings, 2 replies; 103+ messages in thread
From: Kees Cook @ 2024-07-05 21:44 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Fri, Jul 05, 2024 at 07:54:16PM +0200, Mickaël Salaün wrote:
> On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote:
> > On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote:
> > > Such a secure environment can be achieved with an appropriate access
> > > control policy (e.g. mount's noexec option, file access rights, LSM
> > > configuration) and an enlighten ld.so checking that libraries are
> > > allowed for execution e.g., to protect against illegitimate use of
> > > LD_PRELOAD.
> > > 
> > > Scripts may need some changes to deal with untrusted data (e.g. stdin,
> > > environment variables), but that is outside the scope of the kernel.
> > 
> > If the threat model includes an attacker sitting at a shell prompt, we
> > need to be very careful about how process perform enforcement. E.g. even
> > on a locked down system, if an attacker has access to LD_PRELOAD or a
> 
> LD_PRELOAD should be OK once ld.so will be patched to check the
> libraries.  We can still imagine a debug library used to bypass security
> checks, but in this case the issue would be that this library is
> executable in the first place.

Ah yes, that's fair: the shell would discover the malicious library
while using AT_CHECK during resolution of the LD_PRELOAD.

> > seccomp wrapper (which you both mention here), it would be possible to
> > run commands where the resulting process is tricked into thinking it
> > doesn't have the bits set.
> 
> As explained in the UAPI comments, all parent processes need to be
> trusted.  This meeans that their code is trusted, their seccomp filters
> are trusted, and that they are patched, if needed, to check file
> executability.

But we have launchers that apply arbitrary seccomp policy, e.g. minijail
on Chrome OS, or even systemd on regular distros. In theory, this should
be handled via other ACLs.

> > But this would be exactly true for calling execveat(): LD_PRELOAD or
> > seccomp policy could have it just return 0.
> 
> If an attacker is allowed/able to load an arbitrary seccomp filter on a
> process, we cannot trust this process.
> 
> > 
> > While I like AT_CHECK, I do wonder if it's better to do the checks via
> > open(), as was originally designed with O_MAYEXEC. Because then
> > enforcement is gated by the kernel -- the process does not get a file
> > descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into
> > doing.
> 
> Being able to check a path name or a file descriptor (with the same
> syscall) is more flexible and cover more use cases.

If flexibility costs us reliability, I think that flexibility is not
a benefit.

> The execveat(2)
> interface, including current and future flags, is dedicated to file
> execution.  I then think that using execveat(2) for this kind of check
> makes more sense, and will easily evolve with this syscall.

Yeah, I do recognize that is feels much more natural, but I remain
unhappy about how difficult it will become to audit a system for safety
when the check is strictly per-process opt-in, and not enforced by the
kernel for a given process tree. But, I think this may have always been
a fiction in my mind. :)

> > And this thinking also applies to faccessat() too: if a process can be
> > tricked into thinking the access check passed, it'll happily interpret
> > whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC
> > is being checked seems substantially safer to me...
> 
> If attackers can filter execveat(2), they can also filter open(2) and
> any other syscalls.  In all cases, that would mean an issue in the
> security policy.

Hm, as in, make a separate call to open(2) without O_MAYEXEC, and pass
that fd back to the filtered open(2) that did have O_MAYEXEC. Yes, true.

I guess it does become morally equivalent.

Okay. Well, let me ask about usability. Right now, a process will need
to do:

- should I use AT_CHECK? (check secbit)
- if yes: perform execveat(AT_CHECK)

Why not leave the secbit test up to the kernel, and then the program can
just unconditionally call execveat(AT_CHECK)?

Though perhaps the issue here is that an execveat() EINVAL doesn't
tell the program if AT_CHECK is unimplemented or if something else
went wrong, and the secbit prctl() will give the correct signal about
AT_CHECK availability?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-05 21:44       ` Kees Cook
@ 2024-07-05 22:22         ` Jarkko Sakkinen
  2024-07-06 14:56           ` Mickaël Salaün
  2024-07-06 14:56         ` Mickaël Salaün
  1 sibling, 1 reply; 103+ messages in thread
From: Jarkko Sakkinen @ 2024-07-05 22:22 UTC (permalink / raw)
  To: Kees Cook, Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Sat Jul 6, 2024 at 12:44 AM EEST, Kees Cook wrote:
> > As explained in the UAPI comments, all parent processes need to be
> > trusted.  This meeans that their code is trusted, their seccomp filters
> > are trusted, and that they are patched, if needed, to check file
> > executability.
>
> But we have launchers that apply arbitrary seccomp policy, e.g. minijail
> on Chrome OS, or even systemd on regular distros. In theory, this should
> be handled via other ACLs.

Or a regular web browser? AFAIK seccomp filtering was the tool to make
secure browser tabs in the first place.

BR, Jarkko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-04 19:01 ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Mickaël Salaün
  2024-07-05  0:04   ` Kees Cook
  2024-07-05 18:03   ` Florian Weimer
@ 2024-07-06  8:52   ` Andy Lutomirski
  2024-07-07  9:01     ` Mickaël Salaün
  2024-07-17  6:33   ` Jeff Xu
  3 siblings, 1 reply; 103+ messages in thread
From: Andy Lutomirski @ 2024-07-06  8:52 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Fri, Jul 5, 2024 at 3:03 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].
>

Can you give a worked-out example of how this is useful?

I assume the idea is that a program could open a file, then pass the
fd to execveat() to get the kernel's idea of whether it's permissible
to execute it.  And then the program would interpret the file, which
is morally like executing it.  And there would be a big warning in the
manpage that passing a *path* is subject to a TOCTOU race.

This type of usage will do the wrong thing if LSM policy intends to
lock down the task if the task were to actually exec the file.  I
personally think this is a mis-design (let the program doing the
exec-ing lock itself down, possibly by querying a policy, but having
magic happen on exec seems likely to do the wrong thing more often
that it does the wright thing), but that ship sailed a long time ago.

So maybe what's actually needed is a rather different API: a way to
check *and perform* the security transition for an exec without
actually execing.  This would need to be done NO_NEW_PRIVS style for
reasons that are hopefully obvious, but it would permit:

fd = open(some script);
if (do_exec_transition_without_exec(fd) != 0)
  return;  // don't actually do it

// OK, we may have just lost privileges.  But that's okay, because we
meant to do that.
// Make sure we've munmapped anything sensitive and erased any secrets
from memory,
// and then interpret the script!

I think this would actually be straightforward to implement in the
kernel -- one would need to make sure that all the relevant
no_new_privs checks are looking in the right place (as the task might
not actually have no_new_privs set, but LSM_UNSAFE_NO_NEW_PRIVS would
still be set), but I don't see any reason this would be
insurmountable, nor do I expect there would be any fundamental
problems.


> This is different than faccessat(2) which only checks file access
> rights, but not the full context e.g. mount point's noexec, stack limit,
> and all potential LSM extra checks (e.g. argv, envp, credentials).
> Since the use of AT_CHECK follows the exact kernel semantic as for a
> real execution, user space gets the same error codes.
>
> With the information that a script interpreter is about to interpret a
> script, an LSM security policy can adjust caller's access rights or log
> execution request as for native script execution (e.g. role transition).
> This is possible thanks to the call to security_bprm_creds_for_exec().
>
> Because LSMs may only change bprm's credentials, use of AT_CHECK with
> current kernel code should not be a security issue (e.g. unexpected role
> transition).  LSMs willing to update the caller's credential could now
> do so when bprm->is_check is set.  Of course, such policy change should
> be in line with the new user space code.
>
> Because AT_CHECK is dedicated to user space interpreters, it doesn't
> make sense for the kernel to parse the checked files, look for
> interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> if the format is unknown.  Because of that, security_bprm_check() is
> never called when AT_CHECK is used.
>
> It should be noted that script interpreters cannot directly use
> execveat(2) (without this new AT_CHECK flag) because this could lead to
> unexpected behaviors e.g., `python script.sh` could lead to Bash being
> executed to interpret the script.  Unlike the kernel, script
> interpreters may just interpret the shebang as a simple comment, which
> should not change for backward compatibility reasons.
>
> Because scripts or libraries files might not currently have the
> executable permission set, or because we might want specific users to be
> allowed to run arbitrary scripts, the following patch provides a dynamic
> configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> SECBIT_SHOULD_EXEC_RESTRICT securebits.

Can you explain what those bits do?  And why they're useful?

>
> This is a redesign of the CLIP OS 4's O_MAYEXEC:
> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> This patch has been used for more than a decade with customized script
> interpreters.  Some examples can be found here:
> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

This one at least returns an fd, so it looks less likely to get
misused in a way that adds a TOCTOU race.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-05 18:03   ` Florian Weimer
@ 2024-07-06 14:55     ` Mickaël Salaün
  2024-07-06 15:32       ` Florian Weimer
  2024-07-08 16:08     ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Jeff Xu
  1 sibling, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-06 14:55 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
> * Mickaël Salaün:
> 
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> 
> Some distributions no longer set executable bits on most shared objects,
> which I assume would interfere with AT_CHECK probing for shared objects.

A file without the execute permission is not considered as executable by
the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
note that this is just a check, not a restriction.  See the next patch
for the optional policy enforcement.

Anyway, we need to define the policy, and for Linux this is done with
the file permission bits.  So for systems willing to have a consistent
execution policy, we need to rely on the same bits.

> Removing the executable bit is attractive because of a combination of
> two bugs: a binutils wart which until recently always set the entry
> point address in the ELF header to zero, and the kernel not checking for
> a zero entry point (maybe in combination with an absent program
> interpreter) and failing the execve with ELIBEXEC, instead of doing the
> execve and then faulting at virtual address zero.  Removing the
> executable bit is currently the only way to avoid these confusing
> crashes, so I understand the temptation.

Interesting.  Can you please point to the bug report and the fix?  I
don't see any ELIBEXEC in the kernel.

FYI, AT_CHECK doesn't check the content of the file (unlike a full
execve call).

Anyway, I think we should not design a new kernel interface to work
around a current user space bug.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-05 21:44       ` Kees Cook
  2024-07-05 22:22         ` Jarkko Sakkinen
@ 2024-07-06 14:56         ` Mickaël Salaün
  2024-07-18 14:16           ` Roberto Sassu
  1 sibling, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-06 14:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Fri, Jul 05, 2024 at 02:44:03PM -0700, Kees Cook wrote:
> On Fri, Jul 05, 2024 at 07:54:16PM +0200, Mickaël Salaün wrote:
> > On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote:
> > > On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote:
> > > > Such a secure environment can be achieved with an appropriate access
> > > > control policy (e.g. mount's noexec option, file access rights, LSM
> > > > configuration) and an enlighten ld.so checking that libraries are
> > > > allowed for execution e.g., to protect against illegitimate use of
> > > > LD_PRELOAD.
> > > > 
> > > > Scripts may need some changes to deal with untrusted data (e.g. stdin,
> > > > environment variables), but that is outside the scope of the kernel.
> > > 
> > > If the threat model includes an attacker sitting at a shell prompt, we
> > > need to be very careful about how process perform enforcement. E.g. even
> > > on a locked down system, if an attacker has access to LD_PRELOAD or a
> > 
> > LD_PRELOAD should be OK once ld.so will be patched to check the
> > libraries.  We can still imagine a debug library used to bypass security
> > checks, but in this case the issue would be that this library is
> > executable in the first place.
> 
> Ah yes, that's fair: the shell would discover the malicious library
> while using AT_CHECK during resolution of the LD_PRELOAD.

That's the idea, but it would be checked by ld.so, not the shell.

> 
> > > seccomp wrapper (which you both mention here), it would be possible to
> > > run commands where the resulting process is tricked into thinking it
> > > doesn't have the bits set.
> > 
> > As explained in the UAPI comments, all parent processes need to be
> > trusted.  This meeans that their code is trusted, their seccomp filters
> > are trusted, and that they are patched, if needed, to check file
> > executability.
> 
> But we have launchers that apply arbitrary seccomp policy, e.g. minijail
> on Chrome OS, or even systemd on regular distros. In theory, this should
> be handled via other ACLs.

Processes running with untrusted seccomp filter should be considered
untrusted.  It would then make sense for these seccomp filters/programs
to be considered executable code, and then for minijail and systemd to
check them with AT_CHECK (according to the securebits policy).

> 
> > > But this would be exactly true for calling execveat(): LD_PRELOAD or
> > > seccomp policy could have it just return 0.
> > 
> > If an attacker is allowed/able to load an arbitrary seccomp filter on a
> > process, we cannot trust this process.
> > 
> > > 
> > > While I like AT_CHECK, I do wonder if it's better to do the checks via
> > > open(), as was originally designed with O_MAYEXEC. Because then
> > > enforcement is gated by the kernel -- the process does not get a file
> > > descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into
> > > doing.
> > 
> > Being able to check a path name or a file descriptor (with the same
> > syscall) is more flexible and cover more use cases.
> 
> If flexibility costs us reliability, I think that flexibility is not
> a benefit.

Well, it's a matter of letting user space do what they think is best,
and I think there are legitimate and safe uses of path names, even if I
agree that this should not be used in most use cases.  Would we want
faccessat2(2) to only take file descriptor as argument and not file
path? I don't think so but I'd defer to the VFS maintainers.

Christian, Al, Linus?

Steve, could you share a use case with file paths?

> 
> > The execveat(2)
> > interface, including current and future flags, is dedicated to file
> > execution.  I then think that using execveat(2) for this kind of check
> > makes more sense, and will easily evolve with this syscall.
> 
> Yeah, I do recognize that is feels much more natural, but I remain
> unhappy about how difficult it will become to audit a system for safety
> when the check is strictly per-process opt-in, and not enforced by the
> kernel for a given process tree. But, I think this may have always been
> a fiction in my mind. :)

Hmm, I'm not sure to follow. Securebits are inherited, so process tree.
And we need the parent processes to be trusted anyway.

> 
> > > And this thinking also applies to faccessat() too: if a process can be
> > > tricked into thinking the access check passed, it'll happily interpret
> > > whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC
> > > is being checked seems substantially safer to me...
> > 
> > If attackers can filter execveat(2), they can also filter open(2) and
> > any other syscalls.  In all cases, that would mean an issue in the
> > security policy.
> 
> Hm, as in, make a separate call to open(2) without O_MAYEXEC, and pass
> that fd back to the filtered open(2) that did have O_MAYEXEC. Yes, true.
> 
> I guess it does become morally equivalent.
> 
> Okay. Well, let me ask about usability. Right now, a process will need
> to do:
> 
> - should I use AT_CHECK? (check secbit)
> - if yes: perform execveat(AT_CHECK)
> 
> Why not leave the secbit test up to the kernel, and then the program can
> just unconditionally call execveat(AT_CHECK)?

That was kind of the approach of the previous patch series and Linus
wanted the new interface to follow the kernel semantic.  Enforcing this
kind of restriction will always be the duty of user space anyway, so I
think it's simpler (i.e. no mix of policy definition, access check, and
policy enforcement, but a standalone execveat feature), more flexible,
and it fully delegates the policy enforcement to user space instead of
trying to enforce some part in the kernel which would only give the
illusion of security/policy enforcement.

> 
> Though perhaps the issue here is that an execveat() EINVAL doesn't
> tell the program if AT_CHECK is unimplemented or if something else
> went wrong, and the secbit prctl() will give the correct signal about
> AT_CHECK availability?

This kind of check could indeed help to identify the issue.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-05 22:22         ` Jarkko Sakkinen
@ 2024-07-06 14:56           ` Mickaël Salaün
  2024-07-06 17:28             ` Jarkko Sakkinen
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-06 14:56 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Kees Cook, Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Sat, Jul 06, 2024 at 01:22:06AM +0300, Jarkko Sakkinen wrote:
> On Sat Jul 6, 2024 at 12:44 AM EEST, Kees Cook wrote:
> > > As explained in the UAPI comments, all parent processes need to be
> > > trusted.  This meeans that their code is trusted, their seccomp filters
> > > are trusted, and that they are patched, if needed, to check file
> > > executability.
> >
> > But we have launchers that apply arbitrary seccomp policy, e.g. minijail
> > on Chrome OS, or even systemd on regular distros. In theory, this should
> > be handled via other ACLs.
> 
> Or a regular web browser? AFAIK seccomp filtering was the tool to make
> secure browser tabs in the first place.

Yes, and that't OK.  Web browsers embedded their own seccomp filters and
they are then as trusted as the browser code.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-06 14:55     ` Mickaël Salaün
@ 2024-07-06 15:32       ` Florian Weimer
  2024-07-08  8:56         ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Florian Weimer @ 2024-07-06 15:32 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

* Mickaël Salaün:

> On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
>> * Mickaël Salaün:
>> 
>> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
>> > allowed for execution.  The main use case is for script interpreters and
>> > dynamic linkers to check execution permission according to the kernel's
>> > security policy. Another use case is to add context to access logs e.g.,
>> > which script (instead of interpreter) accessed a file.  As any
>> > executable code, scripts could also use this check [1].
>> 
>> Some distributions no longer set executable bits on most shared objects,
>> which I assume would interfere with AT_CHECK probing for shared objects.
>
> A file without the execute permission is not considered as executable by
> the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
> note that this is just a check, not a restriction.  See the next patch
> for the optional policy enforcement.
>
> Anyway, we need to define the policy, and for Linux this is done with
> the file permission bits.  So for systems willing to have a consistent
> execution policy, we need to rely on the same bits.

Yes, that makes complete sense.  I just wanted to point out the odd
interaction with the old binutils bug and the (sadly still current)
kernel bug.

>> Removing the executable bit is attractive because of a combination of
>> two bugs: a binutils wart which until recently always set the entry
>> point address in the ELF header to zero, and the kernel not checking for
>> a zero entry point (maybe in combination with an absent program
>> interpreter) and failing the execve with ELIBEXEC, instead of doing the
>> execve and then faulting at virtual address zero.  Removing the
>> executable bit is currently the only way to avoid these confusing
>> crashes, so I understand the temptation.
>
> Interesting.  Can you please point to the bug report and the fix?  I
> don't see any ELIBEXEC in the kernel.

The kernel hasn't been fixed yet.  I do think this should be fixed, so
that distributions can bring back the executable bit.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-06 14:56           ` Mickaël Salaün
@ 2024-07-06 17:28             ` Jarkko Sakkinen
  0 siblings, 0 replies; 103+ messages in thread
From: Jarkko Sakkinen @ 2024-07-06 17:28 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Kees Cook, Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Sat Jul 6, 2024 at 5:56 PM EEST, Mickaël Salaün wrote:
> On Sat, Jul 06, 2024 at 01:22:06AM +0300, Jarkko Sakkinen wrote:
> > On Sat Jul 6, 2024 at 12:44 AM EEST, Kees Cook wrote:
> > > > As explained in the UAPI comments, all parent processes need to be
> > > > trusted.  This meeans that their code is trusted, their seccomp filters
> > > > are trusted, and that they are patched, if needed, to check file
> > > > executability.
> > >
> > > But we have launchers that apply arbitrary seccomp policy, e.g. minijail
> > > on Chrome OS, or even systemd on regular distros. In theory, this should
> > > be handled via other ACLs.
> > 
> > Or a regular web browser? AFAIK seccomp filtering was the tool to make
> > secure browser tabs in the first place.
>
> Yes, and that't OK.  Web browsers embedded their own seccomp filters and
> they are then as trusted as the browser code.

I'd recommend to slice of tech detail from cover letter, as long as
those details are in the commit messages.

Then, in the cover letter I'd go through maybe two familiar scenarios,
with interactions to this functionality.

A desktop web browser could be perhaps one of them...

BR, Jarkko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-06  8:52   ` Andy Lutomirski
@ 2024-07-07  9:01     ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-07  9:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Sat, Jul 06, 2024 at 04:52:42PM +0800, Andy Lutomirski wrote:
> On Fri, Jul 5, 2024 at 3:03 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> >
> 
> Can you give a worked-out example of how this is useful?

Which part?  Please take a look at CLIP OS, chromeOS, and PEP 578 use
cases and related code (see cover letter).

> 
> I assume the idea is that a program could open a file, then pass the
> fd to execveat() to get the kernel's idea of whether it's permissible
> to execute it.  And then the program would interpret the file, which
> is morally like executing it.  And there would be a big warning in the
> manpage that passing a *path* is subject to a TOCTOU race.

yes

> 
> This type of usage will do the wrong thing if LSM policy intends to
> lock down the task if the task were to actually exec the file.  I

Why? LSMs should currently only change the bprm's credentials not the
current's credentials.  If needed, we can extend the current patch
series with LSM specific patches for them to check bprm->is_check.

> personally think this is a mis-design (let the program doing the
> exec-ing lock itself down, possibly by querying a policy, but having
> magic happen on exec seems likely to do the wrong thing more often
> that it does the wright thing), but that ship sailed a long time ago.

The execveat+AT_CHECK is only a check that doesn't impact the caller.
Maybe you're talking about process transition with future LSM changes?
In this case, we could add another flag, but I'm convinced it would be
confusing for users.  Anyway, let LSMs experiment with that and we'll
come up with a new flag if needed.  The current approach is a good and
useful piece to fill a gap in Linux access control systems.

> 
> So maybe what's actually needed is a rather different API: a way to
> check *and perform* the security transition for an exec without
> actually execing.  This would need to be done NO_NEW_PRIVS style for
> reasons that are hopefully obvious, but it would permit:

NO_NEW_PRIVS is not that obvious in this case because the restrictions
are enforced by user space, not the kernel.  NO_NEW_PRIVS makes sense to
avoid kernel restrictions be requested by a malicious/unprivileged
process to change the behavior of a (child) privileged/trusted process.
We are not in this configuration here.  The only change would be for
ptrace, which is a good thing either way and should not harm SUID
processes but avoid confused deputy attack for them too.

If this is about an LSM changing the caller's credentials, then yes it
might want to set additional flags, but that would be specific to their
implementation, not part of this patch.

> 
> fd = open(some script);
> if (do_exec_transition_without_exec(fd) != 0)
>   return;  // don't actually do it
> 
> // OK, we may have just lost privileges.  But that's okay, because we
> meant to do that.
> // Make sure we've munmapped anything sensitive and erased any secrets
> from memory,
> // and then interpret the script!
> 
> I think this would actually be straightforward to implement in the
> kernel -- one would need to make sure that all the relevant
> no_new_privs checks are looking in the right place (as the task might
> not actually have no_new_privs set, but LSM_UNSAFE_NO_NEW_PRIVS would
> still be set), but I don't see any reason this would be
> insurmountable, nor do I expect there would be any fundamental
> problems.

OK, that's what is described below with security_bprm_creds_for_exec().
Each LSM can implement this change with the current patch series, but
that should be part of a dedicated patch series per LSM, for those
willing to leverage this new feature.

> 
> 
> > This is different than faccessat(2) which only checks file access
> > rights, but not the full context e.g. mount point's noexec, stack limit,
> > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > real execution, user space gets the same error codes.
> >
> > With the information that a script interpreter is about to interpret a
> > script, an LSM security policy can adjust caller's access rights or log
> > execution request as for native script execution (e.g. role transition).
> > This is possible thanks to the call to security_bprm_creds_for_exec().
> >
> > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > current kernel code should not be a security issue (e.g. unexpected role
> > transition).  LSMs willing to update the caller's credential could now
> > do so when bprm->is_check is set.  Of course, such policy change should
> > be in line with the new user space code.
> >
> > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > make sense for the kernel to parse the checked files, look for
> > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > if the format is unknown.  Because of that, security_bprm_check() is
> > never called when AT_CHECK is used.
> >
> > It should be noted that script interpreters cannot directly use
> > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > executed to interpret the script.  Unlike the kernel, script
> > interpreters may just interpret the shebang as a simple comment, which
> > should not change for backward compatibility reasons.
> >
> > Because scripts or libraries files might not currently have the
> > executable permission set, or because we might want specific users to be
> > allowed to run arbitrary scripts, the following patch provides a dynamic
> > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> 
> Can you explain what those bits do?  And why they're useful?

I didn't want to duplicate the comments above their definition
explaining their usage.  Please let me know if it's not enough.

> 
> >
> > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > This patch has been used for more than a decade with customized script
> > interpreters.  Some examples can be found here:
> > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> 
> This one at least returns an fd, so it looks less likely to get
> misused in a way that adds a TOCTOU race.

We can use both an FD or a path name with execveat(2).  See discussion
with Kees and comment from Linus.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-06 15:32       ` Florian Weimer
@ 2024-07-08  8:56         ` Mickaël Salaün
  2024-07-08 16:37           ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC (was: Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)) Florian Weimer
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-08  8:56 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Sat, Jul 06, 2024 at 05:32:12PM +0200, Florian Weimer wrote:
> * Mickaël Salaün:
> 
> > On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
> >> * Mickaël Salaün:
> >> 
> >> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> >> > allowed for execution.  The main use case is for script interpreters and
> >> > dynamic linkers to check execution permission according to the kernel's
> >> > security policy. Another use case is to add context to access logs e.g.,
> >> > which script (instead of interpreter) accessed a file.  As any
> >> > executable code, scripts could also use this check [1].
> >> 
> >> Some distributions no longer set executable bits on most shared objects,
> >> which I assume would interfere with AT_CHECK probing for shared objects.
> >
> > A file without the execute permission is not considered as executable by
> > the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
> > note that this is just a check, not a restriction.  See the next patch
> > for the optional policy enforcement.
> >
> > Anyway, we need to define the policy, and for Linux this is done with
> > the file permission bits.  So for systems willing to have a consistent
> > execution policy, we need to rely on the same bits.
> 
> Yes, that makes complete sense.  I just wanted to point out the odd
> interaction with the old binutils bug and the (sadly still current)
> kernel bug.
> 
> >> Removing the executable bit is attractive because of a combination of
> >> two bugs: a binutils wart which until recently always set the entry
> >> point address in the ELF header to zero, and the kernel not checking for
> >> a zero entry point (maybe in combination with an absent program
> >> interpreter) and failing the execve with ELIBEXEC, instead of doing the
> >> execve and then faulting at virtual address zero.  Removing the
> >> executable bit is currently the only way to avoid these confusing
> >> crashes, so I understand the temptation.
> >
> > Interesting.  Can you please point to the bug report and the fix?  I
> > don't see any ELIBEXEC in the kernel.
> 
> The kernel hasn't been fixed yet.  I do think this should be fixed, so
> that distributions can bring back the executable bit.

Can you please point to the mailing list discussion or the bug report?

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-05 18:03   ` Florian Weimer
  2024-07-06 14:55     ` Mickaël Salaün
@ 2024-07-08 16:08     ` Jeff Xu
  2024-07-08 16:25       ` Florian Weimer
  1 sibling, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-08 16:08 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

Hi

On Fri, Jul 5, 2024 at 11:03 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Mickaël Salaün:
>
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
>
> Some distributions no longer set executable bits on most shared objects,
> which I assume would interfere with AT_CHECK probing for shared objects.
> Removing the executable bit is attractive because of a combination of
> two bugs: a binutils wart which until recently always set the entry
> point address in the ELF header to zero, and the kernel not checking for
> a zero entry point (maybe in combination with an absent program
> interpreter) and failing the execve with ELIBEXEC, instead of doing the
> execve and then faulting at virtual address zero.  Removing the
> executable bit is currently the only way to avoid these confusing
> crashes, so I understand the temptation.
>
Will dynamic linkers use the execveat(AT_CHECK) to check shared
libraries too ?  or just the main executable itself.

Thanks.
-Jeff


> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-04 19:01 ` [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits Mickaël Salaün
  2024-07-05  0:18   ` Kees Cook
@ 2024-07-08 16:17   ` Jeff Xu
  2024-07-08 17:53     ` Jeff Xu
  2024-07-20  2:06   ` Andy Lutomirski
  2 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-08 16:17 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

Hi

On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and
> their *_LOCKED counterparts are designed to be set by processes setting
> up an execution environment, such as a user session, a container, or a
> security sandbox.  Like seccomp filters or Landlock domains, the
> securebits are inherited across proceses.
>
> When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should
> check executable resources with execveat(2) + AT_CHECK (see previous
> patch).
>
> When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK).
>
Do we need both bits ?
When CHECK is set and RESTRICT is not, the "check fail" executable
will still get executed, so CHECK is for logging ?
Does RESTRICT imply CHECK is set, e.g. What if CHECK=0 and RESTRICT = 1 ?

> For a secure environment, we might also want
> SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED
> to be set.  For a test environment (e.g. testing on a fleet to identify
> potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to
> still be able to identify potential issues (e.g. with interpreters logs
> or LSMs audit entries).
>
> It should be noted that unlike other security bits, the
> SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are
> dedicated to user space willing to restrict itself.  Because of that,
> they only make sense in the context of a trusted environment (e.g.
> sandbox, container, user session, full system) where the process
> changing its behavior (according to these bits) and all its parent
> processes are trusted.  Otherwise, any parent process could just execute
> its own malicious code (interpreting a script or not), or even enforce a
> seccomp filter to mask these bits.
>
> Such a secure environment can be achieved with an appropriate access
> control policy (e.g. mount's noexec option, file access rights, LSM
> configuration) and an enlighten ld.so checking that libraries are
> allowed for execution e.g., to protect against illegitimate use of
> LD_PRELOAD.
>
> Scripts may need some changes to deal with untrusted data (e.g. stdin,
> environment variables), but that is outside the scope of the kernel.
>
> The only restriction enforced by the kernel is the right to ptrace
> another process.  Processes are denied to ptrace less restricted ones,
> unless the tracer has CAP_SYS_PTRACE.  This is mainly a safeguard to
> avoid trivial privilege escalations e.g., by a debugging process being
> abused with a confused deputy attack.
>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <paul@paul-moore.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net
> ---
>
> New design since v18:
> https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> ---
>  include/uapi/linux/securebits.h | 56 ++++++++++++++++++++++++++++-
>  security/commoncap.c            | 63 ++++++++++++++++++++++++++++-----
>  2 files changed, 110 insertions(+), 9 deletions(-)
>
> diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h
> index d6d98877ff1a..3fdb0382718b 100644
> --- a/include/uapi/linux/securebits.h
> +++ b/include/uapi/linux/securebits.h
> @@ -52,10 +52,64 @@
>  #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \
>                         (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED))
>
> +/*
> + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable
> + * files with execveat(2) + AT_CHECK.  However, such check should only be
> + * performed if all to-be-executed code only comes from regular files.  For
> + * instance, if a script interpreter is called with both a script snipped as
> + * argument and a regular file, the interpreter should not check any file.
> + * Doing otherwise would mislead the kernel to think that only the script file
> + * is being executed, which could for instance lead to unexpected permission
> + * change and break current use cases.
> + *
> + * This secure bit may be set by user session managers, service managers,
> + * container runtimes, sandboxer tools...  Except for test environments, the
> + * related SECBIT_SHOULD_EXEC_CHECK_LOCKED bit should also be set.
> + *
> + * Ptracing another process is deny if the tracer has SECBIT_SHOULD_EXEC_CHECK
> + * but not the tracee.  SECBIT_SHOULD_EXEC_CHECK_LOCKED also checked.
> + */
> +#define SECURE_SHOULD_EXEC_CHECK               8
> +#define SECURE_SHOULD_EXEC_CHECK_LOCKED                9  /* make bit-8 immutable */
> +
> +#define SECBIT_SHOULD_EXEC_CHECK (issecure_mask(SECURE_SHOULD_EXEC_CHECK))
> +#define SECBIT_SHOULD_EXEC_CHECK_LOCKED \
> +                       (issecure_mask(SECURE_SHOULD_EXEC_CHECK_LOCKED))
> +
> +/*
> + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK).  For
> + * instance, script interpreters called with a script snippet as argument
> + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set.
> + * However, if a script interpreter is called with both
> + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should
> + * interpret the provided script files if no unchecked code is also provided
> + * (e.g. directly as argument).
> + *
> + * This secure bit may be set by user session managers, service managers,
> + * container runtimes, sandboxer tools...  Except for test environments, the
> + * related SECBIT_SHOULD_EXEC_RESTRICT_LOCKED bit should also be set.
> + *
> + * Ptracing another process is deny if the tracer has
> + * SECBIT_SHOULD_EXEC_RESTRICT but not the tracee.
> + * SECBIT_SHOULD_EXEC_RESTRICT_LOCKED is also checked.
> + */
> +#define SECURE_SHOULD_EXEC_RESTRICT            10
> +#define SECURE_SHOULD_EXEC_RESTRICT_LOCKED     11  /* make bit-8 immutable */
> +
> +#define SECBIT_SHOULD_EXEC_RESTRICT (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
> +#define SECBIT_SHOULD_EXEC_RESTRICT_LOCKED \
> +                       (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT_LOCKED))
> +
>  #define SECURE_ALL_BITS                (issecure_mask(SECURE_NOROOT) | \
>                                  issecure_mask(SECURE_NO_SETUID_FIXUP) | \
>                                  issecure_mask(SECURE_KEEP_CAPS) | \
> -                                issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
> +                                issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \
> +                                issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \
> +                                issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
>  #define SECURE_ALL_LOCKS       (SECURE_ALL_BITS << 1)
>
> +#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \
> +                                issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
> +
>  #endif /* _UAPI_LINUX_SECUREBITS_H */
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 162d96b3a676..34b4493e2a69 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -117,6 +117,33 @@ int cap_settime(const struct timespec64 *ts, const struct timezone *tz)
>         return 0;
>  }
>
> +static bool ptrace_secbits_allowed(const struct cred *tracer,
> +                                  const struct cred *tracee)
> +{
> +       const unsigned long tracer_secbits = SECURE_ALL_UNPRIVILEGED &
> +                                            tracer->securebits;
> +       const unsigned long tracee_secbits = SECURE_ALL_UNPRIVILEGED &
> +                                            tracee->securebits;
> +       /* Ignores locking of unset secure bits (cf. SECURE_ALL_LOCKS). */
> +       const unsigned long tracer_locked = (tracer_secbits << 1) &
> +                                           tracer->securebits;
> +       const unsigned long tracee_locked = (tracee_secbits << 1) &
> +                                           tracee->securebits;
> +
> +       /* The tracee must not have less constraints than the tracer. */
> +       if ((tracer_secbits | tracee_secbits) != tracee_secbits)
> +               return false;
> +
> +       /*
> +        * Makes sure that the tracer's locks for restrictions are the same for
> +        * the tracee.
> +        */
> +       if ((tracer_locked | tracee_locked) != tracee_locked)
> +               return false;
> +
> +       return true;
> +}
> +
>  /**
>   * cap_ptrace_access_check - Determine whether the current process may access
>   *                        another
> @@ -146,7 +173,8 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
>         else
>                 caller_caps = &cred->cap_permitted;
>         if (cred->user_ns == child_cred->user_ns &&
> -           cap_issubset(child_cred->cap_permitted, *caller_caps))
> +           cap_issubset(child_cred->cap_permitted, *caller_caps) &&
> +           ptrace_secbits_allowed(cred, child_cred))
>                 goto out;
>         if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE))
>                 goto out;
> @@ -178,7 +206,8 @@ int cap_ptrace_traceme(struct task_struct *parent)
>         cred = __task_cred(parent);
>         child_cred = current_cred();
>         if (cred->user_ns == child_cred->user_ns &&
> -           cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
> +           cap_issubset(child_cred->cap_permitted, cred->cap_permitted) &&
> +           ptrace_secbits_allowed(cred, child_cred))
>                 goto out;
>         if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE))
>                 goto out;
> @@ -1302,21 +1331,39 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>                      & (old->securebits ^ arg2))                        /*[1]*/
>                     || ((old->securebits & SECURE_ALL_LOCKS & ~arg2))   /*[2]*/
>                     || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS))   /*[3]*/
> -                   || (cap_capable(current_cred(),
> -                                   current_cred()->user_ns,
> -                                   CAP_SETPCAP,
> -                                   CAP_OPT_NONE) != 0)                 /*[4]*/
>                         /*
>                          * [1] no changing of bits that are locked
>                          * [2] no unlocking of locks
>                          * [3] no setting of unsupported bits
> -                        * [4] doing anything requires privilege (go read about
> -                        *     the "sendmail capabilities bug")
>                          */
>                     )
>                         /* cannot change a locked bit */
>                         return -EPERM;
>
> +               /*
> +                * Doing anything requires privilege (go read about the
> +                * "sendmail capabilities bug"), except for unprivileged bits.
> +                * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not
> +                * restrictions enforced by the kernel but by user space on
> +                * itself.  The kernel is only in charge of protecting against
> +                * privilege escalation with ptrace protections.
> +                */
> +               if (cap_capable(current_cred(), current_cred()->user_ns,
> +                               CAP_SETPCAP, CAP_OPT_NONE) != 0) {
> +                       const unsigned long unpriv_and_locks =
> +                               SECURE_ALL_UNPRIVILEGED |
> +                               SECURE_ALL_UNPRIVILEGED << 1;
> +                       const unsigned long changed = old->securebits ^ arg2;
> +
> +                       /* For legacy reason, denies non-change. */
> +                       if (!changed)
> +                               return -EPERM;
> +
> +                       /* Denies privileged changes. */
> +                       if (changed & ~unpriv_and_locks)
> +                               return -EPERM;
> +               }
> +
>                 new = prepare_creds();
>                 if (!new)
>                         return -ENOMEM;
> --
> 2.45.2
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-08 16:08     ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Jeff Xu
@ 2024-07-08 16:25       ` Florian Weimer
  2024-07-08 16:40         ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Florian Weimer @ 2024-07-08 16:25 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

* Jeff Xu:

> Will dynamic linkers use the execveat(AT_CHECK) to check shared
> libraries too ?  or just the main executable itself.

I expect that dynamic linkers will have to do this for everything they
map.  Usually, that does not include the maim program, but this can
happen with explicit loader invocations (“ld.so /bin/true”).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC (was: Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2))
  2024-07-08  8:56         ` Mickaël Salaün
@ 2024-07-08 16:37           ` Florian Weimer
  2024-07-08 17:34             ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC Eric W. Biederman
  2024-07-10 10:05             ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC (was: Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)) Mickaël Salaün
  0 siblings, 2 replies; 103+ messages in thread
From: Florian Weimer @ 2024-07-08 16:37 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Eric Biederman, linux-mm

* Mickaël Salaün:

> On Sat, Jul 06, 2024 at 05:32:12PM +0200, Florian Weimer wrote:
>> * Mickaël Salaün:
>> 
>> > On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
>> >> * Mickaël Salaün:
>> >> 
>> >> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
>> >> > allowed for execution.  The main use case is for script interpreters and
>> >> > dynamic linkers to check execution permission according to the kernel's
>> >> > security policy. Another use case is to add context to access logs e.g.,
>> >> > which script (instead of interpreter) accessed a file.  As any
>> >> > executable code, scripts could also use this check [1].
>> >> 
>> >> Some distributions no longer set executable bits on most shared objects,
>> >> which I assume would interfere with AT_CHECK probing for shared objects.
>> >
>> > A file without the execute permission is not considered as executable by
>> > the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
>> > note that this is just a check, not a restriction.  See the next patch
>> > for the optional policy enforcement.
>> >
>> > Anyway, we need to define the policy, and for Linux this is done with
>> > the file permission bits.  So for systems willing to have a consistent
>> > execution policy, we need to rely on the same bits.
>> 
>> Yes, that makes complete sense.  I just wanted to point out the odd
>> interaction with the old binutils bug and the (sadly still current)
>> kernel bug.
>> 
>> >> Removing the executable bit is attractive because of a combination of
>> >> two bugs: a binutils wart which until recently always set the entry
>> >> point address in the ELF header to zero, and the kernel not checking for
>> >> a zero entry point (maybe in combination with an absent program
>> >> interpreter) and failing the execve with ELIBEXEC, instead of doing the
>> >> execve and then faulting at virtual address zero.  Removing the
>> >> executable bit is currently the only way to avoid these confusing
>> >> crashes, so I understand the temptation.
>> >
>> > Interesting.  Can you please point to the bug report and the fix?  I
>> > don't see any ELIBEXEC in the kernel.
>> 
>> The kernel hasn't been fixed yet.  I do think this should be fixed, so
>> that distributions can bring back the executable bit.
>
> Can you please point to the mailing list discussion or the bug report?

I'm not sure if this was ever reported upstream as an RFE to fail with
ELIBEXEC.  We have downstream bug report:

  Prevent executed .so files with e_entry == 0 from attempting to become
  a process.
  <https://bugzilla.redhat.com/show_bug.cgi?id=2004942>

I've put together a patch which seems to work, see below.

I don't think there's any impact on AT_CHECK with execveat because that
mode will never get to this point.

Thanks,
Florian

---8<-----------------------------------------------------------------
Subject: binfmt_elf: Fail execution of shared objects with ELIBEXEC
    
Historically, binutils has used the start of the text segment as the
entry point if _start was not defined.  Executing such files results
in crashes with random effects, depending on what code resides there.
However, starting with binutils 2.38, BFD ld uses a zero entry point,
due to commit 5226a6a892f922ea672e5775c61776830aaf27b7 ("Change the
linker's heuristic for computing the entry point for binaries so that
shared libraries default to an entry point of 0.").  This means
that shared objects with zero entry points are becoming more common,
and it makes sense for the kernel to recognize them and refuse
to execute them.

For backwards compatibility, if a load segment does not map the ELF
header at file offset zero, the kernel still proceeds as before, in
case the file is very non-standard and can actually start executing
at virtual offset zero.

Signed-off-by: Florian Weimer <fweimer@redhat.com>

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index a43897b03ce9..ebd7052eb616 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -830,6 +830,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	unsigned long e_entry;
 	unsigned long interp_load_addr = 0;
 	unsigned long start_code, end_code, start_data, end_data;
+	bool elf_header_mapped = false;
 	unsigned long reloc_func_desc __maybe_unused = 0;
 	int executable_stack = EXSTACK_DEFAULT;
 	struct elfhdr *elf_ex = (struct elfhdr *)bprm->buf;
@@ -865,6 +866,9 @@ static int load_elf_binary(struct linux_binprm *bprm)
 			continue;
 		}
 
+		if (elf_ppnt->p_type == PT_LOAD && !elf_ppnt->p_offset)
+			elf_header_mapped = true;
+
 		if (elf_ppnt->p_type != PT_INTERP)
 			continue;
 
@@ -921,6 +925,20 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		goto out_free_ph;
 	}
 
+	/*
+	 * A zero value for e_entry means that the ELF file has no
+	 * entry point.  If the ELF header is mapped, this is
+	 * guaranteed to crash (often even on the first instruction),
+	 * so fail the execve system call instead.  (This is most
+	 * likely to happen for a shared object.)  If the object has a
+	 * program interpreter, dealing with the situation is its
+	 * responsibility.
+	 */
+	if (elf_header_mapped && !elf_ex->e_entry && !interpreter) {
+		retval = -ELIBEXEC;
+		goto out_free_dentry;
+	}
+
 	elf_ppnt = elf_phdata;
 	for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++)
 		switch (elf_ppnt->p_type) {


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-08 16:25       ` Florian Weimer
@ 2024-07-08 16:40         ` Jeff Xu
  2024-07-08 17:05           ` Mickaël Salaün
  2024-07-08 17:33           ` Florian Weimer
  0 siblings, 2 replies; 103+ messages in thread
From: Jeff Xu @ 2024-07-08 16:40 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Jeff Xu:
>
> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> > libraries too ?  or just the main executable itself.
>
> I expect that dynamic linkers will have to do this for everything they
> map.
Then all the objects (.so, .sh, etc.) will go through  the check from
execveat's main  to security_bprm_creds_for_exec(), some of them might
be specific for the main executable ?
e.g. ChromeOS uses security_bprm_creds_for_exec to block executable
memfd [1], applying this means automatically extending the block to
the .so object.

I'm not sure if other LSMs need to be updated ?  e.g.  will  SELINUX
check for .so with its process transaction policy ?

[1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3834992

-Jeff


> Usually, that does not include the maim program, but this can
> happen with explicit loader invocations (“ld.so /bin/true”).
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-08 16:40         ` Jeff Xu
@ 2024-07-08 17:05           ` Mickaël Salaün
  2024-07-08 17:33           ` Florian Weimer
  1 sibling, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-08 17:05 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Florian Weimer, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Mon, Jul 08, 2024 at 09:40:45AM -0700, Jeff Xu wrote:
> On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Jeff Xu:
> >
> > > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> > > libraries too ?  or just the main executable itself.
> >
> > I expect that dynamic linkers will have to do this for everything they
> > map.

Correct, that would enable to safely handle LD_PRELOAD for instance.

> Then all the objects (.so, .sh, etc.) will go through  the check from
> execveat's main  to security_bprm_creds_for_exec(), some of them might
> be specific for the main executable ?
> e.g. ChromeOS uses security_bprm_creds_for_exec to block executable
> memfd [1], applying this means automatically extending the block to
> the .so object.

That's a good example of how this AT_CHECK check makes sense.

Landlock will probably get a similar (optional) restriction too:
https://github.com/landlock-lsm/linux/issues/37

> 
> I'm not sure if other LSMs need to be updated ?  e.g.  will  SELINUX
> check for .so with its process transaction policy ?

LSM should not need to be updated with this patch series.  However,
systems/components/containers enabling this new check should make sure
it works with their current policy.

> 
> [1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3834992
> 
> -Jeff
> 
> 
> > Usually, that does not include the maim program, but this can
> > happen with explicit loader invocations (“ld.so /bin/true”).
> >
> > Thanks,
> > Florian
> >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-08 16:40         ` Jeff Xu
  2024-07-08 17:05           ` Mickaël Salaün
@ 2024-07-08 17:33           ` Florian Weimer
  2024-07-08 17:52             ` Jeff Xu
  1 sibling, 1 reply; 103+ messages in thread
From: Florian Weimer @ 2024-07-08 17:33 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

* Jeff Xu:

> On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Jeff Xu:
>>
>> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
>> > libraries too ?  or just the main executable itself.
>>
>> I expect that dynamic linkers will have to do this for everything they
>> map.
> Then all the objects (.so, .sh, etc.) will go through  the check from
> execveat's main  to security_bprm_creds_for_exec(), some of them might
> be specific for the main executable ?

If we want to avoid that, we could have an agreed-upon error code which
the LSM can signal that it'll never fail AT_CHECK checks, so we only
have to perform the extra system call once.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC
  2024-07-08 16:37           ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC (was: Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)) Florian Weimer
@ 2024-07-08 17:34             ` Eric W. Biederman
  2024-07-08 17:59               ` Florian Weimer
  2024-07-10 10:05             ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC (was: Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)) Mickaël Salaün
  1 sibling, 1 reply; 103+ messages in thread
From: Eric W. Biederman @ 2024-07-08 17:34 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jeff Xu, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, linux-mm


Florian Weimer <fweimer@redhat.com> writes:

> * Mickaël Salaün:
>
>> On Sat, Jul 06, 2024 at 05:32:12PM +0200, Florian Weimer wrote:
>>> * Mickaël Salaün:
>>> 
>>> > On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
>>> >> * Mickaël Salaün:
>>> >> 
>>> >> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
>>> >> > allowed for execution.  The main use case is for script interpreters and
>>> >> > dynamic linkers to check execution permission according to the kernel's
>>> >> > security policy. Another use case is to add context to access logs e.g.,
>>> >> > which script (instead of interpreter) accessed a file.  As any
>>> >> > executable code, scripts could also use this check [1].
>>> >> 
>>> >> Some distributions no longer set executable bits on most shared objects,
>>> >> which I assume would interfere with AT_CHECK probing for shared objects.
>>> >
>>> > A file without the execute permission is not considered as executable by
>>> > the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
>>> > note that this is just a check, not a restriction.  See the next patch
>>> > for the optional policy enforcement.
>>> >
>>> > Anyway, we need to define the policy, and for Linux this is done with
>>> > the file permission bits.  So for systems willing to have a consistent
>>> > execution policy, we need to rely on the same bits.
>>> 
>>> Yes, that makes complete sense.  I just wanted to point out the odd
>>> interaction with the old binutils bug and the (sadly still current)
>>> kernel bug.
>>> 
>>> >> Removing the executable bit is attractive because of a combination of
>>> >> two bugs: a binutils wart which until recently always set the entry
>>> >> point address in the ELF header to zero, and the kernel not checking for
>>> >> a zero entry point (maybe in combination with an absent program
>>> >> interpreter) and failing the execve with ELIBEXEC, instead of doing the
>>> >> execve and then faulting at virtual address zero.  Removing the
>>> >> executable bit is currently the only way to avoid these confusing
>>> >> crashes, so I understand the temptation.
>>> >
>>> > Interesting.  Can you please point to the bug report and the fix?  I
>>> > don't see any ELIBEXEC in the kernel.
>>> 
>>> The kernel hasn't been fixed yet.  I do think this should be fixed, so
>>> that distributions can bring back the executable bit.
>>
>> Can you please point to the mailing list discussion or the bug report?
>
> I'm not sure if this was ever reported upstream as an RFE to fail with
> ELIBEXEC.  We have downstream bug report:
>
>   Prevent executed .so files with e_entry == 0 from attempting to become
>   a process.
>   <https://bugzilla.redhat.com/show_bug.cgi?id=2004942>
>
> I've put together a patch which seems to work, see below.
>
> I don't think there's any impact on AT_CHECK with execveat because that
> mode will never get to this point.
>
> Thanks,
> Florian
>
> ---8<-----------------------------------------------------------------
> Subject: binfmt_elf: Fail execution of shared objects with ELIBEXEC
>     
> Historically, binutils has used the start of the text segment as the
> entry point if _start was not defined.  Executing such files results
> in crashes with random effects, depending on what code resides there.
> However, starting with binutils 2.38, BFD ld uses a zero entry point,
> due to commit 5226a6a892f922ea672e5775c61776830aaf27b7 ("Change the
> linker's heuristic for computing the entry point for binaries so that
> shared libraries default to an entry point of 0.").  This means
> that shared objects with zero entry points are becoming more common,
> and it makes sense for the kernel to recognize them and refuse
> to execute them.
>
> For backwards compatibility, if a load segment does not map the ELF
> header at file offset zero, the kernel still proceeds as before, in
> case the file is very non-standard and can actually start executing
> at virtual offset zero.


As written I find the logic of the patch confusing, and slightly wrong.

The program header value e_entry is a virtual address, possibly adjusted
by load_bias.  Which makes testing it against the file offset of a
PT_LOAD segment wrong.  It needs to test against elf_ppnt->p_vaddr.

I think performing an early sanity check to avoid very confusing crashes
seems sensible (as long as it is inexpensive).  This appears inexpensive
enough that we don't care.  This code is also before begin_new_exec
so it is early enough to be meaningful.

I think the check should simply test if e_entry is mapped.  So a range
check please to see if e_entry falls in a PT_LOAD segment.

Having code start at virtual address 0 is a perfectly fine semantically
and might happen in embedded scenarios.

The program header is not required to be mapped or be first, (AKA
p_offset and p_vaddr can have a somewhat arbitrary relationship) so any
mention of the program header in your logic seems confusing to me.

I think your basic structure will work.  Just the first check needs to
check if e_entry is lands inside the virtual address of a PT_LOAD
segment.  The second check should just be checking a variable to see if
e_entry was inside any PT_LOAD segment, and there is no interpreter.

Does that make sense?

Eric


>
> Signed-off-by: Florian Weimer <fweimer@redhat.com>
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index a43897b03ce9..ebd7052eb616 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -830,6 +830,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
>  	unsigned long e_entry;
>  	unsigned long interp_load_addr = 0;
>  	unsigned long start_code, end_code, start_data, end_data;
> +	bool elf_header_mapped = false;
>  	unsigned long reloc_func_desc __maybe_unused = 0;
>  	int executable_stack = EXSTACK_DEFAULT;
>  	struct elfhdr *elf_ex = (struct elfhdr *)bprm->buf;
> @@ -865,6 +866,9 @@ static int load_elf_binary(struct linux_binprm *bprm)
>  			continue;
>  		}
>  
> +		if (elf_ppnt->p_type == PT_LOAD && !elf_ppnt->p_offset)
> +			elf_header_mapped = true;
> +
>  		if (elf_ppnt->p_type != PT_INTERP)
>  			continue;
>  
> @@ -921,6 +925,20 @@ static int load_elf_binary(struct linux_binprm *bprm)
>  		goto out_free_ph;
>  	}
>  
> +	/*
> +	 * A zero value for e_entry means that the ELF file has no
> +	 * entry point.  If the ELF header is mapped, this is
> +	 * guaranteed to crash (often even on the first instruction),
> +	 * so fail the execve system call instead.  (This is most
> +	 * likely to happen for a shared object.)  If the object has a
> +	 * program interpreter, dealing with the situation is its
> +	 * responsibility.
> +	 */
> +	if (elf_header_mapped && !elf_ex->e_entry && !interpreter) {
> +		retval = -ELIBEXEC;
> +		goto out_free_dentry;
> +	}
> +
>  	elf_ppnt = elf_phdata;
>  	for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++)
>  		switch (elf_ppnt->p_type) {

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-08 17:33           ` Florian Weimer
@ 2024-07-08 17:52             ` Jeff Xu
  2024-07-09  9:18               ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-08 17:52 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Jeff Xu:
>
> > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Jeff Xu:
> >>
> >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> >> > libraries too ?  or just the main executable itself.
> >>
> >> I expect that dynamic linkers will have to do this for everything they
> >> map.
> > Then all the objects (.so, .sh, etc.) will go through  the check from
> > execveat's main  to security_bprm_creds_for_exec(), some of them might
> > be specific for the main executable ?
>
> If we want to avoid that, we could have an agreed-upon error code which
> the LSM can signal that it'll never fail AT_CHECK checks, so we only
> have to perform the extra system call once.
>
Right, something like that.
I would prefer not having AT_CHECK specific code in LSM code as an
initial goal, if that works, great.

-Jeff

> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-08 16:17   ` Jeff Xu
@ 2024-07-08 17:53     ` Jeff Xu
  2024-07-08 18:48       ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-08 17:53 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Mon, Jul 8, 2024 at 9:17 AM Jeff Xu <jeffxu@google.com> wrote:
>
> Hi
>
> On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and
> > their *_LOCKED counterparts are designed to be set by processes setting
> > up an execution environment, such as a user session, a container, or a
> > security sandbox.  Like seccomp filters or Landlock domains, the
> > securebits are inherited across proceses.
> >
> > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should
> > check executable resources with execveat(2) + AT_CHECK (see previous
> > patch).
> >
> > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK).
> >
> Do we need both bits ?
> When CHECK is set and RESTRICT is not, the "check fail" executable
> will still get executed, so CHECK is for logging ?
> Does RESTRICT imply CHECK is set, e.g. What if CHECK=0 and RESTRICT = 1 ?
>
The intention might be "permissive mode"?  if so, consider reuse
existing selinux's concept, and still with 2 bits:
SECBIT_SHOULD_EXEC_RESTRICT
SECBIT_SHOULD_EXEC_RESTRICT_PERMISSIVE


-Jeff




> > For a secure environment, we might also want
> > SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED
> > to be set.  For a test environment (e.g. testing on a fleet to identify
> > potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to
> > still be able to identify potential issues (e.g. with interpreters logs
> > or LSMs audit entries).
> >
> > It should be noted that unlike other security bits, the
> > SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are
> > dedicated to user space willing to restrict itself.  Because of that,
> > they only make sense in the context of a trusted environment (e.g.
> > sandbox, container, user session, full system) where the process
> > changing its behavior (according to these bits) and all its parent
> > processes are trusted.  Otherwise, any parent process could just execute
> > its own malicious code (interpreting a script or not), or even enforce a
> > seccomp filter to mask these bits.
> >
> > Such a secure environment can be achieved with an appropriate access
> > control policy (e.g. mount's noexec option, file access rights, LSM
> > configuration) and an enlighten ld.so checking that libraries are
> > allowed for execution e.g., to protect against illegitimate use of
> > LD_PRELOAD.
> >
> > Scripts may need some changes to deal with untrusted data (e.g. stdin,
> > environment variables), but that is outside the scope of the kernel.
> >
> > The only restriction enforced by the kernel is the right to ptrace
> > another process.  Processes are denied to ptrace less restricted ones,
> > unless the tracer has CAP_SYS_PTRACE.  This is mainly a safeguard to
> > avoid trivial privilege escalations e.g., by a debugging process being
> > abused with a confused deputy attack.
> >
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Paul Moore <paul@paul-moore.com>
> > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net
> > ---
> >
> > New design since v18:
> > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > ---
> >  include/uapi/linux/securebits.h | 56 ++++++++++++++++++++++++++++-
> >  security/commoncap.c            | 63 ++++++++++++++++++++++++++++-----
> >  2 files changed, 110 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h
> > index d6d98877ff1a..3fdb0382718b 100644
> > --- a/include/uapi/linux/securebits.h
> > +++ b/include/uapi/linux/securebits.h
> > @@ -52,10 +52,64 @@
> >  #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \
> >                         (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED))
> >
> > +/*
> > + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable
> > + * files with execveat(2) + AT_CHECK.  However, such check should only be
> > + * performed if all to-be-executed code only comes from regular files.  For
> > + * instance, if a script interpreter is called with both a script snipped as
> > + * argument and a regular file, the interpreter should not check any file.
> > + * Doing otherwise would mislead the kernel to think that only the script file
> > + * is being executed, which could for instance lead to unexpected permission
> > + * change and break current use cases.
> > + *
> > + * This secure bit may be set by user session managers, service managers,
> > + * container runtimes, sandboxer tools...  Except for test environments, the
> > + * related SECBIT_SHOULD_EXEC_CHECK_LOCKED bit should also be set.
> > + *
> > + * Ptracing another process is deny if the tracer has SECBIT_SHOULD_EXEC_CHECK
> > + * but not the tracee.  SECBIT_SHOULD_EXEC_CHECK_LOCKED also checked.
> > + */
> > +#define SECURE_SHOULD_EXEC_CHECK               8
> > +#define SECURE_SHOULD_EXEC_CHECK_LOCKED                9  /* make bit-8 immutable */
> > +
> > +#define SECBIT_SHOULD_EXEC_CHECK (issecure_mask(SECURE_SHOULD_EXEC_CHECK))
> > +#define SECBIT_SHOULD_EXEC_CHECK_LOCKED \
> > +                       (issecure_mask(SECURE_SHOULD_EXEC_CHECK_LOCKED))
> > +
> > +/*
> > + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> > + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK).  For
> > + * instance, script interpreters called with a script snippet as argument
> > + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set.
> > + * However, if a script interpreter is called with both
> > + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should
> > + * interpret the provided script files if no unchecked code is also provided
> > + * (e.g. directly as argument).
> > + *
> > + * This secure bit may be set by user session managers, service managers,
> > + * container runtimes, sandboxer tools...  Except for test environments, the
> > + * related SECBIT_SHOULD_EXEC_RESTRICT_LOCKED bit should also be set.
> > + *
> > + * Ptracing another process is deny if the tracer has
> > + * SECBIT_SHOULD_EXEC_RESTRICT but not the tracee.
> > + * SECBIT_SHOULD_EXEC_RESTRICT_LOCKED is also checked.
> > + */
> > +#define SECURE_SHOULD_EXEC_RESTRICT            10
> > +#define SECURE_SHOULD_EXEC_RESTRICT_LOCKED     11  /* make bit-8 immutable */
> > +
> > +#define SECBIT_SHOULD_EXEC_RESTRICT (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
> > +#define SECBIT_SHOULD_EXEC_RESTRICT_LOCKED \
> > +                       (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT_LOCKED))
> > +
> >  #define SECURE_ALL_BITS                (issecure_mask(SECURE_NOROOT) | \
> >                                  issecure_mask(SECURE_NO_SETUID_FIXUP) | \
> >                                  issecure_mask(SECURE_KEEP_CAPS) | \
> > -                                issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
> > +                                issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \
> > +                                issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \
> > +                                issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
> >  #define SECURE_ALL_LOCKS       (SECURE_ALL_BITS << 1)
> >
> > +#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \
> > +                                issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
> > +
> >  #endif /* _UAPI_LINUX_SECUREBITS_H */
> > diff --git a/security/commoncap.c b/security/commoncap.c
> > index 162d96b3a676..34b4493e2a69 100644
> > --- a/security/commoncap.c
> > +++ b/security/commoncap.c
> > @@ -117,6 +117,33 @@ int cap_settime(const struct timespec64 *ts, const struct timezone *tz)
> >         return 0;
> >  }
> >
> > +static bool ptrace_secbits_allowed(const struct cred *tracer,
> > +                                  const struct cred *tracee)
> > +{
> > +       const unsigned long tracer_secbits = SECURE_ALL_UNPRIVILEGED &
> > +                                            tracer->securebits;
> > +       const unsigned long tracee_secbits = SECURE_ALL_UNPRIVILEGED &
> > +                                            tracee->securebits;
> > +       /* Ignores locking of unset secure bits (cf. SECURE_ALL_LOCKS). */
> > +       const unsigned long tracer_locked = (tracer_secbits << 1) &
> > +                                           tracer->securebits;
> > +       const unsigned long tracee_locked = (tracee_secbits << 1) &
> > +                                           tracee->securebits;
> > +
> > +       /* The tracee must not have less constraints than the tracer. */
> > +       if ((tracer_secbits | tracee_secbits) != tracee_secbits)
> > +               return false;
> > +
> > +       /*
> > +        * Makes sure that the tracer's locks for restrictions are the same for
> > +        * the tracee.
> > +        */
> > +       if ((tracer_locked | tracee_locked) != tracee_locked)
> > +               return false;
> > +
> > +       return true;
> > +}
> > +
> >  /**
> >   * cap_ptrace_access_check - Determine whether the current process may access
> >   *                        another
> > @@ -146,7 +173,8 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
> >         else
> >                 caller_caps = &cred->cap_permitted;
> >         if (cred->user_ns == child_cred->user_ns &&
> > -           cap_issubset(child_cred->cap_permitted, *caller_caps))
> > +           cap_issubset(child_cred->cap_permitted, *caller_caps) &&
> > +           ptrace_secbits_allowed(cred, child_cred))
> >                 goto out;
> >         if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE))
> >                 goto out;
> > @@ -178,7 +206,8 @@ int cap_ptrace_traceme(struct task_struct *parent)
> >         cred = __task_cred(parent);
> >         child_cred = current_cred();
> >         if (cred->user_ns == child_cred->user_ns &&
> > -           cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
> > +           cap_issubset(child_cred->cap_permitted, cred->cap_permitted) &&
> > +           ptrace_secbits_allowed(cred, child_cred))
> >                 goto out;
> >         if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE))
> >                 goto out;
> > @@ -1302,21 +1331,39 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
> >                      & (old->securebits ^ arg2))                        /*[1]*/
> >                     || ((old->securebits & SECURE_ALL_LOCKS & ~arg2))   /*[2]*/
> >                     || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS))   /*[3]*/
> > -                   || (cap_capable(current_cred(),
> > -                                   current_cred()->user_ns,
> > -                                   CAP_SETPCAP,
> > -                                   CAP_OPT_NONE) != 0)                 /*[4]*/
> >                         /*
> >                          * [1] no changing of bits that are locked
> >                          * [2] no unlocking of locks
> >                          * [3] no setting of unsupported bits
> > -                        * [4] doing anything requires privilege (go read about
> > -                        *     the "sendmail capabilities bug")
> >                          */
> >                     )
> >                         /* cannot change a locked bit */
> >                         return -EPERM;
> >
> > +               /*
> > +                * Doing anything requires privilege (go read about the
> > +                * "sendmail capabilities bug"), except for unprivileged bits.
> > +                * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not
> > +                * restrictions enforced by the kernel but by user space on
> > +                * itself.  The kernel is only in charge of protecting against
> > +                * privilege escalation with ptrace protections.
> > +                */
> > +               if (cap_capable(current_cred(), current_cred()->user_ns,
> > +                               CAP_SETPCAP, CAP_OPT_NONE) != 0) {
> > +                       const unsigned long unpriv_and_locks =
> > +                               SECURE_ALL_UNPRIVILEGED |
> > +                               SECURE_ALL_UNPRIVILEGED << 1;
> > +                       const unsigned long changed = old->securebits ^ arg2;
> > +
> > +                       /* For legacy reason, denies non-change. */
> > +                       if (!changed)
> > +                               return -EPERM;
> > +
> > +                       /* Denies privileged changes. */
> > +                       if (changed & ~unpriv_and_locks)
> > +                               return -EPERM;
> > +               }
> > +
> >                 new = prepare_creds();
> >                 if (!new)
> >                         return -ENOMEM;
> > --
> > 2.45.2
> >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC
  2024-07-08 17:34             ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC Eric W. Biederman
@ 2024-07-08 17:59               ` Florian Weimer
  0 siblings, 0 replies; 103+ messages in thread
From: Florian Weimer @ 2024-07-08 17:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jeff Xu, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, linux-mm

* Eric W. Biederman:

> As written I find the logic of the patch confusing, and slightly wrong.
>
> The program header value e_entry is a virtual address, possibly adjusted
> by load_bias.  Which makes testing it against the file offset of a
> PT_LOAD segment wrong.  It needs to test against elf_ppnt->p_vaddr.

I think we need to test both against zero, or maybe invert the logic: if
something is mapped at virtual address zero that doesn't come from a
zero file offset, we disable the ELIBEXEC check.

> I think performing an early sanity check to avoid very confusing crashes
> seems sensible (as long as it is inexpensive).  This appears inexpensive
> enough that we don't care.  This code is also before begin_new_exec
> so it is early enough to be meaningful.

Yeah, it was quite confusing when it was after begin_new_exec because
the ELIBEXEC error is visible under strace, and then the SIGSEGV comes …

> I think the check should simply test if e_entry is mapped.  So a range
> check please to see if e_entry falls in a PT_LOAD segment.

It's usually mapped even with e_entry ==0 because the ELF header is
loaded at virtual address zero for ET_DYN using the default linker flags
(and this is the case we care about).  With -z noseparate-code, it is
even mapped executable.

> Having code start at virtual address 0 is a perfectly fine semantically
> and might happen in embedded scenarios.

To keep supporting this case, we need to check that the ELF header is at
address zero, because we make a leap of faith and assume it's not really
executable even if it is mapped as such because due to its role in the
file format, it does not contain executable instructions.  That's why
the patch is focused on the ELF header.

I could remove all these checks and just return ELIBEXEC for a zero
entry point.  I think this is valid based on the ELF specification, but
it may have a backwards compatibility impact.

> The program header is not required to be mapped or be first, (AKA
> p_offset and p_vaddr can have a somewhat arbitrary relationship) so any
> mention of the program header in your logic seems confusing to me.

It's the ELF header.

> I think your basic structure will work.  Just the first check needs to
> check if e_entry is lands inside the virtual address of a PT_LOAD
> segment.  The second check should just be checking a variable to see if
> e_entry was inside any PT_LOAD segment, and there is no interpreter.

I think the range check doesn't help here.  Just checking p_vaddr for
zero in addition to p_offset should be sufficient.  If you agree, can
test and send an updated patch.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-08 17:53     ` Jeff Xu
@ 2024-07-08 18:48       ` Mickaël Salaün
  2024-07-08 21:15         ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-08 18:48 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Mon, Jul 08, 2024 at 10:53:11AM -0700, Jeff Xu wrote:
> On Mon, Jul 8, 2024 at 9:17 AM Jeff Xu <jeffxu@google.com> wrote:
> >
> > Hi
> >
> > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and
> > > their *_LOCKED counterparts are designed to be set by processes setting
> > > up an execution environment, such as a user session, a container, or a
> > > security sandbox.  Like seccomp filters or Landlock domains, the
> > > securebits are inherited across proceses.
> > >
> > > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should
> > > check executable resources with execveat(2) + AT_CHECK (see previous
> > > patch).
> > >
> > > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> > > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK).
> > >
> > Do we need both bits ?
> > When CHECK is set and RESTRICT is not, the "check fail" executable
> > will still get executed, so CHECK is for logging ?
> > Does RESTRICT imply CHECK is set, e.g. What if CHECK=0 and RESTRICT = 1 ?
> >
> The intention might be "permissive mode"?  if so, consider reuse
> existing selinux's concept, and still with 2 bits:
> SECBIT_SHOULD_EXEC_RESTRICT
> SECBIT_SHOULD_EXEC_RESTRICT_PERMISSIVE

SECBIT_SHOULD_EXEC_CHECK is for user space to check with execveat+AT_CHECK.

SECBIT_SHOULD_EXEC_RESTRICT is for user space to restrict execution by
default, and potentially allow some exceptions from the list of
checked-and-allowed files, if SECBIT_SHOULD_EXEC_CHECK is set.

Without SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT is to deny
any kind of execution/interpretation.

With only SECBIT_SHOULD_EXEC_CHECK, user space should just check and log
any denied access, but ignore them.  So yes, it is similar to the
SELinux's permissive mode.

This is explained in the next patch as comments.

The *_LOCKED variants are useful and part of the securebits concept.

> 
> 
> -Jeff
> 
> 
> 
> 
> > > For a secure environment, we might also want
> > > SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED
> > > to be set.  For a test environment (e.g. testing on a fleet to identify
> > > potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to
> > > still be able to identify potential issues (e.g. with interpreters logs
> > > or LSMs audit entries).
> > >
> > > It should be noted that unlike other security bits, the
> > > SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are
> > > dedicated to user space willing to restrict itself.  Because of that,
> > > they only make sense in the context of a trusted environment (e.g.
> > > sandbox, container, user session, full system) where the process
> > > changing its behavior (according to these bits) and all its parent
> > > processes are trusted.  Otherwise, any parent process could just execute
> > > its own malicious code (interpreting a script or not), or even enforce a
> > > seccomp filter to mask these bits.
> > >
> > > Such a secure environment can be achieved with an appropriate access
> > > control policy (e.g. mount's noexec option, file access rights, LSM
> > > configuration) and an enlighten ld.so checking that libraries are
> > > allowed for execution e.g., to protect against illegitimate use of
> > > LD_PRELOAD.
> > >
> > > Scripts may need some changes to deal with untrusted data (e.g. stdin,
> > > environment variables), but that is outside the scope of the kernel.
> > >
> > > The only restriction enforced by the kernel is the right to ptrace
> > > another process.  Processes are denied to ptrace less restricted ones,
> > > unless the tracer has CAP_SYS_PTRACE.  This is mainly a safeguard to
> > > avoid trivial privilege escalations e.g., by a debugging process being
> > > abused with a confused deputy attack.
> > >
> > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net
> > > ---

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-05 17:53     ` Mickaël Salaün
@ 2024-07-08 19:38       ` Kees Cook
  0 siblings, 0 replies; 103+ messages in thread
From: Kees Cook @ 2024-07-08 19:38 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Fri, Jul 05, 2024 at 07:53:10PM +0200, Mickaël Salaün wrote:
> On Thu, Jul 04, 2024 at 05:04:03PM -0700, Kees Cook wrote:
> > On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote:
> > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > allowed for execution.  The main use case is for script interpreters and
> > > dynamic linkers to check execution permission according to the kernel's
> > > security policy. Another use case is to add context to access logs e.g.,
> > > which script (instead of interpreter) accessed a file.  As any
> > > executable code, scripts could also use this check [1].
> > > 
> > > This is different than faccessat(2) which only checks file access
> > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > real execution, user space gets the same error codes.
> > 
> > Nice! I much prefer this method of going through the exec machinery so
> > we always have a single code path for these kinds of checks.
> > 
> > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > make sense for the kernel to parse the checked files, look for
> > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > if the format is unknown.  Because of that, security_bprm_check() is
> > > never called when AT_CHECK is used.
> > 
> > I'd like some additional comments in the code that reminds us that
> > access control checks have finished past a certain point.
> 
> Where in the code? Just before the bprm->is_check assignment?

Yeah, that's what I was thinking.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 5/5] samples/should-exec: Add set-should-exec
  2024-07-04 19:01 ` [RFC PATCH v19 5/5] samples/should-exec: Add set-should-exec Mickaël Salaün
@ 2024-07-08 19:40   ` Mimi Zohar
  2024-07-09 20:42     ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Mimi Zohar @ 2024-07-08 19:40 UTC (permalink / raw)
  To: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o
  Cc: Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

Hi Mickaël,

On Thu, 2024-07-04 at 21:01 +0200, Mickaël Salaün wrote:
> Add a simple tool to set SECBIT_SHOULD_EXEC_CHECK,
> SECBIT_SHOULD_EXEC_RESTRICT, and their lock counterparts before
> executing a command.  This should be useful to easily test against
> script interpreters.

The print_usage() provides the calling syntax.  Could you provide an example of
how to use it and what to expect?

thanks,

Mimi


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-04 19:01 [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mickaël Salaün
                   ` (4 preceding siblings ...)
  2024-07-04 19:01 ` [RFC PATCH v19 5/5] samples/should-exec: Add set-should-exec Mickaël Salaün
@ 2024-07-08 20:35 ` Mimi Zohar
  2024-07-09 20:43   ` Mickaël Salaün
  2024-07-15 20:16 ` Jonathan Corbet
  6 siblings, 1 reply; 103+ messages in thread
From: Mimi Zohar @ 2024-07-08 20:35 UTC (permalink / raw)
  To: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o
  Cc: Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

Hi Mickaël,

On Thu, 2024-07-04 at 21:01 +0200, Mickaël Salaün wrote:
> Hi,
> 
> The ultimate goal of this patch series is to be able to ensure that
> direct file execution (e.g. ./script.sh) and indirect file execution
> (e.g. sh script.sh) lead to the same result, especially from a security
> point of view.
> 
> Overview
> --------
> 
> This patch series is a new approach of the initial O_MAYEXEC feature,
> and a revamp of the previous patch series.  Taking into account the last
> reviews [1], we now stick to the kernel semantic for file executability.
> One major change is the clear split between access check and policy
> management.
> 
> The first patch brings the AT_CHECK flag to execveat(2).  The goal is to
> enable user space to check if a file could be executed (by the kernel).
> Unlike stat(2) that only checks file permissions, execveat2(2) +
> AT_CHECK take into account the full context, including mount points
> (noexec), caller's limits, and all potential LSM extra checks (e.g.
> argv, envp, credentials).
> 
> The second patch brings two new securebits used to set or get a security
> policy for a set of processes.  For this to be meaningful, all
> executable code needs to be trusted.  In practice, this means that
> (malicious) users can be restricted to only run scripts provided (and
> trusted) by the system.
> 
> [1] https://lore.kernel.org/r/CAHk-=wjPGNLyzeBMWdQu+kUdQLHQugznwY7CvWjmvNW47D5sog@mail.gmail.com
> 
> Script execution
> ----------------
> 
> One important thing to keep in mind is that the goal of this patch
> series is to get the same security restrictions with these commands:
> * ./script.py
> * python script.py
> * python < script.py
> * python -m script.pyT

This is really needed, but is it the "only" purpose of this patch set or can it
be used to also monitor files the script opens (for read) with the intention of
executing.

> 
> However, on secure systems, we should be able to forbid these commands
> because there is no way to reliably identify the origin of the script:
> * xargs -a script.py -d '\r' -- python -c
> * cat script.py | python
> * python
> 
> Background
> ----------
> 
> Compared to the previous patch series, there is no more dedicated
> syscall nor sysctl configuration.  This new patch series only add new
> flags: one for execveat(2) and four for prctl(2).
> 
> This kind of script interpreter restriction may already be used in
> hardened systems, which may need to fork interpreters and install
> different versions of the binaries.  This mechanism should enable to
> avoid the use of duplicate binaries (and potential forked source code)
> for secure interpreters (e.g. secure Python [2]) by making it possible
> to dynamically enforce restrictions or not.
> 
> The ability to control script execution is also required to close a
> major IMA measurement/appraisal interpreter integrity [3].

Definitely.  But it isn't limited to controlling script execution, but also
measuring the script.  Will it be possible to measure and appraise the indirect
script calls with this patch set?

Mimi

> This new execveat + AT_CHECK should not be confused with the O_EXEC flag
> (for open) which is intended for execute-only, which obviously doesn't
> work for scripts.
> 
> I gave a talk about controlling script execution where I explain the
> previous approaches [4].  The design of the WIP RFC I talked about
> changed quite a bit since then.
> 
> [2] https://github.com/zooba/spython
> [3] https://lore.kernel.org/lkml/20211014130125.6991-1-zohar@linux.ibm.com/
> [4] https://lssna2023.sched.com/event/1K7bO
> 


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-08 18:48       ` Mickaël Salaün
@ 2024-07-08 21:15         ` Jeff Xu
  2024-07-08 21:25           ` Steve Dower
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-08 21:15 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Mon, Jul 8, 2024 at 11:48 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Mon, Jul 08, 2024 at 10:53:11AM -0700, Jeff Xu wrote:
> > On Mon, Jul 8, 2024 at 9:17 AM Jeff Xu <jeffxu@google.com> wrote:
> > >
> > > Hi
> > >
> > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and
> > > > their *_LOCKED counterparts are designed to be set by processes setting
> > > > up an execution environment, such as a user session, a container, or a
> > > > security sandbox.  Like seccomp filters or Landlock domains, the
> > > > securebits are inherited across proceses.
> > > >
> > > > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should
> > > > check executable resources with execveat(2) + AT_CHECK (see previous
> > > > patch).
> > > >
> > > > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> > > > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK).
> > > >
> > > Do we need both bits ?
> > > When CHECK is set and RESTRICT is not, the "check fail" executable
> > > will still get executed, so CHECK is for logging ?
> > > Does RESTRICT imply CHECK is set, e.g. What if CHECK=0 and RESTRICT = 1 ?
> > >
> > The intention might be "permissive mode"?  if so, consider reuse
> > existing selinux's concept, and still with 2 bits:
> > SECBIT_SHOULD_EXEC_RESTRICT
> > SECBIT_SHOULD_EXEC_RESTRICT_PERMISSIVE
>
> SECBIT_SHOULD_EXEC_CHECK is for user space to check with execveat+AT_CHECK.
>
> SECBIT_SHOULD_EXEC_RESTRICT is for user space to restrict execution by
> default, and potentially allow some exceptions from the list of
> checked-and-allowed files, if SECBIT_SHOULD_EXEC_CHECK is set.
>
> Without SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT is to deny
> any kind of execution/interpretation.
>
Do you mean "deny any kinds of executable/interpretation" or just
those that failed with "AT_CHECK"  ( I assume this)?

> With only SECBIT_SHOULD_EXEC_CHECK, user space should just check and log
> any denied access, but ignore them.  So yes, it is similar to the
> SELinux's permissive mode.
>
IIUC:
CHECK=0, RESTRICT=0: do nothing, current behavior
CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results.
CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception.
CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except
those in the "checked-and-allowed" list.

So CHECK is basically trying to form a allowlist?
If there is a need for a allowlist, that is the task of "interruptor
or dynamic linker" to maintain this list, and the list is known in
advance, i.e. not something from execveat(AT_CHECK), and kernel
shouldn't have the knowledge of this allowlist.
Secondly, the concept of allow-list  seems to be an attack factor for
me, I would rather it be fully enforced, or permissive mode.
And Check=1 and RESTRICT=1 is less secure than CHECK=0, RESTRICT=1,
this might also be not obvious to dev.

Unless I understood the CHECK wrong.

> This is explained in the next patch as comments.
>
The next patch is a selftest patch, it is better to define them in the
current commit and in the securebits.h.

> The *_LOCKED variants are useful and part of the securebits concept.
>
The locked state is easy to understand.

Thanks
Best regards
-Jeff

> >
> >
> > -Jeff
> >
> >
> >
> >
> > > > For a secure environment, we might also want
> > > > SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED
> > > > to be set.  For a test environment (e.g. testing on a fleet to identify
> > > > potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to
> > > > still be able to identify potential issues (e.g. with interpreters logs
> > > > or LSMs audit entries).
> > > >
> > > > It should be noted that unlike other security bits, the
> > > > SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are
> > > > dedicated to user space willing to restrict itself.  Because of that,
> > > > they only make sense in the context of a trusted environment (e.g.
> > > > sandbox, container, user session, full system) where the process
> > > > changing its behavior (according to these bits) and all its parent
> > > > processes are trusted.  Otherwise, any parent process could just execute
> > > > its own malicious code (interpreting a script or not), or even enforce a
> > > > seccomp filter to mask these bits.
> > > >
> > > > Such a secure environment can be achieved with an appropriate access
> > > > control policy (e.g. mount's noexec option, file access rights, LSM
> > > > configuration) and an enlighten ld.so checking that libraries are
> > > > allowed for execution e.g., to protect against illegitimate use of
> > > > LD_PRELOAD.
> > > >
> > > > Scripts may need some changes to deal with untrusted data (e.g. stdin,
> > > > environment variables), but that is outside the scope of the kernel.
> > > >
> > > > The only restriction enforced by the kernel is the right to ptrace
> > > > another process.  Processes are denied to ptrace less restricted ones,
> > > > unless the tracer has CAP_SYS_PTRACE.  This is mainly a safeguard to
> > > > avoid trivial privilege escalations e.g., by a debugging process being
> > > > abused with a confused deputy attack.
> > > >
> > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net
> > > > ---

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-08 21:15         ` Jeff Xu
@ 2024-07-08 21:25           ` Steve Dower
  2024-07-08 22:07             ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Steve Dower @ 2024-07-08 21:25 UTC (permalink / raw)
  To: Jeff Xu, Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On 08/07/2024 22:15, Jeff Xu wrote:
> IIUC:
> CHECK=0, RESTRICT=0: do nothing, current behavior
> CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results.
> CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception.
> CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except
> those in the "checked-and-allowed" list.

I had much the same question for Mickaël while working on this.

Essentially, "CHECK=0, RESTRICT=1" means to restrict without checking. 
In the context of a script or macro interpreter, this just means it will 
never interpret any scripts. Non-binary code execution is fully disabled 
in any part of the process that respects these bits.

"CHECK=1, RESTRICT=1" means to restrict unless AT_CHECK passes. This 
case is the allow list (or whatever mechanism is being used to determine 
the result of an AT_CHECK check). The actual mechanism isn't the 
business of the script interpreter at all, it just has to refuse to 
execute anything that doesn't pass the check. So a generic interpreter 
can implement a generic mechanism and leave the specifics to whoever 
configures the machine.

The other two case are more obvious. "CHECK=0, RESTRICT=0" is the 
zero-overhead case, while "CHECK=1, RESTRICT=0" might log, warn, or 
otherwise audit the result of the check, but it won't restrict execution.

Cheers,
Steve

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-08 21:25           ` Steve Dower
@ 2024-07-08 22:07             ` Jeff Xu
  2024-07-09 20:42               ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-08 22:07 UTC (permalink / raw)
  To: Steve Dower
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Mon, Jul 8, 2024 at 2:25 PM Steve Dower <steve.dower@python.org> wrote:
>
> On 08/07/2024 22:15, Jeff Xu wrote:
> > IIUC:
> > CHECK=0, RESTRICT=0: do nothing, current behavior
> > CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results.
> > CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception.
> > CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except
> > those in the "checked-and-allowed" list.
>
> I had much the same question for Mickaël while working on this.
>
> Essentially, "CHECK=0, RESTRICT=1" means to restrict without checking.
> In the context of a script or macro interpreter, this just means it will
> never interpret any scripts. Non-binary code execution is fully disabled
> in any part of the process that respects these bits.
>
I see, so Mickaël does mean this will block all scripts.
I guess, in the context of dynamic linker, this means: no more .so
loading, even "dlopen" is called by an app ?  But this will make the
execve()  fail.

> "CHECK=1, RESTRICT=1" means to restrict unless AT_CHECK passes. This
> case is the allow list (or whatever mechanism is being used to determine
> the result of an AT_CHECK check). The actual mechanism isn't the
> business of the script interpreter at all, it just has to refuse to
> execute anything that doesn't pass the check. So a generic interpreter
> can implement a generic mechanism and leave the specifics to whoever
> configures the machine.
>
In the context of dynamic linker. this means:
if .so passed the AT_CHECK, ldopen() can still load it.
If .so fails the AT_CHECK, ldopen() will fail too.

Thanks
-Jeff

> The other two case are more obvious. "CHECK=0, RESTRICT=0" is the
> zero-overhead case, while "CHECK=1, RESTRICT=0" might log, warn, or
> otherwise audit the result of the check, but it won't restrict execution.
>
> Cheers,
> Steve

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-08 17:52             ` Jeff Xu
@ 2024-07-09  9:18               ` Mickaël Salaün
  2024-07-09 10:05                 ` Florian Weimer
  2024-07-09 18:57                 ` Jeff Xu
  0 siblings, 2 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-09  9:18 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Florian Weimer, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Mon, Jul 08, 2024 at 10:52:36AM -0700, Jeff Xu wrote:
> On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Jeff Xu:
> >
> > > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >>
> > >> * Jeff Xu:
> > >>
> > >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> > >> > libraries too ?  or just the main executable itself.
> > >>
> > >> I expect that dynamic linkers will have to do this for everything they
> > >> map.
> > > Then all the objects (.so, .sh, etc.) will go through  the check from
> > > execveat's main  to security_bprm_creds_for_exec(), some of them might
> > > be specific for the main executable ?

Yes, we should check every executable code (including seccomp filters)
to get a consistent policy.

What do you mean by "specific for the main executable"?

> >
> > If we want to avoid that, we could have an agreed-upon error code which
> > the LSM can signal that it'll never fail AT_CHECK checks, so we only
> > have to perform the extra system call once.

I'm not sure to follow.  Either we check executable code or we don't,
but it doesn't make sense to only check some parts (except for migration
of user space code in a system, which is one purpose of the securebits
added with the next patch).

The idea with AT_CHECK is to unconditionnaly check executable right the
same way it is checked when a file is executed.  User space can decide
to check that or not according to its policy (i.e. securebits).

> >
> Right, something like that.
> I would prefer not having AT_CHECK specific code in LSM code as an
> initial goal, if that works, great.

LSMs should not need to change anything, but they are free to implement
new access right according to AT_CHECK.

> 
> -Jeff
> 
> > Thanks,
> > Florian
> >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-09  9:18               ` Mickaël Salaün
@ 2024-07-09 10:05                 ` Florian Weimer
  2024-07-09 20:42                   ` Mickaël Salaün
  2024-07-09 18:57                 ` Jeff Xu
  1 sibling, 1 reply; 103+ messages in thread
From: Florian Weimer @ 2024-07-09 10:05 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Jeff Xu, Al Viro, Christian Brauner, Kees Cook, Linus Torvalds,
	Paul Moore, Theodore Ts'o, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

* Mickaël Salaün:

>> > If we want to avoid that, we could have an agreed-upon error code which
>> > the LSM can signal that it'll never fail AT_CHECK checks, so we only
>> > have to perform the extra system call once.
>
> I'm not sure to follow.  Either we check executable code or we don't,
> but it doesn't make sense to only check some parts (except for migration
> of user space code in a system, which is one purpose of the securebits
> added with the next patch).
>
> The idea with AT_CHECK is to unconditionnaly check executable right the
> same way it is checked when a file is executed.  User space can decide
> to check that or not according to its policy (i.e. securebits).

I meant it purely as a performance optimization, to skip future system
calls if we know they won't provide any useful information for this
process.  In the grand scheme of things, the extra system call probably
does not matter because we already have to do costly things like mmap.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-09  9:18               ` Mickaël Salaün
  2024-07-09 10:05                 ` Florian Weimer
@ 2024-07-09 18:57                 ` Jeff Xu
  2024-07-09 20:41                   ` Mickaël Salaün
  1 sibling, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-09 18:57 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Florian Weimer, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Tue, Jul 9, 2024 at 2:18 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Mon, Jul 08, 2024 at 10:52:36AM -0700, Jeff Xu wrote:
> > On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * Jeff Xu:
> > >
> > > > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >>
> > > >> * Jeff Xu:
> > > >>
> > > >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> > > >> > libraries too ?  or just the main executable itself.
> > > >>
> > > >> I expect that dynamic linkers will have to do this for everything they
> > > >> map.
> > > > Then all the objects (.so, .sh, etc.) will go through  the check from
> > > > execveat's main  to security_bprm_creds_for_exec(), some of them might
> > > > be specific for the main executable ?
>
> Yes, we should check every executable code (including seccomp filters)
> to get a consistent policy.
>
> What do you mean by "specific for the main executable"?
>
I meant:

The check is for the exe itself, not .so, etc.

For example:  /usr/bin/touch is checked.
not the shared objects:
ldd /usr/bin/touch
linux-vdso.so.1 (0x00007ffdc988f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f59b6757000)
/lib64/ld-linux-x86-64.so.2 (0x00007f59b6986000)

Basically, I asked if the check can be extended to shared-objects,
seccomp filters, etc, without modifying existing LSMs.
you pointed out "LSM should not need to be updated with this patch
series.", which already answered my question.

Thanks.
-Jeff

-Jeff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-09 18:57                 ` Jeff Xu
@ 2024-07-09 20:41                   ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-09 20:41 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Florian Weimer, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Tue, Jul 09, 2024 at 11:57:27AM -0700, Jeff Xu wrote:
> On Tue, Jul 9, 2024 at 2:18 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Mon, Jul 08, 2024 at 10:52:36AM -0700, Jeff Xu wrote:
> > > On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >
> > > > * Jeff Xu:
> > > >
> > > > > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > > >>
> > > > >> * Jeff Xu:
> > > > >>
> > > > >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> > > > >> > libraries too ?  or just the main executable itself.
> > > > >>
> > > > >> I expect that dynamic linkers will have to do this for everything they
> > > > >> map.
> > > > > Then all the objects (.so, .sh, etc.) will go through  the check from
> > > > > execveat's main  to security_bprm_creds_for_exec(), some of them might
> > > > > be specific for the main executable ?
> >
> > Yes, we should check every executable code (including seccomp filters)
> > to get a consistent policy.
> >
> > What do you mean by "specific for the main executable"?
> >
> I meant:
> 
> The check is for the exe itself, not .so, etc.
> 
> For example:  /usr/bin/touch is checked.
> not the shared objects:
> ldd /usr/bin/touch
> linux-vdso.so.1 (0x00007ffdc988f000)
> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f59b6757000)
> /lib64/ld-linux-x86-64.so.2 (0x00007f59b6986000)

ld.so should be patched to check shared-objects.

> 
> Basically, I asked if the check can be extended to shared-objects,
> seccomp filters, etc, without modifying existing LSMs.

Yes, the check should be used against any piece of code such as
shared-objects, seccomp filters...

> you pointed out "LSM should not need to be updated with this patch
> series.", which already answered my question.
> 
> Thanks.
> -Jeff
> 
> -Jeff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-09 10:05                 ` Florian Weimer
@ 2024-07-09 20:42                   ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-09 20:42 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Jeff Xu, Al Viro, Christian Brauner, Kees Cook, Linus Torvalds,
	Paul Moore, Theodore Ts'o, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Tue, Jul 09, 2024 at 12:05:50PM +0200, Florian Weimer wrote:
> * Mickaël Salaün:
> 
> >> > If we want to avoid that, we could have an agreed-upon error code which
> >> > the LSM can signal that it'll never fail AT_CHECK checks, so we only
> >> > have to perform the extra system call once.
> >
> > I'm not sure to follow.  Either we check executable code or we don't,
> > but it doesn't make sense to only check some parts (except for migration
> > of user space code in a system, which is one purpose of the securebits
> > added with the next patch).
> >
> > The idea with AT_CHECK is to unconditionnaly check executable right the
> > same way it is checked when a file is executed.  User space can decide
> > to check that or not according to its policy (i.e. securebits).
> 
> I meant it purely as a performance optimization, to skip future system
> calls if we know they won't provide any useful information for this
> process.  In the grand scheme of things, the extra system call probably
> does not matter because we already have to do costly things like mmap.

Indeed, the performance impact of execveat+AT_CHECK should be negligible
compared to everything else needed to interpret a script or spawn a
process.  Moreover, these checks should only be performed when
SECBIT_SHOULD_EXEC_CHECK is set for the caller.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-08 22:07             ` Jeff Xu
@ 2024-07-09 20:42               ` Mickaël Salaün
  2024-07-09 21:57                 ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-09 20:42 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Steve Dower, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Mon, Jul 08, 2024 at 03:07:24PM -0700, Jeff Xu wrote:
> On Mon, Jul 8, 2024 at 2:25 PM Steve Dower <steve.dower@python.org> wrote:
> >
> > On 08/07/2024 22:15, Jeff Xu wrote:
> > > IIUC:
> > > CHECK=0, RESTRICT=0: do nothing, current behavior
> > > CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results.
> > > CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception.
> > > CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except
> > > those in the "checked-and-allowed" list.
> >
> > I had much the same question for Mickaël while working on this.
> >
> > Essentially, "CHECK=0, RESTRICT=1" means to restrict without checking.
> > In the context of a script or macro interpreter, this just means it will
> > never interpret any scripts. Non-binary code execution is fully disabled
> > in any part of the process that respects these bits.
> >
> I see, so Mickaël does mean this will block all scripts.

That is the initial idea.

> I guess, in the context of dynamic linker, this means: no more .so
> loading, even "dlopen" is called by an app ?  But this will make the
> execve()  fail.

Hmm, I'm not sure this "CHECK=0, RESTRICT=1" configuration would make
sense for a dynamic linker except maybe if we want to only allow static
binaries?

The CHECK and RESTRICT securebits are designed to make it possible a
"permissive mode" and an enforcement mode with the related locked
securebits.  This is why this "CHECK=0, RESTRICT=1" combination looks a
bit weird.  We can replace these securebits with others but I didn't
find a better (and simple) option.  I don't think this is an issue
because with any security policy we can create unusable combinations.
The three other combinations makes a lot of sense though.

> 
> > "CHECK=1, RESTRICT=1" means to restrict unless AT_CHECK passes. This
> > case is the allow list (or whatever mechanism is being used to determine
> > the result of an AT_CHECK check). The actual mechanism isn't the
> > business of the script interpreter at all, it just has to refuse to
> > execute anything that doesn't pass the check. So a generic interpreter
> > can implement a generic mechanism and leave the specifics to whoever
> > configures the machine.
> >
> In the context of dynamic linker. this means:
> if .so passed the AT_CHECK, ldopen() can still load it.
> If .so fails the AT_CHECK, ldopen() will fail too.

Correct

> 
> Thanks
> -Jeff
> 
> > The other two case are more obvious. "CHECK=0, RESTRICT=0" is the
> > zero-overhead case, while "CHECK=1, RESTRICT=0" might log, warn, or
> > otherwise audit the result of the check, but it won't restrict execution.
> >
> > Cheers,
> > Steve

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 5/5] samples/should-exec: Add set-should-exec
  2024-07-08 19:40   ` Mimi Zohar
@ 2024-07-09 20:42     ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-09 20:42 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Mon, Jul 08, 2024 at 03:40:42PM -0400, Mimi Zohar wrote:
> Hi Mickaël,
> 
> On Thu, 2024-07-04 at 21:01 +0200, Mickaël Salaün wrote:
> > Add a simple tool to set SECBIT_SHOULD_EXEC_CHECK,
> > SECBIT_SHOULD_EXEC_RESTRICT, and their lock counterparts before
> > executing a command.  This should be useful to easily test against
> > script interpreters.
> 
> The print_usage() provides the calling syntax.  Could you provide an example of
> how to use it and what to expect?

To set SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and lock
them on a new shell (session) we can use this:

./set-should-exec -crl -- bash -i

This would have no impact unless Bash, ld.so, or one of its child code
is patched to restrict execution (e.g. with execveat+AT_CHECK check).
Script interpreters and dynamic linkers need to be patch on a secure
sysetm.  Steve is enlightening Python, and we'll need more similar
changes for common user space code.  This can be an incremental work and
only enforced on some user sessions or containers for instance.

> 
> thanks,
> 
> Mimi
> 
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-08 20:35 ` [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mimi Zohar
@ 2024-07-09 20:43   ` Mickaël Salaün
  2024-07-16 15:57     ` Roberto Sassu
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-09 20:43 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Mon, Jul 08, 2024 at 04:35:38PM -0400, Mimi Zohar wrote:
> Hi Mickaël,
> 
> On Thu, 2024-07-04 at 21:01 +0200, Mickaël Salaün wrote:
> > Hi,
> > 
> > The ultimate goal of this patch series is to be able to ensure that
> > direct file execution (e.g. ./script.sh) and indirect file execution
> > (e.g. sh script.sh) lead to the same result, especially from a security
> > point of view.
> > 
> > Overview
> > --------
> > 
> > This patch series is a new approach of the initial O_MAYEXEC feature,
> > and a revamp of the previous patch series.  Taking into account the last
> > reviews [1], we now stick to the kernel semantic for file executability.
> > One major change is the clear split between access check and policy
> > management.
> > 
> > The first patch brings the AT_CHECK flag to execveat(2).  The goal is to
> > enable user space to check if a file could be executed (by the kernel).
> > Unlike stat(2) that only checks file permissions, execveat2(2) +
> > AT_CHECK take into account the full context, including mount points
> > (noexec), caller's limits, and all potential LSM extra checks (e.g.
> > argv, envp, credentials).
> > 
> > The second patch brings two new securebits used to set or get a security
> > policy for a set of processes.  For this to be meaningful, all
> > executable code needs to be trusted.  In practice, this means that
> > (malicious) users can be restricted to only run scripts provided (and
> > trusted) by the system.
> > 
> > [1] https://lore.kernel.org/r/CAHk-=wjPGNLyzeBMWdQu+kUdQLHQugznwY7CvWjmvNW47D5sog@mail.gmail.com
> > 
> > Script execution
> > ----------------
> > 
> > One important thing to keep in mind is that the goal of this patch
> > series is to get the same security restrictions with these commands:
> > * ./script.py
> > * python script.py
> > * python < script.py
> > * python -m script.pyT
> 
> This is really needed, but is it the "only" purpose of this patch set or can it
> be used to also monitor files the script opens (for read) with the intention of
> executing.

This feature can indeed also be used to monitor files requested by
scripts to be executed e.g. using
https://docs.python.org/3/library/io.html#io.open_code

IMA/EVM can include this check in its logs.

> 
> > 
> > However, on secure systems, we should be able to forbid these commands
> > because there is no way to reliably identify the origin of the script:
> > * xargs -a script.py -d '\r' -- python -c
> > * cat script.py | python
> > * python
> > 
> > Background
> > ----------
> > 
> > Compared to the previous patch series, there is no more dedicated
> > syscall nor sysctl configuration.  This new patch series only add new
> > flags: one for execveat(2) and four for prctl(2).
> > 
> > This kind of script interpreter restriction may already be used in
> > hardened systems, which may need to fork interpreters and install
> > different versions of the binaries.  This mechanism should enable to
> > avoid the use of duplicate binaries (and potential forked source code)
> > for secure interpreters (e.g. secure Python [2]) by making it possible
> > to dynamically enforce restrictions or not.
> > 
> > The ability to control script execution is also required to close a
> > major IMA measurement/appraisal interpreter integrity [3].
> 
> Definitely.  But it isn't limited to controlling script execution, but also
> measuring the script.  Will it be possible to measure and appraise the indirect
> script calls with this patch set?

Yes. You should only need to implement security_bprm_creds_for_exec()
for IMA/EVM.

BTW, I noticed that IMA only uses the security_bprm_check() hook (which
can be called several times for one execve), but
security_bprm_creds_for_exec() might be more appropriate.

> 
> Mimi
> 
> > This new execveat + AT_CHECK should not be confused with the O_EXEC flag
> > (for open) which is intended for execute-only, which obviously doesn't
> > work for scripts.
> > 
> > I gave a talk about controlling script execution where I explain the
> > previous approaches [4].  The design of the WIP RFC I talked about
> > changed quite a bit since then.
> > 
> > [2] https://github.com/zooba/spython
> > [3] https://lore.kernel.org/lkml/20211014130125.6991-1-zohar@linux.ibm.com/
> > [4] https://lssna2023.sched.com/event/1K7bO
> > 
> 
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-09 20:42               ` Mickaël Salaün
@ 2024-07-09 21:57                 ` Jeff Xu
  2024-07-10  9:58                   ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-09 21:57 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Steve Dower, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Tue, Jul 9, 2024 at 1:42 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Mon, Jul 08, 2024 at 03:07:24PM -0700, Jeff Xu wrote:
> > On Mon, Jul 8, 2024 at 2:25 PM Steve Dower <steve.dower@python.org> wrote:
> > >
> > > On 08/07/2024 22:15, Jeff Xu wrote:
> > > > IIUC:
> > > > CHECK=0, RESTRICT=0: do nothing, current behavior
> > > > CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results.
> > > > CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception.
> > > > CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except
> > > > those in the "checked-and-allowed" list.
> > >
> > > I had much the same question for Mickaël while working on this.
> > >
> > > Essentially, "CHECK=0, RESTRICT=1" means to restrict without checking.
> > > In the context of a script or macro interpreter, this just means it will
> > > never interpret any scripts. Non-binary code execution is fully disabled
> > > in any part of the process that respects these bits.
> > >
> > I see, so Mickaël does mean this will block all scripts.
>
> That is the initial idea.
>
> > I guess, in the context of dynamic linker, this means: no more .so
> > loading, even "dlopen" is called by an app ?  But this will make the
> > execve()  fail.
>
> Hmm, I'm not sure this "CHECK=0, RESTRICT=1" configuration would make
> sense for a dynamic linker except maybe if we want to only allow static
> binaries?
>
> The CHECK and RESTRICT securebits are designed to make it possible a
> "permissive mode" and an enforcement mode with the related locked
> securebits.  This is why this "CHECK=0, RESTRICT=1" combination looks a
> bit weird.  We can replace these securebits with others but I didn't
> find a better (and simple) option.  I don't think this is an issue
> because with any security policy we can create unusable combinations.
> The three other combinations makes a lot of sense though.
>
If we need only handle 3  combinations,  I would think something like
below is easier to understand, and don't have wield state like
CHECK=0, RESTRICT=1

XX_RESTRICT: when true: Perform the AT_CHECK, and deny the executable
after AT_CHECK fails.
XX_RESTRICT_PERMISSIVE:  take effect when XX_RESTRICT is true. True
means Ignoring the AT_CHECK result.

Or

XX_CHECK: when true: Perform the AT_CHECK.
XX_CHECK_ENFORCE takes effect only when XX_CHECK is true.   True means
restrict the executable when AT_CHECK failed; false means ignore the
AT_CHECK failure.

Of course, we can replace XX_CHECK_ENFORCE with XX_RESTRICT.
Personally I think having _CHECK_ in the name implies the XX_CHECK
needs to be true as a prerequisite for this flag , but that is my
opinion only. As long as the semantics are clear as part of the
comments of definition in code,  it is fine.

Thanks
-Jeff


> >
> > > "CHECK=1, RESTRICT=1" means to restrict unless AT_CHECK passes. This
> > > case is the allow list (or whatever mechanism is being used to determine
> > > the result of an AT_CHECK check). The actual mechanism isn't the
> > > business of the script interpreter at all, it just has to refuse to
> > > execute anything that doesn't pass the check. So a generic interpreter
> > > can implement a generic mechanism and leave the specifics to whoever
> > > configures the machine.
> > >
> > In the context of dynamic linker. this means:
> > if .so passed the AT_CHECK, ldopen() can still load it.
> > If .so fails the AT_CHECK, ldopen() will fail too.
>
> Correct
>
> >
> > Thanks
> > -Jeff
> >
> > > The other two case are more obvious. "CHECK=0, RESTRICT=0" is the
> > > zero-overhead case, while "CHECK=1, RESTRICT=0" might log, warn, or
> > > otherwise audit the result of the check, but it won't restrict execution.
> > >
> > > Cheers,
> > > Steve

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-09 21:57                 ` Jeff Xu
@ 2024-07-10  9:58                   ` Mickaël Salaün
  2024-07-10 16:26                     ` Kees Cook
  2024-07-10 16:32                     ` Steve Dower
  0 siblings, 2 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-10  9:58 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Steve Dower, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Tue, Jul 09, 2024 at 02:57:43PM -0700, Jeff Xu wrote:
> On Tue, Jul 9, 2024 at 1:42 PM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Mon, Jul 08, 2024 at 03:07:24PM -0700, Jeff Xu wrote:
> > > On Mon, Jul 8, 2024 at 2:25 PM Steve Dower <steve.dower@python.org> wrote:
> > > >
> > > > On 08/07/2024 22:15, Jeff Xu wrote:
> > > > > IIUC:
> > > > > CHECK=0, RESTRICT=0: do nothing, current behavior
> > > > > CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results.
> > > > > CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception.
> > > > > CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except
> > > > > those in the "checked-and-allowed" list.
> > > >
> > > > I had much the same question for Mickaël while working on this.
> > > >
> > > > Essentially, "CHECK=0, RESTRICT=1" means to restrict without checking.
> > > > In the context of a script or macro interpreter, this just means it will
> > > > never interpret any scripts. Non-binary code execution is fully disabled
> > > > in any part of the process that respects these bits.
> > > >
> > > I see, so Mickaël does mean this will block all scripts.
> >
> > That is the initial idea.
> >
> > > I guess, in the context of dynamic linker, this means: no more .so
> > > loading, even "dlopen" is called by an app ?  But this will make the
> > > execve()  fail.
> >
> > Hmm, I'm not sure this "CHECK=0, RESTRICT=1" configuration would make
> > sense for a dynamic linker except maybe if we want to only allow static
> > binaries?
> >
> > The CHECK and RESTRICT securebits are designed to make it possible a
> > "permissive mode" and an enforcement mode with the related locked
> > securebits.  This is why this "CHECK=0, RESTRICT=1" combination looks a
> > bit weird.  We can replace these securebits with others but I didn't
> > find a better (and simple) option.  I don't think this is an issue
> > because with any security policy we can create unusable combinations.
> > The three other combinations makes a lot of sense though.
> >
> If we need only handle 3  combinations,  I would think something like
> below is easier to understand, and don't have wield state like
> CHECK=0, RESTRICT=1

The "CHECK=0, RESTRICT=1" is useful for script interpreter instances
that should not interpret any command from users e.g., but only execute
script files.

> 
> XX_RESTRICT: when true: Perform the AT_CHECK, and deny the executable
> after AT_CHECK fails.

> XX_RESTRICT_PERMISSIVE:  take effect when XX_RESTRICT is true. True
> means Ignoring the AT_CHECK result.

We get a similar weird state with XX_RESTRICT_PERMISSIVE=1 and
XX_RESTRICT=0

As a side note, for compatibility reasons, by default all securebits
must be 0, and this must translate to no restriction.

> 
> Or
> 
> XX_CHECK: when true: Perform the AT_CHECK.
> XX_CHECK_ENFORCE takes effect only when XX_CHECK is true.   True means
> restrict the executable when AT_CHECK failed; false means ignore the
> AT_CHECK failure.

We get a similar weird state with XX_CHECK_ENFORCE=1 and XX_CHECK=0

> 
> Of course, we can replace XX_CHECK_ENFORCE with XX_RESTRICT.
> Personally I think having _CHECK_ in the name implies the XX_CHECK
> needs to be true as a prerequisite for this flag , but that is my
> opinion only. As long as the semantics are clear as part of the
> comments of definition in code,  it is fine.

Here is another proposal:

We can change a bit the semantic by making it the norm to always check
file executability with AT_CHECK, and using the securebits to restrict
file interpretation and/or command injection (e.g. user supplied shell
commands).  Non-executable checked files can be reported/logged at the
kernel level, with audit, configured by sysadmins.

New securebits (feel free to propose better names):

- SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass.

- SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via
  command line arguments, environment variables, or configuration files.
  This should be ignored by dynamic linkers.  We could also have an
  allow-list of shells for which this bit is not set, managed by an
  LSM's policy, if the native securebits scoping approach is not enough.

Different modes for script interpreters:

1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default)
   Always interpret scripts, and allow arbitrary user commands.
   => No threat, everyone and everything is trusted, but we can get
   ahead of potential issues with logs to prepare for a migration to a
   restrictive mode.

2. RESTRICT_FILE=1 DENY_INTERACTIVE=0
   Deny script interpretation if they are not executable, and allow
   arbitrary user commands.
   => Threat: (potential) malicious scripts run by trusted (and not
      fooled) users.  That could protect against unintended script
      executions (e.g. sh /tmp/*.sh).
   ==> Makes sense for (semi-restricted) user sessions.

3. RESTRICT_FILE=1 DENY_INTERACTIVE=1
   Deny script interpretation if they are not executable, and also deny
   any arbitrary user commands.
   => Threat: malicious scripts run by untrusted users.
   ==> Makes sense for system services executing scripts.

4. RESTRICT_FILE=0 DENY_INTERACTIVE=1
   Always interpret scripts, but deny arbitrary user commands.
   => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in
      a system where the access rights are not (yet) ready.  Arbitrary
      user commands would be much more difficult to monitor.
   ==> First step of restricting system services that should not
       directly pass arbitrary commands to shells.

> 
> Thanks
> -Jeff
> 
> 
> > >
> > > > "CHECK=1, RESTRICT=1" means to restrict unless AT_CHECK passes. This
> > > > case is the allow list (or whatever mechanism is being used to determine
> > > > the result of an AT_CHECK check). The actual mechanism isn't the
> > > > business of the script interpreter at all, it just has to refuse to
> > > > execute anything that doesn't pass the check. So a generic interpreter
> > > > can implement a generic mechanism and leave the specifics to whoever
> > > > configures the machine.
> > > >
> > > In the context of dynamic linker. this means:
> > > if .so passed the AT_CHECK, ldopen() can still load it.
> > > If .so fails the AT_CHECK, ldopen() will fail too.
> >
> > Correct
> >
> > >
> > > Thanks
> > > -Jeff
> > >
> > > > The other two case are more obvious. "CHECK=0, RESTRICT=0" is the
> > > > zero-overhead case, while "CHECK=1, RESTRICT=0" might log, warn, or
> > > > otherwise audit the result of the check, but it won't restrict execution.
> > > >
> > > > Cheers,
> > > > Steve

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC (was: Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2))
  2024-07-08 16:37           ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC (was: Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)) Florian Weimer
  2024-07-08 17:34             ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC Eric W. Biederman
@ 2024-07-10 10:05             ` Mickaël Salaün
  1 sibling, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-10 10:05 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Eric Biederman, linux-mm

On Mon, Jul 08, 2024 at 06:37:14PM +0200, Florian Weimer wrote:
> * Mickaël Salaün:
> 
> > On Sat, Jul 06, 2024 at 05:32:12PM +0200, Florian Weimer wrote:
> >> * Mickaël Salaün:
> >> 
> >> > On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
> >> >> * Mickaël Salaün:
> >> >> 
> >> >> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> >> >> > allowed for execution.  The main use case is for script interpreters and
> >> >> > dynamic linkers to check execution permission according to the kernel's
> >> >> > security policy. Another use case is to add context to access logs e.g.,
> >> >> > which script (instead of interpreter) accessed a file.  As any
> >> >> > executable code, scripts could also use this check [1].
> >> >> 
> >> >> Some distributions no longer set executable bits on most shared objects,
> >> >> which I assume would interfere with AT_CHECK probing for shared objects.
> >> >
> >> > A file without the execute permission is not considered as executable by
> >> > the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
> >> > note that this is just a check, not a restriction.  See the next patch
> >> > for the optional policy enforcement.
> >> >
> >> > Anyway, we need to define the policy, and for Linux this is done with
> >> > the file permission bits.  So for systems willing to have a consistent
> >> > execution policy, we need to rely on the same bits.
> >> 
> >> Yes, that makes complete sense.  I just wanted to point out the odd
> >> interaction with the old binutils bug and the (sadly still current)
> >> kernel bug.
> >> 
> >> >> Removing the executable bit is attractive because of a combination of
> >> >> two bugs: a binutils wart which until recently always set the entry
> >> >> point address in the ELF header to zero, and the kernel not checking for
> >> >> a zero entry point (maybe in combination with an absent program
> >> >> interpreter) and failing the execve with ELIBEXEC, instead of doing the
> >> >> execve and then faulting at virtual address zero.  Removing the
> >> >> executable bit is currently the only way to avoid these confusing
> >> >> crashes, so I understand the temptation.
> >> >
> >> > Interesting.  Can you please point to the bug report and the fix?  I
> >> > don't see any ELIBEXEC in the kernel.
> >> 
> >> The kernel hasn't been fixed yet.  I do think this should be fixed, so
> >> that distributions can bring back the executable bit.
> >
> > Can you please point to the mailing list discussion or the bug report?
> 
> I'm not sure if this was ever reported upstream as an RFE to fail with
> ELIBEXEC.  We have downstream bug report:
> 
>   Prevent executed .so files with e_entry == 0 from attempting to become
>   a process.
>   <https://bugzilla.redhat.com/show_bug.cgi?id=2004942>

Thanks for the info.

> 
> I've put together a patch which seems to work, see below.
> 
> I don't think there's any impact on AT_CHECK with execveat because that
> mode will never get to this point.

Correct, that is not an issue for AT_CHECK use cases.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-10  9:58                   ` Mickaël Salaün
@ 2024-07-10 16:26                     ` Kees Cook
  2024-07-11  8:57                       ` Mickaël Salaün
  2024-07-10 16:32                     ` Steve Dower
  1 sibling, 1 reply; 103+ messages in thread
From: Kees Cook @ 2024-07-10 16:26 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Jeff Xu, Steve Dower, Al Viro, Christian Brauner, Linus Torvalds,
	Paul Moore, Theodore Ts'o, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Grubb,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module

On Wed, Jul 10, 2024 at 11:58:25AM +0200, Mickaël Salaün wrote:
> Here is another proposal:
> 
> We can change a bit the semantic by making it the norm to always check
> file executability with AT_CHECK, and using the securebits to restrict
> file interpretation and/or command injection (e.g. user supplied shell
> commands).  Non-executable checked files can be reported/logged at the
> kernel level, with audit, configured by sysadmins.
> 
> New securebits (feel free to propose better names):
> 
> - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass.

Would you want the enforcement of this bit done by userspace or the
kernel?

IIUC, userspace would always perform AT_CHECK regardless of
SECBIT_EXEC_RESTRICT_FILE, and then which would happen?

1) userspace would ignore errors from AT_CHECK when
   SECBIT_EXEC_RESTRICT_FILE is unset

or

2) kernel would allow all AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is
   unset

I suspect 1 is best and what you intend, given that
SECBIT_EXEC_DENY_INTERACTIVE can only be enforced by userspace.

> - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via
>   command line arguments, environment variables, or configuration files.
>   This should be ignored by dynamic linkers.  We could also have an
>   allow-list of shells for which this bit is not set, managed by an
>   LSM's policy, if the native securebits scoping approach is not enough.
> 
> Different modes for script interpreters:
> 
> 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default)
>    Always interpret scripts, and allow arbitrary user commands.
>    => No threat, everyone and everything is trusted, but we can get
>    ahead of potential issues with logs to prepare for a migration to a
>    restrictive mode.
> 
> 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0
>    Deny script interpretation if they are not executable, and allow
>    arbitrary user commands.
>    => Threat: (potential) malicious scripts run by trusted (and not
>       fooled) users.  That could protect against unintended script
>       executions (e.g. sh /tmp/*.sh).
>    ==> Makes sense for (semi-restricted) user sessions.
> 
> 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1
>    Deny script interpretation if they are not executable, and also deny
>    any arbitrary user commands.
>    => Threat: malicious scripts run by untrusted users.
>    ==> Makes sense for system services executing scripts.
> 
> 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1
>    Always interpret scripts, but deny arbitrary user commands.
>    => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in
>       a system where the access rights are not (yet) ready.  Arbitrary
>       user commands would be much more difficult to monitor.
>    ==> First step of restricting system services that should not
>        directly pass arbitrary commands to shells.

I like these bits!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-10  9:58                   ` Mickaël Salaün
  2024-07-10 16:26                     ` Kees Cook
@ 2024-07-10 16:32                     ` Steve Dower
  1 sibling, 0 replies; 103+ messages in thread
From: Steve Dower @ 2024-07-10 16:32 UTC (permalink / raw)
  To: Mickaël Salaün, Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On 10/07/2024 10:58, Mickaël Salaün wrote:
> On Tue, Jul 09, 2024 at 02:57:43PM -0700, Jeff Xu wrote:
>>> Hmm, I'm not sure this "CHECK=0, RESTRICT=1" configuration would make
>>> sense for a dynamic linker except maybe if we want to only allow static
>>> binaries?
>>>
>>> The CHECK and RESTRICT securebits are designed to make it possible a
>>> "permissive mode" and an enforcement mode with the related locked
>>> securebits.  This is why this "CHECK=0, RESTRICT=1" combination looks a
>>> bit weird.  We can replace these securebits with others but I didn't
>>> find a better (and simple) option.  I don't think this is an issue
>>> because with any security policy we can create unusable combinations.
>>> The three other combinations makes a lot of sense though.
>>>
>> If we need only handle 3  combinations,  I would think something like
>> below is easier to understand, and don't have wield state like
>> CHECK=0, RESTRICT=1
> 
> The "CHECK=0, RESTRICT=1" is useful for script interpreter instances
> that should not interpret any command from users e.g., but only execute
> script files.

I see this case as being most relevant to something that doesn't usually 
need any custom scripts, but may have it. For example, macros in a 
document, or pre/post-install scripts for a package manager.

For something whose sole purpose is to execute scripts, it doesn't make 
much sense. But there are other cases that can be reasonably controlled 
with this option.

Cheers,
Steve

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-10 16:26                     ` Kees Cook
@ 2024-07-11  8:57                       ` Mickaël Salaün
  2024-07-16 15:02                         ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-11  8:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jeff Xu, Steve Dower, Al Viro, Christian Brauner, Linus Torvalds,
	Paul Moore, Theodore Ts'o, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Grubb,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module

On Wed, Jul 10, 2024 at 09:26:14AM -0700, Kees Cook wrote:
> On Wed, Jul 10, 2024 at 11:58:25AM +0200, Mickaël Salaün wrote:
> > Here is another proposal:
> > 
> > We can change a bit the semantic by making it the norm to always check
> > file executability with AT_CHECK, and using the securebits to restrict
> > file interpretation and/or command injection (e.g. user supplied shell
> > commands).  Non-executable checked files can be reported/logged at the
> > kernel level, with audit, configured by sysadmins.
> > 
> > New securebits (feel free to propose better names):
> > 
> > - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass.
> 
> Would you want the enforcement of this bit done by userspace or the
> kernel?
> 
> IIUC, userspace would always perform AT_CHECK regardless of
> SECBIT_EXEC_RESTRICT_FILE, and then which would happen?
> 
> 1) userspace would ignore errors from AT_CHECK when
>    SECBIT_EXEC_RESTRICT_FILE is unset

Yes, that's the idea.

> 
> or
> 
> 2) kernel would allow all AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is
>    unset
> 
> I suspect 1 is best and what you intend, given that
> SECBIT_EXEC_DENY_INTERACTIVE can only be enforced by userspace.

Indeed. We don't want AT_CHECK's behavior to change according to
securebits.

> 
> > - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via
> >   command line arguments, environment variables, or configuration files.
> >   This should be ignored by dynamic linkers.  We could also have an
> >   allow-list of shells for which this bit is not set, managed by an
> >   LSM's policy, if the native securebits scoping approach is not enough.
> > 
> > Different modes for script interpreters:
> > 
> > 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default)
> >    Always interpret scripts, and allow arbitrary user commands.
> >    => No threat, everyone and everything is trusted, but we can get
> >    ahead of potential issues with logs to prepare for a migration to a
> >    restrictive mode.
> > 
> > 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0
> >    Deny script interpretation if they are not executable, and allow
> >    arbitrary user commands.
> >    => Threat: (potential) malicious scripts run by trusted (and not
> >       fooled) users.  That could protect against unintended script
> >       executions (e.g. sh /tmp/*.sh).
> >    ==> Makes sense for (semi-restricted) user sessions.
> > 
> > 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1
> >    Deny script interpretation if they are not executable, and also deny
> >    any arbitrary user commands.
> >    => Threat: malicious scripts run by untrusted users.
> >    ==> Makes sense for system services executing scripts.
> > 
> > 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1
> >    Always interpret scripts, but deny arbitrary user commands.
> >    => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in
> >       a system where the access rights are not (yet) ready.  Arbitrary
> >       user commands would be much more difficult to monitor.
> >    ==> First step of restricting system services that should not
> >        directly pass arbitrary commands to shells.
> 
> I like these bits!

Good! Jeff, Steve, Florian, Matt, others, what do you think?

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-04 19:01 [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mickaël Salaün
                   ` (5 preceding siblings ...)
  2024-07-08 20:35 ` [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mimi Zohar
@ 2024-07-15 20:16 ` Jonathan Corbet
  2024-07-16  7:13   ` Mickaël Salaün
  6 siblings, 1 reply; 103+ messages in thread
From: Jonathan Corbet @ 2024-07-15 20:16 UTC (permalink / raw)
  To: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o
  Cc: Mickaël Salaün, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jeff Xu, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

Mickaël Salaün <mic@digikod.net> writes:

FYI:

> User space patches can be found here:
> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

That link appears to be broken.

Thanks,

jon

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-15 20:16 ` Jonathan Corbet
@ 2024-07-16  7:13   ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-16  7:13 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Mon, Jul 15, 2024 at 02:16:41PM -0600, Jonathan Corbet wrote:
> Mickaël Salaün <mic@digikod.net> writes:
> 
> FYI:
> 
> > User space patches can be found here:
> > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> 
> That link appears to be broken.

Unfortunately, GitHub's code search links only work with an account.
git grep prints a similar output though.

> 
> Thanks,
> 
> jon

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-11  8:57                       ` Mickaël Salaün
@ 2024-07-16 15:02                         ` Jeff Xu
  2024-07-16 15:10                           ` Steve Dower
  2024-07-16 15:15                           ` Mickaël Salaün
  0 siblings, 2 replies; 103+ messages in thread
From: Jeff Xu @ 2024-07-16 15:02 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Kees Cook, Steve Dower, Al Viro, Christian Brauner,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Thu, Jul 11, 2024 at 1:57 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 10, 2024 at 09:26:14AM -0700, Kees Cook wrote:
> > On Wed, Jul 10, 2024 at 11:58:25AM +0200, Mickaël Salaün wrote:
> > > Here is another proposal:
> > >
> > > We can change a bit the semantic by making it the norm to always check
> > > file executability with AT_CHECK, and using the securebits to restrict
> > > file interpretation and/or command injection (e.g. user supplied shell
> > > commands).  Non-executable checked files can be reported/logged at the
> > > kernel level, with audit, configured by sysadmins.
> > >
> > > New securebits (feel free to propose better names):
> > >
> > > - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass.
> >
> > Would you want the enforcement of this bit done by userspace or the
> > kernel?
> >
> > IIUC, userspace would always perform AT_CHECK regardless of
> > SECBIT_EXEC_RESTRICT_FILE, and then which would happen?
> >
> > 1) userspace would ignore errors from AT_CHECK when
> >    SECBIT_EXEC_RESTRICT_FILE is unset
>
> Yes, that's the idea.
>
> >
> > or
> >
> > 2) kernel would allow all AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is
> >    unset
> >
> > I suspect 1 is best and what you intend, given that
> > SECBIT_EXEC_DENY_INTERACTIVE can only be enforced by userspace.
>
> Indeed. We don't want AT_CHECK's behavior to change according to
> securebits.
>
One bit is good.

> >
> > > - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via
> > >   command line arguments, environment variables, or configuration files.
> > >   This should be ignored by dynamic linkers.  We could also have an
> > >   allow-list of shells for which this bit is not set, managed by an
> > >   LSM's policy, if the native securebits scoping approach is not enough.
> > >
> > > Different modes for script interpreters:
> > >
> > > 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default)
> > >    Always interpret scripts, and allow arbitrary user commands.
> > >    => No threat, everyone and everything is trusted, but we can get
> > >    ahead of potential issues with logs to prepare for a migration to a
> > >    restrictive mode.
> > >
> > > 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0
> > >    Deny script interpretation if they are not executable, and allow
> > >    arbitrary user commands.
> > >    => Threat: (potential) malicious scripts run by trusted (and not
> > >       fooled) users.  That could protect against unintended script
> > >       executions (e.g. sh /tmp/*.sh).
> > >    ==> Makes sense for (semi-restricted) user sessions.
> > >
> > > 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1
> > >    Deny script interpretation if they are not executable, and also deny
> > >    any arbitrary user commands.
> > >    => Threat: malicious scripts run by untrusted users.
> > >    ==> Makes sense for system services executing scripts.
> > >
> > > 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1
> > >    Always interpret scripts, but deny arbitrary user commands.
> > >    => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in
> > >       a system where the access rights are not (yet) ready.  Arbitrary
> > >       user commands would be much more difficult to monitor.
> > >    ==> First step of restricting system services that should not
> > >        directly pass arbitrary commands to shells.
> >
> > I like these bits!
>
> Good! Jeff, Steve, Florian, Matt, others, what do you think?

For below two cases: will they be restricted by one (or some) mode above ?

1> cat /tmp/a.sh | sh

2> sh -c "$(cat /tmp/a.sh)"

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-16 15:02                         ` Jeff Xu
@ 2024-07-16 15:10                           ` Steve Dower
  2024-07-16 15:15                           ` Mickaël Salaün
  1 sibling, 0 replies; 103+ messages in thread
From: Steve Dower @ 2024-07-16 15:10 UTC (permalink / raw)
  To: Jeff Xu, Mickaël Salaün
  Cc: Kees Cook, Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On 16/07/2024 16:02, Jeff Xu wrote:
> For below two cases: will they be restricted by one (or some) mode above ?
> 
> 1> cat /tmp/a.sh | sh
> 
> 2> sh -c "$(cat /tmp/a.sh)"

It will almost certainly depend on your context, but to properly lock 
down a system, they must be restricted. "We were unable to check the 
file" ought to be treated the same as "the file failed the check".

If your goal is to only execute files that have been pre-approved in 
some manner, you're implying that you don't want interactive execution 
at all (since that is not a file that's been pre-approved). So a mere 
"sh" or "sh -c ..." would be restricted without checking anything other 
than the secure bit.

Cheers,
Steve

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-16 15:02                         ` Jeff Xu
  2024-07-16 15:10                           ` Steve Dower
@ 2024-07-16 15:15                           ` Mickaël Salaün
  2024-07-16 15:18                             ` Jeff Xu
  1 sibling, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-16 15:15 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Kees Cook, Steve Dower, Al Viro, Christian Brauner,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Tue, Jul 16, 2024 at 08:02:37AM -0700, Jeff Xu wrote:
> On Thu, Jul 11, 2024 at 1:57 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Wed, Jul 10, 2024 at 09:26:14AM -0700, Kees Cook wrote:
> > > On Wed, Jul 10, 2024 at 11:58:25AM +0200, Mickaël Salaün wrote:
> > > > Here is another proposal:
> > > >
> > > > We can change a bit the semantic by making it the norm to always check
> > > > file executability with AT_CHECK, and using the securebits to restrict
> > > > file interpretation and/or command injection (e.g. user supplied shell
> > > > commands).  Non-executable checked files can be reported/logged at the
> > > > kernel level, with audit, configured by sysadmins.
> > > >
> > > > New securebits (feel free to propose better names):
> > > >
> > > > - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass.
> > >
> > > Would you want the enforcement of this bit done by userspace or the
> > > kernel?
> > >
> > > IIUC, userspace would always perform AT_CHECK regardless of
> > > SECBIT_EXEC_RESTRICT_FILE, and then which would happen?
> > >
> > > 1) userspace would ignore errors from AT_CHECK when
> > >    SECBIT_EXEC_RESTRICT_FILE is unset
> >
> > Yes, that's the idea.
> >
> > >
> > > or
> > >
> > > 2) kernel would allow all AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is
> > >    unset
> > >
> > > I suspect 1 is best and what you intend, given that
> > > SECBIT_EXEC_DENY_INTERACTIVE can only be enforced by userspace.
> >
> > Indeed. We don't want AT_CHECK's behavior to change according to
> > securebits.
> >
> One bit is good.
> 
> > >
> > > > - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via
> > > >   command line arguments, environment variables, or configuration files.
> > > >   This should be ignored by dynamic linkers.  We could also have an
> > > >   allow-list of shells for which this bit is not set, managed by an
> > > >   LSM's policy, if the native securebits scoping approach is not enough.
> > > >
> > > > Different modes for script interpreters:
> > > >
> > > > 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default)
> > > >    Always interpret scripts, and allow arbitrary user commands.
> > > >    => No threat, everyone and everything is trusted, but we can get
> > > >    ahead of potential issues with logs to prepare for a migration to a
> > > >    restrictive mode.
> > > >
> > > > 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0
> > > >    Deny script interpretation if they are not executable, and allow
> > > >    arbitrary user commands.
> > > >    => Threat: (potential) malicious scripts run by trusted (and not
> > > >       fooled) users.  That could protect against unintended script
> > > >       executions (e.g. sh /tmp/*.sh).
> > > >    ==> Makes sense for (semi-restricted) user sessions.
> > > >
> > > > 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1
> > > >    Deny script interpretation if they are not executable, and also deny
> > > >    any arbitrary user commands.
> > > >    => Threat: malicious scripts run by untrusted users.
> > > >    ==> Makes sense for system services executing scripts.
> > > >
> > > > 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1
> > > >    Always interpret scripts, but deny arbitrary user commands.
> > > >    => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in
> > > >       a system where the access rights are not (yet) ready.  Arbitrary
> > > >       user commands would be much more difficult to monitor.
> > > >    ==> First step of restricting system services that should not
> > > >        directly pass arbitrary commands to shells.
> > >
> > > I like these bits!
> >
> > Good! Jeff, Steve, Florian, Matt, others, what do you think?
> 
> For below two cases: will they be restricted by one (or some) mode above ?
> 
> 1> cat /tmp/a.sh | sh
> 
> 2> sh -c "$(cat /tmp/a.sh)"

Yes, DENY_INTERACTIVE=1 is to deny both of these cases (i.e. arbitrary
user command).

These other examples should be allowed with AT_CHECK and RESTRICT_FILE=1
if a.sh is executable though:
* sh /tmp/a.sh
* sh < /tmp/a.sh

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-16 15:15                           ` Mickaël Salaün
@ 2024-07-16 15:18                             ` Jeff Xu
  0 siblings, 0 replies; 103+ messages in thread
From: Jeff Xu @ 2024-07-16 15:18 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Kees Cook, Steve Dower, Al Viro, Christian Brauner,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Tue, Jul 16, 2024 at 8:15 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Tue, Jul 16, 2024 at 08:02:37AM -0700, Jeff Xu wrote:
> > On Thu, Jul 11, 2024 at 1:57 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Wed, Jul 10, 2024 at 09:26:14AM -0700, Kees Cook wrote:
> > > > On Wed, Jul 10, 2024 at 11:58:25AM +0200, Mickaël Salaün wrote:
> > > > > Here is another proposal:
> > > > >
> > > > > We can change a bit the semantic by making it the norm to always check
> > > > > file executability with AT_CHECK, and using the securebits to restrict
> > > > > file interpretation and/or command injection (e.g. user supplied shell
> > > > > commands).  Non-executable checked files can be reported/logged at the
> > > > > kernel level, with audit, configured by sysadmins.
> > > > >
> > > > > New securebits (feel free to propose better names):
> > > > >
> > > > > - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass.
> > > >
> > > > Would you want the enforcement of this bit done by userspace or the
> > > > kernel?
> > > >
> > > > IIUC, userspace would always perform AT_CHECK regardless of
> > > > SECBIT_EXEC_RESTRICT_FILE, and then which would happen?
> > > >
> > > > 1) userspace would ignore errors from AT_CHECK when
> > > >    SECBIT_EXEC_RESTRICT_FILE is unset
> > >
> > > Yes, that's the idea.
> > >
> > > >
> > > > or
> > > >
> > > > 2) kernel would allow all AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is
> > > >    unset
> > > >
> > > > I suspect 1 is best and what you intend, given that
> > > > SECBIT_EXEC_DENY_INTERACTIVE can only be enforced by userspace.
> > >
> > > Indeed. We don't want AT_CHECK's behavior to change according to
> > > securebits.
> > >
> > One bit is good.
> >
> > > >
> > > > > - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via
> > > > >   command line arguments, environment variables, or configuration files.
> > > > >   This should be ignored by dynamic linkers.  We could also have an
> > > > >   allow-list of shells for which this bit is not set, managed by an
> > > > >   LSM's policy, if the native securebits scoping approach is not enough.
> > > > >
> > > > > Different modes for script interpreters:
> > > > >
> > > > > 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default)
> > > > >    Always interpret scripts, and allow arbitrary user commands.
> > > > >    => No threat, everyone and everything is trusted, but we can get
> > > > >    ahead of potential issues with logs to prepare for a migration to a
> > > > >    restrictive mode.
> > > > >
> > > > > 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0
> > > > >    Deny script interpretation if they are not executable, and allow
> > > > >    arbitrary user commands.
> > > > >    => Threat: (potential) malicious scripts run by trusted (and not
> > > > >       fooled) users.  That could protect against unintended script
> > > > >       executions (e.g. sh /tmp/*.sh).
> > > > >    ==> Makes sense for (semi-restricted) user sessions.
> > > > >
> > > > > 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1
> > > > >    Deny script interpretation if they are not executable, and also deny
> > > > >    any arbitrary user commands.
> > > > >    => Threat: malicious scripts run by untrusted users.
> > > > >    ==> Makes sense for system services executing scripts.
> > > > >
> > > > > 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1
> > > > >    Always interpret scripts, but deny arbitrary user commands.
> > > > >    => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in
> > > > >       a system where the access rights are not (yet) ready.  Arbitrary
> > > > >       user commands would be much more difficult to monitor.
> > > > >    ==> First step of restricting system services that should not
> > > > >        directly pass arbitrary commands to shells.
> > > >
> > > > I like these bits!
> > >
> > > Good! Jeff, Steve, Florian, Matt, others, what do you think?
> >
> > For below two cases: will they be restricted by one (or some) mode above ?
> >
> > 1> cat /tmp/a.sh | sh
> >
> > 2> sh -c "$(cat /tmp/a.sh)"
>
> Yes, DENY_INTERACTIVE=1 is to deny both of these cases (i.e. arbitrary
> user command).
>
> These other examples should be allowed with AT_CHECK and RESTRICT_FILE=1
> if a.sh is executable though:
> * sh /tmp/a.sh
> * sh < /tmp/a.sh
That looks good. Thanks for clarifying.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-09 20:43   ` Mickaël Salaün
@ 2024-07-16 15:57     ` Roberto Sassu
  2024-07-16 16:12       ` James Bottomley
  0 siblings, 1 reply; 103+ messages in thread
From: Roberto Sassu @ 2024-07-16 15:57 UTC (permalink / raw)
  To: Mickaël Salaün, Mimi Zohar
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Tue, 2024-07-09 at 22:43 +0200, Mickaël Salaün wrote:
> On Mon, Jul 08, 2024 at 04:35:38PM -0400, Mimi Zohar wrote:
> > Hi Mickaël,
> > 
> > On Thu, 2024-07-04 at 21:01 +0200, Mickaël Salaün wrote:
> > > Hi,
> > > 
> > > The ultimate goal of this patch series is to be able to ensure that
> > > direct file execution (e.g. ./script.sh) and indirect file execution
> > > (e.g. sh script.sh) lead to the same result, especially from a security
> > > point of view.
> > > 
> > > Overview
> > > --------
> > > 
> > > This patch series is a new approach of the initial O_MAYEXEC feature,
> > > and a revamp of the previous patch series.  Taking into account the last
> > > reviews [1], we now stick to the kernel semantic for file executability.
> > > One major change is the clear split between access check and policy
> > > management.
> > > 
> > > The first patch brings the AT_CHECK flag to execveat(2).  The goal is to
> > > enable user space to check if a file could be executed (by the kernel).
> > > Unlike stat(2) that only checks file permissions, execveat2(2) +
> > > AT_CHECK take into account the full context, including mount points
> > > (noexec), caller's limits, and all potential LSM extra checks (e.g.
> > > argv, envp, credentials).
> > > 
> > > The second patch brings two new securebits used to set or get a security
> > > policy for a set of processes.  For this to be meaningful, all
> > > executable code needs to be trusted.  In practice, this means that
> > > (malicious) users can be restricted to only run scripts provided (and
> > > trusted) by the system.
> > > 
> > > [1] https://lore.kernel.org/r/CAHk-=wjPGNLyzeBMWdQu+kUdQLHQugznwY7CvWjmvNW47D5sog@mail.gmail.com
> > > 
> > > Script execution
> > > ----------------
> > > 
> > > One important thing to keep in mind is that the goal of this patch
> > > series is to get the same security restrictions with these commands:
> > > * ./script.py
> > > * python script.py
> > > * python < script.py
> > > * python -m script.pyT
> > 
> > This is really needed, but is it the "only" purpose of this patch set or can it
> > be used to also monitor files the script opens (for read) with the intention of
> > executing.
> 
> This feature can indeed also be used to monitor files requested by
> scripts to be executed e.g. using
> https://docs.python.org/3/library/io.html#io.open_code
> 
> IMA/EVM can include this check in its logs.
> 
> > 
> > > 
> > > However, on secure systems, we should be able to forbid these commands
> > > because there is no way to reliably identify the origin of the script:
> > > * xargs -a script.py -d '\r' -- python -c
> > > * cat script.py | python
> > > * python
> > > 
> > > Background
> > > ----------
> > > 
> > > Compared to the previous patch series, there is no more dedicated
> > > syscall nor sysctl configuration.  This new patch series only add new
> > > flags: one for execveat(2) and four for prctl(2).
> > > 
> > > This kind of script interpreter restriction may already be used in
> > > hardened systems, which may need to fork interpreters and install
> > > different versions of the binaries.  This mechanism should enable to
> > > avoid the use of duplicate binaries (and potential forked source code)
> > > for secure interpreters (e.g. secure Python [2]) by making it possible
> > > to dynamically enforce restrictions or not.
> > > 
> > > The ability to control script execution is also required to close a
> > > major IMA measurement/appraisal interpreter integrity [3].
> > 
> > Definitely.  But it isn't limited to controlling script execution, but also
> > measuring the script.  Will it be possible to measure and appraise the indirect
> > script calls with this patch set?
> 
> Yes. You should only need to implement security_bprm_creds_for_exec()
> for IMA/EVM.
> 
> BTW, I noticed that IMA only uses the security_bprm_check() hook (which
> can be called several times for one execve), but
> security_bprm_creds_for_exec() might be more appropriate.

Ok, I tried a trivial modification to have this working:

diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index f04f43af651c..2a6b04c91601 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -554,6 +554,14 @@ static int ima_bprm_check(struct linux_binprm *bprm)
                                   MAY_EXEC, CREDS_CHECK);
 }
 
+static int ima_bprm_creds_for_exec(struct linux_binprm *bprm)
+{
+       if (!bprm->is_check)
+               return 0;
+
+       return ima_bprm_check(bprm);
+}
+
 /**
  * ima_file_check - based on policy, collect/store measurement.
  * @file: pointer to the file to be measured
@@ -1177,6 +1185,7 @@ static int __init init_ima(void)
 
 static struct security_hook_list ima_hooks[] __ro_after_init = {
        LSM_HOOK_INIT(bprm_check_security, ima_bprm_check),
+       LSM_HOOK_INIT(bprm_creds_for_exec, ima_bprm_creds_for_exec),
        LSM_HOOK_INIT(file_post_open, ima_file_check),
        LSM_HOOK_INIT(inode_post_create_tmpfile, ima_post_create_tmpfile),
        LSM_HOOK_INIT(file_release, ima_file_free),


I also adapted the Clip OS 4 patch for bash.

The result seems good so far:

# echo "measure fowner=2000 func=BPRM_CHECK" > /sys/kernel/security/ima/policy

# ./bash /root/test.sh
Hello World

# cat /sys/kernel/security/ima/ascii_runtime_measurements
10 35435d0858d895b90097306171a2e5fcc7f5da9e ima-ng sha256:0e4acf326a82c6bded9d86f48d272d7a036b6490081bb6466ecc2a0e416b244a boot_aggregate
10 4cd9df168a2cf8d18be46543e66c76a53ca6a03d ima-ng sha256:e7f3c2dab66f56fef963fbab55fc6d64bc22a5f900c29042e6ecd87e08f2b535 /root/test.sh

So, it is there.

It works only with +x permission. If not, I get:

# ./bash /root/test.sh
./bash: /root/test.sh: Permission denied

But the Clip OS 4 patch does not cover the redirection case:

# ./bash < /root/test.sh
Hello World

Do you have a more recent patch for that?

Thanks

Roberto

> > 
> > Mimi
> > 
> > > This new execveat + AT_CHECK should not be confused with the O_EXEC flag
> > > (for open) which is intended for execute-only, which obviously doesn't
> > > work for scripts.
> > > 
> > > I gave a talk about controlling script execution where I explain the
> > > previous approaches [4].  The design of the WIP RFC I talked about
> > > changed quite a bit since then.
> > > 
> > > [2] https://github.com/zooba/spython
> > > [3] https://lore.kernel.org/lkml/20211014130125.6991-1-zohar@linux.ibm.com/
> > > [4] https://lssna2023.sched.com/event/1K7bO
> > > 
> > 
> > 


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-16 15:57     ` Roberto Sassu
@ 2024-07-16 16:12       ` James Bottomley
  2024-07-16 17:29         ` Boris Lukashev
  2024-07-16 17:31         ` Mickaël Salaün
  0 siblings, 2 replies; 103+ messages in thread
From: James Bottomley @ 2024-07-16 16:12 UTC (permalink / raw)
  To: Roberto Sassu, Mickaël Salaün, Mimi Zohar
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Tue, 2024-07-16 at 17:57 +0200, Roberto Sassu wrote:
> But the Clip OS 4 patch does not cover the redirection case:
> 
> # ./bash < /root/test.sh
> Hello World
> 
> Do you have a more recent patch for that?

How far down the rabbit hole do you want to go?  You can't forbid a
shell from executing commands from stdin because logging in then won't
work.  It may be possible to allow from a tty backed file and not from
a file backed one, but you still have the problem of the attacker
manually typing in the script.

The saving grace for this for shells is that they pretty much do
nothing on their own (unlike python) so you can still measure all the
executables they call out to, which provides reasonable safety.

James


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-16 16:12       ` James Bottomley
@ 2024-07-16 17:29         ` Boris Lukashev
  2024-07-16 17:47           ` Mickaël Salaün
  2024-07-16 17:31         ` Mickaël Salaün
  1 sibling, 1 reply; 103+ messages in thread
From: Boris Lukashev @ 2024-07-16 17:29 UTC (permalink / raw)
  To: kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 1207 bytes --]

Wouldn't count those shell chickens - awk alone is enough and we can use ssh and openssl clients (all in metasploit public code). As one of the people who makes novel shell types, I can assure you that this effort is only going to slow skiddies and only until the rest of us publish mitigations for this mitigation :)

-Boris (RageLtMan)

On July 16, 2024 12:12:49 PM EDT, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>On Tue, 2024-07-16 at 17:57 +0200, Roberto Sassu wrote:
>> But the Clip OS 4 patch does not cover the redirection case:
>> 
>> # ./bash < /root/test.sh
>> Hello World
>> 
>> Do you have a more recent patch for that?
>
>How far down the rabbit hole do you want to go?  You can't forbid a
>shell from executing commands from stdin because logging in then won't
>work.  It may be possible to allow from a tty backed file and not from
>a file backed one, but you still have the problem of the attacker
>manually typing in the script.
>
>The saving grace for this for shells is that they pretty much do
>nothing on their own (unlike python) so you can still measure all the
>executables they call out to, which provides reasonable safety.
>
>James
>

[-- Attachment #2: Type: text/html, Size: 1711 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-16 16:12       ` James Bottomley
  2024-07-16 17:29         ` Boris Lukashev
@ 2024-07-16 17:31         ` Mickaël Salaün
  2024-07-18 16:21           ` Mickaël Salaün
  1 sibling, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-16 17:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: Roberto Sassu, Mimi Zohar, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Tue, Jul 16, 2024 at 12:12:49PM -0400, James Bottomley wrote:
> On Tue, 2024-07-16 at 17:57 +0200, Roberto Sassu wrote:
> > But the Clip OS 4 patch does not cover the redirection case:
> > 
> > # ./bash < /root/test.sh
> > Hello World
> > 
> > Do you have a more recent patch for that?

Bash was only partially restricted for CLIP OS because it was used for
administrative tasks (interactive shell).

Python was also restricted for user commands though:
https://github.com/clipos-archive/clipos4_portage-overlay/blob/master/dev-lang/python/files/python-2.7.9-clip-mayexec.patch

Steve and Christian could help with a better Python implementation.

> 
> How far down the rabbit hole do you want to go?  You can't forbid a
> shell from executing commands from stdin because logging in then won't
> work.  It may be possible to allow from a tty backed file and not from
> a file backed one, but you still have the problem of the attacker
> manually typing in the script.

Yes, that's why we'll have the (optional) SECBIT_EXEC_DENY_INTERACTIVE:
https://lore.kernel.org/all/20240710.eiKohpa4Phai@digikod.net/

> 
> The saving grace for this for shells is that they pretty much do
> nothing on their own (unlike python) so you can still measure all the
> executables they call out to, which provides reasonable safety.

Exactly. Python is a much more interesting target for attacker because
it opens the door for arbitrary syscalls (see the cover letter).

If we want to have a more advanced access control (e.g. allow Bash but
not Python), we should extend existing LSMs to manage the appropriate
securebits according to programs/subjects.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-16 17:29         ` Boris Lukashev
@ 2024-07-16 17:47           ` Mickaël Salaün
  2024-07-17 17:59             ` Boris Lukashev
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-16 17:47 UTC (permalink / raw)
  To: Boris Lukashev
  Cc: James Bottomley, Roberto Sassu, Mimi Zohar, Al Viro,
	Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

(adding back other people in Cc)

On Tue, Jul 16, 2024 at 01:29:43PM -0400, Boris Lukashev wrote:
> Wouldn't count those shell chickens - awk alone is enough and we can
> use ssh and openssl clients (all in metasploit public code). As one of
> the people who makes novel shell types, I can assure you that this
> effort is only going to slow skiddies and only until the rest of us
> publish mitigations for this mitigation :)

Security is not binary. :)

Not all Linux systems are equals. Some hardened systems need this kind
of feature and they can get guarantees because they fully control and
trust their executable binaries (e.g. CLIP OS, chromeOS) or they
properly sandbox them.  See context in the cover letter.

awk is a script interpreter that should be patched too, like other Linux
tools.

> 
> -Boris (RageLtMan)
> 
> On July 16, 2024 12:12:49 PM EDT, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >On Tue, 2024-07-16 at 17:57 +0200, Roberto Sassu wrote:
> >> But the Clip OS 4 patch does not cover the redirection case:
> >> 
> >> # ./bash < /root/test.sh
> >> Hello World
> >> 
> >> Do you have a more recent patch for that?
> >
> >How far down the rabbit hole do you want to go?  You can't forbid a
> >shell from executing commands from stdin because logging in then won't
> >work.  It may be possible to allow from a tty backed file and not from
> >a file backed one, but you still have the problem of the attacker
> >manually typing in the script.
> >
> >The saving grace for this for shells is that they pretty much do
> >nothing on their own (unlike python) so you can still measure all the
> >executables they call out to, which provides reasonable safety.
> >
> >James
> >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-04 19:01 ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Mickaël Salaün
                     ` (2 preceding siblings ...)
  2024-07-06  8:52   ` Andy Lutomirski
@ 2024-07-17  6:33   ` Jeff Xu
  2024-07-17  8:26     ` Steve Dower
  2024-07-17 10:01     ` Mickaël Salaün
  3 siblings, 2 replies; 103+ messages in thread
From: Jeff Xu @ 2024-07-17  6:33 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].
>
> This is different than faccessat(2) which only checks file access
> rights, but not the full context e.g. mount point's noexec, stack limit,
> and all potential LSM extra checks (e.g. argv, envp, credentials).
> Since the use of AT_CHECK follows the exact kernel semantic as for a
> real execution, user space gets the same error codes.
>
So we concluded that execveat(AT_CHECK) will be used to check the
exec, shared object, script and config file (such as seccomp config),
I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
different use cases:

execveat clearly has less code change, but that also means: we can't
add logic specific to exec (i.e. logic that can't be applied to
config) for this part (from do_execveat_common to
security_bprm_creds_for_exec) in future.  This would require some
agreement/sign-off, I'm not sure from whom.

--------------------------
now looked at user cases (focus on elf for now)

1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)

2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
/usr/bin/some.out will pass AT_CHECK

3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
/usr/bin/some.out will pass AT_CHECK, however, it uses a custom
/tmp/ld.so (I assume this is possible  for elf header will set the
path for ld.so because kernel has no knowledge of that, and
binfmt_elf.c allocate memory for ld.so during execveat call)

4> dlopen(/tmp/a.so)
I assume dynamic linker will call execveat(AT_CHECK), before map a.so
into memory.

For case 1>
Alternative solution: Because AT_CHECK is always called, I think we
can avoid the first AT_CHECK call, and check during execveat(fd),
this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
benefit is that there is no TOCTOU and save one round trip of syscall
for a succesful execveat() case.

For case 2>
dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
However,  the process can all open then mmap() directly, it seems
minimal effort for an attacker to walk around such a defence from
dynamic linker.

Alternative solution:
dynamic linker call AT_CHECK for each .so, kernel will save the state
(associated with fd)
kernel will check fd state at the time of mmap(fd, executable memory)
and enforce SECBIT_EXEC_RESTRICT_FILE = 1

Alternative solution 2:
a new syscall to load the .so and enforce the AT_CHECK in kernel

This also means, for the solution to be complete, we might want to
block creation of executable anonymous memory (e.g. by seccomp, ),
unless the user space can harden the creation of  executable anonymous
memory in some way.

For case 3>
I think binfmt_elf.c in the kernel needs to check the ld.so to make
sure it passes AT_CHECK, before loading it into memory.

For case 4>
same as case 2.

Consider those cases: I think:
a> relying purely on userspace for enforcement does't seem to be
effective,  e.g. it is trivial  to call open(), then mmap() it into
executable memory.
b> if both user space and kernel need to call AT_CHECK, the faccessat
seems to be a better place for AT_CHECK, e.g. kernel can call
do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
avoid complicating the execveat() code path.

What do you think ?

Thanks
-Jeff

> With the information that a script interpreter is about to interpret a
> script, an LSM security policy can adjust caller's access rights or log
> execution request as for native script execution (e.g. role transition).
> This is possible thanks to the call to security_bprm_creds_for_exec().
>
> Because LSMs may only change bprm's credentials, use of AT_CHECK with
> current kernel code should not be a security issue (e.g. unexpected role
> transition).  LSMs willing to update the caller's credential could now
> do so when bprm->is_check is set.  Of course, such policy change should
> be in line with the new user space code.
>
> Because AT_CHECK is dedicated to user space interpreters, it doesn't
> make sense for the kernel to parse the checked files, look for
> interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> if the format is unknown.  Because of that, security_bprm_check() is
> never called when AT_CHECK is used.
>
> It should be noted that script interpreters cannot directly use
> execveat(2) (without this new AT_CHECK flag) because this could lead to
> unexpected behaviors e.g., `python script.sh` could lead to Bash being
> executed to interpret the script.  Unlike the kernel, script
> interpreters may just interpret the shebang as a simple comment, which
> should not change for backward compatibility reasons.
>
> Because scripts or libraries files might not currently have the
> executable permission set, or because we might want specific users to be
> allowed to run arbitrary scripts, the following patch provides a dynamic
> configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> SECBIT_SHOULD_EXEC_RESTRICT securebits.
>
> This is a redesign of the CLIP OS 4's O_MAYEXEC:
> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> This patch has been used for more than a decade with customized script
> interpreters.  Some examples can be found here:
> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <paul@paul-moore.com>
> Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> ---
>
> New design since v18:
> https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> ---
>  fs/exec.c                  |  5 +++--
>  include/linux/binfmts.h    |  7 ++++++-
>  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
>  kernel/audit.h             |  1 +
>  kernel/auditsc.c           |  1 +
>  5 files changed, 41 insertions(+), 3 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 40073142288f..ea2a1867afdc 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
>                 .lookup_flags = LOOKUP_FOLLOW,
>         };
>
> -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
>                 return ERR_PTR(-EINVAL);
>         if (flags & AT_SYMLINK_NOFOLLOW)
>                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
>                 bprm->filename = bprm->fdpath;
>         }
>         bprm->interp = bprm->filename;
> +       bprm->is_check = !!(flags & AT_CHECK);
>
>         retval = bprm_mm_init(bprm);
>         if (!retval)
> @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
>
>         /* Set the unchanging part of bprm->cred */
>         retval = security_bprm_creds_for_exec(bprm);
> -       if (retval)
> +       if (retval || bprm->is_check)
>                 goto out;
>
>         retval = exec_binprm(bprm);
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 70f97f685bff..8ff9c9e33aed 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -42,7 +42,12 @@ struct linux_binprm {
>                  * Set when errors can no longer be returned to the
>                  * original userspace.
>                  */
> -               point_of_no_return:1;
> +               point_of_no_return:1,
> +               /*
> +                * Set by user space to check executability according to the
> +                * caller's environment.
> +                */
> +               is_check:1;
>         struct file *executable; /* Executable to pass to the interpreter */
>         struct file *interpreter;
>         struct file *file;
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index c0bcc185fa48..bcd05c59b7df 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -118,6 +118,36 @@
>  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
>                                         compare object identity and may not
>                                         be usable to open_by_handle_at(2) */
> +
> +/*
> + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> + * of this file would be allowed, ignoring the file format and then the related
> + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> + * thread.  See securebits.h documentation.
> + *
> + * Programs should use this check to apply kernel-level checks against files
> + * that are not directly executed by the kernel but directly passed to a user
> + * space interpreter instead.  All files that contain executable code, from the
> + * point of view of the interpreter, should be checked.  The main purpose of
> + * this flag is to improve the security and consistency of an execution
> + * environment to ensure that direct file execution (e.g. ./script.sh) and
> + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> + * instance, this can be used to check if a file is trustworthy according to
> + * the caller's environment.
> + *
> + * In a secure environment, libraries and any executable dependencies should
> + * also be checked.  For instance dynamic linking should make sure that all
> + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> + * LD_PRELOAD).  For such secure execution environment to make sense, only
> + * trusted code should be executable, which also requires integrity guarantees.
> + *
> + * To avoid race conditions leading to time-of-check to time-of-use issues,
> + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> + * descriptor instead of a path.
> + */
> +#define AT_CHECK               0x10000
> +
>  #if defined(__KERNEL__)
>  #define AT_GETATTR_NOSEC       0x80000000
>  #endif
> diff --git a/kernel/audit.h b/kernel/audit.h
> index a60d2840559e..8ebdabd2ab81 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -197,6 +197,7 @@ struct audit_context {
>                 struct open_how openat2;
>                 struct {
>                         int                     argc;
> +                       bool                    is_check;
>                 } execve;
>                 struct {
>                         char                    *name;
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 6f0d6fb6523f..b6316e284342 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
>
>         context->type = AUDIT_EXECVE;
>         context->execve.argc = bprm->argc;
> +       context->execve.is_check = bprm->is_check;
>  }
>
>
> --
> 2.45.2
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-17  6:33   ` Jeff Xu
@ 2024-07-17  8:26     ` Steve Dower
  2024-07-17 10:00       ` Mickaël Salaün
  2024-07-17 10:01     ` Mickaël Salaün
  1 sibling, 1 reply; 103+ messages in thread
From: Steve Dower @ 2024-07-17  8:26 UTC (permalink / raw)
  To: Jeff Xu, Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On 17/07/2024 07:33, Jeff Xu wrote:
> Consider those cases: I think:
> a> relying purely on userspace for enforcement does't seem to be
> effective,  e.g. it is trivial  to call open(), then mmap() it into
> executable memory.

If there's a way to do this without running executable code that had to 
pass a previous execveat() check, then yeah, it's not effective (e.g. a 
Python interpreter that *doesn't* enforce execveat() is a trivial way to 
do it).

Once arbitrary code is running, all bets are off. So long as all 
arbitrary code is being checked itself, it's allowed to do things that 
would bypass later checks (and it's up to whoever audited it in the 
first place to prevent this by not giving it the special mark that 
allows it to pass the check).

> b> if both user space and kernel need to call AT_CHECK, the faccessat
> seems to be a better place for AT_CHECK, e.g. kernel can call
> do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> avoid complicating the execveat() code path.
> 
> What do you think ?
> 
> Thanks
> -Jeff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-17  8:26     ` Steve Dower
@ 2024-07-17 10:00       ` Mickaël Salaün
  2024-07-18  1:02         ` Andy Lutomirski
  2024-07-18  1:51         ` Jeff Xu
  0 siblings, 2 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-17 10:00 UTC (permalink / raw)
  To: Steve Dower
  Cc: Jeff Xu, Al Viro, Christian Brauner, Kees Cook, Linus Torvalds,
	Paul Moore, Theodore Ts'o, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Grubb,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module

On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> On 17/07/2024 07:33, Jeff Xu wrote:
> > Consider those cases: I think:
> > a> relying purely on userspace for enforcement does't seem to be
> > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > executable memory.
> 
> If there's a way to do this without running executable code that had to pass
> a previous execveat() check, then yeah, it's not effective (e.g. a Python
> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> 
> Once arbitrary code is running, all bets are off. So long as all arbitrary
> code is being checked itself, it's allowed to do things that would bypass
> later checks (and it's up to whoever audited it in the first place to
> prevent this by not giving it the special mark that allows it to pass the
> check).

Exactly.  As explained in the patches, one crucial prerequisite is that
the executable code is trusted, and the system must provide integrity
guarantees.  We cannot do anything without that.  This patches series is
a building block to fix a blind spot on Linux systems to be able to
fully control executability.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-17  6:33   ` Jeff Xu
  2024-07-17  8:26     ` Steve Dower
@ 2024-07-17 10:01     ` Mickaël Salaün
  2024-07-18  2:08       ` Jeff Xu
  1 sibling, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-17 10:01 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> >
> > This is different than faccessat(2) which only checks file access
> > rights, but not the full context e.g. mount point's noexec, stack limit,
> > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > real execution, user space gets the same error codes.
> >
> So we concluded that execveat(AT_CHECK) will be used to check the
> exec, shared object, script and config file (such as seccomp config),

"config file" that contains executable code.

> I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> different use cases:
> 
> execveat clearly has less code change, but that also means: we can't
> add logic specific to exec (i.e. logic that can't be applied to
> config) for this part (from do_execveat_common to
> security_bprm_creds_for_exec) in future.  This would require some
> agreement/sign-off, I'm not sure from whom.

I'm not sure to follow. We could still add new flags, but for now I
don't see use cases.  This patch series is not meant to handle all
possible "trust checks", only executable code, which makes sense for the
kernel.

If we want other checks, we'll need to clearly define their semantic and
align with the kernel.  faccessat2(2) might be used to check other file
properties, but the executable property is not only defined by the file
attributes.

> 
> --------------------------
> now looked at user cases (focus on elf for now)
> 
> 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> 
> 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> /usr/bin/some.out will pass AT_CHECK
> 
> 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> /tmp/ld.so (I assume this is possible  for elf header will set the
> path for ld.so because kernel has no knowledge of that, and
> binfmt_elf.c allocate memory for ld.so during execveat call)
> 
> 4> dlopen(/tmp/a.so)
> I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> into memory.
> 
> For case 1>
> Alternative solution: Because AT_CHECK is always called, I think we
> can avoid the first AT_CHECK call, and check during execveat(fd),

There is no need to use AT_CHECK if we're going to call execveat(2) on
the same file descriptor.  By design, AT_CHECK is implicit for any
execve(2).

> this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> benefit is that there is no TOCTOU and save one round trip of syscall
> for a succesful execveat() case.

As long as user space uses the same file descriptor, there is no TOCTOU.

SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
the user space security policy.  The kernel already enforces the same
security policy for any execve(2), whatever are the calling process's
securebits.

> 
> For case 2>
> dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> However,  the process can all open then mmap() directly, it seems
> minimal effort for an attacker to walk around such a defence from
> dynamic linker.

Which process?  What do you mean by "can all open then mmap() directly"?

In this context the dynamic linker (like its parent processes) is
trusted (guaranteed by the system).

For case 2, the dynamic linker must check with AT_CHECK all files that
will be mapped, which include /usr/bin/some.out and /tmp/a.so

> 
> Alternative solution:
> dynamic linker call AT_CHECK for each .so, kernel will save the state
> (associated with fd)
> kernel will check fd state at the time of mmap(fd, executable memory)
> and enforce SECBIT_EXEC_RESTRICT_FILE = 1

The idea with AT_CHECK is that there is no kernel side effect, no extra
kernel state, and the semantic is the same as with execve(2).

This also enables us to check file's executable permission and ignore
it, which is useful in a "permissive mode" when preparing for a
migration without breaking a system, or to do extra integrity checks.
BTW, this use case would also be more complex with a new openat2(2) flag
like the original O_MAYEXEC.

> 
> Alternative solution 2:
> a new syscall to load the .so and enforce the AT_CHECK in kernel

A new syscall would be overkill for this feature.  Please see Linus's
comment.

> 
> This also means, for the solution to be complete, we might want to
> block creation of executable anonymous memory (e.g. by seccomp, ),

How seccomp could create anonymous memory in user space?
seccomp filters should be treated (and checked with AT_CHECK) as
executable code anyway.

> unless the user space can harden the creation of  executable anonymous
> memory in some way.

User space is already in charge of mmapping its own memory.  I don't see
what is missing.

> 
> For case 3>
> I think binfmt_elf.c in the kernel needs to check the ld.so to make
> sure it passes AT_CHECK, before loading it into memory.

All ELF dependencies are opened and checked with open_exec(), which
perform the main executability checks (with the __FMODE_EXEC flag).
Did I miss something?

However, we must be careful with programs using the (deprecated)
uselib(2). They should also check with AT_CHECK because this syscall
opens the shared library without __FMODE_EXEC (similar to a simple file
open). See
https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/

> 
> For case 4>
> same as case 2.
> 
> Consider those cases: I think:
> a> relying purely on userspace for enforcement does't seem to be
> effective,  e.g. it is trivial  to call open(), then mmap() it into
> executable memory.

As Steve explained (and is also explained in the patches), it is trivial
if the attacker can already execute its own code, which is too late to
enforce any script execution control.

> b> if both user space and kernel need to call AT_CHECK, the faccessat
> seems to be a better place for AT_CHECK, e.g. kernel can call
> do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> avoid complicating the execveat() code path.

A previous version of this patches series already patched faccessat(2),
but this is not the right place.  faccessat2(2) is dedicated to check
file permissions, not executability (e.g. with mount's noexec).

> 
> What do you think ?

I think there are some misunderstandings.  Please let me know if it's
clearer now.

> 
> Thanks
> -Jeff
> 
> > With the information that a script interpreter is about to interpret a
> > script, an LSM security policy can adjust caller's access rights or log
> > execution request as for native script execution (e.g. role transition).
> > This is possible thanks to the call to security_bprm_creds_for_exec().
> >
> > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > current kernel code should not be a security issue (e.g. unexpected role
> > transition).  LSMs willing to update the caller's credential could now
> > do so when bprm->is_check is set.  Of course, such policy change should
> > be in line with the new user space code.
> >
> > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > make sense for the kernel to parse the checked files, look for
> > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > if the format is unknown.  Because of that, security_bprm_check() is
> > never called when AT_CHECK is used.
> >
> > It should be noted that script interpreters cannot directly use
> > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > executed to interpret the script.  Unlike the kernel, script
> > interpreters may just interpret the shebang as a simple comment, which
> > should not change for backward compatibility reasons.
> >
> > Because scripts or libraries files might not currently have the
> > executable permission set, or because we might want specific users to be
> > allowed to run arbitrary scripts, the following patch provides a dynamic
> > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> >
> > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > This patch has been used for more than a decade with customized script
> > interpreters.  Some examples can be found here:
> > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> >
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Paul Moore <paul@paul-moore.com>
> > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > ---
> >
> > New design since v18:
> > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > ---
> >  fs/exec.c                  |  5 +++--
> >  include/linux/binfmts.h    |  7 ++++++-
> >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> >  kernel/audit.h             |  1 +
> >  kernel/auditsc.c           |  1 +
> >  5 files changed, 41 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 40073142288f..ea2a1867afdc 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> >                 .lookup_flags = LOOKUP_FOLLOW,
> >         };
> >
> > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> >                 return ERR_PTR(-EINVAL);
> >         if (flags & AT_SYMLINK_NOFOLLOW)
> >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> >                 bprm->filename = bprm->fdpath;
> >         }
> >         bprm->interp = bprm->filename;
> > +       bprm->is_check = !!(flags & AT_CHECK);
> >
> >         retval = bprm_mm_init(bprm);
> >         if (!retval)
> > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> >
> >         /* Set the unchanging part of bprm->cred */
> >         retval = security_bprm_creds_for_exec(bprm);
> > -       if (retval)
> > +       if (retval || bprm->is_check)
> >                 goto out;
> >
> >         retval = exec_binprm(bprm);
> > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > index 70f97f685bff..8ff9c9e33aed 100644
> > --- a/include/linux/binfmts.h
> > +++ b/include/linux/binfmts.h
> > @@ -42,7 +42,12 @@ struct linux_binprm {
> >                  * Set when errors can no longer be returned to the
> >                  * original userspace.
> >                  */
> > -               point_of_no_return:1;
> > +               point_of_no_return:1,
> > +               /*
> > +                * Set by user space to check executability according to the
> > +                * caller's environment.
> > +                */
> > +               is_check:1;
> >         struct file *executable; /* Executable to pass to the interpreter */
> >         struct file *interpreter;
> >         struct file *file;
> > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > index c0bcc185fa48..bcd05c59b7df 100644
> > --- a/include/uapi/linux/fcntl.h
> > +++ b/include/uapi/linux/fcntl.h
> > @@ -118,6 +118,36 @@
> >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> >                                         compare object identity and may not
> >                                         be usable to open_by_handle_at(2) */
> > +
> > +/*
> > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > + * of this file would be allowed, ignoring the file format and then the related
> > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > + * thread.  See securebits.h documentation.
> > + *
> > + * Programs should use this check to apply kernel-level checks against files
> > + * that are not directly executed by the kernel but directly passed to a user
> > + * space interpreter instead.  All files that contain executable code, from the
> > + * point of view of the interpreter, should be checked.  The main purpose of
> > + * this flag is to improve the security and consistency of an execution
> > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > + * instance, this can be used to check if a file is trustworthy according to
> > + * the caller's environment.
> > + *
> > + * In a secure environment, libraries and any executable dependencies should
> > + * also be checked.  For instance dynamic linking should make sure that all
> > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > + * trusted code should be executable, which also requires integrity guarantees.
> > + *
> > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > + * descriptor instead of a path.
> > + */
> > +#define AT_CHECK               0x10000
> > +
> >  #if defined(__KERNEL__)
> >  #define AT_GETATTR_NOSEC       0x80000000
> >  #endif
> > diff --git a/kernel/audit.h b/kernel/audit.h
> > index a60d2840559e..8ebdabd2ab81 100644
> > --- a/kernel/audit.h
> > +++ b/kernel/audit.h
> > @@ -197,6 +197,7 @@ struct audit_context {
> >                 struct open_how openat2;
> >                 struct {
> >                         int                     argc;
> > +                       bool                    is_check;
> >                 } execve;
> >                 struct {
> >                         char                    *name;
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index 6f0d6fb6523f..b6316e284342 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> >
> >         context->type = AUDIT_EXECVE;
> >         context->execve.argc = bprm->argc;
> > +       context->execve.is_check = bprm->is_check;
> >  }
> >
> >
> > --
> > 2.45.2
> >
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-16 17:47           ` Mickaël Salaün
@ 2024-07-17 17:59             ` Boris Lukashev
  2024-07-18 13:00               ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Boris Lukashev @ 2024-07-17 17:59 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: James Bottomley, Roberto Sassu, Mimi Zohar, Al Viro,
	Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

Apologies, sent from phone so plain-text wasn't flying.
To elaborate a bit on the quick commentary there - i'm the happy
camper behind most of the SSL shells, SSH stuff, AWS shells, and so on
in Metasploit. So please take the following with a grain of
tinfoil-hat salt as i'm well aware that there is no perfect defense
against these things which covers all bases while permitting any level
of sane operation in a general-purpose linux system (also work w/
GrapheneOS which is a far more suitable context for this sort of
thing). Having loosely followed the discussion thread, my offsec-brain
$0.02 are:

Shells are the provenance of the post-exploitation world - it's what
we want to get as a result of the exploit succeeding. So i think we
want to keep clear delineation between exploit and post-exp mitigation
as they're actually separate concerns of the killchain.
1. Command shells tend to differentiate from interpreted or binary
execution environments in their use of POSIX file descriptor
primitives such as pipes. How those are marshalled, chained, and
maintained (in a loop or whatever, hiding args, etc) are the only real
IOCs available at this tier for interdiction as observation of data
flow through the pipes is too onerous and complex. Target systems vary
in the post-exp surfaces exposed (/dev/tcp for example) with the
mechanics of that exposure necessitating adaptation of marshalling,
chaining, and maintenance to fit the environment; but the basic
premise of what forms a command shell cannot be mitigated without
breaking POSIX mechanics themselves - offsec devs are no different
from anyone else, we want our code to utilize architectural primitives
instead of undefined behavior for longevity and ecosystem
persistence/relevance.
2. The conversation about interpreted languages is probably a dead-end
unless you want to neuter the interpreter - check out Spencer
McIntyre's work re Python meterpreter or HDs/mine/etc on the PHP side.
The stagers, loaded contexts, execution patterns, etc are all
trivially modified to avoid detection (private versions not submitted
for free ripping by lazy commercial entities to the FOSS ecosystem,
yet). Dynamic code loading of interpreted languages is trivial and
requires no syscalls, just text/serialized IL/etc. The complexity of
loaded context available permits much more advanced functionality than
we get in most basic command interpreter shells - <advanced evasions
go here before doing something that'll get you caught> sort of thing.
3. Lastly, binary payloads such as Mettle have their own advantages re
portability, skipping over libc, etc but need to be "harnessed-in"
from say a command-injection exploit via memfd or similar. We haven't
published our memfd stagers while the relevant sysctl gets adopted
more widely, but we've had them for a long time (meaning real bad guys
have as well) and have other ways to get binary content into
executable memory or make memory containing it executable
(to-the-gills Grsec/PaX systems notwithstanding). IMO, interdiction of
the harnessed injection from a command context is the last time when
anything of use can be done at this layer unless we're sure that we
can trace all related and potentially async (not within the process
tree anyway) syscalls emanating from what happens next. Subsequent
actions are separate "remedial" workflows which is a wholly separate
philosophical discussion about how to handle having been compromised
already.

Security is very much not binary and in that vein of logic i think
that we should probably define our shades of gray as ranges of what we
want to protect/how and at what operational cost to then permit
"dial-in" knobs to actually garner adoption from a broad range of
systems outside the "real hardened efforts." At some point this turns
into "limit users to sftp or git shells" which is a perfectly valid
approach when the context permits that level of draconian restriction
but the architectural breakdown of "native command, interpreted
context, fully binary" shell types is pretty universal with new ones
being API access into runtimes of clouds (SSM/serial/etc) which have
their own set of limitations at execution and interface layers.
Organizing defensive functions to handle the primitives necessary for
each of these shell classes would likely help stratify/simplify this
conversation and allow for more granular tasking toward those specific
objectives.

Thanks,
-Boris


On Tue, Jul 16, 2024 at 1:48 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> (adding back other people in Cc)
>
> On Tue, Jul 16, 2024 at 01:29:43PM -0400, Boris Lukashev wrote:
> > Wouldn't count those shell chickens - awk alone is enough and we can
> > use ssh and openssl clients (all in metasploit public code). As one of
> > the people who makes novel shell types, I can assure you that this
> > effort is only going to slow skiddies and only until the rest of us
> > publish mitigations for this mitigation :)
>
> Security is not binary. :)
>
> Not all Linux systems are equals. Some hardened systems need this kind
> of feature and they can get guarantees because they fully control and
> trust their executable binaries (e.g. CLIP OS, chromeOS) or they
> properly sandbox them.  See context in the cover letter.
>
> awk is a script interpreter that should be patched too, like other Linux
> tools.
>
> >
> > -Boris (RageLtMan)
> >
> > On July 16, 2024 12:12:49 PM EDT, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > >On Tue, 2024-07-16 at 17:57 +0200, Roberto Sassu wrote:
> > >> But the Clip OS 4 patch does not cover the redirection case:
> > >>
> > >> # ./bash < /root/test.sh
> > >> Hello World
> > >>
> > >> Do you have a more recent patch for that?
> > >
> > >How far down the rabbit hole do you want to go?  You can't forbid a
> > >shell from executing commands from stdin because logging in then won't
> > >work.  It may be possible to allow from a tty backed file and not from
> > >a file backed one, but you still have the problem of the attacker
> > >manually typing in the script.
> > >
> > >The saving grace for this for shells is that they pretty much do
> > >nothing on their own (unlike python) so you can still measure all the
> > >executables they call out to, which provides reasonable safety.
> > >
> > >James
> > >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-17 10:00       ` Mickaël Salaün
@ 2024-07-18  1:02         ` Andy Lutomirski
  2024-07-18 12:22           ` Mickaël Salaün
  2024-07-18  1:51         ` Jeff Xu
  1 sibling, 1 reply; 103+ messages in thread
From: Andy Lutomirski @ 2024-07-18  1:02 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Steve Dower, Jeff Xu, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
>>> On 17/07/2024 07:33, Jeff Xu wrote:
>>> Consider those cases: I think:
>>> a> relying purely on userspace for enforcement does't seem to be
>>> effective,  e.g. it is trivial  to call open(), then mmap() it into
>>> executable memory.
>>
>> If there's a way to do this without running executable code that had to pass
>> a previous execveat() check, then yeah, it's not effective (e.g. a Python
>> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
>>
>> Once arbitrary code is running, all bets are off. So long as all arbitrary
>> code is being checked itself, it's allowed to do things that would bypass
>> later checks (and it's up to whoever audited it in the first place to
>> prevent this by not giving it the special mark that allows it to pass the
>> check).
>
> Exactly.  As explained in the patches, one crucial prerequisite is that
> the executable code is trusted, and the system must provide integrity
> guarantees.  We cannot do anything without that.  This patches series is
> a building block to fix a blind spot on Linux systems to be able to
> fully control executability.

Circling back to my previous comment (did that ever get noticed?), I
don’t think this is quite right:

https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/

On a basic system configuration, a given path either may or may not be
executed. And maybe that path has some integrity check (dm-verity,
etc).  So the kernel should tell the interpreter/loader whether the
target may be executed. All fine.

 But I think the more complex cases are more interesting, and the
“execute a program” process IS NOT BINARY.  An attempt to execute can
be rejected outright, or it can be allowed *with a change to creds or
security context*.  It would be entirely reasonable to have a policy
that allows execution of non-integrity-checked files but in a very
locked down context only.

So… shouldn’t a patch series to this effect actually support this?

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-17 10:00       ` Mickaël Salaün
  2024-07-18  1:02         ` Andy Lutomirski
@ 2024-07-18  1:51         ` Jeff Xu
  2024-07-18 12:23           ` Mickaël Salaün
  1 sibling, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-18  1:51 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Steve Dower, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module,
	Elliott Hughes

On Wed, Jul 17, 2024 at 3:00 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> > On 17/07/2024 07:33, Jeff Xu wrote:
> > > Consider those cases: I think:
> > > a> relying purely on userspace for enforcement does't seem to be
> > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > executable memory.
> >
> > If there's a way to do this without running executable code that had to pass
> > a previous execveat() check, then yeah, it's not effective (e.g. a Python
> > interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> >
> > Once arbitrary code is running, all bets are off. So long as all arbitrary
> > code is being checked itself, it's allowed to do things that would bypass
> > later checks (and it's up to whoever audited it in the first place to
> > prevent this by not giving it the special mark that allows it to pass the
> > check).
>
We will want to define what is considered as "arbitrary code is running"

Using an example of ROP, attackers change the return address in stack,
e.g. direct the execution flow to a gauge to call "ld.so /tmp/a.out",
do you consider "arbitrary code is running" when stack is overwritten
? or after execve() is called.
If it is later, this patch can prevent "ld.so /tmp/a.out".

> Exactly.  As explained in the patches, one crucial prerequisite is that
> the executable code is trusted, and the system must provide integrity
> guarantees.  We cannot do anything without that.  This patches series is
> a building block to fix a blind spot on Linux systems to be able to
> fully control executability.

Even trusted executable can have a bug.

I'm thinking in the context of ChromeOS, where all its system services
are from trusted partitions, and legit code won't load .so from a
non-exec mount.  But we want to sandbox those services, so even under
some kind of ROP attack, the service still won't be able to load .so
from /tmp. Of course, if an attacker can already write arbitrary
length of data into the stack, it is probably already a game over.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-17 10:01     ` Mickaël Salaün
@ 2024-07-18  2:08       ` Jeff Xu
  2024-07-18 12:24         ` Mickaël Salaün
  2024-07-18 14:46         ` enh
  0 siblings, 2 replies; 103+ messages in thread
From: Jeff Xu @ 2024-07-18  2:08 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > allowed for execution.  The main use case is for script interpreters and
> > > dynamic linkers to check execution permission according to the kernel's
> > > security policy. Another use case is to add context to access logs e.g.,
> > > which script (instead of interpreter) accessed a file.  As any
> > > executable code, scripts could also use this check [1].
> > >
> > > This is different than faccessat(2) which only checks file access
> > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > real execution, user space gets the same error codes.
> > >
> > So we concluded that execveat(AT_CHECK) will be used to check the
> > exec, shared object, script and config file (such as seccomp config),
>
> "config file" that contains executable code.
>
Is seccomp config  considered as "contains executable code", seccomp
config is translated into bpf, so maybe yes ? but bpf is running in
the kernel.

> > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > different use cases:
> >
> > execveat clearly has less code change, but that also means: we can't
> > add logic specific to exec (i.e. logic that can't be applied to
> > config) for this part (from do_execveat_common to
> > security_bprm_creds_for_exec) in future.  This would require some
> > agreement/sign-off, I'm not sure from whom.
>
> I'm not sure to follow. We could still add new flags, but for now I
> don't see use cases.  This patch series is not meant to handle all
> possible "trust checks", only executable code, which makes sense for the
> kernel.
>
I guess the "configfile" discussion is where I get confused, at one
point, I think this would become a generic "trust checks" api for
everything related to "generating executable code", e.g. javascript,
java code, and more.
We will want to clearly define the scope of execveat(AT_CHECK)

> If we want other checks, we'll need to clearly define their semantic and
> align with the kernel.  faccessat2(2) might be used to check other file
> properties, but the executable property is not only defined by the file
> attributes.
>
Agreed.

> >
> > --------------------------
> > now looked at user cases (focus on elf for now)
> >
> > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> >
> > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> > /usr/bin/some.out will pass AT_CHECK
> >
> > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> > /tmp/ld.so (I assume this is possible  for elf header will set the
> > path for ld.so because kernel has no knowledge of that, and
> > binfmt_elf.c allocate memory for ld.so during execveat call)
> >
> > 4> dlopen(/tmp/a.so)
> > I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> > into memory.
> >
> > For case 1>
> > Alternative solution: Because AT_CHECK is always called, I think we
> > can avoid the first AT_CHECK call, and check during execveat(fd),
>
> There is no need to use AT_CHECK if we're going to call execveat(2) on
> the same file descriptor.  By design, AT_CHECK is implicit for any
> execve(2).
>
Yes. I realized I was wrong to say that ld.so will call execve() for
/tmp/a.out, there is no execve() call, otherwise it would have been
blocked already today.
The ld.so will  mmap the /tmp/a.out directly.  So case 1 is no
different than case 2 and 4.  ( the elf objects are mapped to memory
by dynamic linker.)
I'm not familiar with dynamic linker, Florian is on this thread, and
can help to correct me if my guess is wrong.

> > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> > benefit is that there is no TOCTOU and save one round trip of syscall
> > for a succesful execveat() case.
>
> As long as user space uses the same file descriptor, there is no TOCTOU.
>
> SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
> the user space security policy.  The kernel already enforces the same
> security policy for any execve(2), whatever are the calling process's
> securebits.
>
> >
> > For case 2>
> > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> > However,  the process can all open then mmap() directly, it seems
> > minimal effort for an attacker to walk around such a defence from
> > dynamic linker.
>
> Which process?  What do you mean by "can all open then mmap() directly"?
>
> In this context the dynamic linker (like its parent processes) is
> trusted (guaranteed by the system).
>
> For case 2, the dynamic linker must check with AT_CHECK all files that
> will be mapped, which include /usr/bin/some.out and /tmp/a.so
>
My point is that the process can work around this by mmap() the file directly.

> >
> > Alternative solution:
> > dynamic linker call AT_CHECK for each .so, kernel will save the state
> > (associated with fd)
> > kernel will check fd state at the time of mmap(fd, executable memory)
> > and enforce SECBIT_EXEC_RESTRICT_FILE = 1
>
> The idea with AT_CHECK is that there is no kernel side effect, no extra
> kernel state, and the semantic is the same as with execve(2).
>
> This also enables us to check file's executable permission and ignore
> it, which is useful in a "permissive mode" when preparing for a
> migration without breaking a system, or to do extra integrity checks.
For preparing a migration (detect all violations), this is useful.
But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this
seems to be weak, at least for elf loading case.

> BTW, this use case would also be more complex with a new openat2(2) flag
> like the original O_MAYEXEC.
>
> >
> > Alternative solution 2:
> > a new syscall to load the .so and enforce the AT_CHECK in kernel
>
> A new syscall would be overkill for this feature.  Please see Linus's
> comment.
>
maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap()
to executable memory.

> >
> > This also means, for the solution to be complete, we might want to
> > block creation of executable anonymous memory (e.g. by seccomp, ),
>
> How seccomp could create anonymous memory in user space?
> seccomp filters should be treated (and checked with AT_CHECK) as
> executable code anyway.
>
> > unless the user space can harden the creation of  executable anonymous
> > memory in some way.
>
> User space is already in charge of mmapping its own memory.  I don't see
> what is missing.
>
> >
> > For case 3>
> > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > sure it passes AT_CHECK, before loading it into memory.
>
> All ELF dependencies are opened and checked with open_exec(), which
> perform the main executability checks (with the __FMODE_EXEC flag).
> Did I miss something?
>
I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
The app can choose its own dynamic linker path during build, (maybe
even statically link one ?)  This is another reason that relying on a
userspace only is not enough.

> However, we must be careful with programs using the (deprecated)
> uselib(2). They should also check with AT_CHECK because this syscall
> opens the shared library without __FMODE_EXEC (similar to a simple file
> open). See
> https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/
>
> >
> > For case 4>
> > same as case 2.
> >
> > Consider those cases: I think:
> > a> relying purely on userspace for enforcement does't seem to be
> > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > executable memory.
>
> As Steve explained (and is also explained in the patches), it is trivial
> if the attacker can already execute its own code, which is too late to
> enforce any script execution control.
>
> > b> if both user space and kernel need to call AT_CHECK, the faccessat
> > seems to be a better place for AT_CHECK, e.g. kernel can call
> > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> > avoid complicating the execveat() code path.
>
> A previous version of this patches series already patched faccessat(2),
> but this is not the right place.  faccessat2(2) is dedicated to check
> file permissions, not executability (e.g. with mount's noexec).
>
> >
> > What do you think ?
>
> I think there are some misunderstandings.  Please let me know if it's
> clearer now.
>
I'm still not sure about the user case for dynamic linker (elf
loading) case. Maybe this patch is more suitable for scripts?
A detailed user case will help demonstrate the use case for dynamic
linker, e.g. what kind of app will benefit from
SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
dealing with , what kind of attack chain we blocked as a result.

> >
> > Thanks
> > -Jeff
> >
> > > With the information that a script interpreter is about to interpret a
> > > script, an LSM security policy can adjust caller's access rights or log
> > > execution request as for native script execution (e.g. role transition).
> > > This is possible thanks to the call to security_bprm_creds_for_exec().
> > >
> > > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > > current kernel code should not be a security issue (e.g. unexpected role
> > > transition).  LSMs willing to update the caller's credential could now
> > > do so when bprm->is_check is set.  Of course, such policy change should
> > > be in line with the new user space code.
> > >
> > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > make sense for the kernel to parse the checked files, look for
> > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > if the format is unknown.  Because of that, security_bprm_check() is
> > > never called when AT_CHECK is used.
> > >
> > > It should be noted that script interpreters cannot directly use
> > > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > > executed to interpret the script.  Unlike the kernel, script
> > > interpreters may just interpret the shebang as a simple comment, which
> > > should not change for backward compatibility reasons.
> > >
> > > Because scripts or libraries files might not currently have the
> > > executable permission set, or because we might want specific users to be
> > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> > >
> > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > This patch has been used for more than a decade with customized script
> > > interpreters.  Some examples can be found here:
> > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > >
> > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > > ---
> > >
> > > New design since v18:
> > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > ---
> > >  fs/exec.c                  |  5 +++--
> > >  include/linux/binfmts.h    |  7 ++++++-
> > >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> > >  kernel/audit.h             |  1 +
> > >  kernel/auditsc.c           |  1 +
> > >  5 files changed, 41 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/exec.c b/fs/exec.c
> > > index 40073142288f..ea2a1867afdc 100644
> > > --- a/fs/exec.c
> > > +++ b/fs/exec.c
> > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > >                 .lookup_flags = LOOKUP_FOLLOW,
> > >         };
> > >
> > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> > >                 return ERR_PTR(-EINVAL);
> > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > >                 bprm->filename = bprm->fdpath;
> > >         }
> > >         bprm->interp = bprm->filename;
> > > +       bprm->is_check = !!(flags & AT_CHECK);
> > >
> > >         retval = bprm_mm_init(bprm);
> > >         if (!retval)
> > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > >
> > >         /* Set the unchanging part of bprm->cred */
> > >         retval = security_bprm_creds_for_exec(bprm);
> > > -       if (retval)
> > > +       if (retval || bprm->is_check)
> > >                 goto out;
> > >
> > >         retval = exec_binprm(bprm);
> > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > index 70f97f685bff..8ff9c9e33aed 100644
> > > --- a/include/linux/binfmts.h
> > > +++ b/include/linux/binfmts.h
> > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > >                  * Set when errors can no longer be returned to the
> > >                  * original userspace.
> > >                  */
> > > -               point_of_no_return:1;
> > > +               point_of_no_return:1,
> > > +               /*
> > > +                * Set by user space to check executability according to the
> > > +                * caller's environment.
> > > +                */
> > > +               is_check:1;
> > >         struct file *executable; /* Executable to pass to the interpreter */
> > >         struct file *interpreter;
> > >         struct file *file;
> > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > index c0bcc185fa48..bcd05c59b7df 100644
> > > --- a/include/uapi/linux/fcntl.h
> > > +++ b/include/uapi/linux/fcntl.h
> > > @@ -118,6 +118,36 @@
> > >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> > >                                         compare object identity and may not
> > >                                         be usable to open_by_handle_at(2) */
> > > +
> > > +/*
> > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > > + * of this file would be allowed, ignoring the file format and then the related
> > > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > > + * thread.  See securebits.h documentation.
> > > + *
> > > + * Programs should use this check to apply kernel-level checks against files
> > > + * that are not directly executed by the kernel but directly passed to a user
> > > + * space interpreter instead.  All files that contain executable code, from the
> > > + * point of view of the interpreter, should be checked.  The main purpose of
> > > + * this flag is to improve the security and consistency of an execution
> > > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > > + * instance, this can be used to check if a file is trustworthy according to
> > > + * the caller's environment.
> > > + *
> > > + * In a secure environment, libraries and any executable dependencies should
> > > + * also be checked.  For instance dynamic linking should make sure that all
> > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > > + * trusted code should be executable, which also requires integrity guarantees.
> > > + *
> > > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > > + * descriptor instead of a path.
> > > + */
> > > +#define AT_CHECK               0x10000
> > > +
> > >  #if defined(__KERNEL__)
> > >  #define AT_GETATTR_NOSEC       0x80000000
> > >  #endif
> > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > index a60d2840559e..8ebdabd2ab81 100644
> > > --- a/kernel/audit.h
> > > +++ b/kernel/audit.h
> > > @@ -197,6 +197,7 @@ struct audit_context {
> > >                 struct open_how openat2;
> > >                 struct {
> > >                         int                     argc;
> > > +                       bool                    is_check;
> > >                 } execve;
> > >                 struct {
> > >                         char                    *name;
> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index 6f0d6fb6523f..b6316e284342 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > >
> > >         context->type = AUDIT_EXECVE;
> > >         context->execve.argc = bprm->argc;
> > > +       context->execve.is_check = bprm->is_check;
> > >  }
> > >
> > >
> > > --
> > > 2.45.2
> > >
> >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18  1:02         ` Andy Lutomirski
@ 2024-07-18 12:22           ` Mickaël Salaün
  2024-07-20  1:59             ` Andy Lutomirski
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-18 12:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Steve Dower, Jeff Xu, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module,
	Elliott Hughes

On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote:
> > On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> >>> On 17/07/2024 07:33, Jeff Xu wrote:
> >>> Consider those cases: I think:
> >>> a> relying purely on userspace for enforcement does't seem to be
> >>> effective,  e.g. it is trivial  to call open(), then mmap() it into
> >>> executable memory.
> >>
> >> If there's a way to do this without running executable code that had to pass
> >> a previous execveat() check, then yeah, it's not effective (e.g. a Python
> >> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> >>
> >> Once arbitrary code is running, all bets are off. So long as all arbitrary
> >> code is being checked itself, it's allowed to do things that would bypass
> >> later checks (and it's up to whoever audited it in the first place to
> >> prevent this by not giving it the special mark that allows it to pass the
> >> check).
> >
> > Exactly.  As explained in the patches, one crucial prerequisite is that
> > the executable code is trusted, and the system must provide integrity
> > guarantees.  We cannot do anything without that.  This patches series is
> > a building block to fix a blind spot on Linux systems to be able to
> > fully control executability.
> 
> Circling back to my previous comment (did that ever get noticed?), I

Yes, I replied to your comments.  Did I miss something?

> don’t think this is quite right:
> 
> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/
> 
> On a basic system configuration, a given path either may or may not be
> executed. And maybe that path has some integrity check (dm-verity,
> etc).  So the kernel should tell the interpreter/loader whether the
> target may be executed. All fine.
> 
>  But I think the more complex cases are more interesting, and the
> “execute a program” process IS NOT BINARY.  An attempt to execute can
> be rejected outright, or it can be allowed *with a change to creds or
> security context*.  It would be entirely reasonable to have a policy
> that allows execution of non-integrity-checked files but in a very
> locked down context only.

I guess you mean to transition to a sandbox when executing an untrusted
file.  This is a good idea.  I talked about role transition in the
patch's description:

With the information that a script interpreter is about to interpret a
script, an LSM security policy can adjust caller's access rights or log
execution request as for native script execution (e.g. role transition).
This is possible thanks to the call to security_bprm_creds_for_exec().

> 
> So… shouldn’t a patch series to this effect actually support this?
> 

This patch series brings the minimal building blocks to have a
consistent execution environment.  Role transitions for script execution
are left to LSMs.  For instance, we could extend Landlock to
automatically sandbox untrusted scripts.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18  1:51         ` Jeff Xu
@ 2024-07-18 12:23           ` Mickaël Salaün
  2024-07-18 22:54             ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-18 12:23 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Steve Dower, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module,
	Elliott Hughes

On Wed, Jul 17, 2024 at 06:51:11PM -0700, Jeff Xu wrote:
> On Wed, Jul 17, 2024 at 3:00 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> > > On 17/07/2024 07:33, Jeff Xu wrote:
> > > > Consider those cases: I think:
> > > > a> relying purely on userspace for enforcement does't seem to be
> > > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > > executable memory.
> > >
> > > If there's a way to do this without running executable code that had to pass
> > > a previous execveat() check, then yeah, it's not effective (e.g. a Python
> > > interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> > >
> > > Once arbitrary code is running, all bets are off. So long as all arbitrary
> > > code is being checked itself, it's allowed to do things that would bypass
> > > later checks (and it's up to whoever audited it in the first place to
> > > prevent this by not giving it the special mark that allows it to pass the
> > > check).
> >
> We will want to define what is considered as "arbitrary code is running"
> 
> Using an example of ROP, attackers change the return address in stack,
> e.g. direct the execution flow to a gauge to call "ld.so /tmp/a.out",
> do you consider "arbitrary code is running" when stack is overwritten
> ? or after execve() is called.

Yes, ROP is arbitrary code execution (which can be mitigated with CFI).
ROP could be enough to interpret custom commands and create a small
interpreter/VM.

> If it is later, this patch can prevent "ld.so /tmp/a.out".
> 
> > Exactly.  As explained in the patches, one crucial prerequisite is that
> > the executable code is trusted, and the system must provide integrity
> > guarantees.  We cannot do anything without that.  This patches series is
> > a building block to fix a blind spot on Linux systems to be able to
> > fully control executability.
> 
> Even trusted executable can have a bug.

Definitely, but this patch series is dedicated to script execution
control.

> 
> I'm thinking in the context of ChromeOS, where all its system services
> are from trusted partitions, and legit code won't load .so from a
> non-exec mount.  But we want to sandbox those services, so even under
> some kind of ROP attack, the service still won't be able to load .so
> from /tmp. Of course, if an attacker can already write arbitrary
> length of data into the stack, it is probably already a game over.
> 

OK, you want to tie executable file permission to mmap.  That makes
sense if you have a consistent execution model.  This can be enforced by
LSMs.  Contrary to script interpretation which is a full user space
implementation (and then controlled by user space), mmap restrictions
should indeed be enforced by the kernel.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18  2:08       ` Jeff Xu
@ 2024-07-18 12:24         ` Mickaël Salaün
  2024-07-18 13:03           ` James Bottomley
                             ` (2 more replies)
  2024-07-18 14:46         ` enh
  1 sibling, 3 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-18 12:24 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > allowed for execution.  The main use case is for script interpreters and
> > > > dynamic linkers to check execution permission according to the kernel's
> > > > security policy. Another use case is to add context to access logs e.g.,
> > > > which script (instead of interpreter) accessed a file.  As any
> > > > executable code, scripts could also use this check [1].
> > > >
> > > > This is different than faccessat(2) which only checks file access
> > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > real execution, user space gets the same error codes.
> > > >
> > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > exec, shared object, script and config file (such as seccomp config),
> >
> > "config file" that contains executable code.
> >
> Is seccomp config  considered as "contains executable code", seccomp
> config is translated into bpf, so maybe yes ? but bpf is running in
> the kernel.

Because seccomp filters alter syscalls, they are similar to code
injection.

> 
> > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > different use cases:
> > >
> > > execveat clearly has less code change, but that also means: we can't
> > > add logic specific to exec (i.e. logic that can't be applied to
> > > config) for this part (from do_execveat_common to
> > > security_bprm_creds_for_exec) in future.  This would require some
> > > agreement/sign-off, I'm not sure from whom.
> >
> > I'm not sure to follow. We could still add new flags, but for now I
> > don't see use cases.  This patch series is not meant to handle all
> > possible "trust checks", only executable code, which makes sense for the
> > kernel.
> >
> I guess the "configfile" discussion is where I get confused, at one
> point, I think this would become a generic "trust checks" api for
> everything related to "generating executable code", e.g. javascript,
> java code, and more.
> We will want to clearly define the scope of execveat(AT_CHECK)

The line between data and code is blurry.  For instance, a configuration
file can impact the execution flow of a program.  So, where to draw the
line?

It might makes sense to follow the kernel and interpreter semantic: if a
file can be executed by the kernel (e.g. ELF binary, file containing a
shebang, or just configured with binfmt_misc), then this should be
considered as executable code.  This applies to Bash, Python,
Javascript, NodeJS, PE, PHP...  However, we can also make a picture
executable with binfmt_misc.  So, again, where to draw the line?

I'd recommend to think about interaction with the outside, through
function calls, IPCs, syscalls...  For instance, "running" an image
should not lead to reading or writing to arbitrary files, or accessing
the network, but in practice it is legitimate for some file formats...
PostScript is a programming language, but mostly used to draw pictures.
So, again, where to draw the line?

We should follow the principle of least astonishment.  What most users
would expect?  This should follow the *common usage* of executable
files.  At the end, the script interpreters will be patched by security
folks for security reasons.  I think the right question to ask should
be: could this file format be (ab)used to leak or modify arbitrary
files, or to perform arbitrary syscalls?  If the answer is yes, then it
should be checked for executability.  Of course, this excludes bugs
exploited in the file format parser.

I'll extend the next patch series with this rationale.

> 
> > If we want other checks, we'll need to clearly define their semantic and
> > align with the kernel.  faccessat2(2) might be used to check other file
> > properties, but the executable property is not only defined by the file
> > attributes.
> >
> Agreed.
> 
> > >
> > > --------------------------
> > > now looked at user cases (focus on elf for now)
> > >
> > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> > >
> > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> > > /usr/bin/some.out will pass AT_CHECK
> > >
> > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> > > /tmp/ld.so (I assume this is possible  for elf header will set the
> > > path for ld.so because kernel has no knowledge of that, and
> > > binfmt_elf.c allocate memory for ld.so during execveat call)
> > >
> > > 4> dlopen(/tmp/a.so)
> > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> > > into memory.
> > >
> > > For case 1>
> > > Alternative solution: Because AT_CHECK is always called, I think we
> > > can avoid the first AT_CHECK call, and check during execveat(fd),
> >
> > There is no need to use AT_CHECK if we're going to call execveat(2) on
> > the same file descriptor.  By design, AT_CHECK is implicit for any
> > execve(2).
> >
> Yes. I realized I was wrong to say that ld.so will call execve() for
> /tmp/a.out, there is no execve() call, otherwise it would have been
> blocked already today.
> The ld.so will  mmap the /tmp/a.out directly.  So case 1 is no
> different than case 2 and 4.  ( the elf objects are mapped to memory
> by dynamic linker.)
> I'm not familiar with dynamic linker, Florian is on this thread, and
> can help to correct me if my guess is wrong.
> 
> > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> > > benefit is that there is no TOCTOU and save one round trip of syscall
> > > for a succesful execveat() case.
> >
> > As long as user space uses the same file descriptor, there is no TOCTOU.
> >
> > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
> > the user space security policy.  The kernel already enforces the same
> > security policy for any execve(2), whatever are the calling process's
> > securebits.
> >
> > >
> > > For case 2>
> > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> > > However,  the process can all open then mmap() directly, it seems
> > > minimal effort for an attacker to walk around such a defence from
> > > dynamic linker.
> >
> > Which process?  What do you mean by "can all open then mmap() directly"?
> >
> > In this context the dynamic linker (like its parent processes) is
> > trusted (guaranteed by the system).
> >
> > For case 2, the dynamic linker must check with AT_CHECK all files that
> > will be mapped, which include /usr/bin/some.out and /tmp/a.so
> >
> My point is that the process can work around this by mmap() the file directly.

Yes, see my answer in the other email. The process is trusted.

> 
> > >
> > > Alternative solution:
> > > dynamic linker call AT_CHECK for each .so, kernel will save the state
> > > (associated with fd)
> > > kernel will check fd state at the time of mmap(fd, executable memory)
> > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1
> >
> > The idea with AT_CHECK is that there is no kernel side effect, no extra
> > kernel state, and the semantic is the same as with execve(2).
> >
> > This also enables us to check file's executable permission and ignore
> > it, which is useful in a "permissive mode" when preparing for a
> > migration without breaking a system, or to do extra integrity checks.
> For preparing a migration (detect all violations), this is useful.
> But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this
> seems to be weak, at least for elf loading case.

We could add more restrictions, but that is outside the scope of this
patch series.

> 
> > BTW, this use case would also be more complex with a new openat2(2) flag
> > like the original O_MAYEXEC.
> >
> > >
> > > Alternative solution 2:
> > > a new syscall to load the .so and enforce the AT_CHECK in kernel
> >
> > A new syscall would be overkill for this feature.  Please see Linus's
> > comment.
> >
> maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap()
> to executable memory.

OK, this is another story.

> 
> > >
> > > This also means, for the solution to be complete, we might want to
> > > block creation of executable anonymous memory (e.g. by seccomp, ),
> >
> > How seccomp could create anonymous memory in user space?
> > seccomp filters should be treated (and checked with AT_CHECK) as
> > executable code anyway.
> >
> > > unless the user space can harden the creation of  executable anonymous
> > > memory in some way.
> >
> > User space is already in charge of mmapping its own memory.  I don't see
> > what is missing.
> >
> > >
> > > For case 3>
> > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > sure it passes AT_CHECK, before loading it into memory.
> >
> > All ELF dependencies are opened and checked with open_exec(), which
> > perform the main executability checks (with the __FMODE_EXEC flag).
> > Did I miss something?
> >
> I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> The app can choose its own dynamic linker path during build, (maybe
> even statically link one ?)  This is another reason that relying on a
> userspace only is not enough.

The kernel calls open_exec() on all dependencies, including
ld-linux-x86-64.so.2, so these files are checked for executability too.

> 
> > However, we must be careful with programs using the (deprecated)
> > uselib(2). They should also check with AT_CHECK because this syscall
> > opens the shared library without __FMODE_EXEC (similar to a simple file
> > open). See
> > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/
> >
> > >
> > > For case 4>
> > > same as case 2.
> > >
> > > Consider those cases: I think:
> > > a> relying purely on userspace for enforcement does't seem to be
> > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > executable memory.
> >
> > As Steve explained (and is also explained in the patches), it is trivial
> > if the attacker can already execute its own code, which is too late to
> > enforce any script execution control.
> >
> > > b> if both user space and kernel need to call AT_CHECK, the faccessat
> > > seems to be a better place for AT_CHECK, e.g. kernel can call
> > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> > > avoid complicating the execveat() code path.
> >
> > A previous version of this patches series already patched faccessat(2),
> > but this is not the right place.  faccessat2(2) is dedicated to check
> > file permissions, not executability (e.g. with mount's noexec).
> >
> > >
> > > What do you think ?
> >
> > I think there are some misunderstandings.  Please let me know if it's
> > clearer now.
> >
> I'm still not sure about the user case for dynamic linker (elf
> loading) case. Maybe this patch is more suitable for scripts?

It's suitable for both, but we could add more restriction on mmap
with an (existing) LSM.  The kernel already checks for mount's noexec
when mapping a file, but not for the file permission, which is OK
because it could be bypassed by coping the content of the file and
mprotecting it anyway.  For a consistent memory execution control, all
memory mapping need to be restricted, which is out of scope for this
patch series.

> A detailed user case will help demonstrate the use case for dynamic
> linker, e.g. what kind of app will benefit from
> SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> dealing with , what kind of attack chain we blocked as a result.

I explained that in the patches and in the description of these new
securebits.  Please point which part is not clear.  The full threat
model is simple: the TCB includes the kernel and system's files, which
are integrity-protected, but we don't trust arbitrary data/scripts that
can be written to user-owned files or directly provided to script
interpreters.  As for the ptrace restrictions, the dynamic linker
restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD)
with consistent executability checks.

> 
> > >
> > > Thanks
> > > -Jeff
> > >
> > > > With the information that a script interpreter is about to interpret a
> > > > script, an LSM security policy can adjust caller's access rights or log
> > > > execution request as for native script execution (e.g. role transition).
> > > > This is possible thanks to the call to security_bprm_creds_for_exec().
> > > >
> > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > > > current kernel code should not be a security issue (e.g. unexpected role
> > > > transition).  LSMs willing to update the caller's credential could now
> > > > do so when bprm->is_check is set.  Of course, such policy change should
> > > > be in line with the new user space code.
> > > >
> > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > > make sense for the kernel to parse the checked files, look for
> > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > never called when AT_CHECK is used.
> > > >
> > > > It should be noted that script interpreters cannot directly use
> > > > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > > > executed to interpret the script.  Unlike the kernel, script
> > > > interpreters may just interpret the shebang as a simple comment, which
> > > > should not change for backward compatibility reasons.
> > > >
> > > > Because scripts or libraries files might not currently have the
> > > > executable permission set, or because we might want specific users to be
> > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > > > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> > > >
> > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > This patch has been used for more than a decade with customized script
> > > > interpreters.  Some examples can be found here:
> > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > >
> > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > > > ---
> > > >
> > > > New design since v18:
> > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > ---
> > > >  fs/exec.c                  |  5 +++--
> > > >  include/linux/binfmts.h    |  7 ++++++-
> > > >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> > > >  kernel/audit.h             |  1 +
> > > >  kernel/auditsc.c           |  1 +
> > > >  5 files changed, 41 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > index 40073142288f..ea2a1867afdc 100644
> > > > --- a/fs/exec.c
> > > > +++ b/fs/exec.c
> > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > >         };
> > > >
> > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> > > >                 return ERR_PTR(-EINVAL);
> > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > >                 bprm->filename = bprm->fdpath;
> > > >         }
> > > >         bprm->interp = bprm->filename;
> > > > +       bprm->is_check = !!(flags & AT_CHECK);
> > > >
> > > >         retval = bprm_mm_init(bprm);
> > > >         if (!retval)
> > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > >
> > > >         /* Set the unchanging part of bprm->cred */
> > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > -       if (retval)
> > > > +       if (retval || bprm->is_check)
> > > >                 goto out;
> > > >
> > > >         retval = exec_binprm(bprm);
> > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > index 70f97f685bff..8ff9c9e33aed 100644
> > > > --- a/include/linux/binfmts.h
> > > > +++ b/include/linux/binfmts.h
> > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > >                  * Set when errors can no longer be returned to the
> > > >                  * original userspace.
> > > >                  */
> > > > -               point_of_no_return:1;
> > > > +               point_of_no_return:1,
> > > > +               /*
> > > > +                * Set by user space to check executability according to the
> > > > +                * caller's environment.
> > > > +                */
> > > > +               is_check:1;
> > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > >         struct file *interpreter;
> > > >         struct file *file;
> > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > index c0bcc185fa48..bcd05c59b7df 100644
> > > > --- a/include/uapi/linux/fcntl.h
> > > > +++ b/include/uapi/linux/fcntl.h
> > > > @@ -118,6 +118,36 @@
> > > >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> > > >                                         compare object identity and may not
> > > >                                         be usable to open_by_handle_at(2) */
> > > > +
> > > > +/*
> > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > > > + * of this file would be allowed, ignoring the file format and then the related
> > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > > > + * thread.  See securebits.h documentation.
> > > > + *
> > > > + * Programs should use this check to apply kernel-level checks against files
> > > > + * that are not directly executed by the kernel but directly passed to a user
> > > > + * space interpreter instead.  All files that contain executable code, from the
> > > > + * point of view of the interpreter, should be checked.  The main purpose of
> > > > + * this flag is to improve the security and consistency of an execution
> > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > > > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > > > + * instance, this can be used to check if a file is trustworthy according to
> > > > + * the caller's environment.
> > > > + *
> > > > + * In a secure environment, libraries and any executable dependencies should
> > > > + * also be checked.  For instance dynamic linking should make sure that all
> > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > > > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > > > + * trusted code should be executable, which also requires integrity guarantees.
> > > > + *
> > > > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > > > + * descriptor instead of a path.
> > > > + */
> > > > +#define AT_CHECK               0x10000
> > > > +
> > > >  #if defined(__KERNEL__)
> > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > >  #endif
> > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > --- a/kernel/audit.h
> > > > +++ b/kernel/audit.h
> > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > >                 struct open_how openat2;
> > > >                 struct {
> > > >                         int                     argc;
> > > > +                       bool                    is_check;
> > > >                 } execve;
> > > >                 struct {
> > > >                         char                    *name;
> > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > index 6f0d6fb6523f..b6316e284342 100644
> > > > --- a/kernel/auditsc.c
> > > > +++ b/kernel/auditsc.c
> > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > >
> > > >         context->type = AUDIT_EXECVE;
> > > >         context->execve.argc = bprm->argc;
> > > > +       context->execve.is_check = bprm->is_check;
> > > >  }
> > > >
> > > >
> > > > --
> > > > 2.45.2
> > > >
> > >
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-17 17:59             ` Boris Lukashev
@ 2024-07-18 13:00               ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-18 13:00 UTC (permalink / raw)
  To: Boris Lukashev
  Cc: James Bottomley, Roberto Sassu, Mimi Zohar, Al Viro,
	Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Wed, Jul 17, 2024 at 01:59:22PM -0400, Boris Lukashev wrote:
> Apologies, sent from phone so plain-text wasn't flying.
> To elaborate a bit on the quick commentary there - i'm the happy
> camper behind most of the SSL shells, SSH stuff, AWS shells, and so on
> in Metasploit. So please take the following with a grain of
> tinfoil-hat salt as i'm well aware that there is no perfect defense
> against these things which covers all bases while permitting any level
> of sane operation in a general-purpose linux system (also work w/
> GrapheneOS which is a far more suitable context for this sort of
> thing). Having loosely followed the discussion thread, my offsec-brain
> $0.02 are:
> 
> Shells are the provenance of the post-exploitation world - it's what
> we want to get as a result of the exploit succeeding. So i think we
> want to keep clear delineation between exploit and post-exp mitigation
> as they're actually separate concerns of the killchain.

Indeed.  The goal of this patch series is to control executable code, so
mostly to make exploitation more difficult. When an attacker can execute
code (e.g. with ROP), execution control is already bypassed.

> 1. Command shells tend to differentiate from interpreted or binary
> execution environments in their use of POSIX file descriptor
> primitives such as pipes. How those are marshalled, chained, and
> maintained (in a loop or whatever, hiding args, etc) are the only real
> IOCs available at this tier for interdiction as observation of data
> flow through the pipes is too onerous and complex.

I agree. Only files can reliably be inspected.

> Target systems vary
> in the post-exp surfaces exposed (/dev/tcp for example) with the
> mechanics of that exposure necessitating adaptation of marshalling,
> chaining, and maintenance to fit the environment; but the basic
> premise of what forms a command shell cannot be mitigated without
> breaking POSIX mechanics themselves - offsec devs are no different
> from anyone else, we want our code to utilize architectural primitives
> instead of undefined behavior for longevity and ecosystem
> persistence/relevance.
> 2. The conversation about interpreted languages is probably a dead-end
> unless you want to neuter the interpreter - check out Spencer
> McIntyre's work re Python meterpreter or HDs/mine/etc on the PHP side.
> The stagers, loaded contexts, execution patterns, etc are all
> trivially modified to avoid detection (private versions not submitted
> for free ripping by lazy commercial entities to the FOSS ecosystem,
> yet). Dynamic code loading of interpreted languages is trivial and
> requires no syscalls, just text/serialized IL/etc. The complexity of
> loaded context available permits much more advanced functionality than
> we get in most basic command interpreter shells - <advanced evasions
> go here before doing something that'll get you caught> sort of thing.

Right, if attackers can bring their own code (or even do ROP), it
doesn't matter what it interprets, its arbitrary code execution.

> 3. Lastly, binary payloads such as Mettle have their own advantages re
> portability, skipping over libc, etc but need to be "harnessed-in"
> from say a command-injection exploit via memfd or similar. We haven't
> published our memfd stagers while the relevant sysctl gets adopted
> more widely, but we've had them for a long time (meaning real bad guys
> have as well) and have other ways to get binary content into
> executable memory or make memory containing it executable
> (to-the-gills Grsec/PaX systems notwithstanding). IMO, interdiction of
> the harnessed injection from a command context is the last time when
> anything of use can be done at this layer unless we're sure that we
> can trace all related and potentially async (not within the process
> tree anyway) syscalls emanating from what happens next. Subsequent
> actions are separate "remedial" workflows which is a wholly separate
> philosophical discussion about how to handle having been compromised
> already.

Indeed, there are some prerequisites for a secure system.  In this case
we trust all the system-installed executable code.  If attackers can
fill a memfd with arbitrary code, it means that they already have code
execution.  This patch series will help mitigate some ways to get code
execution.

> 
> Security is very much not binary and in that vein of logic i think
> that we should probably define our shades of gray as ranges of what we
> want to protect/how and at what operational cost to then permit
> "dial-in" knobs to actually garner adoption from a broad range of
> systems outside the "real hardened efforts." At some point this turns
> into "limit users to sftp or git shells" which is a perfectly valid
> approach when the context permits that level of draconian restriction
> but the architectural breakdown of "native command, interpreted
> context, fully binary" shell types is pretty universal with new ones
> being API access into runtimes of clouds (SSM/serial/etc) which have
> their own set of limitations at execution and interface layers.
> Organizing defensive functions to handle the primitives necessary for
> each of these shell classes would likely help stratify/simplify this
> conversation and allow for more granular tasking toward those specific
> objectives.

Thanks for the discussion.  I agree, but the difficulty with this patch
series is that it brings a simple *building block*.  Of course, this
will definitely not be enough to secure any systems, but it will fill a
gap in some secure systems, and it could also harden more generic
systems (e.g. restricted system services which should not need shell
access).  I listed some examples with the new securebits proposal:
https://lore.kernel.org/all/20240710.eiKohpa4Phai@digikod.net/

> 
> Thanks,
> -Boris
> 
> 
> On Tue, Jul 16, 2024 at 1:48 PM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > (adding back other people in Cc)
> >
> > On Tue, Jul 16, 2024 at 01:29:43PM -0400, Boris Lukashev wrote:
> > > Wouldn't count those shell chickens - awk alone is enough and we can
> > > use ssh and openssl clients (all in metasploit public code). As one of
> > > the people who makes novel shell types, I can assure you that this
> > > effort is only going to slow skiddies and only until the rest of us
> > > publish mitigations for this mitigation :)
> >
> > Security is not binary. :)
> >
> > Not all Linux systems are equals. Some hardened systems need this kind
> > of feature and they can get guarantees because they fully control and
> > trust their executable binaries (e.g. CLIP OS, chromeOS) or they
> > properly sandbox them.  See context in the cover letter.
> >
> > awk is a script interpreter that should be patched too, like other Linux
> > tools.
> >
> > >
> > > -Boris (RageLtMan)
> > >
> > > On July 16, 2024 12:12:49 PM EDT, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > > >On Tue, 2024-07-16 at 17:57 +0200, Roberto Sassu wrote:
> > > >> But the Clip OS 4 patch does not cover the redirection case:
> > > >>
> > > >> # ./bash < /root/test.sh
> > > >> Hello World
> > > >>
> > > >> Do you have a more recent patch for that?
> > > >
> > > >How far down the rabbit hole do you want to go?  You can't forbid a
> > > >shell from executing commands from stdin because logging in then won't
> > > >work.  It may be possible to allow from a tty backed file and not from
> > > >a file backed one, but you still have the problem of the attacker
> > > >manually typing in the script.
> > > >
> > > >The saving grace for this for shells is that they pretty much do
> > > >nothing on their own (unlike python) so you can still measure all the
> > > >executables they call out to, which provides reasonable safety.
> > > >
> > > >James
> > > >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18 12:24         ` Mickaël Salaün
@ 2024-07-18 13:03           ` James Bottomley
  2024-07-18 15:35             ` Mickaël Salaün
  2024-07-19  1:29           ` Jeff Xu
  2024-07-19 15:12           ` Jeff Xu
  2 siblings, 1 reply; 103+ messages in thread
From: James Bottomley @ 2024-07-18 13:03 UTC (permalink / raw)
  To: Mickaël Salaün, Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Thu, 2024-07-18 at 14:24 +0200, Mickaël Salaün wrote:
> On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net>
> > wrote:
> > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
[...]
> > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK)
> > > > in different use cases:
> > > > 
> > > > execveat clearly has less code change, but that also means: we
> > > > can't add logic specific to exec (i.e. logic that can't be
> > > > applied to config) for this part (from do_execveat_common to
> > > > security_bprm_creds_for_exec) in future.  This would require
> > > > some agreement/sign-off, I'm not sure from whom.
> > > 
> > > I'm not sure to follow. We could still add new flags, but for now
> > > I don't see use cases.  This patch series is not meant to handle
> > > all possible "trust checks", only executable code, which makes
> > > sense for the kernel.
> > > 
> > I guess the "configfile" discussion is where I get confused, at one
> > point, I think this would become a generic "trust checks" api for
> > everything related to "generating executable code", e.g.
> > javascript, java code, and more. We will want to clearly define the
> > scope of execveat(AT_CHECK)
> 
> The line between data and code is blurry.  For instance, a
> configuration file can impact the execution flow of a program.  So,
> where to draw the line?

Having a way to have config files part of the trusted envelope, either
by signing or measurement would be really useful.  The current standard
distro IMA deployment is signed executables, but not signed config
because it's hard to construct a policy that doesn't force the signing
of too many extraneous files (and files which might change often).

> It might makes sense to follow the kernel and interpreter semantic:
> if a file can be executed by the kernel (e.g. ELF binary, file
> containing a shebang, or just configured with binfmt_misc), then this
> should be considered as executable code.  This applies to Bash,
> Python, Javascript, NodeJS, PE, PHP...  However, we can also make a
> picture executable with binfmt_misc.  So, again, where to draw the
> line?

Possibly by making open for config an indication executables can give?
I'm not advocating doing it in this patch, but if we had an open for
config indication, the LSMs could do much finer grained policy,
especially if they knew which executable was trying to open the config
file.  It would allow things like an IMA policy saying if a signed
executable is opening a config file, then that file must also be
signed.

James


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-06 14:56         ` Mickaël Salaün
@ 2024-07-18 14:16           ` Roberto Sassu
  2024-07-18 16:20             ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Roberto Sassu @ 2024-07-18 14:16 UTC (permalink / raw)
  To: Mickaël Salaün, Kees Cook
  Cc: Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Sat, 2024-07-06 at 16:56 +0200, Mickaël Salaün wrote:
> On Fri, Jul 05, 2024 at 02:44:03PM -0700, Kees Cook wrote:
> > On Fri, Jul 05, 2024 at 07:54:16PM +0200, Mickaël Salaün wrote:
> > > On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote:
> > > > On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote:
> > > > > Such a secure environment can be achieved with an appropriate access
> > > > > control policy (e.g. mount's noexec option, file access rights, LSM
> > > > > configuration) and an enlighten ld.so checking that libraries are
> > > > > allowed for execution e.g., to protect against illegitimate use of
> > > > > LD_PRELOAD.
> > > > > 
> > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin,
> > > > > environment variables), but that is outside the scope of the kernel.
> > > > 
> > > > If the threat model includes an attacker sitting at a shell prompt, we
> > > > need to be very careful about how process perform enforcement. E.g. even
> > > > on a locked down system, if an attacker has access to LD_PRELOAD or a
> > > 
> > > LD_PRELOAD should be OK once ld.so will be patched to check the
> > > libraries.  We can still imagine a debug library used to bypass security
> > > checks, but in this case the issue would be that this library is
> > > executable in the first place.
> > 
> > Ah yes, that's fair: the shell would discover the malicious library
> > while using AT_CHECK during resolution of the LD_PRELOAD.
> 
> That's the idea, but it would be checked by ld.so, not the shell.
> 
> > 
> > > > seccomp wrapper (which you both mention here), it would be possible to
> > > > run commands where the resulting process is tricked into thinking it
> > > > doesn't have the bits set.
> > > 
> > > As explained in the UAPI comments, all parent processes need to be
> > > trusted.  This meeans that their code is trusted, their seccomp filters
> > > are trusted, and that they are patched, if needed, to check file
> > > executability.
> > 
> > But we have launchers that apply arbitrary seccomp policy, e.g. minijail
> > on Chrome OS, or even systemd on regular distros. In theory, this should
> > be handled via other ACLs.
> 
> Processes running with untrusted seccomp filter should be considered
> untrusted.  It would then make sense for these seccomp filters/programs
> to be considered executable code, and then for minijail and systemd to
> check them with AT_CHECK (according to the securebits policy).
> 
> > 
> > > > But this would be exactly true for calling execveat(): LD_PRELOAD or
> > > > seccomp policy could have it just return 0.
> > > 
> > > If an attacker is allowed/able to load an arbitrary seccomp filter on a
> > > process, we cannot trust this process.
> > > 
> > > > 
> > > > While I like AT_CHECK, I do wonder if it's better to do the checks via
> > > > open(), as was originally designed with O_MAYEXEC. Because then
> > > > enforcement is gated by the kernel -- the process does not get a file
> > > > descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into
> > > > doing.
> > > 
> > > Being able to check a path name or a file descriptor (with the same
> > > syscall) is more flexible and cover more use cases.
> > 
> > If flexibility costs us reliability, I think that flexibility is not
> > a benefit.
> 
> Well, it's a matter of letting user space do what they think is best,
> and I think there are legitimate and safe uses of path names, even if I
> agree that this should not be used in most use cases.  Would we want
> faccessat2(2) to only take file descriptor as argument and not file
> path? I don't think so but I'd defer to the VFS maintainers.
> 
> Christian, Al, Linus?
> 
> Steve, could you share a use case with file paths?
> 
> > 
> > > The execveat(2)
> > > interface, including current and future flags, is dedicated to file
> > > execution.  I then think that using execveat(2) for this kind of check
> > > makes more sense, and will easily evolve with this syscall.
> > 
> > Yeah, I do recognize that is feels much more natural, but I remain
> > unhappy about how difficult it will become to audit a system for safety
> > when the check is strictly per-process opt-in, and not enforced by the
> > kernel for a given process tree. But, I think this may have always been
> > a fiction in my mind. :)
> 
> Hmm, I'm not sure to follow. Securebits are inherited, so process tree.
> And we need the parent processes to be trusted anyway.
> 
> > 
> > > > And this thinking also applies to faccessat() too: if a process can be
> > > > tricked into thinking the access check passed, it'll happily interpret
> > > > whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC
> > > > is being checked seems substantially safer to me...
> > > 
> > > If attackers can filter execveat(2), they can also filter open(2) and
> > > any other syscalls.  In all cases, that would mean an issue in the
> > > security policy.
> > 
> > Hm, as in, make a separate call to open(2) without O_MAYEXEC, and pass
> > that fd back to the filtered open(2) that did have O_MAYEXEC. Yes, true.
> > 
> > I guess it does become morally equivalent.
> > 
> > Okay. Well, let me ask about usability. Right now, a process will need
> > to do:
> > 
> > - should I use AT_CHECK? (check secbit)
> > - if yes: perform execveat(AT_CHECK)
> > 
> > Why not leave the secbit test up to the kernel, and then the program can
> > just unconditionally call execveat(AT_CHECK)?
> 
> That was kind of the approach of the previous patch series and Linus
> wanted the new interface to follow the kernel semantic.  Enforcing this
> kind of restriction will always be the duty of user space anyway, so I
> think it's simpler (i.e. no mix of policy definition, access check, and
> policy enforcement, but a standalone execveat feature), more flexible,
> and it fully delegates the policy enforcement to user space instead of
> trying to enforce some part in the kernel which would only give the
> illusion of security/policy enforcement.

A problem could be that from IMA perspective there is no indication on
whether the interpreter executed or not execveat(). Sure, we can detect
that the binary supports it, but if the enforcement was
enabled/disabled that it is not recorded.

Maybe, setting the process flags should be influenced by the kernel,
for example not allowing changes and enforcing when there is an IMA
policy loaded requiring to measure/appraise scripts.

Roberto

> > 
> > Though perhaps the issue here is that an execveat() EINVAL doesn't
> > tell the program if AT_CHECK is unimplemented or if something else
> > went wrong, and the secbit prctl() will give the correct signal about
> > AT_CHECK availability?
> 
> This kind of check could indeed help to identify the issue.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18  2:08       ` Jeff Xu
  2024-07-18 12:24         ` Mickaël Salaün
@ 2024-07-18 14:46         ` enh
  2024-07-18 15:35           ` Mickaël Salaün
  1 sibling, 1 reply; 103+ messages in thread
From: enh @ 2024-07-18 14:46 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Wed, Jul 17, 2024 at 10:08 PM Jeff Xu <jeffxu@google.com> wrote:
>
> On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > allowed for execution.  The main use case is for script interpreters and
> > > > dynamic linkers to check execution permission according to the kernel's
> > > > security policy. Another use case is to add context to access logs e.g.,
> > > > which script (instead of interpreter) accessed a file.  As any
> > > > executable code, scripts could also use this check [1].
> > > >
> > > > This is different than faccessat(2) which only checks file access
> > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > real execution, user space gets the same error codes.
> > > >
> > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > exec, shared object, script and config file (such as seccomp config),
> >
> > "config file" that contains executable code.
> >
> Is seccomp config  considered as "contains executable code", seccomp
> config is translated into bpf, so maybe yes ? but bpf is running in
> the kernel.
>
> > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > different use cases:
> > >
> > > execveat clearly has less code change, but that also means: we can't
> > > add logic specific to exec (i.e. logic that can't be applied to
> > > config) for this part (from do_execveat_common to
> > > security_bprm_creds_for_exec) in future.  This would require some
> > > agreement/sign-off, I'm not sure from whom.
> >
> > I'm not sure to follow. We could still add new flags, but for now I
> > don't see use cases.  This patch series is not meant to handle all
> > possible "trust checks", only executable code, which makes sense for the
> > kernel.
> >
> I guess the "configfile" discussion is where I get confused, at one
> point, I think this would become a generic "trust checks" api for
> everything related to "generating executable code", e.g. javascript,
> java code, and more.
> We will want to clearly define the scope of execveat(AT_CHECK)
>
> > If we want other checks, we'll need to clearly define their semantic and
> > align with the kernel.  faccessat2(2) might be used to check other file
> > properties, but the executable property is not only defined by the file
> > attributes.
> >
> Agreed.
>
> > >
> > > --------------------------
> > > now looked at user cases (focus on elf for now)
> > >
> > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> > >
> > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> > > /usr/bin/some.out will pass AT_CHECK
> > >
> > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> > > /tmp/ld.so (I assume this is possible  for elf header will set the
> > > path for ld.so because kernel has no knowledge of that, and
> > > binfmt_elf.c allocate memory for ld.so during execveat call)
> > >
> > > 4> dlopen(/tmp/a.so)
> > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> > > into memory.
> > >
> > > For case 1>
> > > Alternative solution: Because AT_CHECK is always called, I think we
> > > can avoid the first AT_CHECK call, and check during execveat(fd),
> >
> > There is no need to use AT_CHECK if we're going to call execveat(2) on
> > the same file descriptor.  By design, AT_CHECK is implicit for any
> > execve(2).
> >
> Yes. I realized I was wrong to say that ld.so will call execve() for
> /tmp/a.out, there is no execve() call, otherwise it would have been
> blocked already today.
> The ld.so will  mmap the /tmp/a.out directly.  So case 1 is no
> different than case 2 and 4.  ( the elf objects are mapped to memory
> by dynamic linker.)
> I'm not familiar with dynamic linker, Florian is on this thread, and
> can help to correct me if my guess is wrong.

for Android, this has been the nail in the coffin of previous attempts
to disallow running code from non-trusted filesystems --- instead of
execing /tmp/a.out, the attacker just execs the linker with /tmp/a.out
as an argument. people are doing this already in some cases, because
we already have ineffectual "barriers" in place. [the usual argument
for doing such things anyway is "it makes it harder to be doing this
by _accident_".]

the other workaround for the attacker is to copy and paste the entire
dynamic linker source and change the bits they don't like :-) (if
you're thinking "is that a thing?", yes, so much so that the idea has
been independently reinvented multiple times by several major legit
apps and by basically every piece of DRM middleware. which is why --
although i'm excited by mseal(2) -- i expect to face significant
challenges rolling it out in Android _especially_ in places like
"dynamic linker internal data structures" where i've wanted it for
years!)

this proposal feels like it _ought_ to let a defender tighten their
seccomp filter to require a "safe" fd if i'm using mmap() with an fd,
but in practice as long as JITs exist i can always just copy code into
a non-fd-backed mmap() region. and -- from the perspective of Android,
where all "apps" are code loaded into a Java runtime -- there's not
much getting away from JITs. (and last i looked, ART -- Android's Java
runtime -- uses memfd() for the JIT cache.)

> > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> > > benefit is that there is no TOCTOU and save one round trip of syscall
> > > for a succesful execveat() case.
> >
> > As long as user space uses the same file descriptor, there is no TOCTOU.
> >
> > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
> > the user space security policy.  The kernel already enforces the same
> > security policy for any execve(2), whatever are the calling process's
> > securebits.
> >
> > >
> > > For case 2>
> > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> > > However,  the process can all open then mmap() directly, it seems
> > > minimal effort for an attacker to walk around such a defence from
> > > dynamic linker.
> >
> > Which process?  What do you mean by "can all open then mmap() directly"?
> >
> > In this context the dynamic linker (like its parent processes) is
> > trusted (guaranteed by the system).
> >
> > For case 2, the dynamic linker must check with AT_CHECK all files that
> > will be mapped, which include /usr/bin/some.out and /tmp/a.so
> >
> My point is that the process can work around this by mmap() the file directly.
>
> > >
> > > Alternative solution:
> > > dynamic linker call AT_CHECK for each .so, kernel will save the state
> > > (associated with fd)
> > > kernel will check fd state at the time of mmap(fd, executable memory)
> > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1
> >
> > The idea with AT_CHECK is that there is no kernel side effect, no extra
> > kernel state, and the semantic is the same as with execve(2).
> >
> > This also enables us to check file's executable permission and ignore
> > it, which is useful in a "permissive mode" when preparing for a
> > migration without breaking a system, or to do extra integrity checks.
> For preparing a migration (detect all violations), this is useful.
> But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this
> seems to be weak, at least for elf loading case.
>
> > BTW, this use case would also be more complex with a new openat2(2) flag
> > like the original O_MAYEXEC.
> >
> > >
> > > Alternative solution 2:
> > > a new syscall to load the .so and enforce the AT_CHECK in kernel
> >
> > A new syscall would be overkill for this feature.  Please see Linus's
> > comment.
> >
> maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap()
> to executable memory.
>
> > >
> > > This also means, for the solution to be complete, we might want to
> > > block creation of executable anonymous memory (e.g. by seccomp, ),
> >
> > How seccomp could create anonymous memory in user space?
> > seccomp filters should be treated (and checked with AT_CHECK) as
> > executable code anyway.
> >
> > > unless the user space can harden the creation of  executable anonymous
> > > memory in some way.
> >
> > User space is already in charge of mmapping its own memory.  I don't see
> > what is missing.
> >
> > >
> > > For case 3>
> > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > sure it passes AT_CHECK, before loading it into memory.
> >
> > All ELF dependencies are opened and checked with open_exec(), which
> > perform the main executability checks (with the __FMODE_EXEC flag).
> > Did I miss something?
> >
> I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> The app can choose its own dynamic linker path during build, (maybe
> even statically link one ?)  This is another reason that relying on a
> userspace only is not enough.
>
> > However, we must be careful with programs using the (deprecated)
> > uselib(2). They should also check with AT_CHECK because this syscall
> > opens the shared library without __FMODE_EXEC (similar to a simple file
> > open). See
> > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/
> >
> > >
> > > For case 4>
> > > same as case 2.
> > >
> > > Consider those cases: I think:
> > > a> relying purely on userspace for enforcement does't seem to be
> > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > executable memory.
> >
> > As Steve explained (and is also explained in the patches), it is trivial
> > if the attacker can already execute its own code, which is too late to
> > enforce any script execution control.
> >
> > > b> if both user space and kernel need to call AT_CHECK, the faccessat
> > > seems to be a better place for AT_CHECK, e.g. kernel can call
> > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> > > avoid complicating the execveat() code path.
> >
> > A previous version of this patches series already patched faccessat(2),
> > but this is not the right place.  faccessat2(2) is dedicated to check
> > file permissions, not executability (e.g. with mount's noexec).
> >
> > >
> > > What do you think ?
> >
> > I think there are some misunderstandings.  Please let me know if it's
> > clearer now.
> >
> I'm still not sure about the user case for dynamic linker (elf
> loading) case. Maybe this patch is more suitable for scripts?
> A detailed user case will help demonstrate the use case for dynamic
> linker, e.g. what kind of app will benefit from
> SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> dealing with , what kind of attack chain we blocked as a result.
>
> > >
> > > Thanks
> > > -Jeff
> > >
> > > > With the information that a script interpreter is about to interpret a
> > > > script, an LSM security policy can adjust caller's access rights or log
> > > > execution request as for native script execution (e.g. role transition).
> > > > This is possible thanks to the call to security_bprm_creds_for_exec().
> > > >
> > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > > > current kernel code should not be a security issue (e.g. unexpected role
> > > > transition).  LSMs willing to update the caller's credential could now
> > > > do so when bprm->is_check is set.  Of course, such policy change should
> > > > be in line with the new user space code.
> > > >
> > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > > make sense for the kernel to parse the checked files, look for
> > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > never called when AT_CHECK is used.
> > > >
> > > > It should be noted that script interpreters cannot directly use
> > > > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > > > executed to interpret the script.  Unlike the kernel, script
> > > > interpreters may just interpret the shebang as a simple comment, which
> > > > should not change for backward compatibility reasons.
> > > >
> > > > Because scripts or libraries files might not currently have the
> > > > executable permission set, or because we might want specific users to be
> > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > > > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> > > >
> > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > This patch has been used for more than a decade with customized script
> > > > interpreters.  Some examples can be found here:
> > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > >
> > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > > > ---
> > > >
> > > > New design since v18:
> > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > ---
> > > >  fs/exec.c                  |  5 +++--
> > > >  include/linux/binfmts.h    |  7 ++++++-
> > > >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> > > >  kernel/audit.h             |  1 +
> > > >  kernel/auditsc.c           |  1 +
> > > >  5 files changed, 41 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > index 40073142288f..ea2a1867afdc 100644
> > > > --- a/fs/exec.c
> > > > +++ b/fs/exec.c
> > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > >         };
> > > >
> > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> > > >                 return ERR_PTR(-EINVAL);
> > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > >                 bprm->filename = bprm->fdpath;
> > > >         }
> > > >         bprm->interp = bprm->filename;
> > > > +       bprm->is_check = !!(flags & AT_CHECK);
> > > >
> > > >         retval = bprm_mm_init(bprm);
> > > >         if (!retval)
> > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > >
> > > >         /* Set the unchanging part of bprm->cred */
> > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > -       if (retval)
> > > > +       if (retval || bprm->is_check)
> > > >                 goto out;
> > > >
> > > >         retval = exec_binprm(bprm);
> > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > index 70f97f685bff..8ff9c9e33aed 100644
> > > > --- a/include/linux/binfmts.h
> > > > +++ b/include/linux/binfmts.h
> > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > >                  * Set when errors can no longer be returned to the
> > > >                  * original userspace.
> > > >                  */
> > > > -               point_of_no_return:1;
> > > > +               point_of_no_return:1,
> > > > +               /*
> > > > +                * Set by user space to check executability according to the
> > > > +                * caller's environment.
> > > > +                */
> > > > +               is_check:1;
> > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > >         struct file *interpreter;
> > > >         struct file *file;
> > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > index c0bcc185fa48..bcd05c59b7df 100644
> > > > --- a/include/uapi/linux/fcntl.h
> > > > +++ b/include/uapi/linux/fcntl.h
> > > > @@ -118,6 +118,36 @@
> > > >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> > > >                                         compare object identity and may not
> > > >                                         be usable to open_by_handle_at(2) */
> > > > +
> > > > +/*
> > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > > > + * of this file would be allowed, ignoring the file format and then the related
> > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > > > + * thread.  See securebits.h documentation.
> > > > + *
> > > > + * Programs should use this check to apply kernel-level checks against files
> > > > + * that are not directly executed by the kernel but directly passed to a user
> > > > + * space interpreter instead.  All files that contain executable code, from the
> > > > + * point of view of the interpreter, should be checked.  The main purpose of
> > > > + * this flag is to improve the security and consistency of an execution
> > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > > > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > > > + * instance, this can be used to check if a file is trustworthy according to
> > > > + * the caller's environment.
> > > > + *
> > > > + * In a secure environment, libraries and any executable dependencies should
> > > > + * also be checked.  For instance dynamic linking should make sure that all
> > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > > > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > > > + * trusted code should be executable, which also requires integrity guarantees.
> > > > + *
> > > > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > > > + * descriptor instead of a path.
> > > > + */
> > > > +#define AT_CHECK               0x10000
> > > > +
> > > >  #if defined(__KERNEL__)
> > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > >  #endif
> > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > --- a/kernel/audit.h
> > > > +++ b/kernel/audit.h
> > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > >                 struct open_how openat2;
> > > >                 struct {
> > > >                         int                     argc;
> > > > +                       bool                    is_check;
> > > >                 } execve;
> > > >                 struct {
> > > >                         char                    *name;
> > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > index 6f0d6fb6523f..b6316e284342 100644
> > > > --- a/kernel/auditsc.c
> > > > +++ b/kernel/auditsc.c
> > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > >
> > > >         context->type = AUDIT_EXECVE;
> > > >         context->execve.argc = bprm->argc;
> > > > +       context->execve.is_check = bprm->is_check;
> > > >  }
> > > >
> > > >
> > > > --
> > > > 2.45.2
> > > >
> > >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18 13:03           ` James Bottomley
@ 2024-07-18 15:35             ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-18 15:35 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jeff Xu, Al Viro, Christian Brauner, Kees Cook, Linus Torvalds,
	Paul Moore, Theodore Ts'o, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module,
	Elliott Hughes

On Thu, Jul 18, 2024 at 09:03:36AM -0400, James Bottomley wrote:
> On Thu, 2024-07-18 at 14:24 +0200, Mickaël Salaün wrote:
> > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net>
> > > wrote:
> > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> [...]
> > > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK)
> > > > > in different use cases:
> > > > > 
> > > > > execveat clearly has less code change, but that also means: we
> > > > > can't add logic specific to exec (i.e. logic that can't be
> > > > > applied to config) for this part (from do_execveat_common to
> > > > > security_bprm_creds_for_exec) in future.  This would require
> > > > > some agreement/sign-off, I'm not sure from whom.
> > > > 
> > > > I'm not sure to follow. We could still add new flags, but for now
> > > > I don't see use cases.  This patch series is not meant to handle
> > > > all possible "trust checks", only executable code, which makes
> > > > sense for the kernel.
> > > > 
> > > I guess the "configfile" discussion is where I get confused, at one
> > > point, I think this would become a generic "trust checks" api for
> > > everything related to "generating executable code", e.g.
> > > javascript, java code, and more. We will want to clearly define the
> > > scope of execveat(AT_CHECK)
> > 
> > The line between data and code is blurry.  For instance, a
> > configuration file can impact the execution flow of a program.  So,
> > where to draw the line?
> 
> Having a way to have config files part of the trusted envelope, either
> by signing or measurement would be really useful.  The current standard
> distro IMA deployment is signed executables, but not signed config
> because it's hard to construct a policy that doesn't force the signing
> of too many extraneous files (and files which might change often).
> 
> > It might makes sense to follow the kernel and interpreter semantic:
> > if a file can be executed by the kernel (e.g. ELF binary, file
> > containing a shebang, or just configured with binfmt_misc), then this
> > should be considered as executable code.  This applies to Bash,
> > Python, Javascript, NodeJS, PE, PHP...  However, we can also make a
> > picture executable with binfmt_misc.  So, again, where to draw the
> > line?
> 
> Possibly by making open for config an indication executables can give?
> I'm not advocating doing it in this patch, but if we had an open for
> config indication, the LSMs could do much finer grained policy,
> especially if they knew which executable was trying to open the config
> file.  It would allow things like an IMA policy saying if a signed
> executable is opening a config file, then that file must also be
> signed.

Checking configuration could be a next step, but not with this patch
series.  FYI, the previous version was a (too) generic syscall:
https://lore.kernel.org/all/20220104155024.48023-1-mic@digikod.net/
One of the main concern was alignment with kernel semantic.  For now,
let's focus on script execution control.

> 
> James
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18 14:46         ` enh
@ 2024-07-18 15:35           ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-18 15:35 UTC (permalink / raw)
  To: enh
  Cc: Jeff Xu, Al Viro, Christian Brauner, Kees Cook, Linus Torvalds,
	Paul Moore, Theodore Ts'o, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Fan Wu, Florian Weimer, Geert Uytterhoeven, James Morris,
	Jan Kara, Jann Horn, Jonathan Corbet, Jordan R Abrahams,
	Lakshmi Ramasubramanian, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module

On Thu, Jul 18, 2024 at 10:46:54AM -0400, enh wrote:
> On Wed, Jul 17, 2024 at 10:08 PM Jeff Xu <jeffxu@google.com> wrote:
> >
> > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > executable code, scripts could also use this check [1].
> > > > >
> > > > > This is different than faccessat(2) which only checks file access
> > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > real execution, user space gets the same error codes.
> > > > >
> > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > exec, shared object, script and config file (such as seccomp config),
> > >
> > > "config file" that contains executable code.
> > >
> > Is seccomp config  considered as "contains executable code", seccomp
> > config is translated into bpf, so maybe yes ? but bpf is running in
> > the kernel.
> >
> > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > different use cases:
> > > >
> > > > execveat clearly has less code change, but that also means: we can't
> > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > config) for this part (from do_execveat_common to
> > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > agreement/sign-off, I'm not sure from whom.
> > >
> > > I'm not sure to follow. We could still add new flags, but for now I
> > > don't see use cases.  This patch series is not meant to handle all
> > > possible "trust checks", only executable code, which makes sense for the
> > > kernel.
> > >
> > I guess the "configfile" discussion is where I get confused, at one
> > point, I think this would become a generic "trust checks" api for
> > everything related to "generating executable code", e.g. javascript,
> > java code, and more.
> > We will want to clearly define the scope of execveat(AT_CHECK)
> >
> > > If we want other checks, we'll need to clearly define their semantic and
> > > align with the kernel.  faccessat2(2) might be used to check other file
> > > properties, but the executable property is not only defined by the file
> > > attributes.
> > >
> > Agreed.
> >
> > > >
> > > > --------------------------
> > > > now looked at user cases (focus on elf for now)
> > > >
> > > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> > > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> > > >
> > > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> > > > /usr/bin/some.out will pass AT_CHECK
> > > >
> > > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> > > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> > > > /tmp/ld.so (I assume this is possible  for elf header will set the
> > > > path for ld.so because kernel has no knowledge of that, and
> > > > binfmt_elf.c allocate memory for ld.so during execveat call)
> > > >
> > > > 4> dlopen(/tmp/a.so)
> > > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> > > > into memory.
> > > >
> > > > For case 1>
> > > > Alternative solution: Because AT_CHECK is always called, I think we
> > > > can avoid the first AT_CHECK call, and check during execveat(fd),
> > >
> > > There is no need to use AT_CHECK if we're going to call execveat(2) on
> > > the same file descriptor.  By design, AT_CHECK is implicit for any
> > > execve(2).
> > >
> > Yes. I realized I was wrong to say that ld.so will call execve() for
> > /tmp/a.out, there is no execve() call, otherwise it would have been
> > blocked already today.
> > The ld.so will  mmap the /tmp/a.out directly.  So case 1 is no
> > different than case 2 and 4.  ( the elf objects are mapped to memory
> > by dynamic linker.)
> > I'm not familiar with dynamic linker, Florian is on this thread, and
> > can help to correct me if my guess is wrong.
> 
> for Android, this has been the nail in the coffin of previous attempts
> to disallow running code from non-trusted filesystems --- instead of
> execing /tmp/a.out, the attacker just execs the linker with /tmp/a.out
> as an argument. people are doing this already in some cases, because
> we already have ineffectual "barriers" in place. [the usual argument
> for doing such things anyway is "it makes it harder to be doing this
> by _accident_".]

This AT_CHECK and related securebits should cover this case.

> 
> the other workaround for the attacker is to copy and paste the entire
> dynamic linker source and change the bits they don't like :-) (if
> you're thinking "is that a thing?", yes, so much so that the idea has
> been independently reinvented multiple times by several major legit
> apps and by basically every piece of DRM middleware. which is why --
> although i'm excited by mseal(2) -- i expect to face significant
> challenges rolling it out in Android _especially_ in places like
> "dynamic linker internal data structures" where i've wanted it for
> years!)
> 
> this proposal feels like it _ought_ to let a defender tighten their
> seccomp filter to require a "safe" fd if i'm using mmap() with an fd,
> but in practice as long as JITs exist i can always just copy code into
> a non-fd-backed mmap() region. and -- from the perspective of Android,
> where all "apps" are code loaded into a Java runtime -- there's not
> much getting away from JITs. (and last i looked, ART -- Android's Java
> runtime -- uses memfd() for the JIT cache.)

Using the feature brought by this patch series makes sense for trusted
executables willing to enforce a security policy.  Untrusted ones should
be sandboxed, which is the case for Android apps.

> 
> > > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> > > > benefit is that there is no TOCTOU and save one round trip of syscall
> > > > for a succesful execveat() case.
> > >
> > > As long as user space uses the same file descriptor, there is no TOCTOU.
> > >
> > > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
> > > the user space security policy.  The kernel already enforces the same
> > > security policy for any execve(2), whatever are the calling process's
> > > securebits.
> > >
> > > >
> > > > For case 2>
> > > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> > > > However,  the process can all open then mmap() directly, it seems
> > > > minimal effort for an attacker to walk around such a defence from
> > > > dynamic linker.
> > >
> > > Which process?  What do you mean by "can all open then mmap() directly"?
> > >
> > > In this context the dynamic linker (like its parent processes) is
> > > trusted (guaranteed by the system).
> > >
> > > For case 2, the dynamic linker must check with AT_CHECK all files that
> > > will be mapped, which include /usr/bin/some.out and /tmp/a.so
> > >
> > My point is that the process can work around this by mmap() the file directly.
> >
> > > >
> > > > Alternative solution:
> > > > dynamic linker call AT_CHECK for each .so, kernel will save the state
> > > > (associated with fd)
> > > > kernel will check fd state at the time of mmap(fd, executable memory)
> > > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1
> > >
> > > The idea with AT_CHECK is that there is no kernel side effect, no extra
> > > kernel state, and the semantic is the same as with execve(2).
> > >
> > > This also enables us to check file's executable permission and ignore
> > > it, which is useful in a "permissive mode" when preparing for a
> > > migration without breaking a system, or to do extra integrity checks.
> > For preparing a migration (detect all violations), this is useful.
> > But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this
> > seems to be weak, at least for elf loading case.
> >
> > > BTW, this use case would also be more complex with a new openat2(2) flag
> > > like the original O_MAYEXEC.
> > >
> > > >
> > > > Alternative solution 2:
> > > > a new syscall to load the .so and enforce the AT_CHECK in kernel
> > >
> > > A new syscall would be overkill for this feature.  Please see Linus's
> > > comment.
> > >
> > maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap()
> > to executable memory.
> >
> > > >
> > > > This also means, for the solution to be complete, we might want to
> > > > block creation of executable anonymous memory (e.g. by seccomp, ),
> > >
> > > How seccomp could create anonymous memory in user space?
> > > seccomp filters should be treated (and checked with AT_CHECK) as
> > > executable code anyway.
> > >
> > > > unless the user space can harden the creation of  executable anonymous
> > > > memory in some way.
> > >
> > > User space is already in charge of mmapping its own memory.  I don't see
> > > what is missing.
> > >
> > > >
> > > > For case 3>
> > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > sure it passes AT_CHECK, before loading it into memory.
> > >
> > > All ELF dependencies are opened and checked with open_exec(), which
> > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > Did I miss something?
> > >
> > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > The app can choose its own dynamic linker path during build, (maybe
> > even statically link one ?)  This is another reason that relying on a
> > userspace only is not enough.
> >
> > > However, we must be careful with programs using the (deprecated)
> > > uselib(2). They should also check with AT_CHECK because this syscall
> > > opens the shared library without __FMODE_EXEC (similar to a simple file
> > > open). See
> > > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/
> > >
> > > >
> > > > For case 4>
> > > > same as case 2.
> > > >
> > > > Consider those cases: I think:
> > > > a> relying purely on userspace for enforcement does't seem to be
> > > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > > executable memory.
> > >
> > > As Steve explained (and is also explained in the patches), it is trivial
> > > if the attacker can already execute its own code, which is too late to
> > > enforce any script execution control.
> > >
> > > > b> if both user space and kernel need to call AT_CHECK, the faccessat
> > > > seems to be a better place for AT_CHECK, e.g. kernel can call
> > > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> > > > avoid complicating the execveat() code path.
> > >
> > > A previous version of this patches series already patched faccessat(2),
> > > but this is not the right place.  faccessat2(2) is dedicated to check
> > > file permissions, not executability (e.g. with mount's noexec).
> > >
> > > >
> > > > What do you think ?
> > >
> > > I think there are some misunderstandings.  Please let me know if it's
> > > clearer now.
> > >
> > I'm still not sure about the user case for dynamic linker (elf
> > loading) case. Maybe this patch is more suitable for scripts?
> > A detailed user case will help demonstrate the use case for dynamic
> > linker, e.g. what kind of app will benefit from
> > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> > dealing with , what kind of attack chain we blocked as a result.
> >
> > > >
> > > > Thanks
> > > > -Jeff
> > > >
> > > > > With the information that a script interpreter is about to interpret a
> > > > > script, an LSM security policy can adjust caller's access rights or log
> > > > > execution request as for native script execution (e.g. role transition).
> > > > > This is possible thanks to the call to security_bprm_creds_for_exec().
> > > > >
> > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > > > > current kernel code should not be a security issue (e.g. unexpected role
> > > > > transition).  LSMs willing to update the caller's credential could now
> > > > > do so when bprm->is_check is set.  Of course, such policy change should
> > > > > be in line with the new user space code.
> > > > >
> > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > > > make sense for the kernel to parse the checked files, look for
> > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > > never called when AT_CHECK is used.
> > > > >
> > > > > It should be noted that script interpreters cannot directly use
> > > > > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > > > > executed to interpret the script.  Unlike the kernel, script
> > > > > interpreters may just interpret the shebang as a simple comment, which
> > > > > should not change for backward compatibility reasons.
> > > > >
> > > > > Because scripts or libraries files might not currently have the
> > > > > executable permission set, or because we might want specific users to be
> > > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > > > > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> > > > >
> > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > > This patch has been used for more than a decade with customized script
> > > > > interpreters.  Some examples can be found here:
> > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > > >
> > > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > > > > ---
> > > > >
> > > > > New design since v18:
> > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > > ---
> > > > >  fs/exec.c                  |  5 +++--
> > > > >  include/linux/binfmts.h    |  7 ++++++-
> > > > >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> > > > >  kernel/audit.h             |  1 +
> > > > >  kernel/auditsc.c           |  1 +
> > > > >  5 files changed, 41 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > index 40073142288f..ea2a1867afdc 100644
> > > > > --- a/fs/exec.c
> > > > > +++ b/fs/exec.c
> > > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > > >         };
> > > > >
> > > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> > > > >                 return ERR_PTR(-EINVAL);
> > > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > > >                 bprm->filename = bprm->fdpath;
> > > > >         }
> > > > >         bprm->interp = bprm->filename;
> > > > > +       bprm->is_check = !!(flags & AT_CHECK);
> > > > >
> > > > >         retval = bprm_mm_init(bprm);
> > > > >         if (!retval)
> > > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > > >
> > > > >         /* Set the unchanging part of bprm->cred */
> > > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > > -       if (retval)
> > > > > +       if (retval || bprm->is_check)
> > > > >                 goto out;
> > > > >
> > > > >         retval = exec_binprm(bprm);
> > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > > index 70f97f685bff..8ff9c9e33aed 100644
> > > > > --- a/include/linux/binfmts.h
> > > > > +++ b/include/linux/binfmts.h
> > > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > > >                  * Set when errors can no longer be returned to the
> > > > >                  * original userspace.
> > > > >                  */
> > > > > -               point_of_no_return:1;
> > > > > +               point_of_no_return:1,
> > > > > +               /*
> > > > > +                * Set by user space to check executability according to the
> > > > > +                * caller's environment.
> > > > > +                */
> > > > > +               is_check:1;
> > > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > > >         struct file *interpreter;
> > > > >         struct file *file;
> > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > > index c0bcc185fa48..bcd05c59b7df 100644
> > > > > --- a/include/uapi/linux/fcntl.h
> > > > > +++ b/include/uapi/linux/fcntl.h
> > > > > @@ -118,6 +118,36 @@
> > > > >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> > > > >                                         compare object identity and may not
> > > > >                                         be usable to open_by_handle_at(2) */
> > > > > +
> > > > > +/*
> > > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > > > > + * of this file would be allowed, ignoring the file format and then the related
> > > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > > > > + * thread.  See securebits.h documentation.
> > > > > + *
> > > > > + * Programs should use this check to apply kernel-level checks against files
> > > > > + * that are not directly executed by the kernel but directly passed to a user
> > > > > + * space interpreter instead.  All files that contain executable code, from the
> > > > > + * point of view of the interpreter, should be checked.  The main purpose of
> > > > > + * this flag is to improve the security and consistency of an execution
> > > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > > > > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > > > > + * instance, this can be used to check if a file is trustworthy according to
> > > > > + * the caller's environment.
> > > > > + *
> > > > > + * In a secure environment, libraries and any executable dependencies should
> > > > > + * also be checked.  For instance dynamic linking should make sure that all
> > > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > > > > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > > > > + * trusted code should be executable, which also requires integrity guarantees.
> > > > > + *
> > > > > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > > > > + * descriptor instead of a path.
> > > > > + */
> > > > > +#define AT_CHECK               0x10000
> > > > > +
> > > > >  #if defined(__KERNEL__)
> > > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > > >  #endif
> > > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > > --- a/kernel/audit.h
> > > > > +++ b/kernel/audit.h
> > > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > > >                 struct open_how openat2;
> > > > >                 struct {
> > > > >                         int                     argc;
> > > > > +                       bool                    is_check;
> > > > >                 } execve;
> > > > >                 struct {
> > > > >                         char                    *name;
> > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > > index 6f0d6fb6523f..b6316e284342 100644
> > > > > --- a/kernel/auditsc.c
> > > > > +++ b/kernel/auditsc.c
> > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > > >
> > > > >         context->type = AUDIT_EXECVE;
> > > > >         context->execve.argc = bprm->argc;
> > > > > +       context->execve.is_check = bprm->is_check;
> > > > >  }
> > > > >
> > > > >
> > > > > --
> > > > > 2.45.2
> > > > >
> > > >
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-18 14:16           ` Roberto Sassu
@ 2024-07-18 16:20             ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-18 16:20 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: Kees Cook, Al Viro, Christian Brauner, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Thu, Jul 18, 2024 at 04:16:45PM +0200, Roberto Sassu wrote:
> On Sat, 2024-07-06 at 16:56 +0200, Mickaël Salaün wrote:
> > On Fri, Jul 05, 2024 at 02:44:03PM -0700, Kees Cook wrote:
> > > On Fri, Jul 05, 2024 at 07:54:16PM +0200, Mickaël Salaün wrote:
> > > > On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote:
> > > > > On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote:
> > > > > > Such a secure environment can be achieved with an appropriate access
> > > > > > control policy (e.g. mount's noexec option, file access rights, LSM
> > > > > > configuration) and an enlighten ld.so checking that libraries are
> > > > > > allowed for execution e.g., to protect against illegitimate use of
> > > > > > LD_PRELOAD.
> > > > > > 
> > > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin,
> > > > > > environment variables), but that is outside the scope of the kernel.
> > > > > 
> > > > > If the threat model includes an attacker sitting at a shell prompt, we
> > > > > need to be very careful about how process perform enforcement. E.g. even
> > > > > on a locked down system, if an attacker has access to LD_PRELOAD or a
> > > > 
> > > > LD_PRELOAD should be OK once ld.so will be patched to check the
> > > > libraries.  We can still imagine a debug library used to bypass security
> > > > checks, but in this case the issue would be that this library is
> > > > executable in the first place.
> > > 
> > > Ah yes, that's fair: the shell would discover the malicious library
> > > while using AT_CHECK during resolution of the LD_PRELOAD.
> > 
> > That's the idea, but it would be checked by ld.so, not the shell.
> > 
> > > 
> > > > > seccomp wrapper (which you both mention here), it would be possible to
> > > > > run commands where the resulting process is tricked into thinking it
> > > > > doesn't have the bits set.
> > > > 
> > > > As explained in the UAPI comments, all parent processes need to be
> > > > trusted.  This meeans that their code is trusted, their seccomp filters
> > > > are trusted, and that they are patched, if needed, to check file
> > > > executability.
> > > 
> > > But we have launchers that apply arbitrary seccomp policy, e.g. minijail
> > > on Chrome OS, or even systemd on regular distros. In theory, this should
> > > be handled via other ACLs.
> > 
> > Processes running with untrusted seccomp filter should be considered
> > untrusted.  It would then make sense for these seccomp filters/programs
> > to be considered executable code, and then for minijail and systemd to
> > check them with AT_CHECK (according to the securebits policy).
> > 
> > > 
> > > > > But this would be exactly true for calling execveat(): LD_PRELOAD or
> > > > > seccomp policy could have it just return 0.
> > > > 
> > > > If an attacker is allowed/able to load an arbitrary seccomp filter on a
> > > > process, we cannot trust this process.
> > > > 
> > > > > 
> > > > > While I like AT_CHECK, I do wonder if it's better to do the checks via
> > > > > open(), as was originally designed with O_MAYEXEC. Because then
> > > > > enforcement is gated by the kernel -- the process does not get a file
> > > > > descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into
> > > > > doing.
> > > > 
> > > > Being able to check a path name or a file descriptor (with the same
> > > > syscall) is more flexible and cover more use cases.
> > > 
> > > If flexibility costs us reliability, I think that flexibility is not
> > > a benefit.
> > 
> > Well, it's a matter of letting user space do what they think is best,
> > and I think there are legitimate and safe uses of path names, even if I
> > agree that this should not be used in most use cases.  Would we want
> > faccessat2(2) to only take file descriptor as argument and not file
> > path? I don't think so but I'd defer to the VFS maintainers.
> > 
> > Christian, Al, Linus?
> > 
> > Steve, could you share a use case with file paths?
> > 
> > > 
> > > > The execveat(2)
> > > > interface, including current and future flags, is dedicated to file
> > > > execution.  I then think that using execveat(2) for this kind of check
> > > > makes more sense, and will easily evolve with this syscall.
> > > 
> > > Yeah, I do recognize that is feels much more natural, but I remain
> > > unhappy about how difficult it will become to audit a system for safety
> > > when the check is strictly per-process opt-in, and not enforced by the
> > > kernel for a given process tree. But, I think this may have always been
> > > a fiction in my mind. :)
> > 
> > Hmm, I'm not sure to follow. Securebits are inherited, so process tree.
> > And we need the parent processes to be trusted anyway.
> > 
> > > 
> > > > > And this thinking also applies to faccessat() too: if a process can be
> > > > > tricked into thinking the access check passed, it'll happily interpret
> > > > > whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC
> > > > > is being checked seems substantially safer to me...
> > > > 
> > > > If attackers can filter execveat(2), they can also filter open(2) and
> > > > any other syscalls.  In all cases, that would mean an issue in the
> > > > security policy.
> > > 
> > > Hm, as in, make a separate call to open(2) without O_MAYEXEC, and pass
> > > that fd back to the filtered open(2) that did have O_MAYEXEC. Yes, true.
> > > 
> > > I guess it does become morally equivalent.
> > > 
> > > Okay. Well, let me ask about usability. Right now, a process will need
> > > to do:
> > > 
> > > - should I use AT_CHECK? (check secbit)
> > > - if yes: perform execveat(AT_CHECK)
> > > 
> > > Why not leave the secbit test up to the kernel, and then the program can
> > > just unconditionally call execveat(AT_CHECK)?
> > 
> > That was kind of the approach of the previous patch series and Linus
> > wanted the new interface to follow the kernel semantic.  Enforcing this
> > kind of restriction will always be the duty of user space anyway, so I
> > think it's simpler (i.e. no mix of policy definition, access check, and
> > policy enforcement, but a standalone execveat feature), more flexible,
> > and it fully delegates the policy enforcement to user space instead of
> > trying to enforce some part in the kernel which would only give the
> > illusion of security/policy enforcement.
> 
> A problem could be that from IMA perspective there is no indication on
> whether the interpreter executed or not execveat(). Sure, we can detect
> that the binary supports it, but if the enforcement was
> enabled/disabled that it is not recorded.

We should assume that if the interpreter call execveat+AT_CHECK, it will
enforce restrictions according to its securebits.

> 
> Maybe, setting the process flags should be influenced by the kernel,
> for example not allowing changes and enforcing when there is an IMA
> policy loaded requiring to measure/appraise scripts.

LSMs can set the required securebits per task/interpreter according to
their policies.

> 
> Roberto
> 
> > > 
> > > Though perhaps the issue here is that an execveat() EINVAL doesn't
> > > tell the program if AT_CHECK is unimplemented or if something else
> > > went wrong, and the secbit prctl() will give the correct signal about
> > > AT_CHECK availability?
> > 
> > This kind of check could indeed help to identify the issue.
> 
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC)
  2024-07-16 17:31         ` Mickaël Salaün
@ 2024-07-18 16:21           ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-18 16:21 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: James Bottomley, Mimi Zohar, Al Viro, Christian Brauner,
	Kees Cook, Linus Torvalds, Paul Moore, Theodore Ts'o,
	Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Tue, Jul 16, 2024 at 07:31:45PM +0200, Mickaël Salaün wrote:
> On Tue, Jul 16, 2024 at 12:12:49PM -0400, James Bottomley wrote:
> > On Tue, 2024-07-16 at 17:57 +0200, Roberto Sassu wrote:
> > > But the Clip OS 4 patch does not cover the redirection case:
> > > 
> > > # ./bash < /root/test.sh
> > > Hello World
> > > 
> > > Do you have a more recent patch for that?
> 
> Bash was only partially restricted for CLIP OS because it was used for
> administrative tasks (interactive shell).
> 
> Python was also restricted for user commands though:
> https://github.com/clipos-archive/clipos4_portage-overlay/blob/master/dev-lang/python/files/python-2.7.9-clip-mayexec.patch
> 
> Steve and Christian could help with a better Python implementation.

I'll include a toy interpreter in the next patch series.  That should
help for experiments.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18 12:23           ` Mickaël Salaün
@ 2024-07-18 22:54             ` Jeff Xu
  0 siblings, 0 replies; 103+ messages in thread
From: Jeff Xu @ 2024-07-18 22:54 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Steve Dower, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module,
	Elliott Hughes

On Thu, Jul 18, 2024 at 5:23 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 17, 2024 at 06:51:11PM -0700, Jeff Xu wrote:
> > On Wed, Jul 17, 2024 at 3:00 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> > > > On 17/07/2024 07:33, Jeff Xu wrote:
> > > > > Consider those cases: I think:
> > > > > a> relying purely on userspace for enforcement does't seem to be
> > > > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > > > executable memory.
> > > >
> > > > If there's a way to do this without running executable code that had to pass
> > > > a previous execveat() check, then yeah, it's not effective (e.g. a Python
> > > > interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> > > >
> > > > Once arbitrary code is running, all bets are off. So long as all arbitrary
> > > > code is being checked itself, it's allowed to do things that would bypass
> > > > later checks (and it's up to whoever audited it in the first place to
> > > > prevent this by not giving it the special mark that allows it to pass the
> > > > check).
> > >
> > We will want to define what is considered as "arbitrary code is running"
> >
> > Using an example of ROP, attackers change the return address in stack,
> > e.g. direct the execution flow to a gauge to call "ld.so /tmp/a.out",
> > do you consider "arbitrary code is running" when stack is overwritten
> > ? or after execve() is called.
>
> Yes, ROP is arbitrary code execution (which can be mitigated with CFI).
> ROP could be enough to interpret custom commands and create a small
> interpreter/VM.
>
> > If it is later, this patch can prevent "ld.so /tmp/a.out".
> >
> > > Exactly.  As explained in the patches, one crucial prerequisite is that
> > > the executable code is trusted, and the system must provide integrity
> > > guarantees.  We cannot do anything without that.  This patches series is
> > > a building block to fix a blind spot on Linux systems to be able to
> > > fully control executability.
> >
> > Even trusted executable can have a bug.
>
> Definitely, but this patch series is dedicated to script execution
> control.
>
> >
> > I'm thinking in the context of ChromeOS, where all its system services
> > are from trusted partitions, and legit code won't load .so from a
> > non-exec mount.  But we want to sandbox those services, so even under
> > some kind of ROP attack, the service still won't be able to load .so
> > from /tmp. Of course, if an attacker can already write arbitrary
> > length of data into the stack, it is probably already a game over.
> >
>
> OK, you want to tie executable file permission to mmap.  That makes
> sense if you have a consistent execution model.  This can be enforced by
> LSMs.  Contrary to script interpretation which is a full user space
> implementation (and then controlled by user space), mmap restrictions
> should indeed be enforced by the kernel.
Ya, that is what I meant. it can be out of scope for this patch.
Indeed, as you point out, this patch is dedicated to script execution
control, and fixing ld.so /tmp/a.out is an extra bonus in addition to
script.

Thanks
-Jeff

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18 12:24         ` Mickaël Salaün
  2024-07-18 13:03           ` James Bottomley
@ 2024-07-19  1:29           ` Jeff Xu
  2024-07-19  8:44             ` Mickaël Salaün
  2024-07-19 15:12           ` Jeff Xu
  2 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-19  1:29 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > executable code, scripts could also use this check [1].
> > > > >
> > > > > This is different than faccessat(2) which only checks file access
> > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > real execution, user space gets the same error codes.
> > > > >
> > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > exec, shared object, script and config file (such as seccomp config),
> > >
> > > "config file" that contains executable code.
> > >
> > Is seccomp config  considered as "contains executable code", seccomp
> > config is translated into bpf, so maybe yes ? but bpf is running in
> > the kernel.
>
> Because seccomp filters alter syscalls, they are similar to code
> injection.
>
that makes sense.

> >
> > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > different use cases:
> > > >
> > > > execveat clearly has less code change, but that also means: we can't
> > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > config) for this part (from do_execveat_common to
> > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > agreement/sign-off, I'm not sure from whom.
> > >
> > > I'm not sure to follow. We could still add new flags, but for now I
> > > don't see use cases.  This patch series is not meant to handle all
> > > possible "trust checks", only executable code, which makes sense for the
> > > kernel.
> > >
> > I guess the "configfile" discussion is where I get confused, at one
> > point, I think this would become a generic "trust checks" api for
> > everything related to "generating executable code", e.g. javascript,
> > java code, and more.
> > We will want to clearly define the scope of execveat(AT_CHECK)
>
> The line between data and code is blurry.  For instance, a configuration
> file can impact the execution flow of a program.  So, where to draw the
> line?
>
> It might makes sense to follow the kernel and interpreter semantic: if a
> file can be executed by the kernel (e.g. ELF binary, file containing a
> shebang, or just configured with binfmt_misc), then this should be
> considered as executable code.  This applies to Bash, Python,
> Javascript, NodeJS, PE, PHP...  However, we can also make a picture
> executable with binfmt_misc.  So, again, where to draw the line?
>
> I'd recommend to think about interaction with the outside, through
> function calls, IPCs, syscalls...  For instance, "running" an image
> should not lead to reading or writing to arbitrary files, or accessing
> the network, but in practice it is legitimate for some file formats...
> PostScript is a programming language, but mostly used to draw pictures.
> So, again, where to draw the line?
>
> We should follow the principle of least astonishment.  What most users
> would expect?  This should follow the *common usage* of executable
> files.  At the end, the script interpreters will be patched by security
> folks for security reasons.  I think the right question to ask should
> be: could this file format be (ab)used to leak or modify arbitrary
> files, or to perform arbitrary syscalls?  If the answer is yes, then it
> should be checked for executability.  Of course, this excludes bugs
> exploited in the file format parser.
>
> I'll extend the next patch series with this rationale.
>
> >
> > > If we want other checks, we'll need to clearly define their semantic and
> > > align with the kernel.  faccessat2(2) might be used to check other file
> > > properties, but the executable property is not only defined by the file
> > > attributes.
> > >
> > Agreed.
> >
> > > >
> > > > --------------------------
> > > > now looked at user cases (focus on elf for now)
> > > >
> > > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> > > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> > > >
> > > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> > > > /usr/bin/some.out will pass AT_CHECK
> > > >
> > > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> > > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> > > > /tmp/ld.so (I assume this is possible  for elf header will set the
> > > > path for ld.so because kernel has no knowledge of that, and
> > > > binfmt_elf.c allocate memory for ld.so during execveat call)
> > > >
> > > > 4> dlopen(/tmp/a.so)
> > > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> > > > into memory.
> > > >
> > > > For case 1>
> > > > Alternative solution: Because AT_CHECK is always called, I think we
> > > > can avoid the first AT_CHECK call, and check during execveat(fd),
> > >
> > > There is no need to use AT_CHECK if we're going to call execveat(2) on
> > > the same file descriptor.  By design, AT_CHECK is implicit for any
> > > execve(2).
> > >
> > Yes. I realized I was wrong to say that ld.so will call execve() for
> > /tmp/a.out, there is no execve() call, otherwise it would have been
> > blocked already today.
> > The ld.so will  mmap the /tmp/a.out directly.  So case 1 is no
> > different than case 2 and 4.  ( the elf objects are mapped to memory
> > by dynamic linker.)
> > I'm not familiar with dynamic linker, Florian is on this thread, and
> > can help to correct me if my guess is wrong.
> >
> > > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> > > > benefit is that there is no TOCTOU and save one round trip of syscall
> > > > for a succesful execveat() case.
> > >
> > > As long as user space uses the same file descriptor, there is no TOCTOU.
> > >
> > > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
> > > the user space security policy.  The kernel already enforces the same
> > > security policy for any execve(2), whatever are the calling process's
> > > securebits.
> > >
> > > >
> > > > For case 2>
> > > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> > > > However,  the process can all open then mmap() directly, it seems
> > > > minimal effort for an attacker to walk around such a defence from
> > > > dynamic linker.
> > >
> > > Which process?  What do you mean by "can all open then mmap() directly"?
> > >
> > > In this context the dynamic linker (like its parent processes) is
> > > trusted (guaranteed by the system).
> > >
> > > For case 2, the dynamic linker must check with AT_CHECK all files that
> > > will be mapped, which include /usr/bin/some.out and /tmp/a.so
> > >
> > My point is that the process can work around this by mmap() the file directly.
>
> Yes, see my answer in the other email. The process is trusted.
>
OK. Let's agree that this is out of scope for this patch series.

> >
> > > >
> > > > Alternative solution:
> > > > dynamic linker call AT_CHECK for each .so, kernel will save the state
> > > > (associated with fd)
> > > > kernel will check fd state at the time of mmap(fd, executable memory)
> > > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1
> > >
> > > The idea with AT_CHECK is that there is no kernel side effect, no extra
> > > kernel state, and the semantic is the same as with execve(2).
> > >
> > > This also enables us to check file's executable permission and ignore
> > > it, which is useful in a "permissive mode" when preparing for a
> > > migration without breaking a system, or to do extra integrity checks.
> > For preparing a migration (detect all violations), this is useful.
> > But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this
> > seems to be weak, at least for elf loading case.
>
> We could add more restrictions, but that is outside the scope of this
> patch series.
>
Agreed.

> >
> > > BTW, this use case would also be more complex with a new openat2(2) flag
> > > like the original O_MAYEXEC.
> > >
> > > >
> > > > Alternative solution 2:
> > > > a new syscall to load the .so and enforce the AT_CHECK in kernel
> > >
> > > A new syscall would be overkill for this feature.  Please see Linus's
> > > comment.
> > >
> > maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap()
> > to executable memory.
>
> OK, this is another story.
>
> >
> > > >
> > > > This also means, for the solution to be complete, we might want to
> > > > block creation of executable anonymous memory (e.g. by seccomp, ),
> > >
> > > How seccomp could create anonymous memory in user space?
> > > seccomp filters should be treated (and checked with AT_CHECK) as
> > > executable code anyway.
> > >
> > > > unless the user space can harden the creation of  executable anonymous
> > > > memory in some way.
> > >
> > > User space is already in charge of mmapping its own memory.  I don't see
> > > what is missing.
> > >
> > > >
> > > > For case 3>
> > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > sure it passes AT_CHECK, before loading it into memory.
> > >
> > > All ELF dependencies are opened and checked with open_exec(), which
> > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > Did I miss something?
> > >
> > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > The app can choose its own dynamic linker path during build, (maybe
> > even statically link one ?)  This is another reason that relying on a
> > userspace only is not enough.
>
> The kernel calls open_exec() on all dependencies, including
> ld-linux-x86-64.so.2, so these files are checked for executability too.
>
This might not be entirely true. iiuc, kernel  calls open_exec for
open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
load_elf_binary() {
   interpreter = open_exec(elf_interpreter);
}

libc.so.6 is opened and mapped by dynamic linker.
so the call sequence is:
 execve(a.out)
  - open exec(a.out)
  - security_bprm_creds(a.out)
  - open the exec(ld.so)
  - call open_exec() for interruptor (ld.so)
  - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
the same check and code path as libc.so below ?
  - transfer the control to ld.so)
  - ld.so open (libc.so)
  - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
require dynamic linker change.
  - ld.so mmap(libc.so,rx)


> >
> > > However, we must be careful with programs using the (deprecated)
> > > uselib(2). They should also check with AT_CHECK because this syscall
> > > opens the shared library without __FMODE_EXEC (similar to a simple file
> > > open). See
> > > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/
> > >
> > > >
> > > > For case 4>
> > > > same as case 2.
> > > >
> > > > Consider those cases: I think:
> > > > a> relying purely on userspace for enforcement does't seem to be
> > > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > > executable memory.
> > >
> > > As Steve explained (and is also explained in the patches), it is trivial
> > > if the attacker can already execute its own code, which is too late to
> > > enforce any script execution control.
> > >
> > > > b> if both user space and kernel need to call AT_CHECK, the faccessat
> > > > seems to be a better place for AT_CHECK, e.g. kernel can call
> > > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> > > > avoid complicating the execveat() code path.
> > >
> > > A previous version of this patches series already patched faccessat(2),
> > > but this is not the right place.  faccessat2(2) is dedicated to check
> > > file permissions, not executability (e.g. with mount's noexec).
> > >
> > > >
> > > > What do you think ?
> > >
> > > I think there are some misunderstandings.  Please let me know if it's
> > > clearer now.
> > >
> > I'm still not sure about the user case for dynamic linker (elf
> > loading) case. Maybe this patch is more suitable for scripts?
>
> It's suitable for both, but we could add more restriction on mmap
> with an (existing) LSM.  The kernel already checks for mount's noexec
> when mapping a file, but not for the file permission, which is OK
> because it could be bypassed by coping the content of the file and
> mprotecting it anyway.  For a consistent memory execution control, all
> memory mapping need to be restricted, which is out of scope for this
> patch series.
>
Ok.

> > A detailed user case will help demonstrate the use case for dynamic
> > linker, e.g. what kind of app will benefit from
> > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> > dealing with , what kind of attack chain we blocked as a result.
>
> I explained that in the patches and in the description of these new
> securebits.  Please point which part is not clear.  The full threat
> model is simple: the TCB includes the kernel and system's files, which
> are integrity-protected, but we don't trust arbitrary data/scripts that
> can be written to user-owned files or directly provided to script
> interpreters.  As for the ptrace restrictions, the dynamic linker
> restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD)
> with consistent executability checks.
>
On elf loading case, I'm clear after your last email. However, I'm not
sure if everyone else follows,  I will try to summarize here:
- Problem:  ld.so /tmp/a.out will happily pass, even /tmp/a.out is
mounted as non-exec.
  Solution: ld.so call execveat(AT_CHECK) for a.out before mmap a.out
into memory.

- Problem: a poorly built application (a.out) can have a dependency on
/tmp/a.o, when /tmp/a.o is on non-exec mount,
  Solution: ld.so call execveat(AT_CHECK) for a.o, before mmap a.o into memory.

- Problem: application can call mmap (/tmp/a.out, rx), where /tmp is
on non-exec mount
  This is out of scope, i.e. will require enforcement on mmap(), maybe
through LSM

Thanks
Best regards
-Jeff

-Jeff


> >
> > > >
> > > > Thanks
> > > > -Jeff
> > > >
> > > > > With the information that a script interpreter is about to interpret a
> > > > > script, an LSM security policy can adjust caller's access rights or log
> > > > > execution request as for native script execution (e.g. role transition).
> > > > > This is possible thanks to the call to security_bprm_creds_for_exec().
> > > > >
> > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > > > > current kernel code should not be a security issue (e.g. unexpected role
> > > > > transition).  LSMs willing to update the caller's credential could now
> > > > > do so when bprm->is_check is set.  Of course, such policy change should
> > > > > be in line with the new user space code.
> > > > >
> > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > > > make sense for the kernel to parse the checked files, look for
> > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > > never called when AT_CHECK is used.
> > > > >
> > > > > It should be noted that script interpreters cannot directly use
> > > > > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > > > > executed to interpret the script.  Unlike the kernel, script
> > > > > interpreters may just interpret the shebang as a simple comment, which
> > > > > should not change for backward compatibility reasons.
> > > > >
> > > > > Because scripts or libraries files might not currently have the
> > > > > executable permission set, or because we might want specific users to be
> > > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > > > > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> > > > >
> > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > > This patch has been used for more than a decade with customized script
> > > > > interpreters.  Some examples can be found here:
> > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > > >
> > > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > > > > ---
> > > > >
> > > > > New design since v18:
> > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > > ---
> > > > >  fs/exec.c                  |  5 +++--
> > > > >  include/linux/binfmts.h    |  7 ++++++-
> > > > >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> > > > >  kernel/audit.h             |  1 +
> > > > >  kernel/auditsc.c           |  1 +
> > > > >  5 files changed, 41 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > index 40073142288f..ea2a1867afdc 100644
> > > > > --- a/fs/exec.c
> > > > > +++ b/fs/exec.c
> > > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > > >         };
> > > > >
> > > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> > > > >                 return ERR_PTR(-EINVAL);
> > > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > > >                 bprm->filename = bprm->fdpath;
> > > > >         }
> > > > >         bprm->interp = bprm->filename;
> > > > > +       bprm->is_check = !!(flags & AT_CHECK);
> > > > >
> > > > >         retval = bprm_mm_init(bprm);
> > > > >         if (!retval)
> > > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > > >
> > > > >         /* Set the unchanging part of bprm->cred */
> > > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > > -       if (retval)
> > > > > +       if (retval || bprm->is_check)
> > > > >                 goto out;
> > > > >
> > > > >         retval = exec_binprm(bprm);
> > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > > index 70f97f685bff..8ff9c9e33aed 100644
> > > > > --- a/include/linux/binfmts.h
> > > > > +++ b/include/linux/binfmts.h
> > > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > > >                  * Set when errors can no longer be returned to the
> > > > >                  * original userspace.
> > > > >                  */
> > > > > -               point_of_no_return:1;
> > > > > +               point_of_no_return:1,
> > > > > +               /*
> > > > > +                * Set by user space to check executability according to the
> > > > > +                * caller's environment.
> > > > > +                */
> > > > > +               is_check:1;
> > > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > > >         struct file *interpreter;
> > > > >         struct file *file;
> > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > > index c0bcc185fa48..bcd05c59b7df 100644
> > > > > --- a/include/uapi/linux/fcntl.h
> > > > > +++ b/include/uapi/linux/fcntl.h
> > > > > @@ -118,6 +118,36 @@
> > > > >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> > > > >                                         compare object identity and may not
> > > > >                                         be usable to open_by_handle_at(2) */
> > > > > +
> > > > > +/*
> > > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > > > > + * of this file would be allowed, ignoring the file format and then the related
> > > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > > > > + * thread.  See securebits.h documentation.
> > > > > + *
> > > > > + * Programs should use this check to apply kernel-level checks against files
> > > > > + * that are not directly executed by the kernel but directly passed to a user
> > > > > + * space interpreter instead.  All files that contain executable code, from the
> > > > > + * point of view of the interpreter, should be checked.  The main purpose of
> > > > > + * this flag is to improve the security and consistency of an execution
> > > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > > > > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > > > > + * instance, this can be used to check if a file is trustworthy according to
> > > > > + * the caller's environment.
> > > > > + *
> > > > > + * In a secure environment, libraries and any executable dependencies should
> > > > > + * also be checked.  For instance dynamic linking should make sure that all
> > > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > > > > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > > > > + * trusted code should be executable, which also requires integrity guarantees.
> > > > > + *
> > > > > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > > > > + * descriptor instead of a path.
> > > > > + */
> > > > > +#define AT_CHECK               0x10000
> > > > > +
> > > > >  #if defined(__KERNEL__)
> > > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > > >  #endif
> > > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > > --- a/kernel/audit.h
> > > > > +++ b/kernel/audit.h
> > > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > > >                 struct open_how openat2;
> > > > >                 struct {
> > > > >                         int                     argc;
> > > > > +                       bool                    is_check;
> > > > >                 } execve;
> > > > >                 struct {
> > > > >                         char                    *name;
> > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > > index 6f0d6fb6523f..b6316e284342 100644
> > > > > --- a/kernel/auditsc.c
> > > > > +++ b/kernel/auditsc.c
> > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > > >
> > > > >         context->type = AUDIT_EXECVE;
> > > > >         context->execve.argc = bprm->argc;
> > > > > +       context->execve.is_check = bprm->is_check;
> > > > >  }
> > > > >
> > > > >
> > > > > --
> > > > > 2.45.2
> > > > >
> > > >
> >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-19  1:29           ` Jeff Xu
@ 2024-07-19  8:44             ` Mickaël Salaün
  2024-07-19 14:16               ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-19  8:44 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > executable code, scripts could also use this check [1].
> > > > > >
> > > > > > This is different than faccessat(2) which only checks file access
> > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > real execution, user space gets the same error codes.
> > > > > >
> > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > exec, shared object, script and config file (such as seccomp config),

> > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > sure it passes AT_CHECK, before loading it into memory.
> > > >
> > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > Did I miss something?
> > > >
> > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > The app can choose its own dynamic linker path during build, (maybe
> > > even statically link one ?)  This is another reason that relying on a
> > > userspace only is not enough.
> >
> > The kernel calls open_exec() on all dependencies, including
> > ld-linux-x86-64.so.2, so these files are checked for executability too.
> >
> This might not be entirely true. iiuc, kernel  calls open_exec for
> open_exec for interpreter, but not all its dependency (e.g. libc.so.6)

Correct, the dynamic linker is in charge of that, which is why it must
be enlighten with execveat+AT_CHECK and securebits checks.

> load_elf_binary() {
>    interpreter = open_exec(elf_interpreter);
> }
> 
> libc.so.6 is opened and mapped by dynamic linker.
> so the call sequence is:
>  execve(a.out)
>   - open exec(a.out)
>   - security_bprm_creds(a.out)
>   - open the exec(ld.so)
>   - call open_exec() for interruptor (ld.so)
>   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> the same check and code path as libc.so below ?

open_exec() checks are enough.  LSMs can use this information (open +
__FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
request.

>   - transfer the control to ld.so)
>   - ld.so open (libc.so)
>   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> require dynamic linker change.
>   - ld.so mmap(libc.so,rx)

Explaining these steps is useful. I'll include that in the next patch
series.

> > > A detailed user case will help demonstrate the use case for dynamic
> > > linker, e.g. what kind of app will benefit from
> > > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> > > dealing with , what kind of attack chain we blocked as a result.
> >
> > I explained that in the patches and in the description of these new
> > securebits.  Please point which part is not clear.  The full threat
> > model is simple: the TCB includes the kernel and system's files, which
> > are integrity-protected, but we don't trust arbitrary data/scripts that
> > can be written to user-owned files or directly provided to script
> > interpreters.  As for the ptrace restrictions, the dynamic linker
> > restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD)
> > with consistent executability checks.
> >
> On elf loading case, I'm clear after your last email. However, I'm not
> sure if everyone else follows,  I will try to summarize here:
> - Problem:  ld.so /tmp/a.out will happily pass, even /tmp/a.out is
> mounted as non-exec.
>   Solution: ld.so call execveat(AT_CHECK) for a.out before mmap a.out
> into memory.
> 
> - Problem: a poorly built application (a.out) can have a dependency on
> /tmp/a.o, when /tmp/a.o is on non-exec mount,
>   Solution: ld.so call execveat(AT_CHECK) for a.o, before mmap a.o into memory.
> 
> - Problem: application can call mmap (/tmp/a.out, rx), where /tmp is
> on non-exec mount

I'd say "malicious or non-enlightened processes" can call mmap without
execveat+AT_CHECK...

>   This is out of scope, i.e. will require enforcement on mmap(), maybe
> through LSM

Cool, I'll include that as well. Thanks.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-19  8:44             ` Mickaël Salaün
@ 2024-07-19 14:16               ` Jeff Xu
  2024-07-19 15:04                 ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-19 14:16 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > executable code, scripts could also use this check [1].
> > > > > > >
> > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > real execution, user space gets the same error codes.
> > > > > > >
> > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > exec, shared object, script and config file (such as seccomp config),
>
> > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > >
> > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > Did I miss something?
> > > > >
> > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > The app can choose its own dynamic linker path during build, (maybe
> > > > even statically link one ?)  This is another reason that relying on a
> > > > userspace only is not enough.
> > >
> > > The kernel calls open_exec() on all dependencies, including
> > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > >
> > This might not be entirely true. iiuc, kernel  calls open_exec for
> > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
>
> Correct, the dynamic linker is in charge of that, which is why it must
> be enlighten with execveat+AT_CHECK and securebits checks.
>
> > load_elf_binary() {
> >    interpreter = open_exec(elf_interpreter);
> > }
> >
> > libc.so.6 is opened and mapped by dynamic linker.
> > so the call sequence is:
> >  execve(a.out)
> >   - open exec(a.out)
> >   - security_bprm_creds(a.out)
> >   - open the exec(ld.so)
> >   - call open_exec() for interruptor (ld.so)
> >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > the same check and code path as libc.so below ?
>
> open_exec() checks are enough.  LSMs can use this information (open +
> __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> request.
>
Then the ld.so doesn't go through the same security_bprm_creds() check
as other .so.

As my previous email, the ChromeOS LSM restricts executable mfd
through security_bprm_creds(), the end result is that ld.so can still
be executable memfd, but not other .so.

One way to address this is to refactor the necessary code from
execveat() code patch, and make it available to call from both kernel
and execveat() code paths., but if we do that, we might as well use
faccessat2(AT_CHECK)


> >   - transfer the control to ld.so)
> >   - ld.so open (libc.so)
> >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > require dynamic linker change.
> >   - ld.so mmap(libc.so,rx)
>
> Explaining these steps is useful. I'll include that in the next patch
> series.
>
> > > > A detailed user case will help demonstrate the use case for dynamic
> > > > linker, e.g. what kind of app will benefit from
> > > > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> > > > dealing with , what kind of attack chain we blocked as a result.
> > >
> > > I explained that in the patches and in the description of these new
> > > securebits.  Please point which part is not clear.  The full threat
> > > model is simple: the TCB includes the kernel and system's files, which
> > > are integrity-protected, but we don't trust arbitrary data/scripts that
> > > can be written to user-owned files or directly provided to script
> > > interpreters.  As for the ptrace restrictions, the dynamic linker
> > > restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD)
> > > with consistent executability checks.
> > >
> > On elf loading case, I'm clear after your last email. However, I'm not
> > sure if everyone else follows,  I will try to summarize here:
> > - Problem:  ld.so /tmp/a.out will happily pass, even /tmp/a.out is
> > mounted as non-exec.
> >   Solution: ld.so call execveat(AT_CHECK) for a.out before mmap a.out
> > into memory.
> >
> > - Problem: a poorly built application (a.out) can have a dependency on
> > /tmp/a.o, when /tmp/a.o is on non-exec mount,
> >   Solution: ld.so call execveat(AT_CHECK) for a.o, before mmap a.o into memory.
> >
> > - Problem: application can call mmap (/tmp/a.out, rx), where /tmp is
> > on non-exec mount
>
> I'd say "malicious or non-enlightened processes" can call mmap without
> execveat+AT_CHECK...
>
> >   This is out of scope, i.e. will require enforcement on mmap(), maybe
> > through LSM
>
> Cool, I'll include that as well. Thanks.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-19 14:16               ` Jeff Xu
@ 2024-07-19 15:04                 ` Mickaël Salaün
  2024-07-19 15:27                   ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-19 15:04 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > >
> > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > >
> > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > real execution, user space gets the same error codes.
> > > > > > > >
> > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > exec, shared object, script and config file (such as seccomp config),
> >
> > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > >
> > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > Did I miss something?
> > > > > >
> > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > even statically link one ?)  This is another reason that relying on a
> > > > > userspace only is not enough.
> > > >
> > > > The kernel calls open_exec() on all dependencies, including
> > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > >
> > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> >
> > Correct, the dynamic linker is in charge of that, which is why it must
> > be enlighten with execveat+AT_CHECK and securebits checks.
> >
> > > load_elf_binary() {
> > >    interpreter = open_exec(elf_interpreter);
> > > }
> > >
> > > libc.so.6 is opened and mapped by dynamic linker.
> > > so the call sequence is:
> > >  execve(a.out)
> > >   - open exec(a.out)
> > >   - security_bprm_creds(a.out)
> > >   - open the exec(ld.so)
> > >   - call open_exec() for interruptor (ld.so)
> > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > the same check and code path as libc.so below ?
> >
> > open_exec() checks are enough.  LSMs can use this information (open +
> > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > request.
> >
> Then the ld.so doesn't go through the same security_bprm_creds() check
> as other .so.

Indeed, but...

> 
> As my previous email, the ChromeOS LSM restricts executable mfd
> through security_bprm_creds(), the end result is that ld.so can still
> be executable memfd, but not other .so.

The chromeOS LSM can check that with the security_file_open() hook and
the __FMODE_EXEC flag, see Landlock's implementation.  I think this
should be the only hook implementation that chromeOS LSM needs to add.

> 
> One way to address this is to refactor the necessary code from
> execveat() code patch, and make it available to call from both kernel
> and execveat() code paths., but if we do that, we might as well use
> faccessat2(AT_CHECK)

That's why I think it makes sense to rely on the existing __FMODE_EXEC
information.

> 
> 
> > >   - transfer the control to ld.so)
> > >   - ld.so open (libc.so)
> > >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > > require dynamic linker change.
> > >   - ld.so mmap(libc.so,rx)
> >
> > Explaining these steps is useful. I'll include that in the next patch
> > series.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18 12:24         ` Mickaël Salaün
  2024-07-18 13:03           ` James Bottomley
  2024-07-19  1:29           ` Jeff Xu
@ 2024-07-19 15:12           ` Jeff Xu
  2024-07-19 15:31             ` Mickaël Salaün
  2 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-19 15:12 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > executable code, scripts could also use this check [1].
> > > > >
> > > > > This is different than faccessat(2) which only checks file access
> > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > real execution, user space gets the same error codes.
> > > > >
> > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > exec, shared object, script and config file (such as seccomp config),
> > >
> > > "config file" that contains executable code.
> > >
> > Is seccomp config  considered as "contains executable code", seccomp
> > config is translated into bpf, so maybe yes ? but bpf is running in
> > the kernel.
>
> Because seccomp filters alter syscalls, they are similar to code
> injection.
>
> >
> > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > different use cases:
> > > >
> > > > execveat clearly has less code change, but that also means: we can't
> > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > config) for this part (from do_execveat_common to
> > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > agreement/sign-off, I'm not sure from whom.
> > >
> > > I'm not sure to follow. We could still add new flags, but for now I
> > > don't see use cases.  This patch series is not meant to handle all
> > > possible "trust checks", only executable code, which makes sense for the
> > > kernel.
> > >
> > I guess the "configfile" discussion is where I get confused, at one
> > point, I think this would become a generic "trust checks" api for
> > everything related to "generating executable code", e.g. javascript,
> > java code, and more.
> > We will want to clearly define the scope of execveat(AT_CHECK)
>
> The line between data and code is blurry.  For instance, a configuration
> file can impact the execution flow of a program.  So, where to draw the
> line?
>
> It might makes sense to follow the kernel and interpreter semantic: if a
> file can be executed by the kernel (e.g. ELF binary, file containing a
> shebang, or just configured with binfmt_misc), then this should be
> considered as executable code.  This applies to Bash, Python,
> Javascript, NodeJS, PE, PHP...  However, we can also make a picture
> executable with binfmt_misc.  So, again, where to draw the line?
>
> I'd recommend to think about interaction with the outside, through
> function calls, IPCs, syscalls...  For instance, "running" an image
> should not lead to reading or writing to arbitrary files, or accessing
> the network, but in practice it is legitimate for some file formats...
> PostScript is a programming language, but mostly used to draw pictures.
> So, again, where to draw the line?
>
The javascript is run by browser and java code by java runtime, do
they meet the criteria? they do not interact with the kernel directly,
however they might have the same "executable" characteristics and the
app might not want them to be put into non-exec mount.

If the answer is yes, they can also use execveat(AT_CHECK),  the next
question is: does it make sense for javacript/java code to go through
execveat() code path, allocate bprm, etc ? (I don't have answer, maybe
it is)

> We should follow the principle of least astonishment.  What most users
> would expect?  This should follow the *common usage* of executable
> files.  At the end, the script interpreters will be patched by security
> folks for security reasons.  I think the right question to ask should
> be: could this file format be (ab)used to leak or modify arbitrary
> files, or to perform arbitrary syscalls?  If the answer is yes, then it
> should be checked for executability.  Of course, this excludes bugs
> exploited in the file format parser.
>
> I'll extend the next patch series with this rationale.
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-19 15:04                 ` Mickaël Salaün
@ 2024-07-19 15:27                   ` Jeff Xu
  2024-07-23 13:15                     ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-19 15:27 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > >
> > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > > >
> > > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > > real execution, user space gets the same error codes.
> > > > > > > > >
> > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > > exec, shared object, script and config file (such as seccomp config),
> > >
> > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > > >
> > > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > > Did I miss something?
> > > > > > >
> > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > > even statically link one ?)  This is another reason that relying on a
> > > > > > userspace only is not enough.
> > > > >
> > > > > The kernel calls open_exec() on all dependencies, including
> > > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > > >
> > > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> > >
> > > Correct, the dynamic linker is in charge of that, which is why it must
> > > be enlighten with execveat+AT_CHECK and securebits checks.
> > >
> > > > load_elf_binary() {
> > > >    interpreter = open_exec(elf_interpreter);
> > > > }
> > > >
> > > > libc.so.6 is opened and mapped by dynamic linker.
> > > > so the call sequence is:
> > > >  execve(a.out)
> > > >   - open exec(a.out)
> > > >   - security_bprm_creds(a.out)
> > > >   - open the exec(ld.so)
> > > >   - call open_exec() for interruptor (ld.so)
> > > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > > the same check and code path as libc.so below ?
> > >
> > > open_exec() checks are enough.  LSMs can use this information (open +
> > > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > > request.
> > >
> > Then the ld.so doesn't go through the same security_bprm_creds() check
> > as other .so.
>
> Indeed, but...
>
My point is: we will want all the .so going through the same code
path, so  security_ functions are called consistently across all the
objects, And in the future, if we want to develop additional LSM
functionality based on AT_CHECK, it will be applied to all objects.

Another thing to consider is:  we are asking userspace to make
additional syscall before  loading the file into memory/get executed,
there is a possibility for future expansion of the mechanism, without
asking user space to add another syscall again.

I m still not convinced yet that execveat(AT_CHECK) fits more than
faccessat(AT_CHECK)


> >
> > As my previous email, the ChromeOS LSM restricts executable mfd
> > through security_bprm_creds(), the end result is that ld.so can still
> > be executable memfd, but not other .so.
>
> The chromeOS LSM can check that with the security_file_open() hook and
> the __FMODE_EXEC flag, see Landlock's implementation.  I think this
> should be the only hook implementation that chromeOS LSM needs to add.
>
> >
> > One way to address this is to refactor the necessary code from
> > execveat() code patch, and make it available to call from both kernel
> > and execveat() code paths., but if we do that, we might as well use
> > faccessat2(AT_CHECK)
>
> That's why I think it makes sense to rely on the existing __FMODE_EXEC
> information.
>
> >
> >
> > > >   - transfer the control to ld.so)
> > > >   - ld.so open (libc.so)
> > > >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > > > require dynamic linker change.
> > > >   - ld.so mmap(libc.so,rx)
> > >
> > > Explaining these steps is useful. I'll include that in the next patch
> > > series.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-19 15:12           ` Jeff Xu
@ 2024-07-19 15:31             ` Mickaël Salaün
  2024-07-19 17:36               ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-19 15:31 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Fri, Jul 19, 2024 at 08:12:37AM -0700, Jeff Xu wrote:
> On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > executable code, scripts could also use this check [1].
> > > > > >
> > > > > > This is different than faccessat(2) which only checks file access
> > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > real execution, user space gets the same error codes.
> > > > > >
> > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > exec, shared object, script and config file (such as seccomp config),
> > > >
> > > > "config file" that contains executable code.
> > > >
> > > Is seccomp config  considered as "contains executable code", seccomp
> > > config is translated into bpf, so maybe yes ? but bpf is running in
> > > the kernel.
> >
> > Because seccomp filters alter syscalls, they are similar to code
> > injection.
> >
> > >
> > > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > > different use cases:
> > > > >
> > > > > execveat clearly has less code change, but that also means: we can't
> > > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > > config) for this part (from do_execveat_common to
> > > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > > agreement/sign-off, I'm not sure from whom.
> > > >
> > > > I'm not sure to follow. We could still add new flags, but for now I
> > > > don't see use cases.  This patch series is not meant to handle all
> > > > possible "trust checks", only executable code, which makes sense for the
> > > > kernel.
> > > >
> > > I guess the "configfile" discussion is where I get confused, at one
> > > point, I think this would become a generic "trust checks" api for
> > > everything related to "generating executable code", e.g. javascript,
> > > java code, and more.
> > > We will want to clearly define the scope of execveat(AT_CHECK)
> >
> > The line between data and code is blurry.  For instance, a configuration
> > file can impact the execution flow of a program.  So, where to draw the
> > line?
> >
> > It might makes sense to follow the kernel and interpreter semantic: if a
> > file can be executed by the kernel (e.g. ELF binary, file containing a
> > shebang, or just configured with binfmt_misc), then this should be
> > considered as executable code.  This applies to Bash, Python,
> > Javascript, NodeJS, PE, PHP...  However, we can also make a picture
> > executable with binfmt_misc.  So, again, where to draw the line?
> >
> > I'd recommend to think about interaction with the outside, through
> > function calls, IPCs, syscalls...  For instance, "running" an image
> > should not lead to reading or writing to arbitrary files, or accessing
> > the network, but in practice it is legitimate for some file formats...
> > PostScript is a programming language, but mostly used to draw pictures.
> > So, again, where to draw the line?
> >
> The javascript is run by browser and java code by java runtime, do
> they meet the criteria? they do not interact with the kernel directly,
> however they might have the same "executable" characteristics and the
> app might not want them to be put into non-exec mount.
> 
> If the answer is yes, they can also use execveat(AT_CHECK),  the next
> question is: does it make sense for javacript/java code to go through
> execveat() code path, allocate bprm, etc ? (I don't have answer, maybe
> it is)

Java and NodeJS can do arbitrary syscalls (through their runtime) and
they can access arbitrary files, so according to my below comment, yes
they should be managed as potentially dangerous executable code.

The question should be: is this code trusted? Most of the time it is
not, hence the security model of web browser and their heavy use of
sandboxing.  So no, I don't think it would make sense to check this kind
of code more than what the browser already do.

I'll talk about this use case in the next patch series.

> 
> > We should follow the principle of least astonishment.  What most users
> > would expect?  This should follow the *common usage* of executable
> > files.  At the end, the script interpreters will be patched by security
> > folks for security reasons.  I think the right question to ask should
> > be: could this file format be (ab)used to leak or modify arbitrary
> > files, or to perform arbitrary syscalls?  If the answer is yes, then it
> > should be checked for executability.  Of course, this excludes bugs
> > exploited in the file format parser.
> >
> > I'll extend the next patch series with this rationale.
> >
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-19 15:31             ` Mickaël Salaün
@ 2024-07-19 17:36               ` Jeff Xu
  2024-07-23 13:15                 ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-07-19 17:36 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Fri, Jul 19, 2024 at 8:31 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Fri, Jul 19, 2024 at 08:12:37AM -0700, Jeff Xu wrote:
> > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > executable code, scripts could also use this check [1].
> > > > > > >
> > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > real execution, user space gets the same error codes.
> > > > > > >
> > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > exec, shared object, script and config file (such as seccomp config),
> > > > >
> > > > > "config file" that contains executable code.
> > > > >
> > > > Is seccomp config  considered as "contains executable code", seccomp
> > > > config is translated into bpf, so maybe yes ? but bpf is running in
> > > > the kernel.
> > >
> > > Because seccomp filters alter syscalls, they are similar to code
> > > injection.
> > >
> > > >
> > > > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > > > different use cases:
> > > > > >
> > > > > > execveat clearly has less code change, but that also means: we can't
> > > > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > > > config) for this part (from do_execveat_common to
> > > > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > > > agreement/sign-off, I'm not sure from whom.
> > > > >
> > > > > I'm not sure to follow. We could still add new flags, but for now I
> > > > > don't see use cases.  This patch series is not meant to handle all
> > > > > possible "trust checks", only executable code, which makes sense for the
> > > > > kernel.
> > > > >
> > > > I guess the "configfile" discussion is where I get confused, at one
> > > > point, I think this would become a generic "trust checks" api for
> > > > everything related to "generating executable code", e.g. javascript,
> > > > java code, and more.
> > > > We will want to clearly define the scope of execveat(AT_CHECK)
> > >
> > > The line between data and code is blurry.  For instance, a configuration
> > > file can impact the execution flow of a program.  So, where to draw the
> > > line?
> > >
> > > It might makes sense to follow the kernel and interpreter semantic: if a
> > > file can be executed by the kernel (e.g. ELF binary, file containing a
> > > shebang, or just configured with binfmt_misc), then this should be
> > > considered as executable code.  This applies to Bash, Python,
> > > Javascript, NodeJS, PE, PHP...  However, we can also make a picture
> > > executable with binfmt_misc.  So, again, where to draw the line?
> > >
> > > I'd recommend to think about interaction with the outside, through
> > > function calls, IPCs, syscalls...  For instance, "running" an image
> > > should not lead to reading or writing to arbitrary files, or accessing
> > > the network, but in practice it is legitimate for some file formats...
> > > PostScript is a programming language, but mostly used to draw pictures.
> > > So, again, where to draw the line?
> > >
> > The javascript is run by browser and java code by java runtime, do
> > they meet the criteria? they do not interact with the kernel directly,
> > however they might have the same "executable" characteristics and the
> > app might not want them to be put into non-exec mount.
> >
> > If the answer is yes, they can also use execveat(AT_CHECK),  the next
> > question is: does it make sense for javacript/java code to go through
> > execveat() code path, allocate bprm, etc ? (I don't have answer, maybe
> > it is)
>
> Java and NodeJS can do arbitrary syscalls (through their runtime) and
> they can access arbitrary files, so according to my below comment, yes
> they should be managed as potentially dangerous executable code.
>
> The question should be: is this code trusted? Most of the time it is
> not, hence the security model of web browser and their heavy use of
> sandboxing.  So no, I don't think it would make sense to check this kind
> of code more than what the browser already do.
>

If I understand you correctly, Java/NodeJS won't use
execveat(AT_CHECK), we will leave that work to the web browser/java
runtime's sandboxer.
This is good because the scope is more narrow/clear.

Thanks
-Jeff

> I'll talk about this use case in the next patch series.
>
> >
> > > We should follow the principle of least astonishment.  What most users
> > > would expect?  This should follow the *common usage* of executable
> > > files.  At the end, the script interpreters will be patched by security
> > > folks for security reasons.  I think the right question to ask should
> > > be: could this file format be (ab)used to leak or modify arbitrary
> > > files, or to perform arbitrary syscalls?  If the answer is yes, then it
> > > should be checked for executability.  Of course, this excludes bugs
> > > exploited in the file format parser.
> > >
> > > I'll extend the next patch series with this rationale.
> > >
> >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-18 12:22           ` Mickaël Salaün
@ 2024-07-20  1:59             ` Andy Lutomirski
  2024-07-20 11:43               ` Jarkko Sakkinen
  2024-07-23 13:16               ` Mickaël Salaün
  0 siblings, 2 replies; 103+ messages in thread
From: Andy Lutomirski @ 2024-07-20  1:59 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Steve Dower, Jeff Xu, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module,
	Elliott Hughes

> On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote:
>>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>>
>>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
>>>>> On 17/07/2024 07:33, Jeff Xu wrote:
>>>>> Consider those cases: I think:
>>>>> a> relying purely on userspace for enforcement does't seem to be
>>>>> effective,  e.g. it is trivial  to call open(), then mmap() it into
>>>>> executable memory.
>>>>
>>>> If there's a way to do this without running executable code that had to pass
>>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python
>>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
>>>>
>>>> Once arbitrary code is running, all bets are off. So long as all arbitrary
>>>> code is being checked itself, it's allowed to do things that would bypass
>>>> later checks (and it's up to whoever audited it in the first place to
>>>> prevent this by not giving it the special mark that allows it to pass the
>>>> check).
>>>
>>> Exactly.  As explained in the patches, one crucial prerequisite is that
>>> the executable code is trusted, and the system must provide integrity
>>> guarantees.  We cannot do anything without that.  This patches series is
>>> a building block to fix a blind spot on Linux systems to be able to
>>> fully control executability.
>>
>> Circling back to my previous comment (did that ever get noticed?), I
>
> Yes, I replied to your comments.  Did I miss something?

I missed that email in the pile, sorry. I’ll reply separately.

>
>> don’t think this is quite right:
>>
>> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/
>>
>> On a basic system configuration, a given path either may or may not be
>> executed. And maybe that path has some integrity check (dm-verity,
>> etc).  So the kernel should tell the interpreter/loader whether the
>> target may be executed. All fine.
>>
>> But I think the more complex cases are more interesting, and the
>> “execute a program” process IS NOT BINARY.  An attempt to execute can
>> be rejected outright, or it can be allowed *with a change to creds or
>> security context*.  It would be entirely reasonable to have a policy
>> that allows execution of non-integrity-checked files but in a very
>> locked down context only.
>
> I guess you mean to transition to a sandbox when executing an untrusted
> file.  This is a good idea.  I talked about role transition in the
> patch's description:
>
> With the information that a script interpreter is about to interpret a
> script, an LSM security policy can adjust caller's access rights or log
> execution request as for native script execution (e.g. role transition).
> This is possible thanks to the call to security_bprm_creds_for_exec().
> This patch series brings the minimal building blocks to have a
> consistent execution environment.  Role transitions for script execution
> are left to LSMs.  For instance, we could extend Landlock to
> automatically sandbox untrusted scripts.

I’m not really convinced.  There’s more to building an API that
enables LSM hooks than merely sticking the hook somewhere in kernel
code. It needs to be a defined API. If you call an operation “check”,
then people will expect it to check, not to change the caller’s
credentials.  And people will mess it up in both directions (e.g.
callers will call it and then open try to load some library that they
should have loaded first, or callers will call it and forget to close
fds first.

And there should probably be some interaction with dumpable as well.
If I “check” a file for executability, that should not suddenly allow
someone to ptrace me?

And callers need to know to exit on failure, not carry on.


More concretely, a runtime that fully opts in to this may well "check"
multiple things.  For example, if I do:

$ ld.so ~/.local/bin/some_program   (i.e. I literally execve ld.so)

then ld.so will load several things:

~/.local/bin/some_program
libc.so
other random DSOs, some of which may well be in my home directory

And for all ld.so knows, some_program is actually an interpreter and
will "check" something else.  And the LSMs have absolutely no clue
what's what.  So I think for this to work right, the APIs need to be a
lot more expressive and explicit:

check_library(fd to libc.so);  <-- does not transition or otherwise drop privs
check_transition_main_program(fd to ~/.local/bin/some_program);  <--
may drop privs

and if some_program is really an interpreter, then it will do:

check_library(fd to some thing imported by the script);
check_transition_main_program(fd to the actual script);

And maybe that takes a parameter that gets run eval-style:

check_unsafe_user_script("actual contents of snippet");

The actual spelling of all this doesn't matter so much.  But the user
code and the kernel code need to be on the same page as to what the
user program is doing and what it's asking the kernel program to do.

--Andy

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-04 19:01 ` [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits Mickaël Salaün
  2024-07-05  0:18   ` Kees Cook
  2024-07-08 16:17   ` Jeff Xu
@ 2024-07-20  2:06   ` Andy Lutomirski
  2024-07-23 13:15     ` Mickaël Salaün
  2 siblings, 1 reply; 103+ messages in thread
From: Andy Lutomirski @ 2024-07-20  2:06 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Fri, Jul 5, 2024 at 3:02 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and
> their *_LOCKED counterparts are designed to be set by processes setting
> up an execution environment, such as a user session, a container, or a
> security sandbox.  Like seccomp filters or Landlock domains, the
> securebits are inherited across proceses.
>
> When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should
> check executable resources with execveat(2) + AT_CHECK (see previous
> patch).
>
> When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK).

I read this twice, slept on it, read them again, and I *still* can't
understand it.  See below...

> The only restriction enforced by the kernel is the right to ptrace
> another process.  Processes are denied to ptrace less restricted ones,
> unless the tracer has CAP_SYS_PTRACE.  This is mainly a safeguard to
> avoid trivial privilege escalations e.g., by a debugging process being
> abused with a confused deputy attack.

What's the actual issue?  And why can't I, as root, do, in a carefully
checked, CHECK'd and RESTRICT'd environment, # gdb -p <pid>?  Adding
weird restrictions to ptrace can substantially *weaken* security
because it forces people to do utterly daft things to work around the
restrictions.

...

> +/*
> + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable
> + * files with execveat(2) + AT_CHECK.  However, such check should only be
> + * performed if all to-be-executed code only comes from regular files.  For
> + * instance, if a script interpreter is called with both a script snipped as

s/snipped/snippet/

> + * argument and a regular file, the interpreter should not check any file.
> + * Doing otherwise would mislead the kernel to think that only the script file
> + * is being executed, which could for instance lead to unexpected permission
> + * change and break current use cases.

This is IMO not nearly clear enough to result in multiple user
implementations and a kernel implementation and multiple LSM
implementations and LSM policy authors actually agreeing as to what
this means.

I also think it's wrong to give user code instructions about what
kernel checks it should do.  Have the user code call the kernel and
have the kernel implement the policy.

> +/*
> + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK).  For
> + * instance, script interpreters called with a script snippet as argument
> + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set.
> + * However, if a script interpreter is called with both
> + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should
> + * interpret the provided script files if no unchecked code is also provided
> + * (e.g. directly as argument).

I think you're trying to say that this is like (the inverse of)
Content-Security-Policy: unsafe-inline.  In other words, you're saying
that, if RESTRICT is set, then programs should not execute code-like
text that didn't come from a file.  Is that right?

I feel like it would be worth looking at the state of the art of
Content-Security-Policy and all the lessons people have learned from
it.  Whatever the result is should be at least as comprehensible and
at least as carefully engineered as Content-Security-Policy.

--Andy

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-20  1:59             ` Andy Lutomirski
@ 2024-07-20 11:43               ` Jarkko Sakkinen
  2024-07-23 13:16                 ` Mickaël Salaün
  2024-07-23 13:16               ` Mickaël Salaün
  1 sibling, 1 reply; 103+ messages in thread
From: Jarkko Sakkinen @ 2024-07-20 11:43 UTC (permalink / raw)
  To: Andy Lutomirski, Mickaël Salaün
  Cc: Steve Dower, Jeff Xu, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module,
	Elliott Hughes

On Sat Jul 20, 2024 at 4:59 AM EEST, Andy Lutomirski wrote:
> > On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote:
> >>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >>>
> >>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> >>>>> On 17/07/2024 07:33, Jeff Xu wrote:
> >>>>> Consider those cases: I think:
> >>>>> a> relying purely on userspace for enforcement does't seem to be
> >>>>> effective,  e.g. it is trivial  to call open(), then mmap() it into
> >>>>> executable memory.
> >>>>
> >>>> If there's a way to do this without running executable code that had to pass
> >>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python
> >>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> >>>>
> >>>> Once arbitrary code is running, all bets are off. So long as all arbitrary
> >>>> code is being checked itself, it's allowed to do things that would bypass
> >>>> later checks (and it's up to whoever audited it in the first place to
> >>>> prevent this by not giving it the special mark that allows it to pass the
> >>>> check).
> >>>
> >>> Exactly.  As explained in the patches, one crucial prerequisite is that
> >>> the executable code is trusted, and the system must provide integrity
> >>> guarantees.  We cannot do anything without that.  This patches series is
> >>> a building block to fix a blind spot on Linux systems to be able to
> >>> fully control executability.
> >>
> >> Circling back to my previous comment (did that ever get noticed?), I
> >
> > Yes, I replied to your comments.  Did I miss something?
>
> I missed that email in the pile, sorry. I’ll reply separately.
>
> >
> >> don’t think this is quite right:
> >>
> >> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/
> >>
> >> On a basic system configuration, a given path either may or may not be
> >> executed. And maybe that path has some integrity check (dm-verity,
> >> etc).  So the kernel should tell the interpreter/loader whether the
> >> target may be executed. All fine.
> >>
> >> But I think the more complex cases are more interesting, and the
> >> “execute a program” process IS NOT BINARY.  An attempt to execute can
> >> be rejected outright, or it can be allowed *with a change to creds or
> >> security context*.  It would be entirely reasonable to have a policy
> >> that allows execution of non-integrity-checked files but in a very
> >> locked down context only.
> >
> > I guess you mean to transition to a sandbox when executing an untrusted
> > file.  This is a good idea.  I talked about role transition in the
> > patch's description:
> >
> > With the information that a script interpreter is about to interpret a
> > script, an LSM security policy can adjust caller's access rights or log
> > execution request as for native script execution (e.g. role transition).
> > This is possible thanks to the call to security_bprm_creds_for_exec().
>
> …
>
> > This patch series brings the minimal building blocks to have a
> > consistent execution environment.  Role transitions for script execution
> > are left to LSMs.  For instance, we could extend Landlock to
> > automatically sandbox untrusted scripts.
>
> I’m not really convinced.  There’s more to building an API that
> enables LSM hooks than merely sticking the hook somewhere in kernel
> code. It needs to be a defined API. If you call an operation “check”,
> then people will expect it to check, not to change the caller’s
> credentials.  And people will mess it up in both directions (e.g.
> callers will call it and then open try to load some library that they
> should have loaded first, or callers will call it and forget to close
> fds first.
>
> And there should probably be some interaction with dumpable as well.
> If I “check” a file for executability, that should not suddenly allow
> someone to ptrace me?
>
> And callers need to know to exit on failure, not carry on.
>
>
> More concretely, a runtime that fully opts in to this may well "check"
> multiple things.  For example, if I do:
>
> $ ld.so ~/.local/bin/some_program   (i.e. I literally execve ld.so)
>
> then ld.so will load several things:
>
> ~/.local/bin/some_program
> libc.so
> other random DSOs, some of which may well be in my home directory

What would really help to comprehend this patch set would be a set of
test scripts, preferably something that you can run easily with
BuildRoot or similar.

Scripts would demonstrate the use cases for the patch set. Then it
would be easier to develop scripts that would underline the corner
cases. I would keep all this out of kselftest shenanigans for now.

I feel that the patch set is hovering in abstractions with examples
that you cannot execute.

I added the patches to standard test CI hack:

https://codeberg.org/jarkko/linux-tpmdd-test

But after I booted up a kernel I had no idea what to do with it. And
all this lenghty discussion makes it even more confusing.

Please find some connection to the real world before sending any new
version of this (e.g. via test scripts). I think this should not be
pulled before almost anyone doing kernel dev can comprehend the "gist"
at least in some reasonable level.

BR, Jarkko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits
  2024-07-20  2:06   ` Andy Lutomirski
@ 2024-07-23 13:15     ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-23 13:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Sat, Jul 20, 2024 at 10:06:28AM +0800, Andy Lutomirski wrote:
> On Fri, Jul 5, 2024 at 3:02 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and
> > their *_LOCKED counterparts are designed to be set by processes setting
> > up an execution environment, such as a user session, a container, or a
> > security sandbox.  Like seccomp filters or Landlock domains, the
> > securebits are inherited across proceses.
> >
> > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should
> > check executable resources with execveat(2) + AT_CHECK (see previous
> > patch).
> >
> > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK).
> 
> I read this twice, slept on it, read them again, and I *still* can't
> understand it.  See below...

There is a new proposal:
https://lore.kernel.org/all/20240710.eiKohpa4Phai@digikod.net/
The new securebits will be SECBIT_EXEC_RESTRICT_FILE and
SECBIT_EXEC_DENY_INTERACTIVE.  I'll send a new patch series with that.

> 
> > The only restriction enforced by the kernel is the right to ptrace
> > another process.  Processes are denied to ptrace less restricted ones,
> > unless the tracer has CAP_SYS_PTRACE.  This is mainly a safeguard to
> > avoid trivial privilege escalations e.g., by a debugging process being
> > abused with a confused deputy attack.
> 
> What's the actual issue?  And why can't I, as root, do, in a carefully
> checked, CHECK'd and RESTRICT'd environment, # gdb -p <pid>?  Adding
> weird restrictions to ptrace can substantially *weaken* security
> because it forces people to do utterly daft things to work around the
> restrictions.

Restricting ptrace was a cautious approach, but I get you point and I
agree.  I'll remove the ptrace restrictions in the next patch series.

> 
> ...
> 
> > +/*
> > + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable
> > + * files with execveat(2) + AT_CHECK.  However, such check should only be
> > + * performed if all to-be-executed code only comes from regular files.  For
> > + * instance, if a script interpreter is called with both a script snipped as
> 
> s/snipped/snippet/
> 
> > + * argument and a regular file, the interpreter should not check any file.
> > + * Doing otherwise would mislead the kernel to think that only the script file
> > + * is being executed, which could for instance lead to unexpected permission
> > + * change and break current use cases.
> 
> This is IMO not nearly clear enough to result in multiple user
> implementations and a kernel implementation and multiple LSM
> implementations and LSM policy authors actually agreeing as to what
> this means.

Right, no kernel parts (e.g. LSMs) should try to infer anything other
than an executability check.  We should handle things such as role
transitions with something else (e.g. a complementary dedicated flag),
and that should be decorrelated from this patch series.

> 
> I also think it's wrong to give user code instructions about what
> kernel checks it should do.  Have the user code call the kernel and
> have the kernel implement the policy.

Call the kernel for what?  Script interpreter is a user space thing, and
restrictions enforced on interpreters need to be a user space thing.
The kernel cannot restrict user space according to a semantic only
defined by user space, such as Python interpretation, CLI arguments,
content of environment variables...  If a process wants to interpret
some data and turn than into code, there is no way for the kernel to
know about that.

> 
> > +/*
> > + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> > + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK).  For
> > + * instance, script interpreters called with a script snippet as argument
> > + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set.
> > + * However, if a script interpreter is called with both
> > + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should
> > + * interpret the provided script files if no unchecked code is also provided
> > + * (e.g. directly as argument).
> 
> I think you're trying to say that this is like (the inverse of)
> Content-Security-Policy: unsafe-inline.  In other words, you're saying
> that, if RESTRICT is set, then programs should not execute code-like
> text that didn't come from a file.  Is that right?

That is the definition of the new SECBIT_EXEC_DENY_INTERACTIVE, which
should be clearer.

> 
> I feel like it would be worth looking at the state of the art of
> Content-Security-Policy and all the lessons people have learned from
> it.  Whatever the result is should be at least as comprehensible and
> at least as carefully engineered as Content-Security-Policy.

That's a good idea, but I guess Content-Security-Policy cannot be
directly applied here.  My understanding is that CSP enables web servers
to request restrictions on code they provide.  In the
AT_CHECK+securebits case, the policy is defined and enforced by the
interpreter, not necessarily the script provider. One big difference is
that web servers (should) know the scripts they provide, and can then
request the browser to ensure that they do what they should do, while
the script interpreter trusts the kernel to check security properties of
a script.  In other words, something like CSP could be implemented with
AT_CHECK+securebits and a LSM policy (e.g. according to file's xattr).

> 
> --Andy

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-19 15:27                   ` Jeff Xu
@ 2024-07-23 13:15                     ` Mickaël Salaün
  2024-08-05 18:35                       ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-23 13:15 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote:
> On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > >
> > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > > > >
> > > > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > > > real execution, user space gets the same error codes.
> > > > > > > > > >
> > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > > > exec, shared object, script and config file (such as seccomp config),
> > > >
> > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > > > >
> > > > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > > > Did I miss something?
> > > > > > > >
> > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > > > even statically link one ?)  This is another reason that relying on a
> > > > > > > userspace only is not enough.
> > > > > >
> > > > > > The kernel calls open_exec() on all dependencies, including
> > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > > > >
> > > > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> > > >
> > > > Correct, the dynamic linker is in charge of that, which is why it must
> > > > be enlighten with execveat+AT_CHECK and securebits checks.
> > > >
> > > > > load_elf_binary() {
> > > > >    interpreter = open_exec(elf_interpreter);
> > > > > }
> > > > >
> > > > > libc.so.6 is opened and mapped by dynamic linker.
> > > > > so the call sequence is:
> > > > >  execve(a.out)
> > > > >   - open exec(a.out)
> > > > >   - security_bprm_creds(a.out)
> > > > >   - open the exec(ld.so)
> > > > >   - call open_exec() for interruptor (ld.so)
> > > > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > > > the same check and code path as libc.so below ?
> > > >
> > > > open_exec() checks are enough.  LSMs can use this information (open +
> > > > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > > > request.
> > > >
> > > Then the ld.so doesn't go through the same security_bprm_creds() check
> > > as other .so.
> >
> > Indeed, but...
> >
> My point is: we will want all the .so going through the same code
> path, so  security_ functions are called consistently across all the
> objects, And in the future, if we want to develop additional LSM
> functionality based on AT_CHECK, it will be applied to all objects.

I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which
already is the common security check for all executable dependencies.
As extra information, they can get explicit requests by looking at
execveat+AT_CHECK call.

> 
> Another thing to consider is:  we are asking userspace to make
> additional syscall before  loading the file into memory/get executed,
> there is a possibility for future expansion of the mechanism, without
> asking user space to add another syscall again.

AT_CHECK is defined with a specific semantic.  Other mechanisms (e.g.
LSM policies) could enforce other restrictions following the same
semantic.  We need to keep in mind backward compatibility.

> 
> I m still not convinced yet that execveat(AT_CHECK) fits more than
> faccessat(AT_CHECK)

faccessat2(2) is dedicated to file permission/attribute check.
execveat(2) is dedicated to execution, which is a superset of file
permission for executability, plus other checks (e.g. noexec).

> 
> 
> > >
> > > As my previous email, the ChromeOS LSM restricts executable mfd
> > > through security_bprm_creds(), the end result is that ld.so can still
> > > be executable memfd, but not other .so.
> >
> > The chromeOS LSM can check that with the security_file_open() hook and
> > the __FMODE_EXEC flag, see Landlock's implementation.  I think this
> > should be the only hook implementation that chromeOS LSM needs to add.
> >
> > >
> > > One way to address this is to refactor the necessary code from
> > > execveat() code patch, and make it available to call from both kernel
> > > and execveat() code paths., but if we do that, we might as well use
> > > faccessat2(AT_CHECK)
> >
> > That's why I think it makes sense to rely on the existing __FMODE_EXEC
> > information.
> >
> > >
> > >
> > > > >   - transfer the control to ld.so)
> > > > >   - ld.so open (libc.so)
> > > > >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > > > > require dynamic linker change.
> > > > >   - ld.so mmap(libc.so,rx)
> > > >
> > > > Explaining these steps is useful. I'll include that in the next patch
> > > > series.
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-19 17:36               ` Jeff Xu
@ 2024-07-23 13:15                 ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-23 13:15 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Fri, Jul 19, 2024 at 10:36:01AM -0700, Jeff Xu wrote:
> On Fri, Jul 19, 2024 at 8:31 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Fri, Jul 19, 2024 at 08:12:37AM -0700, Jeff Xu wrote:
> > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > >
> > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > >
> > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > real execution, user space gets the same error codes.
> > > > > > > >
> > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > exec, shared object, script and config file (such as seccomp config),
> > > > > >
> > > > > > "config file" that contains executable code.
> > > > > >
> > > > > Is seccomp config  considered as "contains executable code", seccomp
> > > > > config is translated into bpf, so maybe yes ? but bpf is running in
> > > > > the kernel.
> > > >
> > > > Because seccomp filters alter syscalls, they are similar to code
> > > > injection.
> > > >
> > > > >
> > > > > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > > > > different use cases:
> > > > > > >
> > > > > > > execveat clearly has less code change, but that also means: we can't
> > > > > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > > > > config) for this part (from do_execveat_common to
> > > > > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > > > > agreement/sign-off, I'm not sure from whom.
> > > > > >
> > > > > > I'm not sure to follow. We could still add new flags, but for now I
> > > > > > don't see use cases.  This patch series is not meant to handle all
> > > > > > possible "trust checks", only executable code, which makes sense for the
> > > > > > kernel.
> > > > > >
> > > > > I guess the "configfile" discussion is where I get confused, at one
> > > > > point, I think this would become a generic "trust checks" api for
> > > > > everything related to "generating executable code", e.g. javascript,
> > > > > java code, and more.
> > > > > We will want to clearly define the scope of execveat(AT_CHECK)
> > > >
> > > > The line between data and code is blurry.  For instance, a configuration
> > > > file can impact the execution flow of a program.  So, where to draw the
> > > > line?
> > > >
> > > > It might makes sense to follow the kernel and interpreter semantic: if a
> > > > file can be executed by the kernel (e.g. ELF binary, file containing a
> > > > shebang, or just configured with binfmt_misc), then this should be
> > > > considered as executable code.  This applies to Bash, Python,
> > > > Javascript, NodeJS, PE, PHP...  However, we can also make a picture
> > > > executable with binfmt_misc.  So, again, where to draw the line?
> > > >
> > > > I'd recommend to think about interaction with the outside, through
> > > > function calls, IPCs, syscalls...  For instance, "running" an image
> > > > should not lead to reading or writing to arbitrary files, or accessing
> > > > the network, but in practice it is legitimate for some file formats...
> > > > PostScript is a programming language, but mostly used to draw pictures.
> > > > So, again, where to draw the line?
> > > >
> > > The javascript is run by browser and java code by java runtime, do
> > > they meet the criteria? they do not interact with the kernel directly,
> > > however they might have the same "executable" characteristics and the
> > > app might not want them to be put into non-exec mount.
> > >
> > > If the answer is yes, they can also use execveat(AT_CHECK),  the next
> > > question is: does it make sense for javacript/java code to go through
> > > execveat() code path, allocate bprm, etc ? (I don't have answer, maybe
> > > it is)
> >
> > Java and NodeJS can do arbitrary syscalls (through their runtime) and
> > they can access arbitrary files, so according to my below comment, yes
> > they should be managed as potentially dangerous executable code.
> >
> > The question should be: is this code trusted? Most of the time it is
> > not, hence the security model of web browser and their heavy use of
> > sandboxing.  So no, I don't think it would make sense to check this kind
> > of code more than what the browser already do.
> >
> 
> If I understand you correctly, Java/NodeJS won't use
> execveat(AT_CHECK), we will leave that work to the web browser/java
> runtime's sandboxer.
> This is good because the scope is more narrow/clear.

Yes for browser's sandboxes because the code comes from the network
(i.e. not authenticated at the kernel level, and mostly untrusted).

For standalone Java applications (stored in the filesystem), the Java
runtime(s) should be patched as other script interpreters.

> 
> Thanks
> -Jeff
> 
> > I'll talk about this use case in the next patch series.
> >
> > >
> > > > We should follow the principle of least astonishment.  What most users
> > > > would expect?  This should follow the *common usage* of executable
> > > > files.  At the end, the script interpreters will be patched by security
> > > > folks for security reasons.  I think the right question to ask should
> > > > be: could this file format be (ab)used to leak or modify arbitrary
> > > > files, or to perform arbitrary syscalls?  If the answer is yes, then it
> > > > should be checked for executability.  Of course, this excludes bugs
> > > > exploited in the file format parser.
> > > >
> > > > I'll extend the next patch series with this rationale.
> > > >
> > >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-20  1:59             ` Andy Lutomirski
  2024-07-20 11:43               ` Jarkko Sakkinen
@ 2024-07-23 13:16               ` Mickaël Salaün
  1 sibling, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-23 13:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Steve Dower, Jeff Xu, Al Viro, Christian Brauner, Kees Cook,
	Linus Torvalds, Paul Moore, Theodore Ts'o, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Luca Boccassi,
	Luis Chamberlain, Madhavan T . Venkataraman, Matt Bobrowski,
	Matthew Garrett, Matthew Wilcox, Miklos Szeredi, Mimi Zohar,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Grubb, Thibaut Sautereau, Vincent Strubel, Xiaoming Ni,
	Yin Fengwei, kernel-hardening, linux-api, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module,
	Elliott Hughes

On Sat, Jul 20, 2024 at 09:59:33AM +0800, Andy Lutomirski wrote:
> > On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote:
> >>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >>>
> >>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> >>>>> On 17/07/2024 07:33, Jeff Xu wrote:
> >>>>> Consider those cases: I think:
> >>>>> a> relying purely on userspace for enforcement does't seem to be
> >>>>> effective,  e.g. it is trivial  to call open(), then mmap() it into
> >>>>> executable memory.
> >>>>
> >>>> If there's a way to do this without running executable code that had to pass
> >>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python
> >>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> >>>>
> >>>> Once arbitrary code is running, all bets are off. So long as all arbitrary
> >>>> code is being checked itself, it's allowed to do things that would bypass
> >>>> later checks (and it's up to whoever audited it in the first place to
> >>>> prevent this by not giving it the special mark that allows it to pass the
> >>>> check).
> >>>
> >>> Exactly.  As explained in the patches, one crucial prerequisite is that
> >>> the executable code is trusted, and the system must provide integrity
> >>> guarantees.  We cannot do anything without that.  This patches series is
> >>> a building block to fix a blind spot on Linux systems to be able to
> >>> fully control executability.
> >>
> >> Circling back to my previous comment (did that ever get noticed?), I
> >
> > Yes, I replied to your comments.  Did I miss something?
> 
> I missed that email in the pile, sorry. I’ll reply separately.
> 
> >
> >> don’t think this is quite right:
> >>
> >> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/
> >>
> >> On a basic system configuration, a given path either may or may not be
> >> executed. And maybe that path has some integrity check (dm-verity,
> >> etc).  So the kernel should tell the interpreter/loader whether the
> >> target may be executed. All fine.
> >>
> >> But I think the more complex cases are more interesting, and the
> >> “execute a program” process IS NOT BINARY.  An attempt to execute can
> >> be rejected outright, or it can be allowed *with a change to creds or
> >> security context*.  It would be entirely reasonable to have a policy
> >> that allows execution of non-integrity-checked files but in a very
> >> locked down context only.
> >
> > I guess you mean to transition to a sandbox when executing an untrusted
> > file.  This is a good idea.  I talked about role transition in the
> > patch's description:
> >
> > With the information that a script interpreter is about to interpret a
> > script, an LSM security policy can adjust caller's access rights or log
> > execution request as for native script execution (e.g. role transition).
> > This is possible thanks to the call to security_bprm_creds_for_exec().
> 
> …
> 
> > This patch series brings the minimal building blocks to have a
> > consistent execution environment.  Role transitions for script execution
> > are left to LSMs.  For instance, we could extend Landlock to
> > automatically sandbox untrusted scripts.
> 
> I’m not really convinced.  There’s more to building an API that
> enables LSM hooks than merely sticking the hook somewhere in kernel
> code. It needs to be a defined API. If you call an operation “check”,
> then people will expect it to check, not to change the caller’s
> credentials.  And people will mess it up in both directions (e.g.
> callers will call it and then open try to load some library that they
> should have loaded first, or callers will call it and forget to close
> fds first.
> 
> And there should probably be some interaction with dumpable as well.
> If I “check” a file for executability, that should not suddenly allow
> someone to ptrace me?
> 
> And callers need to know to exit on failure, not carry on.
> 
> 
> More concretely, a runtime that fully opts in to this may well "check"
> multiple things.  For example, if I do:
> 
> $ ld.so ~/.local/bin/some_program   (i.e. I literally execve ld.so)
> 
> then ld.so will load several things:
> 
> ~/.local/bin/some_program
> libc.so
> other random DSOs, some of which may well be in my home directory
> 
> And for all ld.so knows, some_program is actually an interpreter and
> will "check" something else.  And the LSMs have absolutely no clue
> what's what.  So I think for this to work right, the APIs need to be a
> lot more expressive and explicit:
> 
> check_library(fd to libc.so);  <-- does not transition or otherwise drop privs
> check_transition_main_program(fd to ~/.local/bin/some_program);  <--
> may drop privs
> 
> and if some_program is really an interpreter, then it will do:
> 
> check_library(fd to some thing imported by the script);
> check_transition_main_program(fd to the actual script);
> 
> And maybe that takes a parameter that gets run eval-style:
> 
> check_unsafe_user_script("actual contents of snippet");
> 
> The actual spelling of all this doesn't matter so much.  But the user
> code and the kernel code need to be on the same page as to what the
> user program is doing and what it's asking the kernel program to do.

I agree.  I'll remove any references to "role transition".  This kind of
feature should come with something like getpeercon/setexeccon(3).

> 
> --Andy
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-20 11:43               ` Jarkko Sakkinen
@ 2024-07-23 13:16                 ` Mickaël Salaün
  0 siblings, 0 replies; 103+ messages in thread
From: Mickaël Salaün @ 2024-07-23 13:16 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Andy Lutomirski, Steve Dower, Jeff Xu, Al Viro, Christian Brauner,
	Kees Cook, Linus Torvalds, Paul Moore, Theodore Ts'o,
	Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Grubb, Thibaut Sautereau, Vincent Strubel,
	Xiaoming Ni, Yin Fengwei, kernel-hardening, linux-api,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Sat, Jul 20, 2024 at 02:43:41PM +0300, Jarkko Sakkinen wrote:
> On Sat Jul 20, 2024 at 4:59 AM EEST, Andy Lutomirski wrote:
> > > On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote:
> > >>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
> > >>>
> > >>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> > >>>>> On 17/07/2024 07:33, Jeff Xu wrote:
> > >>>>> Consider those cases: I think:
> > >>>>> a> relying purely on userspace for enforcement does't seem to be
> > >>>>> effective,  e.g. it is trivial  to call open(), then mmap() it into
> > >>>>> executable memory.
> > >>>>
> > >>>> If there's a way to do this without running executable code that had to pass
> > >>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python
> > >>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> > >>>>
> > >>>> Once arbitrary code is running, all bets are off. So long as all arbitrary
> > >>>> code is being checked itself, it's allowed to do things that would bypass
> > >>>> later checks (and it's up to whoever audited it in the first place to
> > >>>> prevent this by not giving it the special mark that allows it to pass the
> > >>>> check).
> > >>>
> > >>> Exactly.  As explained in the patches, one crucial prerequisite is that
> > >>> the executable code is trusted, and the system must provide integrity
> > >>> guarantees.  We cannot do anything without that.  This patches series is
> > >>> a building block to fix a blind spot on Linux systems to be able to
> > >>> fully control executability.
> > >>
> > >> Circling back to my previous comment (did that ever get noticed?), I
> > >
> > > Yes, I replied to your comments.  Did I miss something?
> >
> > I missed that email in the pile, sorry. I’ll reply separately.
> >
> > >
> > >> don’t think this is quite right:
> > >>
> > >> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/
> > >>
> > >> On a basic system configuration, a given path either may or may not be
> > >> executed. And maybe that path has some integrity check (dm-verity,
> > >> etc).  So the kernel should tell the interpreter/loader whether the
> > >> target may be executed. All fine.
> > >>
> > >> But I think the more complex cases are more interesting, and the
> > >> “execute a program” process IS NOT BINARY.  An attempt to execute can
> > >> be rejected outright, or it can be allowed *with a change to creds or
> > >> security context*.  It would be entirely reasonable to have a policy
> > >> that allows execution of non-integrity-checked files but in a very
> > >> locked down context only.
> > >
> > > I guess you mean to transition to a sandbox when executing an untrusted
> > > file.  This is a good idea.  I talked about role transition in the
> > > patch's description:
> > >
> > > With the information that a script interpreter is about to interpret a
> > > script, an LSM security policy can adjust caller's access rights or log
> > > execution request as for native script execution (e.g. role transition).
> > > This is possible thanks to the call to security_bprm_creds_for_exec().
> >
> > …
> >
> > > This patch series brings the minimal building blocks to have a
> > > consistent execution environment.  Role transitions for script execution
> > > are left to LSMs.  For instance, we could extend Landlock to
> > > automatically sandbox untrusted scripts.
> >
> > I’m not really convinced.  There’s more to building an API that
> > enables LSM hooks than merely sticking the hook somewhere in kernel
> > code. It needs to be a defined API. If you call an operation “check”,
> > then people will expect it to check, not to change the caller’s
> > credentials.  And people will mess it up in both directions (e.g.
> > callers will call it and then open try to load some library that they
> > should have loaded first, or callers will call it and forget to close
> > fds first.
> >
> > And there should probably be some interaction with dumpable as well.
> > If I “check” a file for executability, that should not suddenly allow
> > someone to ptrace me?
> >
> > And callers need to know to exit on failure, not carry on.
> >
> >
> > More concretely, a runtime that fully opts in to this may well "check"
> > multiple things.  For example, if I do:
> >
> > $ ld.so ~/.local/bin/some_program   (i.e. I literally execve ld.so)
> >
> > then ld.so will load several things:
> >
> > ~/.local/bin/some_program
> > libc.so
> > other random DSOs, some of which may well be in my home directory
> 
> What would really help to comprehend this patch set would be a set of
> test scripts, preferably something that you can run easily with
> BuildRoot or similar.
> 
> Scripts would demonstrate the use cases for the patch set. Then it
> would be easier to develop scripts that would underline the corner
> cases. I would keep all this out of kselftest shenanigans for now.

I'll include a toy script interpreter with the next patch series.  This
one was an RFC.

> 
> I feel that the patch set is hovering in abstractions with examples
> that you cannot execute.
> 
> I added the patches to standard test CI hack:
> 
> https://codeberg.org/jarkko/linux-tpmdd-test
> 
> But after I booted up a kernel I had no idea what to do with it. And
> all this lenghty discussion makes it even more confusing.

You can run the tests in the CI.

> 
> Please find some connection to the real world before sending any new
> version of this (e.g. via test scripts). I think this should not be
> pulled before almost anyone doing kernel dev can comprehend the "gist"
> at least in some reasonable level.

You'll find in this patch series (cover letter, patch description, and
comments) connection to the real world. :)
The next patch series should take into account the current discussions.

> 
> BR, Jarkko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-07-23 13:15                     ` Mickaël Salaün
@ 2024-08-05 18:35                       ` Jeff Xu
  2024-08-09  8:45                         ` Mickaël Salaün
  0 siblings, 1 reply; 103+ messages in thread
From: Jeff Xu @ 2024-08-05 18:35 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Tue, Jul 23, 2024 at 6:15 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote:
> > On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> > > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > > > > >
> > > > > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > > > > real execution, user space gets the same error codes.
> > > > > > > > > > >
> > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > > > > exec, shared object, script and config file (such as seccomp config),
> > > > >
> > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > > > > >
> > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > > > > Did I miss something?
> > > > > > > > >
> > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > > > > even statically link one ?)  This is another reason that relying on a
> > > > > > > > userspace only is not enough.
> > > > > > >
> > > > > > > The kernel calls open_exec() on all dependencies, including
> > > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > > > > >
> > > > > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> > > > >
> > > > > Correct, the dynamic linker is in charge of that, which is why it must
> > > > > be enlighten with execveat+AT_CHECK and securebits checks.
> > > > >
> > > > > > load_elf_binary() {
> > > > > >    interpreter = open_exec(elf_interpreter);
> > > > > > }
> > > > > >
> > > > > > libc.so.6 is opened and mapped by dynamic linker.
> > > > > > so the call sequence is:
> > > > > >  execve(a.out)
> > > > > >   - open exec(a.out)
> > > > > >   - security_bprm_creds(a.out)
> > > > > >   - open the exec(ld.so)
> > > > > >   - call open_exec() for interruptor (ld.so)
> > > > > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > > > > the same check and code path as libc.so below ?
> > > > >
> > > > > open_exec() checks are enough.  LSMs can use this information (open +
> > > > > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > > > > request.
> > > > >
> > > > Then the ld.so doesn't go through the same security_bprm_creds() check
> > > > as other .so.
> > >
> > > Indeed, but...
> > >
> > My point is: we will want all the .so going through the same code
> > path, so  security_ functions are called consistently across all the
> > objects, And in the future, if we want to develop additional LSM
> > functionality based on AT_CHECK, it will be applied to all objects.
>
> I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which
> already is the common security check for all executable dependencies.
> As extra information, they can get explicit requests by looking at
> execveat+AT_CHECK call.
>
I agree that security_file_open + __FMODE_EXEC for checking all
the .so (e.g for executable memfd) is a better option  than checking at
security_bprm_creds_for_exec.

But then maybe execveat( AT_CHECK) can return after  calling alloc_bprm ?
See below call graph:

do_execveat_common (AT_CHECK)
-> alloc_bprm
->->do_open_execat
->->-> do_filp_open (__FMODE_EXEC)
->->->->->->> security_file_open
-> bprm_execve
->-> prepare_exec_creds
->->-> prepare_creds
->->->-> security_prepare_creds
->-> security_bprm_creds_for_exec

What is the consideration to mark the end at
security_bprm_creds_for_exec ? i.e. including brpm_execve,
prepare_creds, security_prepare_creds, security_bprm_creds_for_exec.

Since dynamic linker doesn't load ld.so (it is by kernel),  ld.so
won't go through those  security_prepare_creds and
security_bprm_creds_for_exec checks like other .so do.

> >
> > Another thing to consider is:  we are asking userspace to make
> > additional syscall before  loading the file into memory/get executed,
> > there is a possibility for future expansion of the mechanism, without
> > asking user space to add another syscall again.
>
> AT_CHECK is defined with a specific semantic.  Other mechanisms (e.g.
> LSM policies) could enforce other restrictions following the same
> semantic.  We need to keep in mind backward compatibility.
>
> >
> > I m still not convinced yet that execveat(AT_CHECK) fits more than
> > faccessat(AT_CHECK)
>
> faccessat2(2) is dedicated to file permission/attribute check.
> execveat(2) is dedicated to execution, which is a superset of file
> permission for executability, plus other checks (e.g. noexec).
>
That sounds reasonable, but if execveat(AT_CHECK) changes behavior of
execveat(),  someone might argue that faccessat2(EXEC_CHECK) can be
made for the executability.

I think the decision might depend on what this PATCH intended to
check, i.e. where we draw the line.

do_open_execat() seems to cover lots of checks for executability, if
we are ok with the thing that do_open_execat() checks, then
faccessat(AT_CHECK) calling do_open_execat() is an option, it  won't
have those "unrelated" calls  in execve path, e.g.  bprm_stack_limits,
copy argc/env .

However, you mentioned superset of file permission for executability,
can you elaborate on that ? Is there something not included in
do_open_execat() but still necessary for execveat(AT_CHECK)? maybe
security_bprm_creds_for_exec? (this goes back to my  question above)

Thanks
Best regards,
-Jeff











> >
> >
> > > >
> > > > As my previous email, the ChromeOS LSM restricts executable mfd
> > > > through security_bprm_creds(), the end result is that ld.so can still
> > > > be executable memfd, but not other .so.
> > >
> > > The chromeOS LSM can check that with the security_file_open() hook and
> > > the __FMODE_EXEC flag, see Landlock's implementation.  I think this
> > > should be the only hook implementation that chromeOS LSM needs to add.
> > >
> > > >
> > > > One way to address this is to refactor the necessary code from
> > > > execveat() code patch, and make it available to call from both kernel
> > > > and execveat() code paths., but if we do that, we might as well use
> > > > faccessat2(AT_CHECK)
> > >
> > > That's why I think it makes sense to rely on the existing __FMODE_EXEC
> > > information.
> > >
> > > >
> > > >
> > > > > >   - transfer the control to ld.so)
> > > > > >   - ld.so open (libc.so)
> > > > > >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > > > > > require dynamic linker change.
> > > > > >   - ld.so mmap(libc.so,rx)
> > > > >
> > > > > Explaining these steps is useful. I'll include that in the next patch
> > > > > series.
> >

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-08-05 18:35                       ` Jeff Xu
@ 2024-08-09  8:45                         ` Mickaël Salaün
  2024-08-09 16:15                           ` Jeff Xu
  0 siblings, 1 reply; 103+ messages in thread
From: Mickaël Salaün @ 2024-08-09  8:45 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Mon, Aug 05, 2024 at 11:35:09AM -0700, Jeff Xu wrote:
> On Tue, Jul 23, 2024 at 6:15 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote:
> > > On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> > > > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > > > > > >
> > > > > > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > > > > > real execution, user space gets the same error codes.
> > > > > > > > > > > >
> > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > > > > > exec, shared object, script and config file (such as seccomp config),
> > > > > >
> > > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > > > > > >
> > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > > > > > Did I miss something?
> > > > > > > > > >
> > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > > > > > even statically link one ?)  This is another reason that relying on a
> > > > > > > > > userspace only is not enough.
> > > > > > > >
> > > > > > > > The kernel calls open_exec() on all dependencies, including
> > > > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > > > > > >
> > > > > > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> > > > > >
> > > > > > Correct, the dynamic linker is in charge of that, which is why it must
> > > > > > be enlighten with execveat+AT_CHECK and securebits checks.
> > > > > >
> > > > > > > load_elf_binary() {
> > > > > > >    interpreter = open_exec(elf_interpreter);
> > > > > > > }
> > > > > > >
> > > > > > > libc.so.6 is opened and mapped by dynamic linker.
> > > > > > > so the call sequence is:
> > > > > > >  execve(a.out)
> > > > > > >   - open exec(a.out)
> > > > > > >   - security_bprm_creds(a.out)
> > > > > > >   - open the exec(ld.so)
> > > > > > >   - call open_exec() for interruptor (ld.so)
> > > > > > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > > > > > the same check and code path as libc.so below ?
> > > > > >
> > > > > > open_exec() checks are enough.  LSMs can use this information (open +
> > > > > > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > > > > > request.
> > > > > >
> > > > > Then the ld.so doesn't go through the same security_bprm_creds() check
> > > > > as other .so.
> > > >
> > > > Indeed, but...
> > > >
> > > My point is: we will want all the .so going through the same code
> > > path, so  security_ functions are called consistently across all the
> > > objects, And in the future, if we want to develop additional LSM
> > > functionality based on AT_CHECK, it will be applied to all objects.
> >
> > I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which
> > already is the common security check for all executable dependencies.
> > As extra information, they can get explicit requests by looking at
> > execveat+AT_CHECK call.
> >
> I agree that security_file_open + __FMODE_EXEC for checking all
> the .so (e.g for executable memfd) is a better option  than checking at
> security_bprm_creds_for_exec.
> 
> But then maybe execveat( AT_CHECK) can return after  calling alloc_bprm ?
> See below call graph:
> 
> do_execveat_common (AT_CHECK)
> -> alloc_bprm
> ->->do_open_execat
> ->->-> do_filp_open (__FMODE_EXEC)
> ->->->->->->> security_file_open
> -> bprm_execve
> ->-> prepare_exec_creds
> ->->-> prepare_creds
> ->->->-> security_prepare_creds
> ->-> security_bprm_creds_for_exec
> 
> What is the consideration to mark the end at
> security_bprm_creds_for_exec ? i.e. including brpm_execve,
> prepare_creds, security_prepare_creds, security_bprm_creds_for_exec.

This enables LSMs to know/log an explicit execution request, including
context with argv and envp.

> 
> Since dynamic linker doesn't load ld.so (it is by kernel),  ld.so
> won't go through those  security_prepare_creds and
> security_bprm_creds_for_exec checks like other .so do.

Yes, but this is not an issue nor an explicit request. ld.so is only one
case of this patch series.

> 
> > >
> > > Another thing to consider is:  we are asking userspace to make
> > > additional syscall before  loading the file into memory/get executed,
> > > there is a possibility for future expansion of the mechanism, without
> > > asking user space to add another syscall again.
> >
> > AT_CHECK is defined with a specific semantic.  Other mechanisms (e.g.
> > LSM policies) could enforce other restrictions following the same
> > semantic.  We need to keep in mind backward compatibility.
> >
> > >
> > > I m still not convinced yet that execveat(AT_CHECK) fits more than
> > > faccessat(AT_CHECK)
> >
> > faccessat2(2) is dedicated to file permission/attribute check.
> > execveat(2) is dedicated to execution, which is a superset of file
> > permission for executability, plus other checks (e.g. noexec).
> >
> That sounds reasonable, but if execveat(AT_CHECK) changes behavior of
> execveat(),  someone might argue that faccessat2(EXEC_CHECK) can be
> made for the executability.

AT_CHECK, as any other syscall flags, changes the behavior of execveat,
but the overall semantic is clearly defined.

Again, faccessat2 is only dedicated to file attributes/permissions, not
file executability.

> 
> I think the decision might depend on what this PATCH intended to
> check, i.e. where we draw the line.

The goal is clearly defined in the cover letter and patches: makes it
possible to control (or log) script execution.

> 
> do_open_execat() seems to cover lots of checks for executability, if
> we are ok with the thing that do_open_execat() checks, then
> faccessat(AT_CHECK) calling do_open_execat() is an option, it  won't
> have those "unrelated" calls  in execve path, e.g.  bprm_stack_limits,
> copy argc/env .

I don't thing there is any unrelated calls in execve path, quite the
contrary, it follows the same semantic as for a full execution, and
that's another argument to use the execveat interface.  Otherwise, we
couldn't argue that `./script.sh` can be the same as `sh script.sh`

The only difference is that user space is in charge of parsing and
interpreting the file's content.

> 
> However, you mentioned superset of file permission for executability,
> can you elaborate on that ? Is there something not included in
> do_open_execat() but still necessary for execveat(AT_CHECK)? maybe
> security_bprm_creds_for_exec? (this goes back to my  question above)

As explained above, the goal is to have the same semantic as a full
execveat call, taking into account all the checks (e.g. stack limit,
argv/envp...).

> 
> Thanks
> Best regards,
> -Jeff
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> > >
> > >
> > > > >
> > > > > As my previous email, the ChromeOS LSM restricts executable mfd
> > > > > through security_bprm_creds(), the end result is that ld.so can still
> > > > > be executable memfd, but not other .so.
> > > >
> > > > The chromeOS LSM can check that with the security_file_open() hook and
> > > > the __FMODE_EXEC flag, see Landlock's implementation.  I think this
> > > > should be the only hook implementation that chromeOS LSM needs to add.
> > > >
> > > > >
> > > > > One way to address this is to refactor the necessary code from
> > > > > execveat() code patch, and make it available to call from both kernel
> > > > > and execveat() code paths., but if we do that, we might as well use
> > > > > faccessat2(AT_CHECK)
> > > >
> > > > That's why I think it makes sense to rely on the existing __FMODE_EXEC
> > > > information.
> > > >
> > > > >
> > > > >
> > > > > > >   - transfer the control to ld.so)
> > > > > > >   - ld.so open (libc.so)
> > > > > > >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > > > > > > require dynamic linker change.
> > > > > > >   - ld.so mmap(libc.so,rx)
> > > > > >
> > > > > > Explaining these steps is useful. I'll include that in the next patch
> > > > > > series.
> > >
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)
  2024-08-09  8:45                         ` Mickaël Salaün
@ 2024-08-09 16:15                           ` Jeff Xu
  0 siblings, 0 replies; 103+ messages in thread
From: Jeff Xu @ 2024-08-09 16:15 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Linus Torvalds, Paul Moore,
	Theodore Ts'o, Alejandro Colomar, Aleksa Sarai, Andrew Morton,
	Andy Lutomirski, Arnd Bergmann, Casey Schaufler, Christian Heimes,
	Dmitry Vyukov, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Elliott Hughes

On Fri, Aug 9, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Mon, Aug 05, 2024 at 11:35:09AM -0700, Jeff Xu wrote:
> > On Tue, Jul 23, 2024 at 6:15 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote:
> > > > On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> > > > > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > > > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > > > > > > real execution, user space gets the same error codes.
> > > > > > > > > > > > >
> > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > > > > > > exec, shared object, script and config file (such as seccomp config),
> > > > > > >
> > > > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > > > > > > >
> > > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > > > > > > Did I miss something?
> > > > > > > > > > >
> > > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > > > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > > > > > > even statically link one ?)  This is another reason that relying on a
> > > > > > > > > > userspace only is not enough.
> > > > > > > > >
> > > > > > > > > The kernel calls open_exec() on all dependencies, including
> > > > > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > > > > > > >
> > > > > > > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > > > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> > > > > > >
> > > > > > > Correct, the dynamic linker is in charge of that, which is why it must
> > > > > > > be enlighten with execveat+AT_CHECK and securebits checks.
> > > > > > >
> > > > > > > > load_elf_binary() {
> > > > > > > >    interpreter = open_exec(elf_interpreter);
> > > > > > > > }
> > > > > > > >
> > > > > > > > libc.so.6 is opened and mapped by dynamic linker.
> > > > > > > > so the call sequence is:
> > > > > > > >  execve(a.out)
> > > > > > > >   - open exec(a.out)
> > > > > > > >   - security_bprm_creds(a.out)
> > > > > > > >   - open the exec(ld.so)
> > > > > > > >   - call open_exec() for interruptor (ld.so)
> > > > > > > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > > > > > > the same check and code path as libc.so below ?
> > > > > > >
> > > > > > > open_exec() checks are enough.  LSMs can use this information (open +
> > > > > > > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > > > > > > request.
> > > > > > >
> > > > > > Then the ld.so doesn't go through the same security_bprm_creds() check
> > > > > > as other .so.
> > > > >
> > > > > Indeed, but...
> > > > >
> > > > My point is: we will want all the .so going through the same code
> > > > path, so  security_ functions are called consistently across all the
> > > > objects, And in the future, if we want to develop additional LSM
> > > > functionality based on AT_CHECK, it will be applied to all objects.
> > >
> > > I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which
> > > already is the common security check for all executable dependencies.
> > > As extra information, they can get explicit requests by looking at
> > > execveat+AT_CHECK call.
> > >
> > I agree that security_file_open + __FMODE_EXEC for checking all
> > the .so (e.g for executable memfd) is a better option  than checking at
> > security_bprm_creds_for_exec.
> >
> > But then maybe execveat( AT_CHECK) can return after  calling alloc_bprm ?
> > See below call graph:
> >
> > do_execveat_common (AT_CHECK)
> > -> alloc_bprm
> > ->->do_open_execat
> > ->->-> do_filp_open (__FMODE_EXEC)
> > ->->->->->->> security_file_open
> > -> bprm_execve
> > ->-> prepare_exec_creds
> > ->->-> prepare_creds
> > ->->->-> security_prepare_creds
> > ->-> security_bprm_creds_for_exec
> >
> > What is the consideration to mark the end at
> > security_bprm_creds_for_exec ? i.e. including brpm_execve,
> > prepare_creds, security_prepare_creds, security_bprm_creds_for_exec.
>
> This enables LSMs to know/log an explicit execution request, including
> context with argv and envp.
>
> >
> > Since dynamic linker doesn't load ld.so (it is by kernel),  ld.so
> > won't go through those  security_prepare_creds and
> > security_bprm_creds_for_exec checks like other .so do.
>
> Yes, but this is not an issue nor an explicit request. ld.so is only one
> case of this patch series.
>
> >
> > > >
> > > > Another thing to consider is:  we are asking userspace to make
> > > > additional syscall before  loading the file into memory/get executed,
> > > > there is a possibility for future expansion of the mechanism, without
> > > > asking user space to add another syscall again.
> > >
> > > AT_CHECK is defined with a specific semantic.  Other mechanisms (e.g.
> > > LSM policies) could enforce other restrictions following the same
> > > semantic.  We need to keep in mind backward compatibility.
> > >
> > > >
> > > > I m still not convinced yet that execveat(AT_CHECK) fits more than
> > > > faccessat(AT_CHECK)
> > >
> > > faccessat2(2) is dedicated to file permission/attribute check.
> > > execveat(2) is dedicated to execution, which is a superset of file
> > > permission for executability, plus other checks (e.g. noexec).
> > >
> > That sounds reasonable, but if execveat(AT_CHECK) changes behavior of
> > execveat(),  someone might argue that faccessat2(EXEC_CHECK) can be
> > made for the executability.
>
> AT_CHECK, as any other syscall flags, changes the behavior of execveat,
> but the overall semantic is clearly defined.
>
> Again, faccessat2 is only dedicated to file attributes/permissions, not
> file executability.
>
> >
> > I think the decision might depend on what this PATCH intended to
> > check, i.e. where we draw the line.
>
> The goal is clearly defined in the cover letter and patches: makes it
> possible to control (or log) script execution.
>
> >
> > do_open_execat() seems to cover lots of checks for executability, if
> > we are ok with the thing that do_open_execat() checks, then
> > faccessat(AT_CHECK) calling do_open_execat() is an option, it  won't
> > have those "unrelated" calls  in execve path, e.g.  bprm_stack_limits,
> > copy argc/env .
>
> I don't thing there is any unrelated calls in execve path, quite the
> contrary, it follows the same semantic as for a full execution, and
> that's another argument to use the execveat interface.  Otherwise, we
> couldn't argue that `./script.sh` can be the same as `sh script.sh`
>
It is a good point from the  "scrip.sh/exec" perspective that we want
it to go through the same execve path.
The reasoning is not obvious from the ".so" which doesn't go through
stack/env check.
Since execveat(AT_CHECK) wants to cover both cases, it is fine.

> The only difference is that user space is in charge of parsing and
> interpreting the file's content.
>
> >
> > However, you mentioned superset of file permission for executability,
> > can you elaborate on that ? Is there something not included in
> > do_open_execat() but still necessary for execveat(AT_CHECK)? maybe
> > security_bprm_creds_for_exec? (this goes back to my  question above)
>
> As explained above, the goal is to have the same semantic as a full
> execveat call, taking into account all the checks (e.g. stack limit,
> argv/envp...).
>
I'm fine with this, thanks for taking time to explain the design.

Regarding the future LSM based on this patch series:
For .so,  security_file_open is recommended for LSM.
For scripts/exec (that needs a full exec code path),
security_file_open and security_bprm_creds_for_exec can both be used.

Thanks
Best regards,
-Jeff

^ permalink raw reply	[flat|nested] 103+ messages in thread

end of thread, other threads:[~2024-08-09 16:16 UTC | newest]

Thread overview: 103+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-04 19:01 [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mickaël Salaün
2024-07-04 19:01 ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Mickaël Salaün
2024-07-05  0:04   ` Kees Cook
2024-07-05 17:53     ` Mickaël Salaün
2024-07-08 19:38       ` Kees Cook
2024-07-05 18:03   ` Florian Weimer
2024-07-06 14:55     ` Mickaël Salaün
2024-07-06 15:32       ` Florian Weimer
2024-07-08  8:56         ` Mickaël Salaün
2024-07-08 16:37           ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC (was: Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)) Florian Weimer
2024-07-08 17:34             ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC Eric W. Biederman
2024-07-08 17:59               ` Florian Weimer
2024-07-10 10:05             ` [PATCH] binfmt_elf: Fail execution of shared objects with ELIBEXEC (was: Re: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2)) Mickaël Salaün
2024-07-08 16:08     ` [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Jeff Xu
2024-07-08 16:25       ` Florian Weimer
2024-07-08 16:40         ` Jeff Xu
2024-07-08 17:05           ` Mickaël Salaün
2024-07-08 17:33           ` Florian Weimer
2024-07-08 17:52             ` Jeff Xu
2024-07-09  9:18               ` Mickaël Salaün
2024-07-09 10:05                 ` Florian Weimer
2024-07-09 20:42                   ` Mickaël Salaün
2024-07-09 18:57                 ` Jeff Xu
2024-07-09 20:41                   ` Mickaël Salaün
2024-07-06  8:52   ` Andy Lutomirski
2024-07-07  9:01     ` Mickaël Salaün
2024-07-17  6:33   ` Jeff Xu
2024-07-17  8:26     ` Steve Dower
2024-07-17 10:00       ` Mickaël Salaün
2024-07-18  1:02         ` Andy Lutomirski
2024-07-18 12:22           ` Mickaël Salaün
2024-07-20  1:59             ` Andy Lutomirski
2024-07-20 11:43               ` Jarkko Sakkinen
2024-07-23 13:16                 ` Mickaël Salaün
2024-07-23 13:16               ` Mickaël Salaün
2024-07-18  1:51         ` Jeff Xu
2024-07-18 12:23           ` Mickaël Salaün
2024-07-18 22:54             ` Jeff Xu
2024-07-17 10:01     ` Mickaël Salaün
2024-07-18  2:08       ` Jeff Xu
2024-07-18 12:24         ` Mickaël Salaün
2024-07-18 13:03           ` James Bottomley
2024-07-18 15:35             ` Mickaël Salaün
2024-07-19  1:29           ` Jeff Xu
2024-07-19  8:44             ` Mickaël Salaün
2024-07-19 14:16               ` Jeff Xu
2024-07-19 15:04                 ` Mickaël Salaün
2024-07-19 15:27                   ` Jeff Xu
2024-07-23 13:15                     ` Mickaël Salaün
2024-08-05 18:35                       ` Jeff Xu
2024-08-09  8:45                         ` Mickaël Salaün
2024-08-09 16:15                           ` Jeff Xu
2024-07-19 15:12           ` Jeff Xu
2024-07-19 15:31             ` Mickaël Salaün
2024-07-19 17:36               ` Jeff Xu
2024-07-23 13:15                 ` Mickaël Salaün
2024-07-18 14:46         ` enh
2024-07-18 15:35           ` Mickaël Salaün
2024-07-04 19:01 ` [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits Mickaël Salaün
2024-07-05  0:18   ` Kees Cook
2024-07-05 17:54     ` Mickaël Salaün
2024-07-05 21:44       ` Kees Cook
2024-07-05 22:22         ` Jarkko Sakkinen
2024-07-06 14:56           ` Mickaël Salaün
2024-07-06 17:28             ` Jarkko Sakkinen
2024-07-06 14:56         ` Mickaël Salaün
2024-07-18 14:16           ` Roberto Sassu
2024-07-18 16:20             ` Mickaël Salaün
2024-07-08 16:17   ` Jeff Xu
2024-07-08 17:53     ` Jeff Xu
2024-07-08 18:48       ` Mickaël Salaün
2024-07-08 21:15         ` Jeff Xu
2024-07-08 21:25           ` Steve Dower
2024-07-08 22:07             ` Jeff Xu
2024-07-09 20:42               ` Mickaël Salaün
2024-07-09 21:57                 ` Jeff Xu
2024-07-10  9:58                   ` Mickaël Salaün
2024-07-10 16:26                     ` Kees Cook
2024-07-11  8:57                       ` Mickaël Salaün
2024-07-16 15:02                         ` Jeff Xu
2024-07-16 15:10                           ` Steve Dower
2024-07-16 15:15                           ` Mickaël Salaün
2024-07-16 15:18                             ` Jeff Xu
2024-07-10 16:32                     ` Steve Dower
2024-07-20  2:06   ` Andy Lutomirski
2024-07-23 13:15     ` Mickaël Salaün
2024-07-04 19:01 ` [RFC PATCH v19 3/5] selftests/exec: Add tests for AT_CHECK and related securebits Mickaël Salaün
2024-07-04 19:01 ` [RFC PATCH v19 4/5] selftests/landlock: Add tests for execveat + AT_CHECK Mickaël Salaün
2024-07-04 19:01 ` [RFC PATCH v19 5/5] samples/should-exec: Add set-should-exec Mickaël Salaün
2024-07-08 19:40   ` Mimi Zohar
2024-07-09 20:42     ` Mickaël Salaün
2024-07-08 20:35 ` [RFC PATCH v19 0/5] Script execution control (was O_MAYEXEC) Mimi Zohar
2024-07-09 20:43   ` Mickaël Salaün
2024-07-16 15:57     ` Roberto Sassu
2024-07-16 16:12       ` James Bottomley
2024-07-16 17:29         ` Boris Lukashev
2024-07-16 17:47           ` Mickaël Salaün
2024-07-17 17:59             ` Boris Lukashev
2024-07-18 13:00               ` Mickaël Salaün
2024-07-16 17:31         ` Mickaël Salaün
2024-07-18 16:21           ` Mickaël Salaün
2024-07-15 20:16 ` Jonathan Corbet
2024-07-16  7:13   ` Mickaël Salaün

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).