[PATCH v21 0/6] Script execution control (was O

kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed

* [PATCH v21 0/6] Script execution control (was O_MAYEXEC)
@ 2024-11-12 19:18 Mickaël Salaün
  2024-11-12 19:18 ` [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2) Mickaël Salaün
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-12 19:18 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn
  Cc: Mickaël Salaün, Adhemerval Zanella Netto,
	Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Elliott Hughes, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Linus Torvalds, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

Hi,

The goal of this patch series is to be able to ensure that direct file
execution (e.g. ./script.sh) and indirect file execution (e.g. sh
script.sh) lead to the same result, especially from a security point of
view.

The main changes from the previous version are the rename of AT_CHECK to
AT_EXECVE_CHECK and the conversion of UAPI comments to a proper RST
documentation.

The current status is summarized in this article:
https://lwn.net/Articles/982085/
I also gave a talk at LPC last month:
https://lpc.events/event/18/contributions/1692/
And here is a proof of concept for Python (for now, for the previous
version: v19): https://github.com/zooba/spython/pull/12

Kees, would you like to take this series in your tree?

Overview
--------

This patch series is a new approach of the initial O_MAYEXEC feature,
and a revamp of the previous patch series.  Taking into account the last
reviews [1], we now stick to the kernel semantic for file executability.
One major change is the clear split between access check and policy
management.

The first patch brings the AT_EXECVE_CHECK flag to execveat(2).  The
goal is to enable user space to check if a file could be executed (by
the kernel).  Unlike stat(2) that only checks file permissions,
execveat2(2) + AT_EXECVE_CHECK take into account the full context,
including mount points (noexec), caller's limits, and all potential LSM
extra checks (e.g. argv, envp, credentials).

The second patch brings two new securebits used to set or get a security
policy for a set of processes.  For this to be meaningful, all
executable code needs to be trusted.  In practice, this means that
(malicious) users can be restricted to only run scripts provided (and
trusted) by the system.

[1] https://lore.kernel.org/r/CAHk-=wjPGNLyzeBMWdQu+kUdQLHQugznwY7CvWjmvNW47D5sog@mail.gmail.com

Script execution
----------------

One important thing to keep in mind is that the goal of this patch
series is to get the same security restrictions with these commands:
* ./script.py
* python script.py
* python < script.py
* python -m script.py

However, on secure systems, we should be able to forbid these commands
because there is no way to reliably identify the origin of the script:
* xargs -a script.py -d '\r' -- python -c
* cat script.py | python
* python

Background
----------

Compared to the previous patch series, there is no more dedicated
syscall nor sysctl configuration.  This new patch series only add new
flags: one for execveat(2) and four for prctl(2).

This kind of script interpreter restriction may already be used in
hardened systems, which may need to fork interpreters and install
different versions of the binaries.  This mechanism should enable to
avoid the use of duplicate binaries (and potential forked source code)
for secure interpreters (e.g. secure Python [2]) by making it possible
to dynamically enforce restrictions or not.

The ability to control script execution is also required to close a
major IMA measurement/appraisal interpreter integrity [3].

This new execveat + AT_EXECVE_CHECK should not be confused with the
O_EXEC flag (for open) which is intended for execute-only, which
obviously doesn't work for scripts.

I gave a talk about controlling script execution where I explain the
previous approaches [4].  The design of the WIP RFC I talked about
changed quite a bit since then.

[2] https://github.com/zooba/spython
[3] https://lore.kernel.org/lkml/20211014130125.6991-1-zohar@linux.ibm.com/
[4] https://lssna2023.sched.com/event/1K7bO

Execution policy
----------------

The "execution" usage means that the content of the file descriptor is
trusted according to the system policy to be executed by user space,
which means that it interprets the content or (try to) maps it as
executable memory.

It is important to note that this can only enable to extend access
control managed by the kernel.  Hence it enables current access control
mechanism to be extended and become a superset of what they can
currently control.  Indeed, the security policy could also be delegated
to an LSM, either a MAC system or an integrity system.

Complementary W^X protections can be brought by SELinux or IPE [5].

Being able to restrict execution also enables to protect the kernel by
restricting arbitrary syscalls that an attacker could perform with a
crafted binary or certain script languages.  It also improves multilevel
isolation by reducing the ability of an attacker to use side channels
with specific code.  These restrictions can natively be enforced for ELF
binaries (with the noexec mount option) but require this kernel
extension to properly handle scripts (e.g. Python, Perl).  To get a
consistent execution policy, additional memory restrictions should also
be enforced (e.g. thanks to SELinux).

[5] https://lore.kernel.org/lkml/1716583609-21790-1-git-send-email-wufan@linux.microsoft.com/

Prerequisite for security use
-----------------------------

Because scripts might not currently have the executable permission and
still run well as is, or because we might want specific users to be
allowed to run arbitrary scripts, we also need a configuration
mechanism.

According to the threat model, to get a secure execution environment on
top of these changes, it might be required to configure and enable
existing security mechanisms such as secure boot, restrictive mount
points (e.g. with rw AND noexec), correct file permissions (including
executable libraries), IMA/EVM, SELinux policy...

The first thing to patch is the libc to check loaded libraries (e.g. see
chromeOS changes).  The second thing to patch are the script
interpreters by checking direct scripts executability and by checking
their own libraries (e.g. Python's imported files or argument-passed
modules).  For instance, the PEP 578 [6] (Runtime Audit Hooks) enables
Python 3.8 to be extended with policy enforcement points related to code
interpretation, which can be used to align with the PowerShell audit
features.  Additional Python security improvements (e.g. a limited
interpreter without -c, stdin piping of code) are developed [2] [7].

[6] https://www.python.org/dev/peps/pep-0578/
[7] https://lore.kernel.org/lkml/0c70debd-e79e-d514-06c6-4cd1e021fa8b@python.org/

libc patch
----------

Dynamic linking needs still need to check the libraries the same way
interpreters need to check scripts.

chromeOS patches glibc with a fstatvfs check [8] [9]. This enables to
check against noexec mount points, which is OK but doesn't fit with
execve semantics.  Moreover, the kernel is not aware of such check, so
all access control checks are not performed (e.g. file permission, LSMs
security policies, integrity and authenticity checks), it is not handled
with audit, and more importantly this would not work on generic
distributions because of the strict requirement and chromeOS-specific
assumptions.

[8] https://issuetracker.google.com/issues/40054993
[9] https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/6abfc9e327241a5f684b8b941c899b7ca8b6dbc1/sys-libs/glibc/files/local/glibc-2.37/0007-Deny-LD_PRELOAD-of-files-in-NOEXEC-mount.patch

Examples
--------

The initial idea comes from CLIP OS 4 and the original implementation
has been used for more than a decade:
https://github.com/clipos-archive/clipos4_doc
Chrome OS has a similar approach:
https://www.chromium.org/chromium-os/developer-library/guides/security/noexec-shell-scripts/

User space patches can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
There is more than the O_MAYEXEC changes (which matches this search)
e.g., to prevent Python interactive execution. There are patches for
Bash, Wine, Java (Icedtea), Busybox's ash, Perl and Python. There are
also some related patches which do not directly rely on O_MAYEXEC but
which restrict the use of browser plugins and extensions, which may be
seen as scripts too:
https://github.com/clipos-archive/clipos4_portage-overlay/tree/master/www-client

Past talks and articles
-----------------------

Closing the script execution control gap at Linux Plumbers Conference
2024: https://lpc.events/event/18/contributions/1692/

An introduction to O_MAYEXEC was given at the Linux Security Summit
Europe 2018 - Linux Kernel Security Contributions by ANSSI:
https://www.youtube.com/watch?v=chNjCRtPKQY&t=17m15s

The "write xor execute" principle was explained at Kernel Recipes 2018 -
CLIP OS: a defense-in-depth OS:
https://www.youtube.com/watch?v=PjRE0uBtkHU&t=11m14s

LWN articles:
* https://lwn.net/Articles/982085/
* https://lwn.net/Articles/832959/
* https://lwn.net/Articles/820000/

FAQ
Link: https://lore.kernel.org/r/20241112191858.162021-1-mic@digikod.net
---

Q: Why not extend open(2) or openat2(2) with a new flag like O_MAYEXEC?
A: Because it is not flexible enough:
https://lore.kernel.org/r/CAG48ez0NAV5gPgmbDaSjo=zzE=FgnYz=-OHuXwu0Vts=B5gesA@mail.gmail.com

Q: Why not only allowing file descriptor to avoid TOCTOU?
A: Because there are different use cases:
https://lore.kernel.org/r/CAHk-=whb=XuU=LGKnJWaa7LOYQz9VwHs8SLfgLbT5sf2VAbX1A@mail.gmail.com

Q: We can copy a script into a memfd and use it as an executable FD.
   Wouldn't that bypass the purpose of this patch series?
A: If an attacker can create a memfd it means that a
   malicious/compromised code is already running and it's too late for
   script execution control to help.  This patch series makes it more
   difficult for an attacker to execute arbitrary code on a trusted
   system in the first place:
https://lore.kernel.org/all/20240717.AGh2shahc9ee@digikod.net/

Q: What about ROP?
A: See previous answer. If ROP is exploited then the attacker already
   controls some code:
https://lore.kernel.org/all/20240718.ahph4che5Shi@digikod.net/

Q: What about LD_PRELOAD environment variable?
A: The dynamic linker should be enlighten to check if libraries are
   allowed to be loaded.

Q: What about The PATH environment variable?
A: All programs allowed to be executed are deemed trusted.

Q: Should we check seccomp filters too?
A: Yes, they should be considered as executable code because they can
   change the behavior of processes, similarly to code injection:
https://lore.kernel.org/all/20240705.IeTheequ7Ooj@digikod.net/

Q: Could that be used for role transition?
A: That would be risky and difficult to implement correctly:
https://lore.kernel.org/all/20240723.Tae5oovie2ah@digikod.net/

Previous versions
-----------------

v20: https://lore.kernel.org/r/20241011184422.977903-1-mic@digikod.net
v19: https://lore.kernel.org/r/20240704190137.696169-1-mic@digikod.net
v18: https://lore.kernel.org/r/20220104155024.48023-1-mic@digikod.net
v17: https://lore.kernel.org/r/20211115185304.198460-1-mic@digikod.net
v16: https://lore.kernel.org/r/20211110190626.257017-1-mic@digikod.net
v15: https://lore.kernel.org/r/20211012192410.2356090-1-mic@digikod.net
v14: https://lore.kernel.org/r/20211008104840.1733385-1-mic@digikod.net
v13: https://lore.kernel.org/r/20211007182321.872075-1-mic@digikod.net
v12: https://lore.kernel.org/r/20201203173118.379271-1-mic@digikod.net
v11: https://lore.kernel.org/r/20201019164932.1430614-1-mic@digikod.net
v10: https://lore.kernel.org/r/20200924153228.387737-1-mic@digikod.net
v9: https://lore.kernel.org/r/20200910164612.114215-1-mic@digikod.net
v8: https://lore.kernel.org/r/20200908075956.1069018-1-mic@digikod.net
v7: https://lore.kernel.org/r/20200723171227.446711-1-mic@digikod.net
v6: https://lore.kernel.org/r/20200714181638.45751-1-mic@digikod.net
v5: https://lore.kernel.org/r/20200505153156.925111-1-mic@digikod.net
v4: https://lore.kernel.org/r/20200430132320.699508-1-mic@digikod.net
v3: https://lore.kernel.org/r/20200428175129.634352-1-mic@digikod.net
v2: https://lore.kernel.org/r/20190906152455.22757-1-mic@digikod.net
v1: https://lore.kernel.org/r/20181212081712.32347-1-mic@digikod.net

Regards,

Mickaël Salaün (6):
  exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
  selftests/exec: Add 32 tests for AT_EXECVE_CHECK and exec securebits
  selftests/landlock: Add tests for execveat + AT_EXECVE_CHECK
  samples/check-exec: Add set-exec
  samples/check-exec: Add an enlighten "inc" interpreter and 28 tests

 Documentation/userspace-api/check_exec.rst    | 131 +++++
 Documentation/userspace-api/index.rst         |   1 +
 fs/exec.c                                     |  20 +-
 include/linux/binfmts.h                       |   7 +-
 include/uapi/linux/fcntl.h                    |   4 +
 include/uapi/linux/securebits.h               |  24 +-
 kernel/audit.h                                |   1 +
 kernel/auditsc.c                              |   1 +
 samples/Kconfig                               |   9 +
 samples/Makefile                              |   1 +
 samples/check-exec/.gitignore                 |   2 +
 samples/check-exec/Makefile                   |  15 +
 samples/check-exec/inc.c                      | 205 ++++++++
 samples/check-exec/run-script-ask.inc         |   8 +
 samples/check-exec/script-ask.inc             |   4 +
 samples/check-exec/script-exec.inc            |   3 +
 samples/check-exec/script-noexec.inc          |   3 +
 samples/check-exec/set-exec.c                 |  85 ++++
 security/commoncap.c                          |  29 +-
 security/security.c                           |  10 +
 tools/testing/selftests/exec/.gitignore       |   4 +
 tools/testing/selftests/exec/Makefile         |  19 +-
 .../selftests/exec/check-exec-tests.sh        | 205 ++++++++
 tools/testing/selftests/exec/check-exec.c     | 448 ++++++++++++++++++
 tools/testing/selftests/exec/config           |   2 +
 tools/testing/selftests/exec/false.c          |   5 +
 .../selftests/kselftest/ktap_helpers.sh       |   2 +-
 tools/testing/selftests/landlock/fs_test.c    |  27 ++
 28 files changed, 1263 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/userspace-api/check_exec.rst
 create mode 100644 samples/check-exec/.gitignore
 create mode 100644 samples/check-exec/Makefile
 create mode 100644 samples/check-exec/inc.c
 create mode 100755 samples/check-exec/run-script-ask.inc
 create mode 100755 samples/check-exec/script-ask.inc
 create mode 100755 samples/check-exec/script-exec.inc
 create mode 100644 samples/check-exec/script-noexec.inc
 create mode 100644 samples/check-exec/set-exec.c
 create mode 100755 tools/testing/selftests/exec/check-exec-tests.sh
 create mode 100644 tools/testing/selftests/exec/check-exec.c
 create mode 100644 tools/testing/selftests/exec/config
 create mode 100644 tools/testing/selftests/exec/false.c

base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
-- 
2.47.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-12 19:18 [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Mickaël Salaün
@ 2024-11-12 19:18 ` Mickaël Salaün
  2024-11-20  1:17   ` Jeff Xu
  2024-11-12 19:18 ` [PATCH v21 2/6] security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits Mickaël Salaün
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-12 19:18 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn
  Cc: Mickaël Salaün, Adhemerval Zanella Netto,
	Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Elliott Hughes, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Linus Torvalds, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
be allowed for execution.  The main use case is for script interpreters
and dynamic linkers to check execution permission according to the
kernel's security policy. Another use case is to add context to access
logs e.g., which script (instead of interpreter) accessed a file.  As
any executable code, scripts could also use this check [1].

This is different from faccessat(2) + X_OK which only checks a subset of
access rights (i.e. inode permission and mount options for regular
files), but not the full context (e.g. all LSM access checks).  The main
use case for access(2) is for SUID processes to (partially) check access
on behalf of their caller.  The main use case for execveat(2) +
AT_EXECVE_CHECK is to check if a script execution would be allowed,
according to all the different restrictions in place.  Because the use
of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
execution, user space gets the same error codes.

An interesting point of using execveat(2) instead of openat2(2) is that
it decouples the check from the enforcement.  Indeed, the security check
can be logged (e.g. with audit) without blocking an execution
environment not yet ready to enforce a strict security policy.

LSMs can control or log execution requests with
security_bprm_creds_for_exec().  However, to enforce a consistent and
complete access control (e.g. on binary's dependencies) LSMs should
restrict file executability, or mesure executed files, with
security_file_open() by checking file->f_flags & __FMODE_EXEC.

Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
doesn't make sense for the kernel to parse the checked files, look for
interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
if the format is unknown.  Because of that, security_bprm_check() is
never called when AT_EXECVE_CHECK is used.

It should be noted that script interpreters cannot directly use
execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
being executed to interpret the script.  Unlike the kernel, script
interpreters may just interpret the shebang as a simple comment, which
should not change for backward compatibility reasons.

Because scripts or libraries files might not currently have the
executable permission set, or because we might want specific users to be
allowed to run arbitrary scripts, the following patch provides a dynamic
configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
SECBIT_EXEC_DENY_INTERACTIVE securebits.

This is a redesign of the CLIP OS 4's O_MAYEXEC:
https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
This patch has been used for more than a decade with customized script
interpreters.  Some examples can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Link: https://docs.python.org/3/library/io.html#io.open_code [1]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net
---

Changes since v20:
* Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
  Serge Hallyn.
* Move the UAPI documentation to a dedicated RST file.
* Add Reviewed-by: Serge Hallyn

Changes since v19:
* Remove mention of "role transition" as suggested by Andy.
* Highlight the difference between security_bprm_creds_for_exec() and
  the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
  discussed with Jeff.
* Improve documentation both in UAPI comments and kernel comments
  (requested by Kees).

New design since v18:
https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
---
 Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
 Documentation/userspace-api/index.rst      |  1 +
 fs/exec.c                                  | 20 +++++++++++--
 include/linux/binfmts.h                    |  7 ++++-
 include/uapi/linux/fcntl.h                 |  4 +++
 kernel/audit.h                             |  1 +
 kernel/auditsc.c                           |  1 +
 security/security.c                        | 10 +++++++
 8 files changed, 75 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/userspace-api/check_exec.rst

diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
new file mode 100644
index 000000000000..ad1aeaa5f6c0
--- /dev/null
+++ b/Documentation/userspace-api/check_exec.rst
@@ -0,0 +1,34 @@
+===================
+Executability check
+===================
+
+AT_EXECVE_CHECK
+===============
+
+Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
+check on a regular file and returns 0 if execution of this file would be
+allowed, ignoring the file format and then the related interpreter dependencies
+(e.g. ELF libraries, script's shebang).
+
+Programs should always perform this check to apply kernel-level checks against
+files that are not directly executed by the kernel but passed to a user space
+interpreter instead.  All files that contain executable code, from the point of
+view of the interpreter, should be checked.  However the result of this check
+should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
+``SECBIT_EXEC_DENY_INTERACTIVE.``.
+
+The main purpose of this flag is to improve the security and consistency of an
+execution environment to ensure that direct file execution (e.g.
+``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to
+the same result.  For instance, this can be used to check if a file is
+trustworthy according to the caller's environment.
+
+In a secure environment, libraries and any executable dependencies should also
+be checked.  For instance, dynamic linking should make sure that all libraries
+are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``).
+For such secure execution environment to make sense, only trusted code should
+be executable, which also requires integrity guarantees.
+
+To avoid race conditions leading to time-of-check to time-of-use issues,
+``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
+file descriptor instead of a path.
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 274cc7546efc..6272bcf11296 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -35,6 +35,7 @@ Security-related interfaces
    mfd_noexec
    spec_ctrl
    tee
+   check_exec
 
 Devices and I/O
 ===============
diff --git a/fs/exec.c b/fs/exec.c
index 6c53920795c2..bb83b6a39530 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
 		.lookup_flags = LOOKUP_FOLLOW,
 	};
 
-	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
+	if ((flags &
+	     ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0)
 		return ERR_PTR(-EINVAL);
 	if (flags & AT_SYMLINK_NOFOLLOW)
 		open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
@@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
 	}
 	bprm->interp = bprm->filename;
 
+	/*
+	 * At this point, security_file_open() has already been called (with
+	 * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will
+	 * stop just after the security_bprm_creds_for_exec() call in
+	 * bprm_execve().  Indeed, the kernel should not try to parse the
+	 * content of the file with exec_binprm() nor change the calling
+	 * thread, which means that the following security functions will be
+	 * not called:
+	 * - security_bprm_check()
+	 * - security_bprm_creds_from_file()
+	 * - security_bprm_committing_creds()
+	 * - security_bprm_committed_creds()
+	 */
+	bprm->is_check = !!(flags & AT_EXECVE_CHECK);
+
 	retval = bprm_mm_init(bprm);
 	if (!retval)
 		return bprm;
@@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm)
 
 	/* Set the unchanging part of bprm->cred */
 	retval = security_bprm_creds_for_exec(bprm);
-	if (retval)
+	if (retval || bprm->is_check)
 		goto out;
 
 	retval = exec_binprm(bprm);
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index e6c00e860951..8ff0eb3644a1 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -42,7 +42,12 @@ struct linux_binprm {
 		 * Set when errors can no longer be returned to the
 		 * original userspace.
 		 */
-		point_of_no_return:1;
+		point_of_no_return:1,
+		/*
+		 * Set by user space to check executability according to the
+		 * caller's environment.
+		 */
+		is_check:1;
 	struct file *executable; /* Executable to pass to the interpreter */
 	struct file *interpreter;
 	struct file *file;
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 87e2dec79fea..2e87f2e3a79f 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -154,6 +154,10 @@
 					   usable with open_by_handle_at(2). */
 #define AT_HANDLE_MNT_ID_UNIQUE	0x001	/* Return the u64 unique mount ID. */
 
+/* Flags for execveat2(2). */
+#define AT_EXECVE_CHECK		0x10000	/* Only perform a check if execution
+					   would be allowed. */
+
 #if defined(__KERNEL__)
 #define AT_GETATTR_NOSEC	0x80000000
 #endif
diff --git a/kernel/audit.h b/kernel/audit.h
index a60d2840559e..8ebdabd2ab81 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -197,6 +197,7 @@ struct audit_context {
 		struct open_how openat2;
 		struct {
 			int			argc;
+			bool			is_check;
 		} execve;
 		struct {
 			char			*name;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index cd57053b4a69..8d9ba5600cf2 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
 
 	context->type = AUDIT_EXECVE;
 	context->execve.argc = bprm->argc;
+	context->execve.is_check = bprm->is_check;
 }
 
 
diff --git a/security/security.c b/security/security.c
index c5981e558bc2..456361ec249d 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
  * to 1 if AT_SECURE should be set to request libc enable secure mode.  @bprm
  * contains the linux_binprm structure.
  *
+ * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is
+ * set.  The result must be the same as without this flag even if the execution
+ * will never really happen and @bprm will always be dropped.
+ *
+ * This hook must not change current->cred, only @bprm->cred.
+ *
  * Return: Returns 0 if the hook is successful and permission is granted.
  */
 int security_bprm_creds_for_exec(struct linux_binprm *bprm)
@@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file)
  * Save open-time permission checking state for later use upon file_permission,
  * and recheck access if anything has changed since inode_permission.
  *
+ * We can check if a file is opened for execution (e.g. execve(2) call), either
+ * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags &
+ * __FMODE_EXEC .
+ *
  * Return: Returns 0 if permission is granted.
  */
 int security_file_open(struct file *file)
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v21 2/6] security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
  2024-11-12 19:18 [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Mickaël Salaün
  2024-11-12 19:18 ` [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2) Mickaël Salaün
@ 2024-11-12 19:18 ` Mickaël Salaün
  2024-11-20  1:30   ` Jeff Xu
  2024-11-12 19:18 ` [PATCH v21 3/6] selftests/exec: Add 32 tests for AT_EXECVE_CHECK and exec securebits Mickaël Salaün
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-12 19:18 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn
  Cc: Mickaël Salaün, Adhemerval Zanella Netto,
	Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Elliott Hughes, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Linus Torvalds, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Andy Lutomirski

The new SECBIT_EXEC_RESTRICT_FILE, SECBIT_EXEC_DENY_INTERACTIVE, and
their *_LOCKED counterparts are designed to be set by processes setting
up an execution environment, such as a user session, a container, or a
security sandbox.  Unlike other securebits, these ones can be set by
unprivileged processes.  Like seccomp filters or Landlock domains, the
securebits are inherited across processes.

When SECBIT_EXEC_RESTRICT_FILE is set, programs interpreting code should
control executable resources according to execveat(2) + AT_EXECVE_CHECK
(see previous commit).

When SECBIT_EXEC_DENY_INTERACTIVE is set, a process should deny
execution of user interactive commands (which excludes executable
regular files).

Being able to configure each of these securebits enables system
administrators or owner of image containers to gradually validate the
related changes and to identify potential issues (e.g. with interpreter
or audit logs).

It should be noted that unlike other security bits, the
SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE bits are
dedicated to user space willing to restrict itself.  Because of that,
they only make sense in the context of a trusted environment (e.g.
sandbox, container, user session, full system) where the process
changing its behavior (according to these bits) and all its parent
processes are trusted.  Otherwise, any parent process could just execute
its own malicious code (interpreting a script or not), or even enforce a
seccomp filter to mask these bits.

Such a secure environment can be achieved with an appropriate access
control (e.g. mount's noexec option, file access rights, LSM policy) and
an enlighten ld.so checking that libraries are allowed for execution
e.g., to protect against illegitimate use of LD_PRELOAD.

Ptrace restrictions according to these securebits would not make sense
because of the processes' trust assumption.

Scripts may need some changes to deal with untrusted data (e.g. stdin,
environment variables), but that is outside the scope of the kernel.

See chromeOS's documentation about script execution control and the
related threat model:
https://www.chromium.org/chromium-os/developer-library/guides/security/noexec-shell-scripts/

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241112191858.162021-3-mic@digikod.net
---

Changes since v20:
* Move UAPI documentation to a dedicated RST file and format it.

Changes since v19:
* Replace SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT with
  SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE:
  https://lore.kernel.org/all/20240710.eiKohpa4Phai@digikod.net/
* Remove the ptrace restrictions, suggested by Andy.
* Improve documentation according to the discussion with Jeff.

New design since v18:
https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
---
 Documentation/userspace-api/check_exec.rst | 97 ++++++++++++++++++++++
 include/uapi/linux/securebits.h            | 24 +++++-
 security/commoncap.c                       | 29 +++++--
 3 files changed, 143 insertions(+), 7 deletions(-)

diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
index ad1aeaa5f6c0..1df5c7534af9 100644
--- a/Documentation/userspace-api/check_exec.rst
+++ b/Documentation/userspace-api/check_exec.rst
@@ -2,6 +2,21 @@
 Executability check
 ===================
 
+The ``AT_EXECVE_CHECK`` :manpage:`execveat(2)` flag, and the
+``SECBIT_EXEC_RESTRICT_FILE`` and ``SECBIT_EXEC_DENY_INTERACTIVE`` securebits
+are intended for script interpreters and dynamic linkers to enforce a
+consistent execution security policy handled by the kernel.  See the
+`samples/check-exec/inc.c`_ example.
+
+Whether an interpreter should check these securebits or not depends on the
+security risk of running malicious scripts with respect to the execution
+environment, and whether the kernel can check if a script is trustworthy or
+not.  For instance, Python scripts running on a server can use arbitrary
+syscalls and access arbitrary files.  Such interpreters should then be
+enlighten to use these securebits and let users define their security policy.
+However, a JavaScript engine running in a web browser should already be
+sandboxed and then should not be able to harm the user's environment.
+
 AT_EXECVE_CHECK
 ===============
 
@@ -32,3 +47,85 @@ be executable, which also requires integrity guarantees.
 To avoid race conditions leading to time-of-check to time-of-use issues,
 ``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
 file descriptor instead of a path.
+
+SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE
+==========================================================
+
+When ``SECBIT_EXEC_RESTRICT_FILE`` is set, a process should only interpret or
+execute a file if a call to :manpage:`execveat(2)` with the related file
+descriptor and the ``AT_EXECVE_CHECK`` flag succeed.
+
+This secure bit may be set by user session managers, service managers,
+container runtimes, sandboxer tools...  Except for test environments, the
+related ``SECBIT_EXEC_RESTRICT_FILE_LOCKED`` bit should also be set.
+
+Programs should only enforce consistent restrictions according to the
+securebits but without relying on any other user-controlled configuration.
+Indeed, the use case for these securebits is to only trust executable code
+vetted by the system configuration (through the kernel), so we should be
+careful to not let untrusted users control this configuration.
+
+However, script interpreters may still use user configuration such as
+environment variables as long as it is not a way to disable the securebits
+checks.  For instance, the ``PATH`` and ``LD_PRELOAD`` variables can be set by
+a script's caller.  Changing these variables may lead to unintended code
+executions, but only from vetted executable programs, which is OK.  For this to
+make sense, the system should provide a consistent security policy to avoid
+arbitrary code execution e.g., by enforcing a write xor execute policy.
+
+When ``SECBIT_EXEC_DENY_INTERACTIVE`` is set, a process should never interpret
+interactive user commands (e.g. scripts).  However, if such commands are passed
+through a file descriptor (e.g. stdin), its content should be interpreted if a
+call to :manpage:`execveat(2)` with the related file descriptor and the
+``AT_EXECVE_CHECK`` flag succeed.
+
+For instance, script interpreters called with a script snippet as argument
+should always deny such execution if ``SECBIT_EXEC_DENY_INTERACTIVE`` is set.
+
+This secure bit may be set by user session managers, service managers,
+container runtimes, sandboxer tools...  Except for test environments, the
+related ``SECBIT_EXEC_DENY_INTERACTIVE_LOCKED`` bit should also be set.
+
+Here is the expected behavior for a script interpreter according to combination
+of any exec securebits:
+
+1. ``SECBIT_EXEC_RESTRICT_FILE=0`` and ``SECBIT_EXEC_DENY_INTERACTIVE=0``
+
+   Always interpret scripts, and allow arbitrary user commands (default).
+
+   No threat, everyone and everything is trusted, but we can get ahead of
+   potential issues thanks to the call to :manpage:`execveat(2)` with
+   ``AT_EXECVE_CHECK`` which should always be performed but ignored by the
+   script interpreter.  Indeed, this check is still important to enable systems
+   administrators to verify requests (e.g. with audit) and prepare for
+   migration to a secure mode.
+
+2. ``SECBIT_EXEC_RESTRICT_FILE=1`` and ``SECBIT_EXEC_DENY_INTERACTIVE=0``
+
+   Deny script interpretation if they are not executable, but allow
+   arbitrary user commands.
+
+   The threat is (potential) malicious scripts run by trusted (and not fooled)
+   users.  That can protect against unintended script executions (e.g. ``sh
+   /tmp/*.sh``).  This makes sense for (semi-restricted) user sessions.
+
+3. ``SECBIT_EXEC_RESTRICT_FILE=0`` and ``SECBIT_EXEC_DENY_INTERACTIVE=1``
+
+   Always interpret scripts, but deny arbitrary user commands.
+
+   This use case may be useful for secure services (i.e. without interactive
+   user session) where scripts' integrity is verified (e.g.  with IMA/EVM or
+   dm-verity/IPE) but where access rights might not be ready yet.  Indeed,
+   arbitrary interactive commands would be much more difficult to check.
+
+4. ``SECBIT_EXEC_RESTRICT_FILE=1`` and ``SECBIT_EXEC_DENY_INTERACTIVE=1``
+
+   Deny script interpretation if they are not executable, and also deny
+   any arbitrary user commands.
+
+   The threat is malicious scripts run by untrusted users (but trusted code).
+   This makes sense for system services that may only execute trusted scripts.
+
+.. Links
+.. _samples/check-exec/inc.c:
+   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/check-exec/inc.c
diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h
index d6d98877ff1a..3fba30dbd68b 100644
--- a/include/uapi/linux/securebits.h
+++ b/include/uapi/linux/securebits.h
@@ -52,10 +52,32 @@
 #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \
 			(issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED))
 
+/* See Documentation/userspace-api/check_exec.rst */
+#define SECURE_EXEC_RESTRICT_FILE		8
+#define SECURE_EXEC_RESTRICT_FILE_LOCKED	9  /* make bit-8 immutable */
+
+#define SECBIT_EXEC_RESTRICT_FILE (issecure_mask(SECURE_EXEC_RESTRICT_FILE))
+#define SECBIT_EXEC_RESTRICT_FILE_LOCKED \
+			(issecure_mask(SECURE_EXEC_RESTRICT_FILE_LOCKED))
+
+/* See Documentation/userspace-api/check_exec.rst */
+#define SECURE_EXEC_DENY_INTERACTIVE		10
+#define SECURE_EXEC_DENY_INTERACTIVE_LOCKED	11  /* make bit-10 immutable */
+
+#define SECBIT_EXEC_DENY_INTERACTIVE \
+			(issecure_mask(SECURE_EXEC_DENY_INTERACTIVE))
+#define SECBIT_EXEC_DENY_INTERACTIVE_LOCKED \
+			(issecure_mask(SECURE_EXEC_DENY_INTERACTIVE_LOCKED))
+
 #define SECURE_ALL_BITS		(issecure_mask(SECURE_NOROOT) | \
 				 issecure_mask(SECURE_NO_SETUID_FIXUP) | \
 				 issecure_mask(SECURE_KEEP_CAPS) | \
-				 issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
+				 issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \
+				 issecure_mask(SECURE_EXEC_RESTRICT_FILE) | \
+				 issecure_mask(SECURE_EXEC_DENY_INTERACTIVE))
 #define SECURE_ALL_LOCKS	(SECURE_ALL_BITS << 1)
 
+#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_EXEC_RESTRICT_FILE) | \
+				 issecure_mask(SECURE_EXEC_DENY_INTERACTIVE))
+
 #endif /* _UAPI_LINUX_SECUREBITS_H */
diff --git a/security/commoncap.c b/security/commoncap.c
index cefad323a0b1..52ea01acb453 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1302,21 +1302,38 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 		     & (old->securebits ^ arg2))			/*[1]*/
 		    || ((old->securebits & SECURE_ALL_LOCKS & ~arg2))	/*[2]*/
 		    || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS))	/*[3]*/
-		    || (cap_capable(current_cred(),
-				    current_cred()->user_ns,
-				    CAP_SETPCAP,
-				    CAP_OPT_NONE) != 0)			/*[4]*/
 			/*
 			 * [1] no changing of bits that are locked
 			 * [2] no unlocking of locks
 			 * [3] no setting of unsupported bits
-			 * [4] doing anything requires privilege (go read about
-			 *     the "sendmail capabilities bug")
 			 */
 		    )
 			/* cannot change a locked bit */
 			return -EPERM;
 
+		/*
+		 * Doing anything requires privilege (go read about the
+		 * "sendmail capabilities bug"), except for unprivileged bits.
+		 * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not
+		 * restrictions enforced by the kernel but by user space on
+		 * itself.
+		 */
+		if (cap_capable(current_cred(), current_cred()->user_ns,
+				CAP_SETPCAP, CAP_OPT_NONE) != 0) {
+			const unsigned long unpriv_and_locks =
+				SECURE_ALL_UNPRIVILEGED |
+				SECURE_ALL_UNPRIVILEGED << 1;
+			const unsigned long changed = old->securebits ^ arg2;
+
+			/* For legacy reason, denies non-change. */
+			if (!changed)
+				return -EPERM;
+
+			/* Denies privileged changes. */
+			if (changed & ~unpriv_and_locks)
+				return -EPERM;
+		}
+
 		new = prepare_creds();
 		if (!new)
 			return -ENOMEM;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v21 3/6] selftests/exec: Add 32 tests for AT_EXECVE_CHECK and exec securebits
  2024-11-12 19:18 [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Mickaël Salaün
  2024-11-12 19:18 ` [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2) Mickaël Salaün
  2024-11-12 19:18 ` [PATCH v21 2/6] security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits Mickaël Salaün
@ 2024-11-12 19:18 ` Mickaël Salaün
  2024-11-12 19:18 ` [PATCH v21 4/6] selftests/landlock: Add tests for execveat + AT_EXECVE_CHECK Mickaël Salaün
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-12 19:18 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn
  Cc: Mickaël Salaün, Adhemerval Zanella Netto,
	Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Elliott Hughes, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Linus Torvalds, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

Test that checks performed by execveat(..., AT_EXECVE_CHECK) are
consistent with noexec mount points and file execute permissions.

Test that SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE are
inherited by child processes and that they can be pinned with the
appropriate SECBIT_EXEC_RESTRICT_FILE_LOCKED and
SECBIT_EXEC_DENY_INTERACTIVE_LOCKED bits.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241112191858.162021-4-mic@digikod.net
---

Changes since v20:
* Rename AT_CHECK to AT_EXECVE_CHECK.

Changes since v19:
* Rename securebits.
* Rename test file.

Changes since v18:
* Rewrite tests with the new design: execveat/AT_CHECK and securebits.
* Simplify the capability dropping and improve it with the NOROOT
  securebits.
* Replace most ASSERT with EXPECT.
* Fix NULL execve's argv to avoid kernel warning.
* Move tests to exec/
* Build a "false" static binary to test full execution path.

Changes since v14:
* Add Reviewed-by Kees Cook.

Changes since v13:
* Move -I to CFLAGS (suggested by Kees Cook).
* Update sysctl name.

Changes since v12:
* Fix Makefile's license.

Changes since v10:
* Update selftest Makefile.

Changes since v9:
* Rename the syscall and the sysctl.
* Update tests for enum trusted_for_usage

Changes since v8:
* Update with the dedicated syscall introspect_access(2) and the renamed
  fs.introspection_policy sysctl.
* Remove check symlink which can't be use as is anymore.
* Use socketpair(2) to test UNIX socket.

Changes since v7:
* Update tests with faccessat2/AT_INTERPRETED, including new ones to
  check that setting R_OK or W_OK returns EINVAL.
* Add tests for memfd, pipefs and nsfs.
* Rename and move back tests to a standalone directory.

Changes since v6:
* Add full combination tests for all file types, including block
  devices, character devices, fifos, sockets and symlinks.
* Properly save and restore initial sysctl value for all tests.

Changes since v5:
* Refactor with FIXTURE_VARIANT, which make the tests much more easy to
  read and maintain.
* Save and restore initial sysctl value (suggested by Kees Cook).
* Test with a sysctl value of 0.
* Check errno in sysctl_access_write test.
* Update tests for the CAP_SYS_ADMIN switch.
* Update tests to check -EISDIR (replacing -EACCES).
* Replace FIXTURE_DATA() with FIXTURE() (spotted by Kees Cook).
* Use global const strings.

Changes since v3:
* Replace RESOLVE_MAYEXEC with O_MAYEXEC.
* Add tests to check that O_MAYEXEC is ignored by open(2) and openat(2).

Changes since v2:
* Move tests from exec/ to openat2/ .
* Replace O_MAYEXEC with RESOLVE_MAYEXEC from openat2(2).
* Cleanup tests.

Changes since v1:
* Move tests from yama/ to exec/ .
* Fix _GNU_SOURCE in kselftest_harness.h .
* Add a new test sysctl_access_write to check if CAP_MAC_ADMIN is taken
  into account.
* Test directory execution which is always forbidden since commit
  73601ea5b7b1 ("fs/open.c: allow opening only regular files during
  execve()"), and also check that even the root user can not bypass file
  execution checks.
* Make sure delete_workspace() always as enough right to succeed.
* Cosmetic cleanup.
---
 tools/testing/selftests/exec/.gitignore   |   2 +
 tools/testing/selftests/exec/Makefile     |   7 +
 tools/testing/selftests/exec/check-exec.c | 448 ++++++++++++++++++++++
 tools/testing/selftests/exec/config       |   2 +
 tools/testing/selftests/exec/false.c      |   5 +
 5 files changed, 464 insertions(+)
 create mode 100644 tools/testing/selftests/exec/check-exec.c
 create mode 100644 tools/testing/selftests/exec/config
 create mode 100644 tools/testing/selftests/exec/false.c

diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore
index a0dc5d4bf733..a32c63bb4df1 100644
--- a/tools/testing/selftests/exec/.gitignore
+++ b/tools/testing/selftests/exec/.gitignore
@@ -9,6 +9,8 @@ execveat.ephemeral
 execveat.denatured
 non-regular
 null-argv
+/check-exec
+/false
 /load_address.*
 !load_address.c
 /recursion-depth
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
index ba012bc5aab9..8713d1c862ae 100644
--- a/tools/testing/selftests/exec/Makefile
+++ b/tools/testing/selftests/exec/Makefile
@@ -1,6 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 CFLAGS = -Wall
 CFLAGS += -Wno-nonnull
+CFLAGS += $(KHDR_INCLUDES)
+
+LDLIBS += -lcap
 
 ALIGNS := 0x1000 0x200000 0x1000000
 ALIGN_PIES        := $(patsubst %,load_address.%,$(ALIGNS))
@@ -9,12 +12,14 @@ ALIGNMENT_TESTS   := $(ALIGN_PIES) $(ALIGN_STATIC_PIES)
 
 TEST_PROGS := binfmt_script.py
 TEST_GEN_PROGS := execveat non-regular $(ALIGNMENT_TESTS)
+TEST_GEN_PROGS_EXTENDED := false
 TEST_GEN_FILES := execveat.symlink execveat.denatured script subdir
 # Makefile is a run-time dependency, since it's accessed by the execveat test
 TEST_FILES := Makefile
 
 TEST_GEN_PROGS += recursion-depth
 TEST_GEN_PROGS += null-argv
+TEST_GEN_PROGS += check-exec
 
 EXTRA_CLEAN := $(OUTPUT)/subdir.moved $(OUTPUT)/execveat.moved $(OUTPUT)/xxxxx*	\
 	       $(OUTPUT)/S_I*.test
@@ -38,3 +43,5 @@ $(OUTPUT)/load_address.0x%: load_address.c
 $(OUTPUT)/load_address.static.0x%: load_address.c
 	$(CC) $(CFLAGS) $(LDFLAGS) -Wl,-z,max-page-size=$(lastword $(subst ., ,$@)) \
 		-fPIE -static-pie $< -o $@
+$(OUTPUT)/false: false.c
+	$(CC) $(CFLAGS) $(LDFLAGS) -static $< -o $@
diff --git a/tools/testing/selftests/exec/check-exec.c b/tools/testing/selftests/exec/check-exec.c
new file mode 100644
index 000000000000..c3aa046d8d68
--- /dev/null
+++ b/tools/testing/selftests/exec/check-exec.c
@@ -0,0 +1,448 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test execveat(2) with AT_EXECVE_CHECK, and prctl(2) with
+ * SECBIT_EXEC_RESTRICT_FILE, SECBIT_EXEC_DENY_INTERACTIVE, and their locked
+ * counterparts.
+ *
+ * Copyright © 2018-2020 ANSSI
+ * Copyright © 2024 Microsoft Corporation
+ *
+ * Author: Mickaël Salaün <mic@digikod.net>
+ */
+
+#include <asm-generic/unistd.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/prctl.h>
+#include <linux/securebits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/capability.h>
+#include <sys/mount.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/sysmacros.h>
+#include <unistd.h>
+
+/* Defines AT_EXECVE_CHECK without type conflicts. */
+#define _ASM_GENERIC_FCNTL_H
+#include <linux/fcntl.h>
+
+#include "../kselftest_harness.h"
+
+static void drop_privileges(struct __test_metadata *const _metadata)
+{
+	const unsigned int noroot = SECBIT_NOROOT | SECBIT_NOROOT_LOCKED;
+	cap_t cap_p;
+
+	if ((cap_get_secbits() & noroot) != noroot)
+		EXPECT_EQ(0, cap_set_secbits(noroot));
+
+	cap_p = cap_get_proc();
+	EXPECT_NE(NULL, cap_p);
+	EXPECT_NE(-1, cap_clear(cap_p));
+
+	/*
+	 * Drops everything, especially CAP_SETPCAP, CAP_DAC_OVERRIDE, and
+	 * CAP_DAC_READ_SEARCH.
+	 */
+	EXPECT_NE(-1, cap_set_proc(cap_p));
+	EXPECT_NE(-1, cap_free(cap_p));
+}
+
+static int test_secbits_set(const unsigned int secbits)
+{
+	int err;
+
+	err = prctl(PR_SET_SECUREBITS, secbits);
+	if (err)
+		return errno;
+	return 0;
+}
+
+FIXTURE(access)
+{
+	int memfd, pipefd;
+	int pipe_fds[2], socket_fds[2];
+};
+
+FIXTURE_VARIANT(access)
+{
+	const bool mount_exec;
+	const bool file_exec;
+};
+
+FIXTURE_VARIANT_ADD(access, mount_exec_file_exec){
+	.mount_exec = true,
+	.file_exec = true,
+};
+
+FIXTURE_VARIANT_ADD(access, mount_exec_file_noexec){
+	.mount_exec = true,
+	.file_exec = false,
+};
+
+FIXTURE_VARIANT_ADD(access, mount_noexec_file_exec){
+	.mount_exec = false,
+	.file_exec = true,
+};
+
+FIXTURE_VARIANT_ADD(access, mount_noexec_file_noexec){
+	.mount_exec = false,
+	.file_exec = false,
+};
+
+static const char binary_path[] = "./false";
+static const char workdir_path[] = "./test-mount";
+static const char reg_file_path[] = "./test-mount/regular_file";
+static const char dir_path[] = "./test-mount/directory";
+static const char block_dev_path[] = "./test-mount/block_device";
+static const char char_dev_path[] = "./test-mount/character_device";
+static const char fifo_path[] = "./test-mount/fifo";
+
+FIXTURE_SETUP(access)
+{
+	int procfd_path_size;
+	static const char path_template[] = "/proc/self/fd/%d";
+	char procfd_path[sizeof(path_template) + 10];
+
+	/* Makes sure we are not already restricted nor locked. */
+	EXPECT_EQ(0, test_secbits_set(0));
+
+	/*
+	 * Cleans previous workspace if any error previously happened (don't
+	 * check errors).
+	 */
+	umount(workdir_path);
+	rmdir(workdir_path);
+
+	/* Creates a clean mount point. */
+	ASSERT_EQ(0, mkdir(workdir_path, 00700));
+	ASSERT_EQ(0, mount("test", workdir_path, "tmpfs",
+			   MS_MGC_VAL | (variant->mount_exec ? 0 : MS_NOEXEC),
+			   "mode=0700,size=9m"));
+
+	/* Creates a regular file. */
+	ASSERT_EQ(0, mknod(reg_file_path,
+			   S_IFREG | (variant->file_exec ? 0700 : 0600), 0));
+	/* Creates a directory. */
+	ASSERT_EQ(0, mkdir(dir_path, variant->file_exec ? 0700 : 0600));
+	/* Creates a character device: /dev/null. */
+	ASSERT_EQ(0, mknod(char_dev_path, S_IFCHR | 0400, makedev(1, 3)));
+	/* Creates a block device: /dev/loop0 */
+	ASSERT_EQ(0, mknod(block_dev_path, S_IFBLK | 0400, makedev(7, 0)));
+	/* Creates a fifo. */
+	ASSERT_EQ(0, mknod(fifo_path, S_IFIFO | 0600, 0));
+
+	/* Creates a regular file without user mount point. */
+	self->memfd = memfd_create("test-exec-probe", MFD_CLOEXEC);
+	ASSERT_LE(0, self->memfd);
+	/* Sets mode, which must be ignored by the exec check. */
+	ASSERT_EQ(0, fchmod(self->memfd, variant->file_exec ? 0700 : 0600));
+
+	/* Creates a pipefs file descriptor. */
+	ASSERT_EQ(0, pipe(self->pipe_fds));
+	procfd_path_size = snprintf(procfd_path, sizeof(procfd_path),
+				    path_template, self->pipe_fds[0]);
+	ASSERT_LT(procfd_path_size, sizeof(procfd_path));
+	self->pipefd = open(procfd_path, O_RDWR | O_CLOEXEC);
+	ASSERT_LE(0, self->pipefd);
+	ASSERT_EQ(0, fchmod(self->pipefd, variant->file_exec ? 0700 : 0600));
+
+	/* Creates a socket file descriptor. */
+	ASSERT_EQ(0, socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0,
+				self->socket_fds));
+}
+
+FIXTURE_TEARDOWN_PARENT(access)
+{
+	/* There is no need to unlink the test files. */
+	EXPECT_EQ(0, umount(workdir_path));
+	EXPECT_EQ(0, rmdir(workdir_path));
+}
+
+static void fill_exec_fd(struct __test_metadata *_metadata, const int fd_out)
+{
+	char buf[1024];
+	size_t len;
+	int fd_in;
+
+	fd_in = open(binary_path, O_CLOEXEC | O_RDONLY);
+	ASSERT_LE(0, fd_in);
+	/* Cannot use copy_file_range(2) because of EXDEV. */
+	len = read(fd_in, buf, sizeof(buf));
+	EXPECT_LE(0, len);
+	while (len > 0) {
+		EXPECT_EQ(len, write(fd_out, buf, len))
+		{
+			TH_LOG("Failed to write: %s (%d)", strerror(errno),
+			       errno);
+		}
+		len = read(fd_in, buf, sizeof(buf));
+		EXPECT_LE(0, len);
+	}
+	EXPECT_EQ(0, close(fd_in));
+}
+
+static void fill_exec_path(struct __test_metadata *_metadata,
+			   const char *const path)
+{
+	int fd_out;
+
+	fd_out = open(path, O_CLOEXEC | O_WRONLY);
+	ASSERT_LE(0, fd_out)
+	{
+		TH_LOG("Failed to open %s: %s", path, strerror(errno));
+	}
+	fill_exec_fd(_metadata, fd_out);
+	EXPECT_EQ(0, close(fd_out));
+}
+
+static void test_exec_fd(struct __test_metadata *_metadata, const int fd,
+			 const int err_code)
+{
+	char *const argv[] = { "", NULL };
+	int access_ret, access_errno;
+
+	/*
+	 * If we really execute fd, filled with the "false" binary, the current
+	 * thread will exits with an error, which will be interpreted by the
+	 * test framework as an error.  With AT_EXECVE_CHECK, we only check a
+	 * potential successful execution.
+	 */
+	access_ret =
+		execveat(fd, "", argv, NULL, AT_EMPTY_PATH | AT_EXECVE_CHECK);
+	access_errno = errno;
+	if (err_code) {
+		EXPECT_EQ(-1, access_ret);
+		EXPECT_EQ(err_code, access_errno)
+		{
+			TH_LOG("Wrong error for execveat(2): %s (%d)",
+			       strerror(access_errno), errno);
+		}
+	} else {
+		EXPECT_EQ(0, access_ret)
+		{
+			TH_LOG("Access denied: %s", strerror(access_errno));
+		}
+	}
+}
+
+static void test_exec_path(struct __test_metadata *_metadata,
+			   const char *const path, const int err_code)
+{
+	int flags = O_CLOEXEC;
+	int fd;
+
+	/* Do not block on pipes. */
+	if (path == fifo_path)
+		flags |= O_NONBLOCK;
+
+	fd = open(path, flags | O_RDONLY);
+	ASSERT_LE(0, fd)
+	{
+		TH_LOG("Failed to open %s: %s", path, strerror(errno));
+	}
+	test_exec_fd(_metadata, fd, err_code);
+	EXPECT_EQ(0, close(fd));
+}
+
+/* Tests that we don't get ENOEXEC. */
+TEST_F(access, regular_file_empty)
+{
+	const int exec = variant->mount_exec && variant->file_exec;
+
+	test_exec_path(_metadata, reg_file_path, exec ? 0 : EACCES);
+
+	drop_privileges(_metadata);
+	test_exec_path(_metadata, reg_file_path, exec ? 0 : EACCES);
+}
+
+TEST_F(access, regular_file_elf)
+{
+	const int exec = variant->mount_exec && variant->file_exec;
+
+	fill_exec_path(_metadata, reg_file_path);
+
+	test_exec_path(_metadata, reg_file_path, exec ? 0 : EACCES);
+
+	drop_privileges(_metadata);
+	test_exec_path(_metadata, reg_file_path, exec ? 0 : EACCES);
+}
+
+/* Tests that we don't get ENOEXEC. */
+TEST_F(access, memfd_empty)
+{
+	const int exec = variant->file_exec;
+
+	test_exec_fd(_metadata, self->memfd, exec ? 0 : EACCES);
+
+	drop_privileges(_metadata);
+	test_exec_fd(_metadata, self->memfd, exec ? 0 : EACCES);
+}
+
+TEST_F(access, memfd_elf)
+{
+	const int exec = variant->file_exec;
+
+	fill_exec_fd(_metadata, self->memfd);
+
+	test_exec_fd(_metadata, self->memfd, exec ? 0 : EACCES);
+
+	drop_privileges(_metadata);
+	test_exec_fd(_metadata, self->memfd, exec ? 0 : EACCES);
+}
+
+TEST_F(access, non_regular_files)
+{
+	test_exec_path(_metadata, dir_path, EACCES);
+	test_exec_path(_metadata, block_dev_path, EACCES);
+	test_exec_path(_metadata, char_dev_path, EACCES);
+	test_exec_path(_metadata, fifo_path, EACCES);
+	test_exec_fd(_metadata, self->socket_fds[0], EACCES);
+	test_exec_fd(_metadata, self->pipefd, EACCES);
+}
+
+/* clang-format off */
+FIXTURE(secbits) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(secbits)
+{
+	const bool is_privileged;
+	const int error;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(secbits, priv) {
+	/* clang-format on */
+	.is_privileged = true,
+	.error = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(secbits, unpriv) {
+	/* clang-format on */
+	.is_privileged = false,
+	.error = EPERM,
+};
+
+FIXTURE_SETUP(secbits)
+{
+	/* Makes sure no exec bits are set. */
+	EXPECT_EQ(0, test_secbits_set(0));
+	EXPECT_EQ(0, prctl(PR_GET_SECUREBITS));
+
+	if (!variant->is_privileged)
+		drop_privileges(_metadata);
+}
+
+FIXTURE_TEARDOWN(secbits)
+{
+}
+
+TEST_F(secbits, legacy)
+{
+	EXPECT_EQ(variant->error, test_secbits_set(0));
+}
+
+#define CHILD(...)                     \
+	do {                           \
+		pid_t child = vfork(); \
+		EXPECT_LE(0, child);   \
+		if (child == 0) {      \
+			__VA_ARGS__;   \
+			_exit(0);      \
+		}                      \
+	} while (0)
+
+TEST_F(secbits, exec)
+{
+	unsigned int secbits = prctl(PR_GET_SECUREBITS);
+
+	secbits |= SECBIT_EXEC_RESTRICT_FILE;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+	EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS));
+	CHILD(EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS)));
+
+	secbits |= SECBIT_EXEC_DENY_INTERACTIVE;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+	EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS));
+	CHILD(EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS)));
+
+	secbits &= ~(SECBIT_EXEC_RESTRICT_FILE | SECBIT_EXEC_DENY_INTERACTIVE);
+	EXPECT_EQ(0, test_secbits_set(secbits));
+	EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS));
+	CHILD(EXPECT_EQ(secbits, prctl(PR_GET_SECUREBITS)));
+}
+
+TEST_F(secbits, check_locked_set)
+{
+	unsigned int secbits = prctl(PR_GET_SECUREBITS);
+
+	secbits |= SECBIT_EXEC_RESTRICT_FILE;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+	secbits |= SECBIT_EXEC_RESTRICT_FILE_LOCKED;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+
+	/* Checks lock set but unchanged. */
+	EXPECT_EQ(variant->error, test_secbits_set(secbits));
+	CHILD(EXPECT_EQ(variant->error, test_secbits_set(secbits)));
+
+	secbits &= ~SECBIT_EXEC_RESTRICT_FILE;
+	EXPECT_EQ(EPERM, test_secbits_set(0));
+	CHILD(EXPECT_EQ(EPERM, test_secbits_set(0)));
+}
+
+TEST_F(secbits, check_locked_unset)
+{
+	unsigned int secbits = prctl(PR_GET_SECUREBITS);
+
+	secbits |= SECBIT_EXEC_RESTRICT_FILE_LOCKED;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+
+	/* Checks lock unset but unchanged. */
+	EXPECT_EQ(variant->error, test_secbits_set(secbits));
+	CHILD(EXPECT_EQ(variant->error, test_secbits_set(secbits)));
+
+	secbits &= ~SECBIT_EXEC_RESTRICT_FILE;
+	EXPECT_EQ(EPERM, test_secbits_set(0));
+	CHILD(EXPECT_EQ(EPERM, test_secbits_set(0)));
+}
+
+TEST_F(secbits, restrict_locked_set)
+{
+	unsigned int secbits = prctl(PR_GET_SECUREBITS);
+
+	secbits |= SECBIT_EXEC_DENY_INTERACTIVE;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+	secbits |= SECBIT_EXEC_DENY_INTERACTIVE_LOCKED;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+
+	/* Checks lock set but unchanged. */
+	EXPECT_EQ(variant->error, test_secbits_set(secbits));
+	CHILD(EXPECT_EQ(variant->error, test_secbits_set(secbits)));
+
+	secbits &= ~SECBIT_EXEC_DENY_INTERACTIVE;
+	EXPECT_EQ(EPERM, test_secbits_set(0));
+	CHILD(EXPECT_EQ(EPERM, test_secbits_set(0)));
+}
+
+TEST_F(secbits, restrict_locked_unset)
+{
+	unsigned int secbits = prctl(PR_GET_SECUREBITS);
+
+	secbits |= SECBIT_EXEC_DENY_INTERACTIVE_LOCKED;
+	EXPECT_EQ(0, test_secbits_set(secbits));
+
+	/* Checks lock unset but unchanged. */
+	EXPECT_EQ(variant->error, test_secbits_set(secbits));
+	CHILD(EXPECT_EQ(variant->error, test_secbits_set(secbits)));
+
+	secbits &= ~SECBIT_EXEC_DENY_INTERACTIVE;
+	EXPECT_EQ(EPERM, test_secbits_set(0));
+	CHILD(EXPECT_EQ(EPERM, test_secbits_set(0)));
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/exec/config b/tools/testing/selftests/exec/config
new file mode 100644
index 000000000000..c308079867b3
--- /dev/null
+++ b/tools/testing/selftests/exec/config
@@ -0,0 +1,2 @@
+CONFIG_BLK_DEV=y
+CONFIG_BLK_DEV_LOOP=y
diff --git a/tools/testing/selftests/exec/false.c b/tools/testing/selftests/exec/false.c
new file mode 100644
index 000000000000..104383ec3a79
--- /dev/null
+++ b/tools/testing/selftests/exec/false.c
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+int main(void)
+{
+	return 1;
+}
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v21 4/6] selftests/landlock: Add tests for execveat + AT_EXECVE_CHECK
  2024-11-12 19:18 [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Mickaël Salaün
                   ` (2 preceding siblings ...)
  2024-11-12 19:18 ` [PATCH v21 3/6] selftests/exec: Add 32 tests for AT_EXECVE_CHECK and exec securebits Mickaël Salaün
@ 2024-11-12 19:18 ` Mickaël Salaün
  2024-11-12 19:18 ` [PATCH v21 5/6] samples/check-exec: Add set-exec Mickaël Salaün
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-12 19:18 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn
  Cc: Mickaël Salaün, Adhemerval Zanella Netto,
	Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Elliott Hughes, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Linus Torvalds, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Günther Noack

Extend layout1.execute with the new AT_EXECVE_CHECK flag.  The semantic
with AT_EXECVE_CHECK is the same as with a simple execve(2),
LANDLOCK_ACCESS_FS_EXECUTE is enforced the same way.

Cc: Günther Noack <gnoack@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241112191858.162021-5-mic@digikod.net
---

Changes since v20:
* Rename AT_CHECK to AT_EXECVE_CHECK.
---
 tools/testing/selftests/landlock/fs_test.c | 27 ++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index 6788762188fe..cd66901be612 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -37,6 +37,10 @@
 #include <linux/fs.h>
 #include <linux/mount.h>
 
+/* Defines AT_EXECVE_CHECK without type conflicts. */
+#define _ASM_GENERIC_FCNTL_H
+#include <linux/fcntl.h>
+
 #include "common.h"
 
 #ifndef renameat2
@@ -2008,6 +2012,22 @@ static void test_execute(struct __test_metadata *const _metadata, const int err,
 	};
 }
 
+static void test_check_exec(struct __test_metadata *const _metadata,
+			    const int err, const char *const path)
+{
+	int ret;
+	char *const argv[] = { (char *)path, NULL };
+
+	ret = execveat(AT_FDCWD, path, argv, NULL,
+		       AT_EMPTY_PATH | AT_EXECVE_CHECK);
+	if (err) {
+		EXPECT_EQ(-1, ret);
+		EXPECT_EQ(errno, err);
+	} else {
+		EXPECT_EQ(0, ret);
+	}
+}
+
 TEST_F_FORK(layout1, execute)
 {
 	const struct rule rules[] = {
@@ -2025,20 +2045,27 @@ TEST_F_FORK(layout1, execute)
 	copy_binary(_metadata, file1_s1d2);
 	copy_binary(_metadata, file1_s1d3);
 
+	/* Checks before file1_s1d1 being denied. */
+	test_execute(_metadata, 0, file1_s1d1);
+	test_check_exec(_metadata, 0, file1_s1d1);
+
 	enforce_ruleset(_metadata, ruleset_fd);
 	ASSERT_EQ(0, close(ruleset_fd));
 
 	ASSERT_EQ(0, test_open(dir_s1d1, O_RDONLY));
 	ASSERT_EQ(0, test_open(file1_s1d1, O_RDONLY));
 	test_execute(_metadata, EACCES, file1_s1d1);
+	test_check_exec(_metadata, EACCES, file1_s1d1);
 
 	ASSERT_EQ(0, test_open(dir_s1d2, O_RDONLY));
 	ASSERT_EQ(0, test_open(file1_s1d2, O_RDONLY));
 	test_execute(_metadata, 0, file1_s1d2);
+	test_check_exec(_metadata, 0, file1_s1d2);
 
 	ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY));
 	ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
 	test_execute(_metadata, 0, file1_s1d3);
+	test_check_exec(_metadata, 0, file1_s1d3);
 }
 
 TEST_F_FORK(layout1, link)
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v21 5/6] samples/check-exec: Add set-exec
  2024-11-12 19:18 [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Mickaël Salaün
                   ` (3 preceding siblings ...)
  2024-11-12 19:18 ` [PATCH v21 4/6] selftests/landlock: Add tests for execveat + AT_EXECVE_CHECK Mickaël Salaün
@ 2024-11-12 19:18 ` Mickaël Salaün
  2024-11-12 19:18 ` [PATCH v21 6/6] samples/check-exec: Add an enlighten "inc" interpreter and 28 tests Mickaël Salaün
  2024-11-21  4:58 ` [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Kees Cook
  6 siblings, 0 replies; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-12 19:18 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn
  Cc: Mickaël Salaün, Adhemerval Zanella Netto,
	Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Elliott Hughes, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Linus Torvalds, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

Add a simple tool to set SECBIT_EXEC_RESTRICT_FILE or
SECBIT_EXEC_DENY_INTERACTIVE before executing a command.  This is useful
to easily test against enlighten script interpreters.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241112191858.162021-6-mic@digikod.net
---

Changes since v19:
* Rename file and directory.
* Update securebits and related arguments.
* Remove useless call to prctl() when securebits are unchanged.
---
 samples/Kconfig               |  7 +++
 samples/Makefile              |  1 +
 samples/check-exec/.gitignore |  1 +
 samples/check-exec/Makefile   | 14 ++++++
 samples/check-exec/set-exec.c | 85 +++++++++++++++++++++++++++++++++++
 5 files changed, 108 insertions(+)
 create mode 100644 samples/check-exec/.gitignore
 create mode 100644 samples/check-exec/Makefile
 create mode 100644 samples/check-exec/set-exec.c

diff --git a/samples/Kconfig b/samples/Kconfig
index b288d9991d27..efa28ceadc42 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -291,6 +291,13 @@ config SAMPLE_CGROUP
 	help
 	  Build samples that demonstrate the usage of the cgroup API.
 
+config SAMPLE_CHECK_EXEC
+	bool "Exec secure bits examples"
+	depends on CC_CAN_LINK && HEADERS_INSTALL
+	help
+	  Build a tool to easily configure SECBIT_EXEC_RESTRICT_FILE and
+	  SECBIT_EXEC_DENY_INTERACTIVE.
+
 source "samples/rust/Kconfig"
 
 endif # SAMPLES
diff --git a/samples/Makefile b/samples/Makefile
index b85fa64390c5..f988202f3a30 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -3,6 +3,7 @@
 
 subdir-$(CONFIG_SAMPLE_AUXDISPLAY)	+= auxdisplay
 subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
+subdir-$(CONFIG_SAMPLE_CHECK_EXEC)	+= check-exec
 subdir-$(CONFIG_SAMPLE_CGROUP) += cgroup
 obj-$(CONFIG_SAMPLE_CONFIGFS)		+= configfs/
 obj-$(CONFIG_SAMPLE_CONNECTOR)		+= connector/
diff --git a/samples/check-exec/.gitignore b/samples/check-exec/.gitignore
new file mode 100644
index 000000000000..3f8119112ccf
--- /dev/null
+++ b/samples/check-exec/.gitignore
@@ -0,0 +1 @@
+/set-exec
diff --git a/samples/check-exec/Makefile b/samples/check-exec/Makefile
new file mode 100644
index 000000000000..d9f976e3ff98
--- /dev/null
+++ b/samples/check-exec/Makefile
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+userprogs-always-y := \
+	set-exec
+
+userccflags += -I usr/include
+
+.PHONY: all clean
+
+all:
+	$(MAKE) -C ../.. samples/check-exec/
+
+clean:
+	$(MAKE) -C ../.. M=samples/check-exec/ clean
diff --git a/samples/check-exec/set-exec.c b/samples/check-exec/set-exec.c
new file mode 100644
index 000000000000..ba86a60a20dd
--- /dev/null
+++ b/samples/check-exec/set-exec.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Simple tool to set SECBIT_EXEC_RESTRICT_FILE, SECBIT_EXEC_DENY_INTERACTIVE,
+ * before executing a command.
+ *
+ * Copyright © 2024 Microsoft Corporation
+ */
+
+#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__
+#include <errno.h>
+#include <linux/prctl.h>
+#include <linux/securebits.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+
+static void print_usage(const char *argv0)
+{
+	fprintf(stderr, "usage: %s -f|-i -- <cmd> [args]...\n\n", argv0);
+	fprintf(stderr, "Execute a command with\n");
+	fprintf(stderr, "- SECBIT_EXEC_RESTRICT_FILE set: -f\n");
+	fprintf(stderr, "- SECBIT_EXEC_DENY_INTERACTIVE set: -i\n");
+}
+
+int main(const int argc, char *const argv[], char *const *const envp)
+{
+	const char *cmd_path;
+	char *const *cmd_argv;
+	int opt, secbits_cur, secbits_new;
+	bool has_policy = false;
+
+	secbits_cur = prctl(PR_GET_SECUREBITS);
+	if (secbits_cur == -1) {
+		/*
+		 * This should never happen, except with a buggy seccomp
+		 * filter.
+		 */
+		perror("ERROR: Failed to get securebits");
+		return 1;
+	}
+
+	secbits_new = secbits_cur;
+	while ((opt = getopt(argc, argv, "fi")) != -1) {
+		switch (opt) {
+		case 'f':
+			secbits_new |= SECBIT_EXEC_RESTRICT_FILE |
+				       SECBIT_EXEC_RESTRICT_FILE_LOCKED;
+			has_policy = true;
+			break;
+		case 'i':
+			secbits_new |= SECBIT_EXEC_DENY_INTERACTIVE |
+				       SECBIT_EXEC_DENY_INTERACTIVE_LOCKED;
+			has_policy = true;
+			break;
+		default:
+			print_usage(argv[0]);
+			return 1;
+		}
+	}
+
+	if (!argv[optind] || !has_policy) {
+		print_usage(argv[0]);
+		return 1;
+	}
+
+	if (secbits_cur != secbits_new &&
+	    prctl(PR_SET_SECUREBITS, secbits_new)) {
+		perror("Failed to set secure bit(s).");
+		fprintf(stderr,
+			"Hint: The running kernel may not support this feature.\n");
+		return 1;
+	}
+
+	cmd_path = argv[optind];
+	cmd_argv = argv + optind;
+	fprintf(stderr, "Executing command...\n");
+	execvpe(cmd_path, cmd_argv, envp);
+	fprintf(stderr, "Failed to execute \"%s\": %s\n", cmd_path,
+		strerror(errno));
+	return 1;
+}
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v21 6/6] samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
  2024-11-12 19:18 [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Mickaël Salaün
                   ` (4 preceding siblings ...)
  2024-11-12 19:18 ` [PATCH v21 5/6] samples/check-exec: Add set-exec Mickaël Salaün
@ 2024-11-12 19:18 ` Mickaël Salaün
  2024-11-21 20:34   ` Mimi Zohar
  2024-11-21  4:58 ` [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Kees Cook
  6 siblings, 1 reply; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-12 19:18 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn
  Cc: Mickaël Salaün, Adhemerval Zanella Netto,
	Alejandro Colomar, Aleksa Sarai, Andrew Morton, Andy Lutomirski,
	Arnd Bergmann, Casey Schaufler, Christian Heimes, Dmitry Vyukov,
	Elliott Hughes, Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn, Jeff Xu,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Linus Torvalds, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

Add a very simple script interpreter called "inc" that can evaluate two
different commands (one per line):
- "?" to initialize a counter from user's input;
- "+" to increment the counter (which is set to 0 by default).

It is enlighten to only interpret executable files according to
AT_EXECVE_CHECK and the related securebits:

  # Executing a script with RESTRICT_FILE is only allowed if the script
  # is executable:
  ./set-exec -f -- ./inc script-exec.inc # Allowed
  ./set-exec -f -- ./inc script-noexec.inc # Denied

  # Executing stdin with DENY_INTERACTIVE is only allowed if stdin is an
  # executable regular file:
  ./set-exec -i -- ./inc -i < script-exec.inc # Allowed
  ./set-exec -i -- ./inc -i < script-noexec.inc # Denied

  # However, a pipe is not executable and it is then denied:
  cat script-noexec.inc | ./set-exec -i -- ./inc -i # Denied

  # Executing raw data (e.g. command argument) with DENY_INTERACTIVE is
  # always denied.
  ./set-exec -i -- ./inc -c "+" # Denied
  ./inc -c "$(<script-ask.inc)" # Allowed

  # To directly execute a script, we can update $PATH (used by `env`):
  PATH="${PATH}:." ./script-exec.inc

  # To execute several commands passed as argument:

Add a complete test suite to check the script interpreter against all
possible execution cases:

  make TARGETS=exec kselftest-install
  ./tools/testing/selftests/kselftest_install/run_kselftest.sh

Fix ktap_helpers.sh to gracefully ignore optional argument.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241112191858.162021-7-mic@digikod.net
---

Changes since v20:
* Rename AT_CHECK to AT_EXECVE_CHECK.

Changes since v19:
* New patch.
---
 samples/Kconfig                               |   4 +-
 samples/check-exec/.gitignore                 |   1 +
 samples/check-exec/Makefile                   |   1 +
 samples/check-exec/inc.c                      | 205 ++++++++++++++++++
 samples/check-exec/run-script-ask.inc         |   8 +
 samples/check-exec/script-ask.inc             |   4 +
 samples/check-exec/script-exec.inc            |   3 +
 samples/check-exec/script-noexec.inc          |   3 +
 tools/testing/selftests/exec/.gitignore       |   2 +
 tools/testing/selftests/exec/Makefile         |  14 +-
 .../selftests/exec/check-exec-tests.sh        | 205 ++++++++++++++++++
 .../selftests/kselftest/ktap_helpers.sh       |   2 +-
 12 files changed, 448 insertions(+), 4 deletions(-)
 create mode 100644 samples/check-exec/inc.c
 create mode 100755 samples/check-exec/run-script-ask.inc
 create mode 100755 samples/check-exec/script-ask.inc
 create mode 100755 samples/check-exec/script-exec.inc
 create mode 100644 samples/check-exec/script-noexec.inc
 create mode 100755 tools/testing/selftests/exec/check-exec-tests.sh

diff --git a/samples/Kconfig b/samples/Kconfig
index efa28ceadc42..84a9d4e8d947 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -296,7 +296,9 @@ config SAMPLE_CHECK_EXEC
 	depends on CC_CAN_LINK && HEADERS_INSTALL
 	help
 	  Build a tool to easily configure SECBIT_EXEC_RESTRICT_FILE and
-	  SECBIT_EXEC_DENY_INTERACTIVE.
+	  SECBIT_EXEC_DENY_INTERACTIVE, and a simple script interpreter to
+	  demonstrate how they should be used with execveat(2) +
+	  AT_EXECVE_CHECK.
 
 source "samples/rust/Kconfig"
 
diff --git a/samples/check-exec/.gitignore b/samples/check-exec/.gitignore
index 3f8119112ccf..cd759a19dacd 100644
--- a/samples/check-exec/.gitignore
+++ b/samples/check-exec/.gitignore
@@ -1 +1,2 @@
+/inc
 /set-exec
diff --git a/samples/check-exec/Makefile b/samples/check-exec/Makefile
index d9f976e3ff98..c4f08ad0f8e3 100644
--- a/samples/check-exec/Makefile
+++ b/samples/check-exec/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 
 userprogs-always-y := \
+	inc \
 	set-exec
 
 userccflags += -I usr/include
diff --git a/samples/check-exec/inc.c b/samples/check-exec/inc.c
new file mode 100644
index 000000000000..94b87569d2a2
--- /dev/null
+++ b/samples/check-exec/inc.c
@@ -0,0 +1,205 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Very simple script interpreter that can evaluate two different commands (one
+ * per line):
+ * - "?" to initialize a counter from user's input;
+ * - "+" to increment the counter (which is set to 0 by default).
+ *
+ * See tools/testing/selftests/exec/check-exec-tests.sh and
+ * Documentation/userspace-api/check_exec.rst
+ *
+ * Copyright © 2024 Microsoft Corporation
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <linux/fcntl.h>
+#include <linux/prctl.h>
+#include <linux/securebits.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+
+/* Returns 1 on error, 0 otherwise. */
+static int interpret_buffer(char *buffer, size_t buffer_size)
+{
+	char *line, *saveptr = NULL;
+	long long number = 0;
+
+	/* Each command is the first character of a line. */
+	saveptr = NULL;
+	line = strtok_r(buffer, "\n", &saveptr);
+	while (line) {
+		if (*line != '#' && strlen(line) != 1) {
+			fprintf(stderr, "# ERROR: Unknown string\n");
+			return 1;
+		}
+		switch (*line) {
+		case '#':
+			/* Skips shebang and comments. */
+			break;
+		case '+':
+			/* Increments and prints the number. */
+			number++;
+			printf("%lld\n", number);
+			break;
+		case '?':
+			/* Reads integer from stdin. */
+			fprintf(stderr, "> Enter new number: \n");
+			if (scanf("%lld", &number) != 1) {
+				fprintf(stderr,
+					"# WARNING: Failed to read number from stdin\n");
+			}
+			break;
+		default:
+			fprintf(stderr, "# ERROR: Unknown character '%c'\n",
+				*line);
+			return 1;
+		}
+		line = strtok_r(NULL, "\n", &saveptr);
+	}
+	return 0;
+}
+
+/* Returns 1 on error, 0 otherwise. */
+static int interpret_stream(FILE *script, char *const script_name,
+			    char *const *const envp, const bool restrict_stream)
+{
+	int err;
+	char *const script_argv[] = { script_name, NULL };
+	char buf[128] = {};
+	size_t buf_size = sizeof(buf);
+
+	/*
+	 * We pass a valid argv and envp to the kernel to emulate a native
+	 * script execution.  We must use the script file descriptor instead of
+	 * the script path name to avoid race conditions.
+	 */
+	err = execveat(fileno(script), "", script_argv, envp,
+		       AT_EMPTY_PATH | AT_EXECVE_CHECK);
+	if (err && restrict_stream) {
+		perror("ERROR: Script execution check");
+		return 1;
+	}
+
+	/* Reads script. */
+	buf_size = fread(buf, 1, buf_size - 1, script);
+	return interpret_buffer(buf, buf_size);
+}
+
+static void print_usage(const char *argv0)
+{
+	fprintf(stderr, "usage: %s <script.inc> | -i | -c <command>\n\n",
+		argv0);
+	fprintf(stderr, "Example:\n");
+	fprintf(stderr, "  ./set-exec -fi -- ./inc -i < script-exec.inc\n");
+}
+
+int main(const int argc, char *const argv[], char *const *const envp)
+{
+	int opt;
+	char *cmd = NULL;
+	char *script_name = NULL;
+	bool interpret_stdin = false;
+	FILE *script_file = NULL;
+	int secbits;
+	bool deny_interactive, restrict_file;
+	size_t arg_nb;
+
+	secbits = prctl(PR_GET_SECUREBITS);
+	if (secbits == -1) {
+		/*
+		 * This should never happen, except with a buggy seccomp
+		 * filter.
+		 */
+		perror("ERROR: Failed to get securebits");
+		return 1;
+	}
+
+	deny_interactive = !!(secbits & SECBIT_EXEC_DENY_INTERACTIVE);
+	restrict_file = !!(secbits & SECBIT_EXEC_RESTRICT_FILE);
+
+	while ((opt = getopt(argc, argv, "c:i")) != -1) {
+		switch (opt) {
+		case 'c':
+			if (cmd) {
+				fprintf(stderr, "ERROR: Command already set");
+				return 1;
+			}
+			cmd = optarg;
+			break;
+		case 'i':
+			interpret_stdin = true;
+			break;
+		default:
+			print_usage(argv[0]);
+			return 1;
+		}
+	}
+
+	/* Checks that only one argument is used, or read stdin. */
+	arg_nb = !!cmd + !!interpret_stdin;
+	if (arg_nb == 0 && argc == 2) {
+		script_name = argv[1];
+	} else if (arg_nb != 1) {
+		print_usage(argv[0]);
+		return 1;
+	}
+
+	if (cmd) {
+		/*
+		 * Other kind of interactive interpretations should be denied
+		 * as well (e.g. CLI arguments passing script snippets,
+		 * environment variables interpreted as script).  However, any
+		 * way to pass script files should only be restricted according
+		 * to restrict_file.
+		 */
+		if (deny_interactive) {
+			fprintf(stderr,
+				"ERROR: Interactive interpretation denied.\n");
+			return 1;
+		}
+
+		return interpret_buffer(cmd, strlen(cmd));
+	}
+
+	if (interpret_stdin && !script_name) {
+		script_file = stdin;
+		/*
+		 * As for any execve(2) call, this path may be logged by the
+		 * kernel.
+		 */
+		script_name = "/proc/self/fd/0";
+		/*
+		 * When stdin is used, it can point to a regular file or a
+		 * pipe.  Restrict stdin execution according to
+		 * SECBIT_EXEC_DENY_INTERACTIVE but always allow executable
+		 * files (which are not considered as interactive inputs).
+		 */
+		return interpret_stream(script_file, script_name, envp,
+					deny_interactive);
+	} else if (script_name && !interpret_stdin) {
+		/*
+		 * In this sample, we don't pass any argument to scripts, but
+		 * otherwise we would have to forge an argv with such
+		 * arguments.
+		 */
+		script_file = fopen(script_name, "r");
+		if (!script_file) {
+			perror("ERROR: Failed to open script");
+			return 1;
+		}
+		/*
+		 * Restricts file execution according to
+		 * SECBIT_EXEC_RESTRICT_FILE.
+		 */
+		return interpret_stream(script_file, script_name, envp,
+					restrict_file);
+	}
+
+	print_usage(argv[0]);
+	return 1;
+}
diff --git a/samples/check-exec/run-script-ask.inc b/samples/check-exec/run-script-ask.inc
new file mode 100755
index 000000000000..3ea3e15fbd5a
--- /dev/null
+++ b/samples/check-exec/run-script-ask.inc
@@ -0,0 +1,8 @@
+#!/usr/bin/env sh
+
+DIR="$(dirname -- "$0")"
+
+PATH="${PATH}:${DIR}"
+
+set -x
+"${DIR}/script-ask.inc"
diff --git a/samples/check-exec/script-ask.inc b/samples/check-exec/script-ask.inc
new file mode 100755
index 000000000000..f48252ab07c1
--- /dev/null
+++ b/samples/check-exec/script-ask.inc
@@ -0,0 +1,4 @@
+#!/usr/bin/env inc
+
+?
++
diff --git a/samples/check-exec/script-exec.inc b/samples/check-exec/script-exec.inc
new file mode 100755
index 000000000000..525e958e1c20
--- /dev/null
+++ b/samples/check-exec/script-exec.inc
@@ -0,0 +1,3 @@
+#!/usr/bin/env inc
+
++
diff --git a/samples/check-exec/script-noexec.inc b/samples/check-exec/script-noexec.inc
new file mode 100644
index 000000000000..525e958e1c20
--- /dev/null
+++ b/samples/check-exec/script-noexec.inc
@@ -0,0 +1,3 @@
+#!/usr/bin/env inc
+
++
diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore
index a32c63bb4df1..7f3d1ae762ec 100644
--- a/tools/testing/selftests/exec/.gitignore
+++ b/tools/testing/selftests/exec/.gitignore
@@ -11,9 +11,11 @@ non-regular
 null-argv
 /check-exec
 /false
+/inc
 /load_address.*
 !load_address.c
 /recursion-depth
+/set-exec
 xxxxxxxx*
 pipe
 S_I*.test
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
index 8713d1c862ae..45a3cfc435cf 100644
--- a/tools/testing/selftests/exec/Makefile
+++ b/tools/testing/selftests/exec/Makefile
@@ -10,9 +10,9 @@ ALIGN_PIES        := $(patsubst %,load_address.%,$(ALIGNS))
 ALIGN_STATIC_PIES := $(patsubst %,load_address.static.%,$(ALIGNS))
 ALIGNMENT_TESTS   := $(ALIGN_PIES) $(ALIGN_STATIC_PIES)
 
-TEST_PROGS := binfmt_script.py
+TEST_PROGS := binfmt_script.py check-exec-tests.sh
 TEST_GEN_PROGS := execveat non-regular $(ALIGNMENT_TESTS)
-TEST_GEN_PROGS_EXTENDED := false
+TEST_GEN_PROGS_EXTENDED := false inc set-exec script-exec.inc script-noexec.inc
 TEST_GEN_FILES := execveat.symlink execveat.denatured script subdir
 # Makefile is a run-time dependency, since it's accessed by the execveat test
 TEST_FILES := Makefile
@@ -26,6 +26,8 @@ EXTRA_CLEAN := $(OUTPUT)/subdir.moved $(OUTPUT)/execveat.moved $(OUTPUT)/xxxxx*
 
 include ../lib.mk
 
+CHECK_EXEC_SAMPLES := $(top_srcdir)/samples/check-exec
+
 $(OUTPUT)/subdir:
 	mkdir -p $@
 $(OUTPUT)/script: Makefile
@@ -45,3 +47,11 @@ $(OUTPUT)/load_address.static.0x%: load_address.c
 		-fPIE -static-pie $< -o $@
 $(OUTPUT)/false: false.c
 	$(CC) $(CFLAGS) $(LDFLAGS) -static $< -o $@
+$(OUTPUT)/inc: $(CHECK_EXEC_SAMPLES)/inc.c
+	$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
+$(OUTPUT)/set-exec: $(CHECK_EXEC_SAMPLES)/set-exec.c
+	$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
+$(OUTPUT)/script-exec.inc: $(CHECK_EXEC_SAMPLES)/script-exec.inc
+	cp $< $@
+$(OUTPUT)/script-noexec.inc: $(CHECK_EXEC_SAMPLES)/script-noexec.inc
+	cp $< $@
diff --git a/tools/testing/selftests/exec/check-exec-tests.sh b/tools/testing/selftests/exec/check-exec-tests.sh
new file mode 100755
index 000000000000..87102906ae3c
--- /dev/null
+++ b/tools/testing/selftests/exec/check-exec-tests.sh
@@ -0,0 +1,205 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test the "inc" interpreter.
+#
+# See include/uapi/linux/securebits.h, include/uapi/linux/fcntl.h and
+# samples/check-exec/inc.c
+#
+# Copyright © 2024 Microsoft Corporation
+
+set -u -e -o pipefail
+
+EXPECTED_OUTPUT="1"
+exec 2>/dev/null
+
+DIR="$(dirname $(readlink -f "$0"))"
+source "${DIR}"/../kselftest/ktap_helpers.sh
+
+exec_direct() {
+	local expect="$1"
+	local script="$2"
+	shift 2
+	local ret=0
+	local out
+
+	# Updates PATH for `env` to execute the `inc` interpreter.
+	out="$(PATH="." "$@" "${script}")" || ret=$?
+
+	if [[ ${ret} -ne ${expect} ]]; then
+		echo "ERROR: Wrong expectation for direct file execution: ${ret}"
+		return 1
+	fi
+	if [[ ${ret} -eq 0 && "${out}" != "${EXPECTED_OUTPUT}" ]]; then
+		echo "ERROR: Wrong output for direct file execution: ${out}"
+		return 1
+	fi
+}
+
+exec_indirect() {
+	local expect="$1"
+	local script="$2"
+	shift 2
+	local ret=0
+	local out
+
+	# Script passed as argument.
+	out="$("$@" ./inc "${script}")" || ret=$?
+
+	if [[ ${ret} -ne ${expect} ]]; then
+		echo "ERROR: Wrong expectation for indirect file execution: ${ret}"
+		return 1
+	fi
+	if [[ ${ret} -eq 0 && "${out}" != "${EXPECTED_OUTPUT}" ]]; then
+		echo "ERROR: Wrong output for indirect file execution: ${out}"
+		return 1
+	fi
+}
+
+exec_stdin_reg() {
+	local expect="$1"
+	local script="$2"
+	shift 2
+	local ret=0
+	local out
+
+	# Executing stdin must be allowed if the related file is executable.
+	out="$("$@" ./inc -i < "${script}")" || ret=$?
+
+	if [[ ${ret} -ne ${expect} ]]; then
+		echo "ERROR: Wrong expectation for stdin regular file execution: ${ret}"
+		return 1
+	fi
+	if [[ ${ret} -eq 0 && "${out}" != "${EXPECTED_OUTPUT}" ]]; then
+		echo "ERROR: Wrong output for stdin regular file execution: ${out}"
+		return 1
+	fi
+}
+
+exec_stdin_pipe() {
+	local expect="$1"
+	shift
+	local ret=0
+	local out
+
+	# A pipe is not executable.
+	out="$(cat script-exec.inc | "$@" ./inc -i)" || ret=$?
+
+	if [[ ${ret} -ne ${expect} ]]; then
+		echo "ERROR: Wrong expectation for stdin pipe execution: ${ret}"
+		return 1
+	fi
+}
+
+exec_argument() {
+	local expect="$1"
+	local ret=0
+	shift
+	local out
+
+	# Script not coming from a file must not be executed.
+	out="$("$@" ./inc -c "$(< script-exec.inc)")" || ret=$?
+
+	if [[ ${ret} -ne ${expect} ]]; then
+		echo "ERROR: Wrong expectation for arbitrary argument execution: ${ret}"
+		return 1
+	fi
+	if [[ ${ret} -eq 0 && "${out}" != "${EXPECTED_OUTPUT}" ]]; then
+		echo "ERROR: Wrong output for arbitrary argument execution: ${out}"
+		return 1
+	fi
+}
+
+exec_interactive() {
+	exec_stdin_pipe "$@"
+	exec_argument "$@"
+}
+
+ktap_test() {
+	ktap_test_result "$*" "$@"
+}
+
+ktap_print_header
+ktap_set_plan 28
+
+# Without secbit configuration, nothing is changed.
+
+ktap_print_msg "By default, executable scripts are allowed to be interpreted and executed."
+ktap_test exec_direct 0 script-exec.inc
+ktap_test exec_indirect 0 script-exec.inc
+
+ktap_print_msg "By default, executable stdin is allowed to be interpreted."
+ktap_test exec_stdin_reg 0 script-exec.inc
+
+ktap_print_msg "By default, non-executable scripts are allowed to be interpreted, but not directly executed."
+# We get 126 because of direct execution by Bash.
+ktap_test exec_direct 126 script-noexec.inc
+ktap_test exec_indirect 0 script-noexec.inc
+
+ktap_print_msg "By default, non-executable stdin is allowed to be interpreted."
+ktap_test exec_stdin_reg 0 script-noexec.inc
+
+ktap_print_msg "By default, interactive commands are allowed to be interpreted."
+ktap_test exec_interactive 0
+
+# With only file restriction: protect non-malicious users from inadvertent errors (e.g. python ~/Downloads/*.py).
+
+ktap_print_msg "With -f, executable scripts are allowed to be interpreted and executed."
+ktap_test exec_direct 0 script-exec.inc ./set-exec -f --
+ktap_test exec_indirect 0 script-exec.inc ./set-exec -f --
+
+ktap_print_msg "With -f, executable stdin is allowed to be interpreted."
+ktap_test exec_stdin_reg 0 script-exec.inc ./set-exec -f --
+
+ktap_print_msg "With -f, non-executable scripts are not allowed to be executed nor interpreted."
+# Direct execution of non-executable script is alwayse denied by the kernel.
+ktap_test exec_direct 1 script-noexec.inc ./set-exec -f --
+ktap_test exec_indirect 1 script-noexec.inc ./set-exec -f --
+
+ktap_print_msg "With -f, non-executable stdin is allowed to be interpreted."
+ktap_test exec_stdin_reg 0 script-noexec.inc ./set-exec -f --
+
+ktap_print_msg "With -f, interactive commands are allowed to be interpreted."
+ktap_test exec_interactive 0 ./set-exec -f --
+
+# With only denied interactive commands: check or monitor script content (e.g. with LSM).
+
+ktap_print_msg "With -i, executable scripts are allowed to be interpreted and executed."
+ktap_test exec_direct 0 script-exec.inc ./set-exec -i --
+ktap_test exec_indirect 0 script-exec.inc ./set-exec -i --
+
+ktap_print_msg "With -i, executable stdin is allowed to be interpreted."
+ktap_test exec_stdin_reg 0 script-exec.inc ./set-exec -i --
+
+ktap_print_msg "With -i, non-executable scripts are allowed to be interpreted, but not directly executed."
+# Direct execution of non-executable script is alwayse denied by the kernel.
+ktap_test exec_direct 1 script-noexec.inc ./set-exec -i --
+ktap_test exec_indirect 0 script-noexec.inc ./set-exec -i --
+
+ktap_print_msg "With -i, non-executable stdin is not allowed to be interpreted."
+ktap_test exec_stdin_reg 1 script-noexec.inc ./set-exec -i --
+
+ktap_print_msg "With -i, interactive commands are not allowed to be interpreted."
+ktap_test exec_interactive 1 ./set-exec -i --
+
+# With both file restriction and denied interactive commands: only allow executable scripts.
+
+ktap_print_msg "With -fi, executable scripts are allowed to be interpreted and executed."
+ktap_test exec_direct 0 script-exec.inc ./set-exec -fi --
+ktap_test exec_indirect 0 script-exec.inc ./set-exec -fi --
+
+ktap_print_msg "With -fi, executable stdin is allowed to be interpreted."
+ktap_test exec_stdin_reg 0 script-exec.inc ./set-exec -fi --
+
+ktap_print_msg "With -fi, non-executable scripts are not allowed to be interpreted nor executed."
+# Direct execution of non-executable script is alwayse denied by the kernel.
+ktap_test exec_direct 1 script-noexec.inc ./set-exec -fi --
+ktap_test exec_indirect 1 script-noexec.inc ./set-exec -fi --
+
+ktap_print_msg "With -fi, non-executable stdin is not allowed to be interpreted."
+ktap_test exec_stdin_reg 1 script-noexec.inc ./set-exec -fi --
+
+ktap_print_msg "With -fi, interactive commands are not allowed to be interpreted."
+ktap_test exec_interactive 1 ./set-exec -fi --
+
+ktap_finished
diff --git a/tools/testing/selftests/kselftest/ktap_helpers.sh b/tools/testing/selftests/kselftest/ktap_helpers.sh
index 79a125eb24c2..14e7f3ec3f84 100644
--- a/tools/testing/selftests/kselftest/ktap_helpers.sh
+++ b/tools/testing/selftests/kselftest/ktap_helpers.sh
@@ -40,7 +40,7 @@ ktap_skip_all() {
 __ktap_test() {
 	result="$1"
 	description="$2"
-	directive="$3" # optional
+	directive="${3:-}" # optional
 
 	local directive_str=
 	[ ! -z "$directive" ] && directive_str="# $directive"
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-12 19:18 ` [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2) Mickaël Salaün
@ 2024-11-20  1:17   ` Jeff Xu
  2024-11-20  9:42     ` Mickaël Salaün
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Xu @ 2024-11-20  1:17 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Theodore Ts'o,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module

On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
> be allowed for execution.  The main use case is for script interpreters
> and dynamic linkers to check execution permission according to the
> kernel's security policy. Another use case is to add context to access
> logs e.g., which script (instead of interpreter) accessed a file.  As
> any executable code, scripts could also use this check [1].
>
> This is different from faccessat(2) + X_OK which only checks a subset of
> access rights (i.e. inode permission and mount options for regular
> files), but not the full context (e.g. all LSM access checks).  The main
> use case for access(2) is for SUID processes to (partially) check access
> on behalf of their caller.  The main use case for execveat(2) +
> AT_EXECVE_CHECK is to check if a script execution would be allowed,
> according to all the different restrictions in place.  Because the use
> of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
> execution, user space gets the same error codes.
>
> An interesting point of using execveat(2) instead of openat2(2) is that
> it decouples the check from the enforcement.  Indeed, the security check
> can be logged (e.g. with audit) without blocking an execution
> environment not yet ready to enforce a strict security policy.
>
> LSMs can control or log execution requests with
> security_bprm_creds_for_exec().  However, to enforce a consistent and
> complete access control (e.g. on binary's dependencies) LSMs should
> restrict file executability, or mesure executed files, with
> security_file_open() by checking file->f_flags & __FMODE_EXEC.
>
> Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
> doesn't make sense for the kernel to parse the checked files, look for
> interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> if the format is unknown.  Because of that, security_bprm_check() is
> never called when AT_EXECVE_CHECK is used.
>
> It should be noted that script interpreters cannot directly use
> execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
> lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
> being executed to interpret the script.  Unlike the kernel, script
> interpreters may just interpret the shebang as a simple comment, which
> should not change for backward compatibility reasons.
>
> Because scripts or libraries files might not currently have the
> executable permission set, or because we might want specific users to be
> allowed to run arbitrary scripts, the following patch provides a dynamic
> configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
> SECBIT_EXEC_DENY_INTERACTIVE securebits.
>
> This is a redesign of the CLIP OS 4's O_MAYEXEC:
> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> This patch has been used for more than a decade with customized script
> interpreters.  Some examples can be found here:
> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <paul@paul-moore.com>
> Reviewed-by: Serge Hallyn <serge@hallyn.com>
> Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net
> ---
>
> Changes since v20:
> * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
>   Serge Hallyn.
> * Move the UAPI documentation to a dedicated RST file.
> * Add Reviewed-by: Serge Hallyn
>
> Changes since v19:
> * Remove mention of "role transition" as suggested by Andy.
> * Highlight the difference between security_bprm_creds_for_exec() and
>   the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
>   discussed with Jeff.
> * Improve documentation both in UAPI comments and kernel comments
>   (requested by Kees).
>
> New design since v18:
> https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> ---
>  Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
>  Documentation/userspace-api/index.rst      |  1 +
>  fs/exec.c                                  | 20 +++++++++++--
>  include/linux/binfmts.h                    |  7 ++++-
>  include/uapi/linux/fcntl.h                 |  4 +++
>  kernel/audit.h                             |  1 +
>  kernel/auditsc.c                           |  1 +
>  security/security.c                        | 10 +++++++
>  8 files changed, 75 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/userspace-api/check_exec.rst
>
> diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> new file mode 100644
> index 000000000000..ad1aeaa5f6c0
> --- /dev/null
> +++ b/Documentation/userspace-api/check_exec.rst
> @@ -0,0 +1,34 @@
> +===================
> +Executability check
> +===================
> +
> +AT_EXECVE_CHECK
> +===============
> +
> +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
> +check on a regular file and returns 0 if execution of this file would be
> +allowed, ignoring the file format and then the related interpreter dependencies
> +(e.g. ELF libraries, script's shebang).
> +
> +Programs should always perform this check to apply kernel-level checks against
> +files that are not directly executed by the kernel but passed to a user space
> +interpreter instead.  All files that contain executable code, from the point of
> +view of the interpreter, should be checked.  However the result of this check
> +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
> +``SECBIT_EXEC_DENY_INTERACTIVE.``.
Regarding "should only"
Userspace (e.g. libc) could decide to enforce even when
SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing
doesn't make sense.
When SECBIT_EXEC_RESTRICT_FILE=1,  userspace is bound to enforce.

> +
> +The main purpose of this flag is to improve the security and consistency of an
> +execution environment to ensure that direct file execution (e.g.
> +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to
> +the same result.  For instance, this can be used to check if a file is
> +trustworthy according to the caller's environment.
> +
> +In a secure environment, libraries and any executable dependencies should also
> +be checked.  For instance, dynamic linking should make sure that all libraries
> +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``).
> +For such secure execution environment to make sense, only trusted code should
> +be executable, which also requires integrity guarantees.
> +
> +To avoid race conditions leading to time-of-check to time-of-use issues,
> +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
> +file descriptor instead of a path.
> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> index 274cc7546efc..6272bcf11296 100644
> --- a/Documentation/userspace-api/index.rst
> +++ b/Documentation/userspace-api/index.rst
> @@ -35,6 +35,7 @@ Security-related interfaces
>     mfd_noexec
>     spec_ctrl
>     tee
> +   check_exec
>
>  Devices and I/O
>  ===============
> diff --git a/fs/exec.c b/fs/exec.c
> index 6c53920795c2..bb83b6a39530 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
>                 .lookup_flags = LOOKUP_FOLLOW,
>         };
>
> -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> +       if ((flags &
> +            ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0)
>                 return ERR_PTR(-EINVAL);
>         if (flags & AT_SYMLINK_NOFOLLOW)
>                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
>         }
>         bprm->interp = bprm->filename;
>
> +       /*
> +        * At this point, security_file_open() has already been called (with
> +        * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will
> +        * stop just after the security_bprm_creds_for_exec() call in
> +        * bprm_execve().  Indeed, the kernel should not try to parse the
> +        * content of the file with exec_binprm() nor change the calling
> +        * thread, which means that the following security functions will be
> +        * not called:
> +        * - security_bprm_check()
> +        * - security_bprm_creds_from_file()
> +        * - security_bprm_committing_creds()
> +        * - security_bprm_committed_creds()
> +        */
> +       bprm->is_check = !!(flags & AT_EXECVE_CHECK);
> +
>         retval = bprm_mm_init(bprm);
>         if (!retval)
>                 return bprm;
> @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm)
>
>         /* Set the unchanging part of bprm->cred */
>         retval = security_bprm_creds_for_exec(bprm);
> -       if (retval)
> +       if (retval || bprm->is_check)
>                 goto out;
>
>         retval = exec_binprm(bprm);
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index e6c00e860951..8ff0eb3644a1 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -42,7 +42,12 @@ struct linux_binprm {
>                  * Set when errors can no longer be returned to the
>                  * original userspace.
>                  */
> -               point_of_no_return:1;
> +               point_of_no_return:1,
> +               /*
> +                * Set by user space to check executability according to the
> +                * caller's environment.
> +                */
> +               is_check:1;
>         struct file *executable; /* Executable to pass to the interpreter */
>         struct file *interpreter;
>         struct file *file;
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index 87e2dec79fea..2e87f2e3a79f 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -154,6 +154,10 @@
>                                            usable with open_by_handle_at(2). */
>  #define AT_HANDLE_MNT_ID_UNIQUE        0x001   /* Return the u64 unique mount ID. */
>
> +/* Flags for execveat2(2). */
> +#define AT_EXECVE_CHECK                0x10000 /* Only perform a check if execution
> +                                          would be allowed. */
> +
>  #if defined(__KERNEL__)
>  #define AT_GETATTR_NOSEC       0x80000000
>  #endif
> diff --git a/kernel/audit.h b/kernel/audit.h
> index a60d2840559e..8ebdabd2ab81 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -197,6 +197,7 @@ struct audit_context {
>                 struct open_how openat2;
>                 struct {
>                         int                     argc;
> +                       bool                    is_check;
>                 } execve;
>                 struct {
>                         char                    *name;
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index cd57053b4a69..8d9ba5600cf2 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
>
>         context->type = AUDIT_EXECVE;
>         context->execve.argc = bprm->argc;
> +       context->execve.is_check = bprm->is_check;
Where is execve.is_check used ?


>  }
>
>
> diff --git a/security/security.c b/security/security.c
> index c5981e558bc2..456361ec249d 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
>   * to 1 if AT_SECURE should be set to request libc enable secure mode.  @bprm
>   * contains the linux_binprm structure.
>   *
> + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is
> + * set.  The result must be the same as without this flag even if the execution
> + * will never really happen and @bprm will always be dropped.
> + *
> + * This hook must not change current->cred, only @bprm->cred.
> + *
>   * Return: Returns 0 if the hook is successful and permission is granted.
>   */
>  int security_bprm_creds_for_exec(struct linux_binprm *bprm)
> @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file)
>   * Save open-time permission checking state for later use upon file_permission,
>   * and recheck access if anything has changed since inode_permission.
>   *
> + * We can check if a file is opened for execution (e.g. execve(2) call), either
> + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags &
> + * __FMODE_EXEC .
> + *
>   * Return: Returns 0 if permission is granted.
>   */
>  int security_file_open(struct file *file)
> --
> 2.47.0
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 2/6] security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
  2024-11-12 19:18 ` [PATCH v21 2/6] security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits Mickaël Salaün
@ 2024-11-20  1:30   ` Jeff Xu
  2024-11-20  9:42     ` Mickaël Salaün
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Xu @ 2024-11-20  1:30 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Theodore Ts'o,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module, Andy Lutomirski

On Tue, Nov 12, 2024 at 12:06 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> The new SECBIT_EXEC_RESTRICT_FILE, SECBIT_EXEC_DENY_INTERACTIVE, and
> their *_LOCKED counterparts are designed to be set by processes setting
> up an execution environment, such as a user session, a container, or a
> security sandbox.  Unlike other securebits, these ones can be set by
> unprivileged processes.  Like seccomp filters or Landlock domains, the
> securebits are inherited across processes.
>
> When SECBIT_EXEC_RESTRICT_FILE is set, programs interpreting code should
> control executable resources according to execveat(2) + AT_EXECVE_CHECK
> (see previous commit).
>
> When SECBIT_EXEC_DENY_INTERACTIVE is set, a process should deny
> execution of user interactive commands (which excludes executable
> regular files).
>
> Being able to configure each of these securebits enables system
> administrators or owner of image containers to gradually validate the
> related changes and to identify potential issues (e.g. with interpreter
> or audit logs).
>
> It should be noted that unlike other security bits, the
> SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE bits are
> dedicated to user space willing to restrict itself.  Because of that,
> they only make sense in the context of a trusted environment (e.g.
> sandbox, container, user session, full system) where the process
> changing its behavior (according to these bits) and all its parent
> processes are trusted.  Otherwise, any parent process could just execute
> its own malicious code (interpreting a script or not), or even enforce a
> seccomp filter to mask these bits.
>
> Such a secure environment can be achieved with an appropriate access
> control (e.g. mount's noexec option, file access rights, LSM policy) and
> an enlighten ld.so checking that libraries are allowed for execution
> e.g., to protect against illegitimate use of LD_PRELOAD.
>
> Ptrace restrictions according to these securebits would not make sense
> because of the processes' trust assumption.
>
> Scripts may need some changes to deal with untrusted data (e.g. stdin,
> environment variables), but that is outside the scope of the kernel.
>
> See chromeOS's documentation about script execution control and the
> related threat model:
> https://www.chromium.org/chromium-os/developer-library/guides/security/noexec-shell-scripts/
>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <paul@paul-moore.com>
> Reviewed-by: Serge Hallyn <serge@hallyn.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Link: https://lore.kernel.org/r/20241112191858.162021-3-mic@digikod.net
> ---
>
> Changes since v20:
> * Move UAPI documentation to a dedicated RST file and format it.
>
> Changes since v19:
> * Replace SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT with
>   SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE:
>   https://lore.kernel.org/all/20240710.eiKohpa4Phai@digikod.net/
> * Remove the ptrace restrictions, suggested by Andy.
> * Improve documentation according to the discussion with Jeff.
>
> New design since v18:
> https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> ---
>  Documentation/userspace-api/check_exec.rst | 97 ++++++++++++++++++++++
>  include/uapi/linux/securebits.h            | 24 +++++-
>  security/commoncap.c                       | 29 +++++--
>  3 files changed, 143 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> index ad1aeaa5f6c0..1df5c7534af9 100644
> --- a/Documentation/userspace-api/check_exec.rst
> +++ b/Documentation/userspace-api/check_exec.rst
> @@ -2,6 +2,21 @@
>  Executability check
>  ===================
>
> +The ``AT_EXECVE_CHECK`` :manpage:`execveat(2)` flag, and the
> +``SECBIT_EXEC_RESTRICT_FILE`` and ``SECBIT_EXEC_DENY_INTERACTIVE`` securebits
> +are intended for script interpreters and dynamic linkers to enforce a
> +consistent execution security policy handled by the kernel.  See the
> +`samples/check-exec/inc.c`_ example.
> +
> +Whether an interpreter should check these securebits or not depends on the
> +security risk of running malicious scripts with respect to the execution
> +environment, and whether the kernel can check if a script is trustworthy or
> +not.  For instance, Python scripts running on a server can use arbitrary
> +syscalls and access arbitrary files.  Such interpreters should then be
> +enlighten to use these securebits and let users define their security policy.
> +However, a JavaScript engine running in a web browser should already be
> +sandboxed and then should not be able to harm the user's environment.
> +
>  AT_EXECVE_CHECK
>  ===============
>
> @@ -32,3 +47,85 @@ be executable, which also requires integrity guarantees.
>  To avoid race conditions leading to time-of-check to time-of-use issues,
>  ``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
>  file descriptor instead of a path.
> +
> +SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE
> +==========================================================
> +
> +When ``SECBIT_EXEC_RESTRICT_FILE`` is set, a process should only interpret or
> +execute a file if a call to :manpage:`execveat(2)` with the related file
> +descriptor and the ``AT_EXECVE_CHECK`` flag succeed.
> +
> +This secure bit may be set by user session managers, service managers,
> +container runtimes, sandboxer tools...  Except for test environments, the
> +related ``SECBIT_EXEC_RESTRICT_FILE_LOCKED`` bit should also be set.
> +
> +Programs should only enforce consistent restrictions according to the
> +securebits but without relying on any other user-controlled configuration.
> +Indeed, the use case for these securebits is to only trust executable code
> +vetted by the system configuration (through the kernel), so we should be
> +careful to not let untrusted users control this configuration.
> +
> +However, script interpreters may still use user configuration such as
> +environment variables as long as it is not a way to disable the securebits
> +checks.  For instance, the ``PATH`` and ``LD_PRELOAD`` variables can be set by
> +a script's caller.  Changing these variables may lead to unintended code
> +executions, but only from vetted executable programs, which is OK.  For this to
> +make sense, the system should provide a consistent security policy to avoid
> +arbitrary code execution e.g., by enforcing a write xor execute policy.
> +
> +When ``SECBIT_EXEC_DENY_INTERACTIVE`` is set, a process should never interpret
> +interactive user commands (e.g. scripts).  However, if such commands are passed
> +through a file descriptor (e.g. stdin), its content should be interpreted if a
> +call to :manpage:`execveat(2)` with the related file descriptor and the
> +``AT_EXECVE_CHECK`` flag succeed.
> +
> +For instance, script interpreters called with a script snippet as argument
> +should always deny such execution if ``SECBIT_EXEC_DENY_INTERACTIVE`` is set.
> +
> +This secure bit may be set by user session managers, service managers,
> +container runtimes, sandboxer tools...  Except for test environments, the
> +related ``SECBIT_EXEC_DENY_INTERACTIVE_LOCKED`` bit should also be set.
> +
> +Here is the expected behavior for a script interpreter according to combination
> +of any exec securebits:
> +
> +1. ``SECBIT_EXEC_RESTRICT_FILE=0`` and ``SECBIT_EXEC_DENY_INTERACTIVE=0``
> +
> +   Always interpret scripts, and allow arbitrary user commands (default).
> +
> +   No threat, everyone and everything is trusted, but we can get ahead of
> +   potential issues thanks to the call to :manpage:`execveat(2)` with
> +   ``AT_EXECVE_CHECK`` which should always be performed but ignored by the
> +   script interpreter.  Indeed, this check is still important to enable systems
> +   administrators to verify requests (e.g. with audit) and prepare for
> +   migration to a secure mode.
> +
> +2. ``SECBIT_EXEC_RESTRICT_FILE=1`` and ``SECBIT_EXEC_DENY_INTERACTIVE=0``
> +
> +   Deny script interpretation if they are not executable, but allow
> +   arbitrary user commands.
> +
> +   The threat is (potential) malicious scripts run by trusted (and not fooled)
> +   users.  That can protect against unintended script executions (e.g. ``sh
> +   /tmp/*.sh``).  This makes sense for (semi-restricted) user sessions.
> +
> +3. ``SECBIT_EXEC_RESTRICT_FILE=0`` and ``SECBIT_EXEC_DENY_INTERACTIVE=1``
> +
> +   Always interpret scripts, but deny arbitrary user commands.
> +
> +   This use case may be useful for secure services (i.e. without interactive
> +   user session) where scripts' integrity is verified (e.g.  with IMA/EVM or
> +   dm-verity/IPE) but where access rights might not be ready yet.  Indeed,
> +   arbitrary interactive commands would be much more difficult to check.
> +
> +4. ``SECBIT_EXEC_RESTRICT_FILE=1`` and ``SECBIT_EXEC_DENY_INTERACTIVE=1``
> +
> +   Deny script interpretation if they are not executable, and also deny
> +   any arbitrary user commands.
> +
> +   The threat is malicious scripts run by untrusted users (but trusted code).
> +   This makes sense for system services that may only execute trusted scripts.
> +
> +.. Links
> +.. _samples/check-exec/inc.c:
> +   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/check-exec/inc.c
> diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h
> index d6d98877ff1a..3fba30dbd68b 100644
> --- a/include/uapi/linux/securebits.h
> +++ b/include/uapi/linux/securebits.h
> @@ -52,10 +52,32 @@
>  #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \
>                         (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED))
>
> +/* See Documentation/userspace-api/check_exec.rst */
> +#define SECURE_EXEC_RESTRICT_FILE              8
> +#define SECURE_EXEC_RESTRICT_FILE_LOCKED       9  /* make bit-8 immutable */
> +
> +#define SECBIT_EXEC_RESTRICT_FILE (issecure_mask(SECURE_EXEC_RESTRICT_FILE))
> +#define SECBIT_EXEC_RESTRICT_FILE_LOCKED \
> +                       (issecure_mask(SECURE_EXEC_RESTRICT_FILE_LOCKED))
> +
> +/* See Documentation/userspace-api/check_exec.rst */
> +#define SECURE_EXEC_DENY_INTERACTIVE           10
> +#define SECURE_EXEC_DENY_INTERACTIVE_LOCKED    11  /* make bit-10 immutable */
> +
> +#define SECBIT_EXEC_DENY_INTERACTIVE \
> +                       (issecure_mask(SECURE_EXEC_DENY_INTERACTIVE))
> +#define SECBIT_EXEC_DENY_INTERACTIVE_LOCKED \
> +                       (issecure_mask(SECURE_EXEC_DENY_INTERACTIVE_LOCKED))
> +
>  #define SECURE_ALL_BITS                (issecure_mask(SECURE_NOROOT) | \
>                                  issecure_mask(SECURE_NO_SETUID_FIXUP) | \
>                                  issecure_mask(SECURE_KEEP_CAPS) | \
> -                                issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
> +                                issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \
> +                                issecure_mask(SECURE_EXEC_RESTRICT_FILE) | \
> +                                issecure_mask(SECURE_EXEC_DENY_INTERACTIVE))
>  #define SECURE_ALL_LOCKS       (SECURE_ALL_BITS << 1)
>
> +#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_EXEC_RESTRICT_FILE) | \
> +                                issecure_mask(SECURE_EXEC_DENY_INTERACTIVE))
> +
>  #endif /* _UAPI_LINUX_SECUREBITS_H */
> diff --git a/security/commoncap.c b/security/commoncap.c
> index cefad323a0b1..52ea01acb453 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -1302,21 +1302,38 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>                      & (old->securebits ^ arg2))                        /*[1]*/
>                     || ((old->securebits & SECURE_ALL_LOCKS & ~arg2))   /*[2]*/
>                     || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS))   /*[3]*/
> -                   || (cap_capable(current_cred(),
> -                                   current_cred()->user_ns,
> -                                   CAP_SETPCAP,
> -                                   CAP_OPT_NONE) != 0)                 /*[4]*/
>                         /*
>                          * [1] no changing of bits that are locked
>                          * [2] no unlocking of locks
>                          * [3] no setting of unsupported bits
> -                        * [4] doing anything requires privilege (go read about
> -                        *     the "sendmail capabilities bug")
>                          */
>                     )
>                         /* cannot change a locked bit */
>                         return -EPERM;
>
> +               /*
> +                * Doing anything requires privilege (go read about the
> +                * "sendmail capabilities bug"), except for unprivileged bits.
> +                * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not
> +                * restrictions enforced by the kernel but by user space on
> +                * itself.
> +                */
> +               if (cap_capable(current_cred(), current_cred()->user_ns,
> +                               CAP_SETPCAP, CAP_OPT_NONE) != 0) {
> +                       const unsigned long unpriv_and_locks =
> +                               SECURE_ALL_UNPRIVILEGED |
> +                               SECURE_ALL_UNPRIVILEGED << 1;
> +                       const unsigned long changed = old->securebits ^ arg2;
> +
> +                       /* For legacy reason, denies non-change. */
> +                       if (!changed)
> +                               return -EPERM;
> +
> +                       /* Denies privileged changes. */
> +                       if (changed & ~unpriv_and_locks)
> +                               return -EPERM;
> +               }
> +
Is above a refactor (without functional change) or a bug fix ?
maybe a separate commit with description ?

>                 new = prepare_creds();
>                 if (!new)
>                         return -ENOMEM;
> --
> 2.47.0
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-20  1:17   ` Jeff Xu
@ 2024-11-20  9:42     ` Mickaël Salaün
  2024-11-20 16:06       ` Jeff Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-20  9:42 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Theodore Ts'o,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module

On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote:
> On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
> > be allowed for execution.  The main use case is for script interpreters
> > and dynamic linkers to check execution permission according to the
> > kernel's security policy. Another use case is to add context to access
> > logs e.g., which script (instead of interpreter) accessed a file.  As
> > any executable code, scripts could also use this check [1].
> >
> > This is different from faccessat(2) + X_OK which only checks a subset of
> > access rights (i.e. inode permission and mount options for regular
> > files), but not the full context (e.g. all LSM access checks).  The main
> > use case for access(2) is for SUID processes to (partially) check access
> > on behalf of their caller.  The main use case for execveat(2) +
> > AT_EXECVE_CHECK is to check if a script execution would be allowed,
> > according to all the different restrictions in place.  Because the use
> > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
> > execution, user space gets the same error codes.
> >
> > An interesting point of using execveat(2) instead of openat2(2) is that
> > it decouples the check from the enforcement.  Indeed, the security check
> > can be logged (e.g. with audit) without blocking an execution
> > environment not yet ready to enforce a strict security policy.
> >
> > LSMs can control or log execution requests with
> > security_bprm_creds_for_exec().  However, to enforce a consistent and
> > complete access control (e.g. on binary's dependencies) LSMs should
> > restrict file executability, or mesure executed files, with
> > security_file_open() by checking file->f_flags & __FMODE_EXEC.
> >
> > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
> > doesn't make sense for the kernel to parse the checked files, look for
> > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > if the format is unknown.  Because of that, security_bprm_check() is
> > never called when AT_EXECVE_CHECK is used.
> >
> > It should be noted that script interpreters cannot directly use
> > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
> > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
> > being executed to interpret the script.  Unlike the kernel, script
> > interpreters may just interpret the shebang as a simple comment, which
> > should not change for backward compatibility reasons.
> >
> > Because scripts or libraries files might not currently have the
> > executable permission set, or because we might want specific users to be
> > allowed to run arbitrary scripts, the following patch provides a dynamic
> > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
> > SECBIT_EXEC_DENY_INTERACTIVE securebits.
> >
> > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > This patch has been used for more than a decade with customized script
> > interpreters.  Some examples can be found here:
> > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> >
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Paul Moore <paul@paul-moore.com>
> > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net
> > ---
> >
> > Changes since v20:
> > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
> >   Serge Hallyn.
> > * Move the UAPI documentation to a dedicated RST file.
> > * Add Reviewed-by: Serge Hallyn
> >
> > Changes since v19:
> > * Remove mention of "role transition" as suggested by Andy.
> > * Highlight the difference between security_bprm_creds_for_exec() and
> >   the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
> >   discussed with Jeff.
> > * Improve documentation both in UAPI comments and kernel comments
> >   (requested by Kees).
> >
> > New design since v18:
> > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > ---
> >  Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
> >  Documentation/userspace-api/index.rst      |  1 +
> >  fs/exec.c                                  | 20 +++++++++++--
> >  include/linux/binfmts.h                    |  7 ++++-
> >  include/uapi/linux/fcntl.h                 |  4 +++
> >  kernel/audit.h                             |  1 +
> >  kernel/auditsc.c                           |  1 +
> >  security/security.c                        | 10 +++++++
> >  8 files changed, 75 insertions(+), 3 deletions(-)
> >  create mode 100644 Documentation/userspace-api/check_exec.rst
> >
> > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> > new file mode 100644
> > index 000000000000..ad1aeaa5f6c0
> > --- /dev/null
> > +++ b/Documentation/userspace-api/check_exec.rst
> > @@ -0,0 +1,34 @@
> > +===================
> > +Executability check
> > +===================
> > +
> > +AT_EXECVE_CHECK
> > +===============
> > +
> > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
> > +check on a regular file and returns 0 if execution of this file would be
> > +allowed, ignoring the file format and then the related interpreter dependencies
> > +(e.g. ELF libraries, script's shebang).
> > +
> > +Programs should always perform this check to apply kernel-level checks against
> > +files that are not directly executed by the kernel but passed to a user space
> > +interpreter instead.  All files that contain executable code, from the point of
> > +view of the interpreter, should be checked.  However the result of this check
> > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
> > +``SECBIT_EXEC_DENY_INTERACTIVE.``.
> Regarding "should only"
> Userspace (e.g. libc) could decide to enforce even when
> SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing
> doesn't make sense.

User space is always in control, but I don't think it would be wise to
not follow the configuration securebits (in a generic system) because
this could result to unattended behaviors (I don't have a specific one
in mind but...).  That being said, configuration and checks are
standalones and specific/tailored systems are free to do the checks they
want.

> When SECBIT_EXEC_RESTRICT_FILE=1,  userspace is bound to enforce.
> 
> > +
> > +The main purpose of this flag is to improve the security and consistency of an
> > +execution environment to ensure that direct file execution (e.g.
> > +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to
> > +the same result.  For instance, this can be used to check if a file is
> > +trustworthy according to the caller's environment.
> > +
> > +In a secure environment, libraries and any executable dependencies should also
> > +be checked.  For instance, dynamic linking should make sure that all libraries
> > +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``).
> > +For such secure execution environment to make sense, only trusted code should
> > +be executable, which also requires integrity guarantees.
> > +
> > +To avoid race conditions leading to time-of-check to time-of-use issues,
> > +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
> > +file descriptor instead of a path.
> > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> > index 274cc7546efc..6272bcf11296 100644
> > --- a/Documentation/userspace-api/index.rst
> > +++ b/Documentation/userspace-api/index.rst
> > @@ -35,6 +35,7 @@ Security-related interfaces
> >     mfd_noexec
> >     spec_ctrl
> >     tee
> > +   check_exec
> >
> >  Devices and I/O
> >  ===============
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 6c53920795c2..bb83b6a39530 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> >                 .lookup_flags = LOOKUP_FOLLOW,
> >         };
> >
> > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > +       if ((flags &
> > +            ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0)
> >                 return ERR_PTR(-EINVAL);
> >         if (flags & AT_SYMLINK_NOFOLLOW)
> >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> >         }
> >         bprm->interp = bprm->filename;
> >
> > +       /*
> > +        * At this point, security_file_open() has already been called (with
> > +        * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will
> > +        * stop just after the security_bprm_creds_for_exec() call in
> > +        * bprm_execve().  Indeed, the kernel should not try to parse the
> > +        * content of the file with exec_binprm() nor change the calling
> > +        * thread, which means that the following security functions will be
> > +        * not called:
> > +        * - security_bprm_check()
> > +        * - security_bprm_creds_from_file()
> > +        * - security_bprm_committing_creds()
> > +        * - security_bprm_committed_creds()
> > +        */
> > +       bprm->is_check = !!(flags & AT_EXECVE_CHECK);
> > +
> >         retval = bprm_mm_init(bprm);
> >         if (!retval)
> >                 return bprm;
> > @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> >
> >         /* Set the unchanging part of bprm->cred */
> >         retval = security_bprm_creds_for_exec(bprm);
> > -       if (retval)
> > +       if (retval || bprm->is_check)
> >                 goto out;
> >
> >         retval = exec_binprm(bprm);
> > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > index e6c00e860951..8ff0eb3644a1 100644
> > --- a/include/linux/binfmts.h
> > +++ b/include/linux/binfmts.h
> > @@ -42,7 +42,12 @@ struct linux_binprm {
> >                  * Set when errors can no longer be returned to the
> >                  * original userspace.
> >                  */
> > -               point_of_no_return:1;
> > +               point_of_no_return:1,
> > +               /*
> > +                * Set by user space to check executability according to the
> > +                * caller's environment.
> > +                */
> > +               is_check:1;
> >         struct file *executable; /* Executable to pass to the interpreter */
> >         struct file *interpreter;
> >         struct file *file;
> > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > index 87e2dec79fea..2e87f2e3a79f 100644
> > --- a/include/uapi/linux/fcntl.h
> > +++ b/include/uapi/linux/fcntl.h
> > @@ -154,6 +154,10 @@
> >                                            usable with open_by_handle_at(2). */
> >  #define AT_HANDLE_MNT_ID_UNIQUE        0x001   /* Return the u64 unique mount ID. */
> >
> > +/* Flags for execveat2(2). */
> > +#define AT_EXECVE_CHECK                0x10000 /* Only perform a check if execution
> > +                                          would be allowed. */
> > +
> >  #if defined(__KERNEL__)
> >  #define AT_GETATTR_NOSEC       0x80000000
> >  #endif
> > diff --git a/kernel/audit.h b/kernel/audit.h
> > index a60d2840559e..8ebdabd2ab81 100644
> > --- a/kernel/audit.h
> > +++ b/kernel/audit.h
> > @@ -197,6 +197,7 @@ struct audit_context {
> >                 struct open_how openat2;
> >                 struct {
> >                         int                     argc;
> > +                       bool                    is_check;
> >                 } execve;
> >                 struct {
> >                         char                    *name;
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index cd57053b4a69..8d9ba5600cf2 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> >
> >         context->type = AUDIT_EXECVE;
> >         context->execve.argc = bprm->argc;
> > +       context->execve.is_check = bprm->is_check;
> Where is execve.is_check used ?

It is used in bprm_execve(), exposed to the audit framework, and
potentially used by LSMs.

> 
> 
> >  }
> >
> >
> > diff --git a/security/security.c b/security/security.c
> > index c5981e558bc2..456361ec249d 100644
> > --- a/security/security.c
> > +++ b/security/security.c
> > @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
> >   * to 1 if AT_SECURE should be set to request libc enable secure mode.  @bprm
> >   * contains the linux_binprm structure.
> >   *
> > + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is
> > + * set.  The result must be the same as without this flag even if the execution
> > + * will never really happen and @bprm will always be dropped.
> > + *
> > + * This hook must not change current->cred, only @bprm->cred.
> > + *
> >   * Return: Returns 0 if the hook is successful and permission is granted.
> >   */
> >  int security_bprm_creds_for_exec(struct linux_binprm *bprm)
> > @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file)
> >   * Save open-time permission checking state for later use upon file_permission,
> >   * and recheck access if anything has changed since inode_permission.
> >   *
> > + * We can check if a file is opened for execution (e.g. execve(2) call), either
> > + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags &
> > + * __FMODE_EXEC .
> > + *
> >   * Return: Returns 0 if permission is granted.
> >   */
> >  int security_file_open(struct file *file)
> > --
> > 2.47.0
> >
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 2/6] security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
  2024-11-20  1:30   ` Jeff Xu
@ 2024-11-20  9:42     ` Mickaël Salaün
  0 siblings, 0 replies; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-20  9:42 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Theodore Ts'o,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module, Andy Lutomirski

On Tue, Nov 19, 2024 at 05:30:13PM -0800, Jeff Xu wrote:
> On Tue, Nov 12, 2024 at 12:06 PM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > The new SECBIT_EXEC_RESTRICT_FILE, SECBIT_EXEC_DENY_INTERACTIVE, and
> > their *_LOCKED counterparts are designed to be set by processes setting
> > up an execution environment, such as a user session, a container, or a
> > security sandbox.  Unlike other securebits, these ones can be set by
> > unprivileged processes.  Like seccomp filters or Landlock domains, the
> > securebits are inherited across processes.
> >
> > When SECBIT_EXEC_RESTRICT_FILE is set, programs interpreting code should
> > control executable resources according to execveat(2) + AT_EXECVE_CHECK
> > (see previous commit).
> >
> > When SECBIT_EXEC_DENY_INTERACTIVE is set, a process should deny
> > execution of user interactive commands (which excludes executable
> > regular files).
> >
> > Being able to configure each of these securebits enables system
> > administrators or owner of image containers to gradually validate the
> > related changes and to identify potential issues (e.g. with interpreter
> > or audit logs).
> >
> > It should be noted that unlike other security bits, the
> > SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE bits are
> > dedicated to user space willing to restrict itself.  Because of that,
> > they only make sense in the context of a trusted environment (e.g.
> > sandbox, container, user session, full system) where the process
> > changing its behavior (according to these bits) and all its parent
> > processes are trusted.  Otherwise, any parent process could just execute
> > its own malicious code (interpreting a script or not), or even enforce a
> > seccomp filter to mask these bits.
> >
> > Such a secure environment can be achieved with an appropriate access
> > control (e.g. mount's noexec option, file access rights, LSM policy) and
> > an enlighten ld.so checking that libraries are allowed for execution
> > e.g., to protect against illegitimate use of LD_PRELOAD.
> >
> > Ptrace restrictions according to these securebits would not make sense
> > because of the processes' trust assumption.
> >
> > Scripts may need some changes to deal with untrusted data (e.g. stdin,
> > environment variables), but that is outside the scope of the kernel.
> >
> > See chromeOS's documentation about script execution control and the
> > related threat model:
> > https://www.chromium.org/chromium-os/developer-library/guides/security/noexec-shell-scripts/
> >
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Paul Moore <paul@paul-moore.com>
> > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > Link: https://lore.kernel.org/r/20241112191858.162021-3-mic@digikod.net
> > ---
> >
> > Changes since v20:
> > * Move UAPI documentation to a dedicated RST file and format it.
> >
> > Changes since v19:
> > * Replace SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT with
> >   SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE:
> >   https://lore.kernel.org/all/20240710.eiKohpa4Phai@digikod.net/
> > * Remove the ptrace restrictions, suggested by Andy.
> > * Improve documentation according to the discussion with Jeff.
> >
> > New design since v18:
> > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > ---
> >  Documentation/userspace-api/check_exec.rst | 97 ++++++++++++++++++++++
> >  include/uapi/linux/securebits.h            | 24 +++++-
> >  security/commoncap.c                       | 29 +++++--
> >  3 files changed, 143 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> > index ad1aeaa5f6c0..1df5c7534af9 100644
> > --- a/Documentation/userspace-api/check_exec.rst
> > +++ b/Documentation/userspace-api/check_exec.rst
> > @@ -2,6 +2,21 @@
> >  Executability check
> >  ===================
> >
> > +The ``AT_EXECVE_CHECK`` :manpage:`execveat(2)` flag, and the
> > +``SECBIT_EXEC_RESTRICT_FILE`` and ``SECBIT_EXEC_DENY_INTERACTIVE`` securebits
> > +are intended for script interpreters and dynamic linkers to enforce a
> > +consistent execution security policy handled by the kernel.  See the
> > +`samples/check-exec/inc.c`_ example.
> > +
> > +Whether an interpreter should check these securebits or not depends on the
> > +security risk of running malicious scripts with respect to the execution
> > +environment, and whether the kernel can check if a script is trustworthy or
> > +not.  For instance, Python scripts running on a server can use arbitrary
> > +syscalls and access arbitrary files.  Such interpreters should then be
> > +enlighten to use these securebits and let users define their security policy.
> > +However, a JavaScript engine running in a web browser should already be
> > +sandboxed and then should not be able to harm the user's environment.
> > +
> >  AT_EXECVE_CHECK
> >  ===============
> >
> > @@ -32,3 +47,85 @@ be executable, which also requires integrity guarantees.
> >  To avoid race conditions leading to time-of-check to time-of-use issues,
> >  ``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
> >  file descriptor instead of a path.
> > +
> > +SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE
> > +==========================================================
> > +
> > +When ``SECBIT_EXEC_RESTRICT_FILE`` is set, a process should only interpret or
> > +execute a file if a call to :manpage:`execveat(2)` with the related file
> > +descriptor and the ``AT_EXECVE_CHECK`` flag succeed.
> > +
> > +This secure bit may be set by user session managers, service managers,
> > +container runtimes, sandboxer tools...  Except for test environments, the
> > +related ``SECBIT_EXEC_RESTRICT_FILE_LOCKED`` bit should also be set.
> > +
> > +Programs should only enforce consistent restrictions according to the
> > +securebits but without relying on any other user-controlled configuration.
> > +Indeed, the use case for these securebits is to only trust executable code
> > +vetted by the system configuration (through the kernel), so we should be
> > +careful to not let untrusted users control this configuration.
> > +
> > +However, script interpreters may still use user configuration such as
> > +environment variables as long as it is not a way to disable the securebits
> > +checks.  For instance, the ``PATH`` and ``LD_PRELOAD`` variables can be set by
> > +a script's caller.  Changing these variables may lead to unintended code
> > +executions, but only from vetted executable programs, which is OK.  For this to
> > +make sense, the system should provide a consistent security policy to avoid
> > +arbitrary code execution e.g., by enforcing a write xor execute policy.
> > +
> > +When ``SECBIT_EXEC_DENY_INTERACTIVE`` is set, a process should never interpret
> > +interactive user commands (e.g. scripts).  However, if such commands are passed
> > +through a file descriptor (e.g. stdin), its content should be interpreted if a
> > +call to :manpage:`execveat(2)` with the related file descriptor and the
> > +``AT_EXECVE_CHECK`` flag succeed.
> > +
> > +For instance, script interpreters called with a script snippet as argument
> > +should always deny such execution if ``SECBIT_EXEC_DENY_INTERACTIVE`` is set.
> > +
> > +This secure bit may be set by user session managers, service managers,
> > +container runtimes, sandboxer tools...  Except for test environments, the
> > +related ``SECBIT_EXEC_DENY_INTERACTIVE_LOCKED`` bit should also be set.
> > +
> > +Here is the expected behavior for a script interpreter according to combination
> > +of any exec securebits:
> > +
> > +1. ``SECBIT_EXEC_RESTRICT_FILE=0`` and ``SECBIT_EXEC_DENY_INTERACTIVE=0``
> > +
> > +   Always interpret scripts, and allow arbitrary user commands (default).
> > +
> > +   No threat, everyone and everything is trusted, but we can get ahead of
> > +   potential issues thanks to the call to :manpage:`execveat(2)` with
> > +   ``AT_EXECVE_CHECK`` which should always be performed but ignored by the
> > +   script interpreter.  Indeed, this check is still important to enable systems
> > +   administrators to verify requests (e.g. with audit) and prepare for
> > +   migration to a secure mode.
> > +
> > +2. ``SECBIT_EXEC_RESTRICT_FILE=1`` and ``SECBIT_EXEC_DENY_INTERACTIVE=0``
> > +
> > +   Deny script interpretation if they are not executable, but allow
> > +   arbitrary user commands.
> > +
> > +   The threat is (potential) malicious scripts run by trusted (and not fooled)
> > +   users.  That can protect against unintended script executions (e.g. ``sh
> > +   /tmp/*.sh``).  This makes sense for (semi-restricted) user sessions.
> > +
> > +3. ``SECBIT_EXEC_RESTRICT_FILE=0`` and ``SECBIT_EXEC_DENY_INTERACTIVE=1``
> > +
> > +   Always interpret scripts, but deny arbitrary user commands.
> > +
> > +   This use case may be useful for secure services (i.e. without interactive
> > +   user session) where scripts' integrity is verified (e.g.  with IMA/EVM or
> > +   dm-verity/IPE) but where access rights might not be ready yet.  Indeed,
> > +   arbitrary interactive commands would be much more difficult to check.
> > +
> > +4. ``SECBIT_EXEC_RESTRICT_FILE=1`` and ``SECBIT_EXEC_DENY_INTERACTIVE=1``
> > +
> > +   Deny script interpretation if they are not executable, and also deny
> > +   any arbitrary user commands.
> > +
> > +   The threat is malicious scripts run by untrusted users (but trusted code).
> > +   This makes sense for system services that may only execute trusted scripts.
> > +
> > +.. Links
> > +.. _samples/check-exec/inc.c:
> > +   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/check-exec/inc.c
> > diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h
> > index d6d98877ff1a..3fba30dbd68b 100644
> > --- a/include/uapi/linux/securebits.h
> > +++ b/include/uapi/linux/securebits.h
> > @@ -52,10 +52,32 @@
> >  #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \
> >                         (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED))
> >
> > +/* See Documentation/userspace-api/check_exec.rst */
> > +#define SECURE_EXEC_RESTRICT_FILE              8
> > +#define SECURE_EXEC_RESTRICT_FILE_LOCKED       9  /* make bit-8 immutable */
> > +
> > +#define SECBIT_EXEC_RESTRICT_FILE (issecure_mask(SECURE_EXEC_RESTRICT_FILE))
> > +#define SECBIT_EXEC_RESTRICT_FILE_LOCKED \
> > +                       (issecure_mask(SECURE_EXEC_RESTRICT_FILE_LOCKED))
> > +
> > +/* See Documentation/userspace-api/check_exec.rst */
> > +#define SECURE_EXEC_DENY_INTERACTIVE           10
> > +#define SECURE_EXEC_DENY_INTERACTIVE_LOCKED    11  /* make bit-10 immutable */
> > +
> > +#define SECBIT_EXEC_DENY_INTERACTIVE \
> > +                       (issecure_mask(SECURE_EXEC_DENY_INTERACTIVE))
> > +#define SECBIT_EXEC_DENY_INTERACTIVE_LOCKED \
> > +                       (issecure_mask(SECURE_EXEC_DENY_INTERACTIVE_LOCKED))
> > +
> >  #define SECURE_ALL_BITS                (issecure_mask(SECURE_NOROOT) | \
> >                                  issecure_mask(SECURE_NO_SETUID_FIXUP) | \
> >                                  issecure_mask(SECURE_KEEP_CAPS) | \
> > -                                issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
> > +                                issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \
> > +                                issecure_mask(SECURE_EXEC_RESTRICT_FILE) | \
> > +                                issecure_mask(SECURE_EXEC_DENY_INTERACTIVE))
> >  #define SECURE_ALL_LOCKS       (SECURE_ALL_BITS << 1)
> >
> > +#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_EXEC_RESTRICT_FILE) | \
> > +                                issecure_mask(SECURE_EXEC_DENY_INTERACTIVE))
> > +
> >  #endif /* _UAPI_LINUX_SECUREBITS_H */
> > diff --git a/security/commoncap.c b/security/commoncap.c
> > index cefad323a0b1..52ea01acb453 100644
> > --- a/security/commoncap.c
> > +++ b/security/commoncap.c
> > @@ -1302,21 +1302,38 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
> >                      & (old->securebits ^ arg2))                        /*[1]*/
> >                     || ((old->securebits & SECURE_ALL_LOCKS & ~arg2))   /*[2]*/
> >                     || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS))   /*[3]*/
> > -                   || (cap_capable(current_cred(),
> > -                                   current_cred()->user_ns,
> > -                                   CAP_SETPCAP,
> > -                                   CAP_OPT_NONE) != 0)                 /*[4]*/
> >                         /*
> >                          * [1] no changing of bits that are locked
> >                          * [2] no unlocking of locks
> >                          * [3] no setting of unsupported bits
> > -                        * [4] doing anything requires privilege (go read about
> > -                        *     the "sendmail capabilities bug")
> >                          */
> >                     )
> >                         /* cannot change a locked bit */
> >                         return -EPERM;
> >
> > +               /*
> > +                * Doing anything requires privilege (go read about the
> > +                * "sendmail capabilities bug"), except for unprivileged bits.
> > +                * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not
> > +                * restrictions enforced by the kernel but by user space on
> > +                * itself.
> > +                */
> > +               if (cap_capable(current_cred(), current_cred()->user_ns,
> > +                               CAP_SETPCAP, CAP_OPT_NONE) != 0) {
> > +                       const unsigned long unpriv_and_locks =
> > +                               SECURE_ALL_UNPRIVILEGED |
> > +                               SECURE_ALL_UNPRIVILEGED << 1;
> > +                       const unsigned long changed = old->securebits ^ arg2;
> > +
> > +                       /* For legacy reason, denies non-change. */
> > +                       if (!changed)
> > +                               return -EPERM;
> > +
> > +                       /* Denies privileged changes. */
> > +                       if (changed & ~unpriv_and_locks)
> > +                               return -EPERM;
> > +               }
> > +
> Is above a refactor (without functional change) or a bug fix ?
> maybe a separate commit with description ?

As explained in the comments this is a change to allow unprivileged
securebits to be set, which is related to the CAP_SETPCAP check and
required by this patch.

> 
> >                 new = prepare_creds();
> >                 if (!new)
> >                         return -ENOMEM;
> > --
> > 2.47.0
> >
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-20  9:42     ` Mickaël Salaün
@ 2024-11-20 16:06       ` Jeff Xu
  2024-11-21 13:39         ` Mickaël Salaün
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Xu @ 2024-11-20 16:06 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Theodore Ts'o,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module

On Wed, Nov 20, 2024 at 1:42 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote:
> > On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
> > > be allowed for execution.  The main use case is for script interpreters
> > > and dynamic linkers to check execution permission according to the
> > > kernel's security policy. Another use case is to add context to access
> > > logs e.g., which script (instead of interpreter) accessed a file.  As
> > > any executable code, scripts could also use this check [1].
> > >
> > > This is different from faccessat(2) + X_OK which only checks a subset of
> > > access rights (i.e. inode permission and mount options for regular
> > > files), but not the full context (e.g. all LSM access checks).  The main
> > > use case for access(2) is for SUID processes to (partially) check access
> > > on behalf of their caller.  The main use case for execveat(2) +
> > > AT_EXECVE_CHECK is to check if a script execution would be allowed,
> > > according to all the different restrictions in place.  Because the use
> > > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
> > > execution, user space gets the same error codes.
> > >
> > > An interesting point of using execveat(2) instead of openat2(2) is that
> > > it decouples the check from the enforcement.  Indeed, the security check
> > > can be logged (e.g. with audit) without blocking an execution
> > > environment not yet ready to enforce a strict security policy.
> > >
> > > LSMs can control or log execution requests with
> > > security_bprm_creds_for_exec().  However, to enforce a consistent and
> > > complete access control (e.g. on binary's dependencies) LSMs should
> > > restrict file executability, or mesure executed files, with
> > > security_file_open() by checking file->f_flags & __FMODE_EXEC.
> > >
> > > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
> > > doesn't make sense for the kernel to parse the checked files, look for
> > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > if the format is unknown.  Because of that, security_bprm_check() is
> > > never called when AT_EXECVE_CHECK is used.
> > >
> > > It should be noted that script interpreters cannot directly use
> > > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
> > > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
> > > being executed to interpret the script.  Unlike the kernel, script
> > > interpreters may just interpret the shebang as a simple comment, which
> > > should not change for backward compatibility reasons.
> > >
> > > Because scripts or libraries files might not currently have the
> > > executable permission set, or because we might want specific users to be
> > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
> > > SECBIT_EXEC_DENY_INTERACTIVE securebits.
> > >
> > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > This patch has been used for more than a decade with customized script
> > > interpreters.  Some examples can be found here:
> > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > >
> > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net
> > > ---
> > >
> > > Changes since v20:
> > > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
> > >   Serge Hallyn.
> > > * Move the UAPI documentation to a dedicated RST file.
> > > * Add Reviewed-by: Serge Hallyn
> > >
> > > Changes since v19:
> > > * Remove mention of "role transition" as suggested by Andy.
> > > * Highlight the difference between security_bprm_creds_for_exec() and
> > >   the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
> > >   discussed with Jeff.
> > > * Improve documentation both in UAPI comments and kernel comments
> > >   (requested by Kees).
> > >
> > > New design since v18:
> > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > ---
> > >  Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
> > >  Documentation/userspace-api/index.rst      |  1 +
> > >  fs/exec.c                                  | 20 +++++++++++--
> > >  include/linux/binfmts.h                    |  7 ++++-
> > >  include/uapi/linux/fcntl.h                 |  4 +++
> > >  kernel/audit.h                             |  1 +
> > >  kernel/auditsc.c                           |  1 +
> > >  security/security.c                        | 10 +++++++
> > >  8 files changed, 75 insertions(+), 3 deletions(-)
> > >  create mode 100644 Documentation/userspace-api/check_exec.rst
> > >
> > > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> > > new file mode 100644
> > > index 000000000000..ad1aeaa5f6c0
> > > --- /dev/null
> > > +++ b/Documentation/userspace-api/check_exec.rst
> > > @@ -0,0 +1,34 @@
> > > +===================
> > > +Executability check
> > > +===================
> > > +
> > > +AT_EXECVE_CHECK
> > > +===============
> > > +
> > > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
> > > +check on a regular file and returns 0 if execution of this file would be
> > > +allowed, ignoring the file format and then the related interpreter dependencies
> > > +(e.g. ELF libraries, script's shebang).
> > > +
> > > +Programs should always perform this check to apply kernel-level checks against
> > > +files that are not directly executed by the kernel but passed to a user space
> > > +interpreter instead.  All files that contain executable code, from the point of
> > > +view of the interpreter, should be checked.  However the result of this check
> > > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
> > > +``SECBIT_EXEC_DENY_INTERACTIVE.``.
> > Regarding "should only"
> > Userspace (e.g. libc) could decide to enforce even when
> > SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing
> > doesn't make sense.
>
> User space is always in control, but I don't think it would be wise to
> not follow the configuration securebits (in a generic system) because
> this could result to unattended behaviors (I don't have a specific one
> in mind but...).  That being said, configuration and checks are
> standalones and specific/tailored systems are free to do the checks they
> want.
>
In the case of dynamic linker, we can always enforce honoring the
execveat(AT_EXECVE_CHECK) result, right ? I can't think of a case not
to,  the dynamic linker doesn't need to check the
SECBIT_EXEC_RESTRICT_FILE bit.

script interpreters need to check this though,  because the apps might
need to adjust/test the scripts they are calling, so
SECBIT_EXEC_RESTRICT_FILE can be used to opt-out the enforcement.

> > When SECBIT_EXEC_RESTRICT_FILE=1,  userspace is bound to enforce.
> >
> > > +
> > > +The main purpose of this flag is to improve the security and consistency of an
> > > +execution environment to ensure that direct file execution (e.g.
> > > +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to
> > > +the same result.  For instance, this can be used to check if a file is
> > > +trustworthy according to the caller's environment.
> > > +
> > > +In a secure environment, libraries and any executable dependencies should also
> > > +be checked.  For instance, dynamic linking should make sure that all libraries
> > > +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``).
> > > +For such secure execution environment to make sense, only trusted code should
> > > +be executable, which also requires integrity guarantees.
> > > +
> > > +To avoid race conditions leading to time-of-check to time-of-use issues,
> > > +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
> > > +file descriptor instead of a path.
> > > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> > > index 274cc7546efc..6272bcf11296 100644
> > > --- a/Documentation/userspace-api/index.rst
> > > +++ b/Documentation/userspace-api/index.rst
> > > @@ -35,6 +35,7 @@ Security-related interfaces
> > >     mfd_noexec
> > >     spec_ctrl
> > >     tee
> > > +   check_exec
> > >
> > >  Devices and I/O
> > >  ===============
> > > diff --git a/fs/exec.c b/fs/exec.c
> > > index 6c53920795c2..bb83b6a39530 100644
> > > --- a/fs/exec.c
> > > +++ b/fs/exec.c
> > > @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > >                 .lookup_flags = LOOKUP_FOLLOW,
> > >         };
> > >
> > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > +       if ((flags &
> > > +            ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0)
> > >                 return ERR_PTR(-EINVAL);
> > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > >         }
> > >         bprm->interp = bprm->filename;
> > >
> > > +       /*
> > > +        * At this point, security_file_open() has already been called (with
> > > +        * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will
> > > +        * stop just after the security_bprm_creds_for_exec() call in
> > > +        * bprm_execve().  Indeed, the kernel should not try to parse the
> > > +        * content of the file with exec_binprm() nor change the calling
> > > +        * thread, which means that the following security functions will be
> > > +        * not called:
> > > +        * - security_bprm_check()
> > > +        * - security_bprm_creds_from_file()
> > > +        * - security_bprm_committing_creds()
> > > +        * - security_bprm_committed_creds()
> > > +        */
> > > +       bprm->is_check = !!(flags & AT_EXECVE_CHECK);
> > > +
> > >         retval = bprm_mm_init(bprm);
> > >         if (!retval)
> > >                 return bprm;
> > > @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > >
> > >         /* Set the unchanging part of bprm->cred */
> > >         retval = security_bprm_creds_for_exec(bprm);
> > > -       if (retval)
> > > +       if (retval || bprm->is_check)
> > >                 goto out;
> > >
> > >         retval = exec_binprm(bprm);
> > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > index e6c00e860951..8ff0eb3644a1 100644
> > > --- a/include/linux/binfmts.h
> > > +++ b/include/linux/binfmts.h
> > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > >                  * Set when errors can no longer be returned to the
> > >                  * original userspace.
> > >                  */
> > > -               point_of_no_return:1;
> > > +               point_of_no_return:1,
> > > +               /*
> > > +                * Set by user space to check executability according to the
> > > +                * caller's environment.
> > > +                */
> > > +               is_check:1;
> > >         struct file *executable; /* Executable to pass to the interpreter */
> > >         struct file *interpreter;
> > >         struct file *file;
> > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > index 87e2dec79fea..2e87f2e3a79f 100644
> > > --- a/include/uapi/linux/fcntl.h
> > > +++ b/include/uapi/linux/fcntl.h
> > > @@ -154,6 +154,10 @@
> > >                                            usable with open_by_handle_at(2). */
> > >  #define AT_HANDLE_MNT_ID_UNIQUE        0x001   /* Return the u64 unique mount ID. */
> > >
> > > +/* Flags for execveat2(2). */
> > > +#define AT_EXECVE_CHECK                0x10000 /* Only perform a check if execution
> > > +                                          would be allowed. */
> > > +
> > >  #if defined(__KERNEL__)
> > >  #define AT_GETATTR_NOSEC       0x80000000
> > >  #endif
> > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > index a60d2840559e..8ebdabd2ab81 100644
> > > --- a/kernel/audit.h
> > > +++ b/kernel/audit.h
> > > @@ -197,6 +197,7 @@ struct audit_context {
> > >                 struct open_how openat2;
> > >                 struct {
> > >                         int                     argc;
> > > +                       bool                    is_check;
> > >                 } execve;
> > >                 struct {
> > >                         char                    *name;
> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index cd57053b4a69..8d9ba5600cf2 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > >
> > >         context->type = AUDIT_EXECVE;
> > >         context->execve.argc = bprm->argc;
> > > +       context->execve.is_check = bprm->is_check;
> > Where is execve.is_check used ?
>
> It is used in bprm_execve(), exposed to the audit framework, and
> potentially used by LSMs.
>
bprm_execve() uses bprm->is_check, not  the context->execve.is_check.


> >
> >
> > >  }
> > >
> > >
> > > diff --git a/security/security.c b/security/security.c
> > > index c5981e558bc2..456361ec249d 100644
> > > --- a/security/security.c
> > > +++ b/security/security.c
> > > @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
> > >   * to 1 if AT_SECURE should be set to request libc enable secure mode.  @bprm
> > >   * contains the linux_binprm structure.
> > >   *
> > > + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is
> > > + * set.  The result must be the same as without this flag even if the execution
> > > + * will never really happen and @bprm will always be dropped.
> > > + *
> > > + * This hook must not change current->cred, only @bprm->cred.
> > > + *
> > >   * Return: Returns 0 if the hook is successful and permission is granted.
> > >   */
> > >  int security_bprm_creds_for_exec(struct linux_binprm *bprm)
> > > @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file)
> > >   * Save open-time permission checking state for later use upon file_permission,
> > >   * and recheck access if anything has changed since inode_permission.
> > >   *
> > > + * We can check if a file is opened for execution (e.g. execve(2) call), either
> > > + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags &
> > > + * __FMODE_EXEC .
> > > + *
> > >   * Return: Returns 0 if permission is granted.
> > >   */
> > >  int security_file_open(struct file *file)
> > > --
> > > 2.47.0
> > >
> > >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 0/6] Script execution control (was O_MAYEXEC)
  2024-11-12 19:18 [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Mickaël Salaün
                   ` (5 preceding siblings ...)
  2024-11-12 19:18 ` [PATCH v21 6/6] samples/check-exec: Add an enlighten "inc" interpreter and 28 tests Mickaël Salaün
@ 2024-11-21  4:58 ` Kees Cook
  6 siblings, 0 replies; 25+ messages in thread
From: Kees Cook @ 2024-11-21  4:58 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Theodore Ts'o,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module

On Tue, Nov 12, 2024 at 08:18:52PM +0100, Mickaël Salaün wrote:
> Kees, would you like to take this series in your tree?

Yeah, let's give it a shot for -next after the merge window is closed,
assuming review is clean.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-20 16:06       ` Jeff Xu
@ 2024-11-21 13:39         ` Mickaël Salaün
  2024-11-21 18:27           ` Jeff Xu
  2024-12-05  3:33           ` Paul Moore
  0 siblings, 2 replies; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-21 13:39 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Theodore Ts'o,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module, Eric Paris, audit

On Wed, Nov 20, 2024 at 08:06:07AM -0800, Jeff Xu wrote:
> On Wed, Nov 20, 2024 at 1:42 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote:
> > > On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
> > > > be allowed for execution.  The main use case is for script interpreters
> > > > and dynamic linkers to check execution permission according to the
> > > > kernel's security policy. Another use case is to add context to access
> > > > logs e.g., which script (instead of interpreter) accessed a file.  As
> > > > any executable code, scripts could also use this check [1].
> > > >
> > > > This is different from faccessat(2) + X_OK which only checks a subset of
> > > > access rights (i.e. inode permission and mount options for regular
> > > > files), but not the full context (e.g. all LSM access checks).  The main
> > > > use case for access(2) is for SUID processes to (partially) check access
> > > > on behalf of their caller.  The main use case for execveat(2) +
> > > > AT_EXECVE_CHECK is to check if a script execution would be allowed,
> > > > according to all the different restrictions in place.  Because the use
> > > > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
> > > > execution, user space gets the same error codes.
> > > >
> > > > An interesting point of using execveat(2) instead of openat2(2) is that
> > > > it decouples the check from the enforcement.  Indeed, the security check
> > > > can be logged (e.g. with audit) without blocking an execution
> > > > environment not yet ready to enforce a strict security policy.
> > > >
> > > > LSMs can control or log execution requests with
> > > > security_bprm_creds_for_exec().  However, to enforce a consistent and
> > > > complete access control (e.g. on binary's dependencies) LSMs should
> > > > restrict file executability, or mesure executed files, with
> > > > security_file_open() by checking file->f_flags & __FMODE_EXEC.
> > > >
> > > > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
> > > > doesn't make sense for the kernel to parse the checked files, look for
> > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > never called when AT_EXECVE_CHECK is used.
> > > >
> > > > It should be noted that script interpreters cannot directly use
> > > > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
> > > > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
> > > > being executed to interpret the script.  Unlike the kernel, script
> > > > interpreters may just interpret the shebang as a simple comment, which
> > > > should not change for backward compatibility reasons.
> > > >
> > > > Because scripts or libraries files might not currently have the
> > > > executable permission set, or because we might want specific users to be
> > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
> > > > SECBIT_EXEC_DENY_INTERACTIVE securebits.
> > > >
> > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > This patch has been used for more than a decade with customized script
> > > > interpreters.  Some examples can be found here:
> > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > >
> > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net
> > > > ---
> > > >
> > > > Changes since v20:
> > > > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
> > > >   Serge Hallyn.
> > > > * Move the UAPI documentation to a dedicated RST file.
> > > > * Add Reviewed-by: Serge Hallyn
> > > >
> > > > Changes since v19:
> > > > * Remove mention of "role transition" as suggested by Andy.
> > > > * Highlight the difference between security_bprm_creds_for_exec() and
> > > >   the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
> > > >   discussed with Jeff.
> > > > * Improve documentation both in UAPI comments and kernel comments
> > > >   (requested by Kees).
> > > >
> > > > New design since v18:
> > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > ---
> > > >  Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
> > > >  Documentation/userspace-api/index.rst      |  1 +
> > > >  fs/exec.c                                  | 20 +++++++++++--
> > > >  include/linux/binfmts.h                    |  7 ++++-
> > > >  include/uapi/linux/fcntl.h                 |  4 +++
> > > >  kernel/audit.h                             |  1 +
> > > >  kernel/auditsc.c                           |  1 +
> > > >  security/security.c                        | 10 +++++++
> > > >  8 files changed, 75 insertions(+), 3 deletions(-)
> > > >  create mode 100644 Documentation/userspace-api/check_exec.rst
> > > >
> > > > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> > > > new file mode 100644
> > > > index 000000000000..ad1aeaa5f6c0
> > > > --- /dev/null
> > > > +++ b/Documentation/userspace-api/check_exec.rst
> > > > @@ -0,0 +1,34 @@
> > > > +===================
> > > > +Executability check
> > > > +===================
> > > > +
> > > > +AT_EXECVE_CHECK
> > > > +===============
> > > > +
> > > > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
> > > > +check on a regular file and returns 0 if execution of this file would be
> > > > +allowed, ignoring the file format and then the related interpreter dependencies
> > > > +(e.g. ELF libraries, script's shebang).
> > > > +
> > > > +Programs should always perform this check to apply kernel-level checks against
> > > > +files that are not directly executed by the kernel but passed to a user space
> > > > +interpreter instead.  All files that contain executable code, from the point of
> > > > +view of the interpreter, should be checked.  However the result of this check
> > > > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
> > > > +``SECBIT_EXEC_DENY_INTERACTIVE.``.
> > > Regarding "should only"
> > > Userspace (e.g. libc) could decide to enforce even when
> > > SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing
> > > doesn't make sense.
> >
> > User space is always in control, but I don't think it would be wise to
> > not follow the configuration securebits (in a generic system) because
> > this could result to unattended behaviors (I don't have a specific one
> > in mind but...).  That being said, configuration and checks are
> > standalones and specific/tailored systems are free to do the checks they
> > want.
> >
> In the case of dynamic linker, we can always enforce honoring the
> execveat(AT_EXECVE_CHECK) result, right ? I can't think of a case not
> to,  the dynamic linker doesn't need to check the
> SECBIT_EXEC_RESTRICT_FILE bit.

If the library file is not allowed to be executed by *all* access
control systems (not just mount and file permission, but all LSMs), then
the AT_EXECVE_CHECK will fail, which is OK as long as it is not a hard
requirement.  Relying on the securebits to know if this is a hard
requirement or not enables system administrator and distros to control
this potential behavior change.

> 
> script interpreters need to check this though,  because the apps might
> need to adjust/test the scripts they are calling, so
> SECBIT_EXEC_RESTRICT_FILE can be used to opt-out the enforcement.
> 
> > > When SECBIT_EXEC_RESTRICT_FILE=1,  userspace is bound to enforce.
> > >
> > > > +
> > > > +The main purpose of this flag is to improve the security and consistency of an
> > > > +execution environment to ensure that direct file execution (e.g.
> > > > +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to
> > > > +the same result.  For instance, this can be used to check if a file is
> > > > +trustworthy according to the caller's environment.
> > > > +
> > > > +In a secure environment, libraries and any executable dependencies should also
> > > > +be checked.  For instance, dynamic linking should make sure that all libraries
> > > > +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``).
> > > > +For such secure execution environment to make sense, only trusted code should
> > > > +be executable, which also requires integrity guarantees.
> > > > +
> > > > +To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
> > > > +file descriptor instead of a path.
> > > > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> > > > index 274cc7546efc..6272bcf11296 100644
> > > > --- a/Documentation/userspace-api/index.rst
> > > > +++ b/Documentation/userspace-api/index.rst
> > > > @@ -35,6 +35,7 @@ Security-related interfaces
> > > >     mfd_noexec
> > > >     spec_ctrl
> > > >     tee
> > > > +   check_exec
> > > >
> > > >  Devices and I/O
> > > >  ===============
> > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > index 6c53920795c2..bb83b6a39530 100644
> > > > --- a/fs/exec.c
> > > > +++ b/fs/exec.c
> > > > @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > >         };
> > > >
> > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > +       if ((flags &
> > > > +            ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0)
> > > >                 return ERR_PTR(-EINVAL);
> > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > >         }
> > > >         bprm->interp = bprm->filename;
> > > >
> > > > +       /*
> > > > +        * At this point, security_file_open() has already been called (with
> > > > +        * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will
> > > > +        * stop just after the security_bprm_creds_for_exec() call in
> > > > +        * bprm_execve().  Indeed, the kernel should not try to parse the
> > > > +        * content of the file with exec_binprm() nor change the calling
> > > > +        * thread, which means that the following security functions will be
> > > > +        * not called:
> > > > +        * - security_bprm_check()
> > > > +        * - security_bprm_creds_from_file()
> > > > +        * - security_bprm_committing_creds()
> > > > +        * - security_bprm_committed_creds()
> > > > +        */
> > > > +       bprm->is_check = !!(flags & AT_EXECVE_CHECK);
> > > > +
> > > >         retval = bprm_mm_init(bprm);
> > > >         if (!retval)
> > > >                 return bprm;
> > > > @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > >
> > > >         /* Set the unchanging part of bprm->cred */
> > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > -       if (retval)
> > > > +       if (retval || bprm->is_check)
> > > >                 goto out;
> > > >
> > > >         retval = exec_binprm(bprm);
> > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > index e6c00e860951..8ff0eb3644a1 100644
> > > > --- a/include/linux/binfmts.h
> > > > +++ b/include/linux/binfmts.h
> > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > >                  * Set when errors can no longer be returned to the
> > > >                  * original userspace.
> > > >                  */
> > > > -               point_of_no_return:1;
> > > > +               point_of_no_return:1,
> > > > +               /*
> > > > +                * Set by user space to check executability according to the
> > > > +                * caller's environment.
> > > > +                */
> > > > +               is_check:1;
> > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > >         struct file *interpreter;
> > > >         struct file *file;
> > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > index 87e2dec79fea..2e87f2e3a79f 100644
> > > > --- a/include/uapi/linux/fcntl.h
> > > > +++ b/include/uapi/linux/fcntl.h
> > > > @@ -154,6 +154,10 @@
> > > >                                            usable with open_by_handle_at(2). */
> > > >  #define AT_HANDLE_MNT_ID_UNIQUE        0x001   /* Return the u64 unique mount ID. */
> > > >
> > > > +/* Flags for execveat2(2). */
> > > > +#define AT_EXECVE_CHECK                0x10000 /* Only perform a check if execution
> > > > +                                          would be allowed. */
> > > > +
> > > >  #if defined(__KERNEL__)
> > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > >  #endif
> > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > --- a/kernel/audit.h
> > > > +++ b/kernel/audit.h
> > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > >                 struct open_how openat2;
> > > >                 struct {
> > > >                         int                     argc;
> > > > +                       bool                    is_check;
> > > >                 } execve;
> > > >                 struct {
> > > >                         char                    *name;
> > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > index cd57053b4a69..8d9ba5600cf2 100644
> > > > --- a/kernel/auditsc.c
> > > > +++ b/kernel/auditsc.c
> > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > >
> > > >         context->type = AUDIT_EXECVE;
> > > >         context->execve.argc = bprm->argc;
> > > > +       context->execve.is_check = bprm->is_check;
> > > Where is execve.is_check used ?
> >
> > It is used in bprm_execve(), exposed to the audit framework, and
> > potentially used by LSMs.
> >
> bprm_execve() uses bprm->is_check, not  the context->execve.is_check.

Correct, this is only for audit but not used yet.

Paul, Eric, do you want me to remove this field, leave it, or extend
this patch like this?

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 8d9ba5600cf2..12cf89fa224a 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1290,6 +1290,8 @@ static void audit_log_execve_info(struct audit_context *context,
                }
        } while (arg < context->execve.argc);

+       audit_log_format(*ab, " check=%d", context->execve.is_check);
+
        /* NOTE: the caller handles the final audit_log_end() call */

 out:

> 
> 
> > >
> > >
> > > >  }
> > > >
> > > >
> > > > diff --git a/security/security.c b/security/security.c
> > > > index c5981e558bc2..456361ec249d 100644
> > > > --- a/security/security.c
> > > > +++ b/security/security.c
> > > > @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
> > > >   * to 1 if AT_SECURE should be set to request libc enable secure mode.  @bprm
> > > >   * contains the linux_binprm structure.
> > > >   *
> > > > + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is
> > > > + * set.  The result must be the same as without this flag even if the execution
> > > > + * will never really happen and @bprm will always be dropped.
> > > > + *
> > > > + * This hook must not change current->cred, only @bprm->cred.
> > > > + *
> > > >   * Return: Returns 0 if the hook is successful and permission is granted.
> > > >   */
> > > >  int security_bprm_creds_for_exec(struct linux_binprm *bprm)
> > > > @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file)
> > > >   * Save open-time permission checking state for later use upon file_permission,
> > > >   * and recheck access if anything has changed since inode_permission.
> > > >   *
> > > > + * We can check if a file is opened for execution (e.g. execve(2) call), either
> > > > + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags &
> > > > + * __FMODE_EXEC .
> > > > + *
> > > >   * Return: Returns 0 if permission is granted.
> > > >   */
> > > >  int security_file_open(struct file *file)
> > > > --
> > > > 2.47.0
> > > >
> > > >
> 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-21 13:39         ` Mickaël Salaün
@ 2024-11-21 18:27           ` Jeff Xu
  2024-11-22 14:50             ` Mickaël Salaün
  2024-12-05  3:33           ` Paul Moore
  1 sibling, 1 reply; 25+ messages in thread
From: Jeff Xu @ 2024-11-21 18:27 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Theodore Ts'o,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module, Eric Paris, audit

On Thu, Nov 21, 2024 at 5:40 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Nov 20, 2024 at 08:06:07AM -0800, Jeff Xu wrote:
> > On Wed, Nov 20, 2024 at 1:42 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote:
> > > > On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
> > > > > be allowed for execution.  The main use case is for script interpreters
> > > > > and dynamic linkers to check execution permission according to the
> > > > > kernel's security policy. Another use case is to add context to access
> > > > > logs e.g., which script (instead of interpreter) accessed a file.  As
> > > > > any executable code, scripts could also use this check [1].
> > > > >
> > > > > This is different from faccessat(2) + X_OK which only checks a subset of
> > > > > access rights (i.e. inode permission and mount options for regular
> > > > > files), but not the full context (e.g. all LSM access checks).  The main
> > > > > use case for access(2) is for SUID processes to (partially) check access
> > > > > on behalf of their caller.  The main use case for execveat(2) +
> > > > > AT_EXECVE_CHECK is to check if a script execution would be allowed,
> > > > > according to all the different restrictions in place.  Because the use
> > > > > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
> > > > > execution, user space gets the same error codes.
> > > > >
> > > > > An interesting point of using execveat(2) instead of openat2(2) is that
> > > > > it decouples the check from the enforcement.  Indeed, the security check
> > > > > can be logged (e.g. with audit) without blocking an execution
> > > > > environment not yet ready to enforce a strict security policy.
> > > > >
> > > > > LSMs can control or log execution requests with
> > > > > security_bprm_creds_for_exec().  However, to enforce a consistent and
> > > > > complete access control (e.g. on binary's dependencies) LSMs should
> > > > > restrict file executability, or mesure executed files, with
> > > > > security_file_open() by checking file->f_flags & __FMODE_EXEC.
> > > > >
> > > > > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
> > > > > doesn't make sense for the kernel to parse the checked files, look for
> > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > > never called when AT_EXECVE_CHECK is used.
> > > > >
> > > > > It should be noted that script interpreters cannot directly use
> > > > > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
> > > > > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
> > > > > being executed to interpret the script.  Unlike the kernel, script
> > > > > interpreters may just interpret the shebang as a simple comment, which
> > > > > should not change for backward compatibility reasons.
> > > > >
> > > > > Because scripts or libraries files might not currently have the
> > > > > executable permission set, or because we might want specific users to be
> > > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
> > > > > SECBIT_EXEC_DENY_INTERACTIVE securebits.
> > > > >
> > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > > This patch has been used for more than a decade with customized script
> > > > > interpreters.  Some examples can be found here:
> > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > > >
> > > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net
> > > > > ---
> > > > >
> > > > > Changes since v20:
> > > > > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
> > > > >   Serge Hallyn.
> > > > > * Move the UAPI documentation to a dedicated RST file.
> > > > > * Add Reviewed-by: Serge Hallyn
> > > > >
> > > > > Changes since v19:
> > > > > * Remove mention of "role transition" as suggested by Andy.
> > > > > * Highlight the difference between security_bprm_creds_for_exec() and
> > > > >   the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
> > > > >   discussed with Jeff.
> > > > > * Improve documentation both in UAPI comments and kernel comments
> > > > >   (requested by Kees).
> > > > >
> > > > > New design since v18:
> > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > > ---
> > > > >  Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
> > > > >  Documentation/userspace-api/index.rst      |  1 +
> > > > >  fs/exec.c                                  | 20 +++++++++++--
> > > > >  include/linux/binfmts.h                    |  7 ++++-
> > > > >  include/uapi/linux/fcntl.h                 |  4 +++
> > > > >  kernel/audit.h                             |  1 +
> > > > >  kernel/auditsc.c                           |  1 +
> > > > >  security/security.c                        | 10 +++++++
> > > > >  8 files changed, 75 insertions(+), 3 deletions(-)
> > > > >  create mode 100644 Documentation/userspace-api/check_exec.rst
> > > > >
> > > > > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> > > > > new file mode 100644
> > > > > index 000000000000..ad1aeaa5f6c0
> > > > > --- /dev/null
> > > > > +++ b/Documentation/userspace-api/check_exec.rst
> > > > > @@ -0,0 +1,34 @@
> > > > > +===================
> > > > > +Executability check
> > > > > +===================
> > > > > +
> > > > > +AT_EXECVE_CHECK
> > > > > +===============
> > > > > +
> > > > > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
> > > > > +check on a regular file and returns 0 if execution of this file would be
> > > > > +allowed, ignoring the file format and then the related interpreter dependencies
> > > > > +(e.g. ELF libraries, script's shebang).
> > > > > +
> > > > > +Programs should always perform this check to apply kernel-level checks against
> > > > > +files that are not directly executed by the kernel but passed to a user space
> > > > > +interpreter instead.  All files that contain executable code, from the point of
> > > > > +view of the interpreter, should be checked.  However the result of this check
> > > > > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
> > > > > +``SECBIT_EXEC_DENY_INTERACTIVE.``.
> > > > Regarding "should only"
> > > > Userspace (e.g. libc) could decide to enforce even when
> > > > SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing
> > > > doesn't make sense.
> > >
> > > User space is always in control, but I don't think it would be wise to
> > > not follow the configuration securebits (in a generic system) because
> > > this could result to unattended behaviors (I don't have a specific one
> > > in mind but...).  That being said, configuration and checks are
> > > standalones and specific/tailored systems are free to do the checks they
> > > want.
> > >
> > In the case of dynamic linker, we can always enforce honoring the
> > execveat(AT_EXECVE_CHECK) result, right ? I can't think of a case not
> > to,  the dynamic linker doesn't need to check the
> > SECBIT_EXEC_RESTRICT_FILE bit.
>
> If the library file is not allowed to be executed by *all* access
> control systems (not just mount and file permission, but all LSMs), then
> the AT_EXECVE_CHECK will fail, which is OK as long as it is not a hard
> requirement.
Yes. specifically for the library loading case, I can't think of a
case where we need to by-pass LSMs.  (letting user space to by-pass
LSM check seems questionable in concept, and should only be used when
there aren't other solutions). In the context of SELINUX enforcing
mode,  we will want to enforce it. In the context of process level LSM
such as landlock,  the process can already decide for itself by
selecting the policy for its own domain, it is unnecessary to use
another opt-out solution.

There is one case where I see a difference:
ld.so a.out (when a.out is on non-exec mount)

If the dynamic linker doesn't read SECBIT_EXEC_RESTRICT_FILE setting,
above will always fail. But that is more of a bugfix.

>Relying on the securebits to know if this is a hard
> requirement or not enables system administrator and distros to control
> this potential behavior change.
>
I think, for the dynamic linker, it can be a hard requirement.

For scripts, the cases are more complicated and we can't just enforce
it,  therefore have to rely on security bits to give a pre-process
level control.

> >
> > script interpreters need to check this though,  because the apps might
> > need to adjust/test the scripts they are calling, so
> > SECBIT_EXEC_RESTRICT_FILE can be used to opt-out the enforcement.
> >
> > > > When SECBIT_EXEC_RESTRICT_FILE=1,  userspace is bound to enforce.
> > > >
> > > > > +
> > > > > +The main purpose of this flag is to improve the security and consistency of an
> > > > > +execution environment to ensure that direct file execution (e.g.
> > > > > +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to
> > > > > +the same result.  For instance, this can be used to check if a file is
> > > > > +trustworthy according to the caller's environment.
> > > > > +
> > > > > +In a secure environment, libraries and any executable dependencies should also
> > > > > +be checked.  For instance, dynamic linking should make sure that all libraries
> > > > > +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``).
> > > > > +For such secure execution environment to make sense, only trusted code should
> > > > > +be executable, which also requires integrity guarantees.
> > > > > +
> > > > > +To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > > +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
> > > > > +file descriptor instead of a path.
> > > > > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> > > > > index 274cc7546efc..6272bcf11296 100644
> > > > > --- a/Documentation/userspace-api/index.rst
> > > > > +++ b/Documentation/userspace-api/index.rst
> > > > > @@ -35,6 +35,7 @@ Security-related interfaces
> > > > >     mfd_noexec
> > > > >     spec_ctrl
> > > > >     tee
> > > > > +   check_exec
> > > > >
> > > > >  Devices and I/O
> > > > >  ===============
> > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > index 6c53920795c2..bb83b6a39530 100644
> > > > > --- a/fs/exec.c
> > > > > +++ b/fs/exec.c
> > > > > @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > > >         };
> > > > >
> > > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > +       if ((flags &
> > > > > +            ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0)
> > > > >                 return ERR_PTR(-EINVAL);
> > > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > > @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > > >         }
> > > > >         bprm->interp = bprm->filename;
> > > > >
> > > > > +       /*
> > > > > +        * At this point, security_file_open() has already been called (with
> > > > > +        * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will
> > > > > +        * stop just after the security_bprm_creds_for_exec() call in
> > > > > +        * bprm_execve().  Indeed, the kernel should not try to parse the
> > > > > +        * content of the file with exec_binprm() nor change the calling
> > > > > +        * thread, which means that the following security functions will be
> > > > > +        * not called:
> > > > > +        * - security_bprm_check()
> > > > > +        * - security_bprm_creds_from_file()
> > > > > +        * - security_bprm_committing_creds()
> > > > > +        * - security_bprm_committed_creds()
> > > > > +        */
> > > > > +       bprm->is_check = !!(flags & AT_EXECVE_CHECK);
> > > > > +
> > > > >         retval = bprm_mm_init(bprm);
> > > > >         if (!retval)
> > > > >                 return bprm;
> > > > > @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > > >
> > > > >         /* Set the unchanging part of bprm->cred */
> > > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > > -       if (retval)
> > > > > +       if (retval || bprm->is_check)
> > > > >                 goto out;
> > > > >
> > > > >         retval = exec_binprm(bprm);
> > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > > index e6c00e860951..8ff0eb3644a1 100644
> > > > > --- a/include/linux/binfmts.h
> > > > > +++ b/include/linux/binfmts.h
> > > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > > >                  * Set when errors can no longer be returned to the
> > > > >                  * original userspace.
> > > > >                  */
> > > > > -               point_of_no_return:1;
> > > > > +               point_of_no_return:1,
> > > > > +               /*
> > > > > +                * Set by user space to check executability according to the
> > > > > +                * caller's environment.
> > > > > +                */
> > > > > +               is_check:1;
> > > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > > >         struct file *interpreter;
> > > > >         struct file *file;
> > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > > index 87e2dec79fea..2e87f2e3a79f 100644
> > > > > --- a/include/uapi/linux/fcntl.h
> > > > > +++ b/include/uapi/linux/fcntl.h
> > > > > @@ -154,6 +154,10 @@
> > > > >                                            usable with open_by_handle_at(2). */
> > > > >  #define AT_HANDLE_MNT_ID_UNIQUE        0x001   /* Return the u64 unique mount ID. */
> > > > >
> > > > > +/* Flags for execveat2(2). */
> > > > > +#define AT_EXECVE_CHECK                0x10000 /* Only perform a check if execution
> > > > > +                                          would be allowed. */
> > > > > +
> > > > >  #if defined(__KERNEL__)
> > > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > > >  #endif
> > > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > > --- a/kernel/audit.h
> > > > > +++ b/kernel/audit.h
> > > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > > >                 struct open_how openat2;
> > > > >                 struct {
> > > > >                         int                     argc;
> > > > > +                       bool                    is_check;
> > > > >                 } execve;
> > > > >                 struct {
> > > > >                         char                    *name;
> > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > > index cd57053b4a69..8d9ba5600cf2 100644
> > > > > --- a/kernel/auditsc.c
> > > > > +++ b/kernel/auditsc.c
> > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > > >
> > > > >         context->type = AUDIT_EXECVE;
> > > > >         context->execve.argc = bprm->argc;
> > > > > +       context->execve.is_check = bprm->is_check;
> > > > Where is execve.is_check used ?
> > >
> > > It is used in bprm_execve(), exposed to the audit framework, and
> > > potentially used by LSMs.
> > >
> > bprm_execve() uses bprm->is_check, not  the context->execve.is_check.
>
> Correct, this is only for audit but not used yet.
>
> Paul, Eric, do you want me to remove this field, leave it, or extend
> this patch like this?
>
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 8d9ba5600cf2..12cf89fa224a 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -1290,6 +1290,8 @@ static void audit_log_execve_info(struct audit_context *context,
>                 }
>         } while (arg < context->execve.argc);
>
> +       audit_log_format(*ab, " check=%d", context->execve.is_check);
> +
>         /* NOTE: the caller handles the final audit_log_end() call */
>
>  out:
>
> >
> >
> > > >
> > > >
> > > > >  }
> > > > >
> > > > >
> > > > > diff --git a/security/security.c b/security/security.c
> > > > > index c5981e558bc2..456361ec249d 100644
> > > > > --- a/security/security.c
> > > > > +++ b/security/security.c
> > > > > @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
> > > > >   * to 1 if AT_SECURE should be set to request libc enable secure mode.  @bprm
> > > > >   * contains the linux_binprm structure.
> > > > >   *
> > > > > + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is
> > > > > + * set.  The result must be the same as without this flag even if the execution
> > > > > + * will never really happen and @bprm will always be dropped.
> > > > > + *
> > > > > + * This hook must not change current->cred, only @bprm->cred.
> > > > > + *
> > > > >   * Return: Returns 0 if the hook is successful and permission is granted.
> > > > >   */
> > > > >  int security_bprm_creds_for_exec(struct linux_binprm *bprm)
> > > > > @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file)
> > > > >   * Save open-time permission checking state for later use upon file_permission,
> > > > >   * and recheck access if anything has changed since inode_permission.
> > > > >   *
> > > > > + * We can check if a file is opened for execution (e.g. execve(2) call), either
> > > > > + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags &
> > > > > + * __FMODE_EXEC .
> > > > > + *
> > > > >   * Return: Returns 0 if permission is granted.
> > > > >   */
> > > > >  int security_file_open(struct file *file)
> > > > > --
> > > > > 2.47.0
> > > > >
> > > > >
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 6/6] samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
  2024-11-12 19:18 ` [PATCH v21 6/6] samples/check-exec: Add an enlighten "inc" interpreter and 28 tests Mickaël Salaün
@ 2024-11-21 20:34   ` Mimi Zohar
  2024-11-22 14:50     ` Mickaël Salaün
  0 siblings, 1 reply; 25+ messages in thread
From: Mimi Zohar @ 2024-11-21 20:34 UTC (permalink / raw)
  To: Mickaël Salaün, Al Viro, Christian Brauner, Kees Cook,
	Paul Moore, Serge Hallyn
  Cc: Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

Hi Mickaël,

On Tue, 2024-11-12 at 20:18 +0100, Mickaël Salaün wrote:
> 
> +
> +/* Returns 1 on error, 0 otherwise. */
> +static int interpret_stream(FILE *script, char *const script_name,
> +			    char *const *const envp, const bool restrict_stream)
> +{
> +	int err;
> +	char *const script_argv[] = { script_name, NULL };
> +	char buf[128] = {};
> +	size_t buf_size = sizeof(buf);
> +
> +	/*
> +	 * We pass a valid argv and envp to the kernel to emulate a native
> +	 * script execution.  We must use the script file descriptor instead of
> +	 * the script path name to avoid race conditions.
> +	 */
> +	err = execveat(fileno(script), "", script_argv, envp,
> +		       AT_EMPTY_PATH | AT_EXECVE_CHECK);

At least with v20, the AT_CHECK always was being set, independent of whether
set-exec.c set it.  I'll re-test with v21.

thanks,

Mimi

> +	if (err && restrict_stream) {
> +		perror("ERROR: Script execution check");
> +		return 1;
> +	}
> +
> +	/* Reads script. */
> +	buf_size = fread(buf, 1, buf_size - 1, script);
> +	return interpret_buffer(buf, buf_size);
> +}
> +


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-21 18:27           ` Jeff Xu
@ 2024-11-22 14:50             ` Mickaël Salaün
  2024-11-25 17:38               ` Jeff Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-22 14:50 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Theodore Ts'o,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module, Eric Paris, audit

On Thu, Nov 21, 2024 at 10:27:40AM -0800, Jeff Xu wrote:
> On Thu, Nov 21, 2024 at 5:40 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Wed, Nov 20, 2024 at 08:06:07AM -0800, Jeff Xu wrote:
> > > On Wed, Nov 20, 2024 at 1:42 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote:
> > > > > On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
> > > > > > be allowed for execution.  The main use case is for script interpreters
> > > > > > and dynamic linkers to check execution permission according to the
> > > > > > kernel's security policy. Another use case is to add context to access
> > > > > > logs e.g., which script (instead of interpreter) accessed a file.  As
> > > > > > any executable code, scripts could also use this check [1].
> > > > > >
> > > > > > This is different from faccessat(2) + X_OK which only checks a subset of
> > > > > > access rights (i.e. inode permission and mount options for regular
> > > > > > files), but not the full context (e.g. all LSM access checks).  The main
> > > > > > use case for access(2) is for SUID processes to (partially) check access
> > > > > > on behalf of their caller.  The main use case for execveat(2) +
> > > > > > AT_EXECVE_CHECK is to check if a script execution would be allowed,
> > > > > > according to all the different restrictions in place.  Because the use
> > > > > > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
> > > > > > execution, user space gets the same error codes.
> > > > > >
> > > > > > An interesting point of using execveat(2) instead of openat2(2) is that
> > > > > > it decouples the check from the enforcement.  Indeed, the security check
> > > > > > can be logged (e.g. with audit) without blocking an execution
> > > > > > environment not yet ready to enforce a strict security policy.
> > > > > >
> > > > > > LSMs can control or log execution requests with
> > > > > > security_bprm_creds_for_exec().  However, to enforce a consistent and
> > > > > > complete access control (e.g. on binary's dependencies) LSMs should
> > > > > > restrict file executability, or mesure executed files, with
> > > > > > security_file_open() by checking file->f_flags & __FMODE_EXEC.
> > > > > >
> > > > > > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
> > > > > > doesn't make sense for the kernel to parse the checked files, look for
> > > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > > > never called when AT_EXECVE_CHECK is used.
> > > > > >
> > > > > > It should be noted that script interpreters cannot directly use
> > > > > > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
> > > > > > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
> > > > > > being executed to interpret the script.  Unlike the kernel, script
> > > > > > interpreters may just interpret the shebang as a simple comment, which
> > > > > > should not change for backward compatibility reasons.
> > > > > >
> > > > > > Because scripts or libraries files might not currently have the
> > > > > > executable permission set, or because we might want specific users to be
> > > > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > > > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
> > > > > > SECBIT_EXEC_DENY_INTERACTIVE securebits.
> > > > > >
> > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > > > This patch has been used for more than a decade with customized script
> > > > > > interpreters.  Some examples can be found here:
> > > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > > > >
> > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > > > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net
> > > > > > ---
> > > > > >
> > > > > > Changes since v20:
> > > > > > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
> > > > > >   Serge Hallyn.
> > > > > > * Move the UAPI documentation to a dedicated RST file.
> > > > > > * Add Reviewed-by: Serge Hallyn
> > > > > >
> > > > > > Changes since v19:
> > > > > > * Remove mention of "role transition" as suggested by Andy.
> > > > > > * Highlight the difference between security_bprm_creds_for_exec() and
> > > > > >   the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
> > > > > >   discussed with Jeff.
> > > > > > * Improve documentation both in UAPI comments and kernel comments
> > > > > >   (requested by Kees).
> > > > > >
> > > > > > New design since v18:
> > > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > > > ---
> > > > > >  Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
> > > > > >  Documentation/userspace-api/index.rst      |  1 +
> > > > > >  fs/exec.c                                  | 20 +++++++++++--
> > > > > >  include/linux/binfmts.h                    |  7 ++++-
> > > > > >  include/uapi/linux/fcntl.h                 |  4 +++
> > > > > >  kernel/audit.h                             |  1 +
> > > > > >  kernel/auditsc.c                           |  1 +
> > > > > >  security/security.c                        | 10 +++++++
> > > > > >  8 files changed, 75 insertions(+), 3 deletions(-)
> > > > > >  create mode 100644 Documentation/userspace-api/check_exec.rst
> > > > > >
> > > > > > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> > > > > > new file mode 100644
> > > > > > index 000000000000..ad1aeaa5f6c0
> > > > > > --- /dev/null
> > > > > > +++ b/Documentation/userspace-api/check_exec.rst
> > > > > > @@ -0,0 +1,34 @@
> > > > > > +===================
> > > > > > +Executability check
> > > > > > +===================
> > > > > > +
> > > > > > +AT_EXECVE_CHECK
> > > > > > +===============
> > > > > > +
> > > > > > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
> > > > > > +check on a regular file and returns 0 if execution of this file would be
> > > > > > +allowed, ignoring the file format and then the related interpreter dependencies
> > > > > > +(e.g. ELF libraries, script's shebang).
> > > > > > +
> > > > > > +Programs should always perform this check to apply kernel-level checks against
> > > > > > +files that are not directly executed by the kernel but passed to a user space
> > > > > > +interpreter instead.  All files that contain executable code, from the point of
> > > > > > +view of the interpreter, should be checked.  However the result of this check
> > > > > > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
> > > > > > +``SECBIT_EXEC_DENY_INTERACTIVE.``.
> > > > > Regarding "should only"
> > > > > Userspace (e.g. libc) could decide to enforce even when
> > > > > SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing
> > > > > doesn't make sense.
> > > >
> > > > User space is always in control, but I don't think it would be wise to
> > > > not follow the configuration securebits (in a generic system) because
> > > > this could result to unattended behaviors (I don't have a specific one
> > > > in mind but...).  That being said, configuration and checks are
> > > > standalones and specific/tailored systems are free to do the checks they
> > > > want.
> > > >
> > > In the case of dynamic linker, we can always enforce honoring the
> > > execveat(AT_EXECVE_CHECK) result, right ? I can't think of a case not
> > > to,  the dynamic linker doesn't need to check the
> > > SECBIT_EXEC_RESTRICT_FILE bit.
> >
> > If the library file is not allowed to be executed by *all* access
> > control systems (not just mount and file permission, but all LSMs), then
> > the AT_EXECVE_CHECK will fail, which is OK as long as it is not a hard
> > requirement.
> Yes. specifically for the library loading case, I can't think of a
> case where we need to by-pass LSMs.  (letting user space to by-pass
> LSM check seems questionable in concept, and should only be used when
> there aren't other solutions). In the context of SELINUX enforcing
> mode,  we will want to enforce it. In the context of process level LSM
> such as landlock,  the process can already decide for itself by
> selecting the policy for its own domain, it is unnecessary to use
> another opt-out solution.

My answer wasn't clear.  The execveat(AT_EXECVE_CHECK) can and should
always be done, but user space should only enforce restrictions
according to the securebits.

It doesn't make sense to talk about user space "bypassing" kernel
checks.  This patch series provides a feature to enable user space to
enforce (at its level) the same checks as the kernel.

There is no opt-out solution, but compatibility configuration bits
through securebits (which can also be set by LSMs).

To answer your question about the dynamic linker, there should be no
difference of behavior with a script interpreter.  Both should check
executability but only enforce restriction according to the securebits
(as explained in the documentation).  Doing otherwise on a generic
distro could lead to unexpected behaviors (e.g. if a user enforced a
specific SELinux policy that doesn't allow execution of library files).

> 
> There is one case where I see a difference:
> ld.so a.out (when a.out is on non-exec mount)
> 
> If the dynamic linker doesn't read SECBIT_EXEC_RESTRICT_FILE setting,
> above will always fail. But that is more of a bugfix.

No, the dynamic linker should only enforce restrictions according to the
securebits, otherwise a user space update (e.g. with a new dynamic
linker ignoring the securebits) could break an existing system.

> 
> >Relying on the securebits to know if this is a hard
> > requirement or not enables system administrator and distros to control
> > this potential behavior change.
> >
> I think, for the dynamic linker, it can be a hard requirement.

Not on a generic distro.

> 
> For scripts, the cases are more complicated and we can't just enforce
> it,  therefore have to rely on security bits to give a pre-process
> level control.
> 
> > >
> > > script interpreters need to check this though,  because the apps might
> > > need to adjust/test the scripts they are calling, so
> > > SECBIT_EXEC_RESTRICT_FILE can be used to opt-out the enforcement.
> > >
> > > > > When SECBIT_EXEC_RESTRICT_FILE=1,  userspace is bound to enforce.
> > > > >
> > > > > > +
> > > > > > +The main purpose of this flag is to improve the security and consistency of an
> > > > > > +execution environment to ensure that direct file execution (e.g.
> > > > > > +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to
> > > > > > +the same result.  For instance, this can be used to check if a file is
> > > > > > +trustworthy according to the caller's environment.
> > > > > > +
> > > > > > +In a secure environment, libraries and any executable dependencies should also
> > > > > > +be checked.  For instance, dynamic linking should make sure that all libraries
> > > > > > +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``).
> > > > > > +For such secure execution environment to make sense, only trusted code should
> > > > > > +be executable, which also requires integrity guarantees.
> > > > > > +
> > > > > > +To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > > > +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
> > > > > > +file descriptor instead of a path.
> > > > > > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> > > > > > index 274cc7546efc..6272bcf11296 100644
> > > > > > --- a/Documentation/userspace-api/index.rst
> > > > > > +++ b/Documentation/userspace-api/index.rst
> > > > > > @@ -35,6 +35,7 @@ Security-related interfaces
> > > > > >     mfd_noexec
> > > > > >     spec_ctrl
> > > > > >     tee
> > > > > > +   check_exec
> > > > > >
> > > > > >  Devices and I/O
> > > > > >  ===============
> > > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > > index 6c53920795c2..bb83b6a39530 100644
> > > > > > --- a/fs/exec.c
> > > > > > +++ b/fs/exec.c
> > > > > > @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > > > >         };
> > > > > >
> > > > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > > +       if ((flags &
> > > > > > +            ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0)
> > > > > >                 return ERR_PTR(-EINVAL);
> > > > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > > > @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > > > >         }
> > > > > >         bprm->interp = bprm->filename;
> > > > > >
> > > > > > +       /*
> > > > > > +        * At this point, security_file_open() has already been called (with
> > > > > > +        * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will
> > > > > > +        * stop just after the security_bprm_creds_for_exec() call in
> > > > > > +        * bprm_execve().  Indeed, the kernel should not try to parse the
> > > > > > +        * content of the file with exec_binprm() nor change the calling
> > > > > > +        * thread, which means that the following security functions will be
> > > > > > +        * not called:
> > > > > > +        * - security_bprm_check()
> > > > > > +        * - security_bprm_creds_from_file()
> > > > > > +        * - security_bprm_committing_creds()
> > > > > > +        * - security_bprm_committed_creds()
> > > > > > +        */
> > > > > > +       bprm->is_check = !!(flags & AT_EXECVE_CHECK);
> > > > > > +
> > > > > >         retval = bprm_mm_init(bprm);
> > > > > >         if (!retval)
> > > > > >                 return bprm;
> > > > > > @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > > > >
> > > > > >         /* Set the unchanging part of bprm->cred */
> > > > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > > > -       if (retval)
> > > > > > +       if (retval || bprm->is_check)
> > > > > >                 goto out;
> > > > > >
> > > > > >         retval = exec_binprm(bprm);
> > > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > > > index e6c00e860951..8ff0eb3644a1 100644
> > > > > > --- a/include/linux/binfmts.h
> > > > > > +++ b/include/linux/binfmts.h
> > > > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > > > >                  * Set when errors can no longer be returned to the
> > > > > >                  * original userspace.
> > > > > >                  */
> > > > > > -               point_of_no_return:1;
> > > > > > +               point_of_no_return:1,
> > > > > > +               /*
> > > > > > +                * Set by user space to check executability according to the
> > > > > > +                * caller's environment.
> > > > > > +                */
> > > > > > +               is_check:1;
> > > > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > > > >         struct file *interpreter;
> > > > > >         struct file *file;
> > > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > > > index 87e2dec79fea..2e87f2e3a79f 100644
> > > > > > --- a/include/uapi/linux/fcntl.h
> > > > > > +++ b/include/uapi/linux/fcntl.h
> > > > > > @@ -154,6 +154,10 @@
> > > > > >                                            usable with open_by_handle_at(2). */
> > > > > >  #define AT_HANDLE_MNT_ID_UNIQUE        0x001   /* Return the u64 unique mount ID. */
> > > > > >
> > > > > > +/* Flags for execveat2(2). */
> > > > > > +#define AT_EXECVE_CHECK                0x10000 /* Only perform a check if execution
> > > > > > +                                          would be allowed. */
> > > > > > +
> > > > > >  #if defined(__KERNEL__)
> > > > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > > > >  #endif
> > > > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > > > --- a/kernel/audit.h
> > > > > > +++ b/kernel/audit.h
> > > > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > > > >                 struct open_how openat2;
> > > > > >                 struct {
> > > > > >                         int                     argc;
> > > > > > +                       bool                    is_check;
> > > > > >                 } execve;
> > > > > >                 struct {
> > > > > >                         char                    *name;
> > > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > > > index cd57053b4a69..8d9ba5600cf2 100644
> > > > > > --- a/kernel/auditsc.c
> > > > > > +++ b/kernel/auditsc.c
> > > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > > > >
> > > > > >         context->type = AUDIT_EXECVE;
> > > > > >         context->execve.argc = bprm->argc;
> > > > > > +       context->execve.is_check = bprm->is_check;
> > > > > Where is execve.is_check used ?
> > > >
> > > > It is used in bprm_execve(), exposed to the audit framework, and
> > > > potentially used by LSMs.
> > > >
> > > bprm_execve() uses bprm->is_check, not  the context->execve.is_check.
> >
> > Correct, this is only for audit but not used yet.
> >
> > Paul, Eric, do you want me to remove this field, leave it, or extend
> > this patch like this?
> >
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index 8d9ba5600cf2..12cf89fa224a 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -1290,6 +1290,8 @@ static void audit_log_execve_info(struct audit_context *context,
> >                 }
> >         } while (arg < context->execve.argc);
> >
> > +       audit_log_format(*ab, " check=%d", context->execve.is_check);
> > +
> >         /* NOTE: the caller handles the final audit_log_end() call */
> >
> >  out:
> >
> > >
> > >
> > > > >
> > > > >
> > > > > >  }
> > > > > >
> > > > > >
> > > > > > diff --git a/security/security.c b/security/security.c
> > > > > > index c5981e558bc2..456361ec249d 100644
> > > > > > --- a/security/security.c
> > > > > > +++ b/security/security.c
> > > > > > @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
> > > > > >   * to 1 if AT_SECURE should be set to request libc enable secure mode.  @bprm
> > > > > >   * contains the linux_binprm structure.
> > > > > >   *
> > > > > > + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is
> > > > > > + * set.  The result must be the same as without this flag even if the execution
> > > > > > + * will never really happen and @bprm will always be dropped.
> > > > > > + *
> > > > > > + * This hook must not change current->cred, only @bprm->cred.
> > > > > > + *
> > > > > >   * Return: Returns 0 if the hook is successful and permission is granted.
> > > > > >   */
> > > > > >  int security_bprm_creds_for_exec(struct linux_binprm *bprm)
> > > > > > @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file)
> > > > > >   * Save open-time permission checking state for later use upon file_permission,
> > > > > >   * and recheck access if anything has changed since inode_permission.
> > > > > >   *
> > > > > > + * We can check if a file is opened for execution (e.g. execve(2) call), either
> > > > > > + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags &
> > > > > > + * __FMODE_EXEC .
> > > > > > + *
> > > > > >   * Return: Returns 0 if permission is granted.
> > > > > >   */
> > > > > >  int security_file_open(struct file *file)
> > > > > > --
> > > > > > 2.47.0
> > > > > >
> > > > > >
> > >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 6/6] samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
  2024-11-21 20:34   ` Mimi Zohar
@ 2024-11-22 14:50     ` Mickaël Salaün
  2024-11-26 17:41       ` Mimi Zohar
  0 siblings, 1 reply; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-22 14:50 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Thu, Nov 21, 2024 at 03:34:47PM -0500, Mimi Zohar wrote:
> Hi Mickaël,
> 
> On Tue, 2024-11-12 at 20:18 +0100, Mickaël Salaün wrote:
> > 
> > +
> > +/* Returns 1 on error, 0 otherwise. */
> > +static int interpret_stream(FILE *script, char *const script_name,
> > +			    char *const *const envp, const bool restrict_stream)
> > +{
> > +	int err;
> > +	char *const script_argv[] = { script_name, NULL };
> > +	char buf[128] = {};
> > +	size_t buf_size = sizeof(buf);
> > +
> > +	/*
> > +	 * We pass a valid argv and envp to the kernel to emulate a native
> > +	 * script execution.  We must use the script file descriptor instead of
> > +	 * the script path name to avoid race conditions.
> > +	 */
> > +	err = execveat(fileno(script), "", script_argv, envp,
> > +		       AT_EMPTY_PATH | AT_EXECVE_CHECK);
> 
> At least with v20, the AT_CHECK always was being set, independent of whether
> set-exec.c set it.  I'll re-test with v21.

AT_EXECVE_CEHCK should always be set, only the interpretation of the
result should be relative to securebits.  This is highlighted in the
documentation.

> 
> thanks,
> 
> Mimi
> 
> > +	if (err && restrict_stream) {
> > +		perror("ERROR: Script execution check");
> > +		return 1;
> > +	}
> > +
> > +	/* Reads script. */
> > +	buf_size = fread(buf, 1, buf_size - 1, script);
> > +	return interpret_buffer(buf, buf_size);
> > +}
> > +
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-22 14:50             ` Mickaël Salaün
@ 2024-11-25 17:38               ` Jeff Xu
  2024-11-27 12:07                 ` Mickaël Salaün
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Xu @ 2024-11-25 17:38 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Jeff Xu, Al Viro, Christian Brauner, Kees Cook, Paul Moore,
	Serge Hallyn, Adhemerval Zanella Netto, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Elliott Hughes,
	Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Linus Torvalds, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Eric Paris, audit

On Fri, Nov 22, 2024 at 6:50 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Thu, Nov 21, 2024 at 10:27:40AM -0800, Jeff Xu wrote:
> > On Thu, Nov 21, 2024 at 5:40 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Wed, Nov 20, 2024 at 08:06:07AM -0800, Jeff Xu wrote:
> > > > On Wed, Nov 20, 2024 at 1:42 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote:
> > > > > > On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
> > > > > > > be allowed for execution.  The main use case is for script interpreters
> > > > > > > and dynamic linkers to check execution permission according to the
> > > > > > > kernel's security policy. Another use case is to add context to access
> > > > > > > logs e.g., which script (instead of interpreter) accessed a file.  As
> > > > > > > any executable code, scripts could also use this check [1].
> > > > > > >
> > > > > > > This is different from faccessat(2) + X_OK which only checks a subset of
> > > > > > > access rights (i.e. inode permission and mount options for regular
> > > > > > > files), but not the full context (e.g. all LSM access checks).  The main
> > > > > > > use case for access(2) is for SUID processes to (partially) check access
> > > > > > > on behalf of their caller.  The main use case for execveat(2) +
> > > > > > > AT_EXECVE_CHECK is to check if a script execution would be allowed,
> > > > > > > according to all the different restrictions in place.  Because the use
> > > > > > > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
> > > > > > > execution, user space gets the same error codes.
> > > > > > >
> > > > > > > An interesting point of using execveat(2) instead of openat2(2) is that
> > > > > > > it decouples the check from the enforcement.  Indeed, the security check
> > > > > > > can be logged (e.g. with audit) without blocking an execution
> > > > > > > environment not yet ready to enforce a strict security policy.
> > > > > > >
> > > > > > > LSMs can control or log execution requests with
> > > > > > > security_bprm_creds_for_exec().  However, to enforce a consistent and
> > > > > > > complete access control (e.g. on binary's dependencies) LSMs should
> > > > > > > restrict file executability, or mesure executed files, with
> > > > > > > security_file_open() by checking file->f_flags & __FMODE_EXEC.
> > > > > > >
> > > > > > > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
> > > > > > > doesn't make sense for the kernel to parse the checked files, look for
> > > > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > > > > never called when AT_EXECVE_CHECK is used.
> > > > > > >
> > > > > > > It should be noted that script interpreters cannot directly use
> > > > > > > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
> > > > > > > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
> > > > > > > being executed to interpret the script.  Unlike the kernel, script
> > > > > > > interpreters may just interpret the shebang as a simple comment, which
> > > > > > > should not change for backward compatibility reasons.
> > > > > > >
> > > > > > > Because scripts or libraries files might not currently have the
> > > > > > > executable permission set, or because we might want specific users to be
> > > > > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > > > > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
> > > > > > > SECBIT_EXEC_DENY_INTERACTIVE securebits.
> > > > > > >
> > > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > > > > This patch has been used for more than a decade with customized script
> > > > > > > interpreters.  Some examples can be found here:
> > > > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > > > > >
> > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > > > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > > > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > > > > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net
> > > > > > > ---
> > > > > > >
> > > > > > > Changes since v20:
> > > > > > > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
> > > > > > >   Serge Hallyn.
> > > > > > > * Move the UAPI documentation to a dedicated RST file.
> > > > > > > * Add Reviewed-by: Serge Hallyn
> > > > > > >
> > > > > > > Changes since v19:
> > > > > > > * Remove mention of "role transition" as suggested by Andy.
> > > > > > > * Highlight the difference between security_bprm_creds_for_exec() and
> > > > > > >   the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
> > > > > > >   discussed with Jeff.
> > > > > > > * Improve documentation both in UAPI comments and kernel comments
> > > > > > >   (requested by Kees).
> > > > > > >
> > > > > > > New design since v18:
> > > > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > > > > ---
> > > > > > >  Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
> > > > > > >  Documentation/userspace-api/index.rst      |  1 +
> > > > > > >  fs/exec.c                                  | 20 +++++++++++--
> > > > > > >  include/linux/binfmts.h                    |  7 ++++-
> > > > > > >  include/uapi/linux/fcntl.h                 |  4 +++
> > > > > > >  kernel/audit.h                             |  1 +
> > > > > > >  kernel/auditsc.c                           |  1 +
> > > > > > >  security/security.c                        | 10 +++++++
> > > > > > >  8 files changed, 75 insertions(+), 3 deletions(-)
> > > > > > >  create mode 100644 Documentation/userspace-api/check_exec.rst
> > > > > > >
> > > > > > > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> > > > > > > new file mode 100644
> > > > > > > index 000000000000..ad1aeaa5f6c0
> > > > > > > --- /dev/null
> > > > > > > +++ b/Documentation/userspace-api/check_exec.rst
> > > > > > > @@ -0,0 +1,34 @@
> > > > > > > +===================
> > > > > > > +Executability check
> > > > > > > +===================
> > > > > > > +
> > > > > > > +AT_EXECVE_CHECK
> > > > > > > +===============
> > > > > > > +
> > > > > > > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
> > > > > > > +check on a regular file and returns 0 if execution of this file would be
> > > > > > > +allowed, ignoring the file format and then the related interpreter dependencies
> > > > > > > +(e.g. ELF libraries, script's shebang).
> > > > > > > +
> > > > > > > +Programs should always perform this check to apply kernel-level checks against
> > > > > > > +files that are not directly executed by the kernel but passed to a user space
> > > > > > > +interpreter instead.  All files that contain executable code, from the point of
> > > > > > > +view of the interpreter, should be checked.  However the result of this check
> > > > > > > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
> > > > > > > +``SECBIT_EXEC_DENY_INTERACTIVE.``.
> > > > > > Regarding "should only"
> > > > > > Userspace (e.g. libc) could decide to enforce even when
> > > > > > SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing
> > > > > > doesn't make sense.
> > > > >
> > > > > User space is always in control, but I don't think it would be wise to
> > > > > not follow the configuration securebits (in a generic system) because
> > > > > this could result to unattended behaviors (I don't have a specific one
> > > > > in mind but...).  That being said, configuration and checks are
> > > > > standalones and specific/tailored systems are free to do the checks they
> > > > > want.
> > > > >
> > > > In the case of dynamic linker, we can always enforce honoring the
> > > > execveat(AT_EXECVE_CHECK) result, right ? I can't think of a case not
> > > > to,  the dynamic linker doesn't need to check the
> > > > SECBIT_EXEC_RESTRICT_FILE bit.
> > >
> > > If the library file is not allowed to be executed by *all* access
> > > control systems (not just mount and file permission, but all LSMs), then
> > > the AT_EXECVE_CHECK will fail, which is OK as long as it is not a hard
> > > requirement.
> > Yes. specifically for the library loading case, I can't think of a
> > case where we need to by-pass LSMs.  (letting user space to by-pass
> > LSM check seems questionable in concept, and should only be used when
> > there aren't other solutions). In the context of SELINUX enforcing
> > mode,  we will want to enforce it. In the context of process level LSM
> > such as landlock,  the process can already decide for itself by
> > selecting the policy for its own domain, it is unnecessary to use
> > another opt-out solution.
>
> My answer wasn't clear.  The execveat(AT_EXECVE_CHECK) can and should
> always be done, but user space should only enforce restrictions
> according to the securebits.
>
I knew this part (AT_EXESCVE_CHECK is called always)
Since the securebits are enforced by userspace, setting it to 0 is
equivalent to opt-out enforcement, that is what I meant by opt-out.

> It doesn't make sense to talk about user space "bypassing" kernel
> checks.  This patch series provides a feature to enable user space to
> enforce (at its level) the same checks as the kernel.
>
> There is no opt-out solution, but compatibility configuration bits
> through securebits (which can also be set by LSMs).
>
> To answer your question about the dynamic linker, there should be no
> difference of behavior with a script interpreter.  Both should check
> executability but only enforce restriction according to the securebits
> (as explained in the documentation).  Doing otherwise on a generic
> distro could lead to unexpected behaviors (e.g. if a user enforced a
> specific SELinux policy that doesn't allow execution of library files).
>
> >
> > There is one case where I see a difference:
> > ld.so a.out (when a.out is on non-exec mount)
> >
> > If the dynamic linker doesn't read SECBIT_EXEC_RESTRICT_FILE setting,
> > above will always fail. But that is more of a bugfix.
>
> No, the dynamic linker should only enforce restrictions according to the
> securebits, otherwise a user space update (e.g. with a new dynamic
> linker ignoring the securebits) could break an existing system.
>
OK. upgrade is a valid concern. Previously, I was just thinking about
a new LSM based on this check, not existing LSM policies.
Do you happen to know which SELinux policy/LSM could break ? i.e. it
will be applied to libraries once we add AT_EXESCVE_CHECK in the
dynamic linker.
We could give heads up and prepare for that.

> >
> > >Relying on the securebits to know if this is a hard
> > > requirement or not enables system administrator and distros to control
> > > this potential behavior change.
> > >
> > I think, for the dynamic linker, it can be a hard requirement.
>
> Not on a generic distro.
>
Ok. Maybe this can be done through a configuration option for the
dynamic linker.

The consideration I have is: securebits is currently designed to
control both dynamic linker and shell scripts.
The case for dynamic linker is simpler than scripts cases, (non-exec
mount, and perhaps some LSM policies for libraries) and distributions
such as ChromeOS can enforce the dynamic linker case ahead of scripts
interrupter cases, i.e. without waiting for python/shell being
upgraded, that can take sometimes.

> >
> > For scripts, the cases are more complicated and we can't just enforce
> > it,  therefore have to rely on security bits to give a pre-process
> > level control.
> >
> > > >
> > > > script interpreters need to check this though,  because the apps might
> > > > need to adjust/test the scripts they are calling, so
> > > > SECBIT_EXEC_RESTRICT_FILE can be used to opt-out the enforcement.
> > > >
> > > > > > When SECBIT_EXEC_RESTRICT_FILE=1,  userspace is bound to enforce.
> > > > > >
> > > > > > > +
> > > > > > > +The main purpose of this flag is to improve the security and consistency of an
> > > > > > > +execution environment to ensure that direct file execution (e.g.
> > > > > > > +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to
> > > > > > > +the same result.  For instance, this can be used to check if a file is
> > > > > > > +trustworthy according to the caller's environment.
> > > > > > > +
> > > > > > > +In a secure environment, libraries and any executable dependencies should also
> > > > > > > +be checked.  For instance, dynamic linking should make sure that all libraries
> > > > > > > +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``).
> > > > > > > +For such secure execution environment to make sense, only trusted code should
> > > > > > > +be executable, which also requires integrity guarantees.
> > > > > > > +
> > > > > > > +To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > > > > +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
> > > > > > > +file descriptor instead of a path.
> > > > > > > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> > > > > > > index 274cc7546efc..6272bcf11296 100644
> > > > > > > --- a/Documentation/userspace-api/index.rst
> > > > > > > +++ b/Documentation/userspace-api/index.rst
> > > > > > > @@ -35,6 +35,7 @@ Security-related interfaces
> > > > > > >     mfd_noexec
> > > > > > >     spec_ctrl
> > > > > > >     tee
> > > > > > > +   check_exec
> > > > > > >
> > > > > > >  Devices and I/O
> > > > > > >  ===============
> > > > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > > > index 6c53920795c2..bb83b6a39530 100644
> > > > > > > --- a/fs/exec.c
> > > > > > > +++ b/fs/exec.c
> > > > > > > @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > > > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > > > > >         };
> > > > > > >
> > > > > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > > > +       if ((flags &
> > > > > > > +            ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0)
> > > > > > >                 return ERR_PTR(-EINVAL);
> > > > > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > > > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > > > > @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > > > > >         }
> > > > > > >         bprm->interp = bprm->filename;
> > > > > > >
> > > > > > > +       /*
> > > > > > > +        * At this point, security_file_open() has already been called (with
> > > > > > > +        * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will
> > > > > > > +        * stop just after the security_bprm_creds_for_exec() call in
> > > > > > > +        * bprm_execve().  Indeed, the kernel should not try to parse the
> > > > > > > +        * content of the file with exec_binprm() nor change the calling
> > > > > > > +        * thread, which means that the following security functions will be
> > > > > > > +        * not called:
> > > > > > > +        * - security_bprm_check()
> > > > > > > +        * - security_bprm_creds_from_file()
> > > > > > > +        * - security_bprm_committing_creds()
> > > > > > > +        * - security_bprm_committed_creds()
> > > > > > > +        */
> > > > > > > +       bprm->is_check = !!(flags & AT_EXECVE_CHECK);
> > > > > > > +
> > > > > > >         retval = bprm_mm_init(bprm);
> > > > > > >         if (!retval)
> > > > > > >                 return bprm;
> > > > > > > @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > > > > >
> > > > > > >         /* Set the unchanging part of bprm->cred */
> > > > > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > > > > -       if (retval)
> > > > > > > +       if (retval || bprm->is_check)
> > > > > > >                 goto out;
> > > > > > >
> > > > > > >         retval = exec_binprm(bprm);
> > > > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > > > > index e6c00e860951..8ff0eb3644a1 100644
> > > > > > > --- a/include/linux/binfmts.h
> > > > > > > +++ b/include/linux/binfmts.h
> > > > > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > > > > >                  * Set when errors can no longer be returned to the
> > > > > > >                  * original userspace.
> > > > > > >                  */
> > > > > > > -               point_of_no_return:1;
> > > > > > > +               point_of_no_return:1,
> > > > > > > +               /*
> > > > > > > +                * Set by user space to check executability according to the
> > > > > > > +                * caller's environment.
> > > > > > > +                */
> > > > > > > +               is_check:1;
> > > > > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > > > > >         struct file *interpreter;
> > > > > > >         struct file *file;
> > > > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > > > > index 87e2dec79fea..2e87f2e3a79f 100644
> > > > > > > --- a/include/uapi/linux/fcntl.h
> > > > > > > +++ b/include/uapi/linux/fcntl.h
> > > > > > > @@ -154,6 +154,10 @@
> > > > > > >                                            usable with open_by_handle_at(2). */
> > > > > > >  #define AT_HANDLE_MNT_ID_UNIQUE        0x001   /* Return the u64 unique mount ID. */
> > > > > > >
> > > > > > > +/* Flags for execveat2(2). */
> > > > > > > +#define AT_EXECVE_CHECK                0x10000 /* Only perform a check if execution
> > > > > > > +                                          would be allowed. */
> > > > > > > +
> > > > > > >  #if defined(__KERNEL__)
> > > > > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > > > > >  #endif
> > > > > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > > > > --- a/kernel/audit.h
> > > > > > > +++ b/kernel/audit.h
> > > > > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > > > > >                 struct open_how openat2;
> > > > > > >                 struct {
> > > > > > >                         int                     argc;
> > > > > > > +                       bool                    is_check;
> > > > > > >                 } execve;
> > > > > > >                 struct {
> > > > > > >                         char                    *name;
> > > > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > > > > index cd57053b4a69..8d9ba5600cf2 100644
> > > > > > > --- a/kernel/auditsc.c
> > > > > > > +++ b/kernel/auditsc.c
> > > > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > > > > >
> > > > > > >         context->type = AUDIT_EXECVE;
> > > > > > >         context->execve.argc = bprm->argc;
> > > > > > > +       context->execve.is_check = bprm->is_check;
> > > > > > Where is execve.is_check used ?
> > > > >
> > > > > It is used in bprm_execve(), exposed to the audit framework, and
> > > > > potentially used by LSMs.
> > > > >
> > > > bprm_execve() uses bprm->is_check, not  the context->execve.is_check.
> > >
> > > Correct, this is only for audit but not used yet.
> > >
> > > Paul, Eric, do you want me to remove this field, leave it, or extend
> > > this patch like this?
> > >
> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index 8d9ba5600cf2..12cf89fa224a 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -1290,6 +1290,8 @@ static void audit_log_execve_info(struct audit_context *context,
> > >                 }
> > >         } while (arg < context->execve.argc);
> > >
> > > +       audit_log_format(*ab, " check=%d", context->execve.is_check);
> > > +
> > >         /* NOTE: the caller handles the final audit_log_end() call */
> > >
> > >  out:
> > >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > >  }
> > > > > > >
> > > > > > >
> > > > > > > diff --git a/security/security.c b/security/security.c
> > > > > > > index c5981e558bc2..456361ec249d 100644
> > > > > > > --- a/security/security.c
> > > > > > > +++ b/security/security.c
> > > > > > > @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
> > > > > > >   * to 1 if AT_SECURE should be set to request libc enable secure mode.  @bprm
> > > > > > >   * contains the linux_binprm structure.
> > > > > > >   *
> > > > > > > + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is
> > > > > > > + * set.  The result must be the same as without this flag even if the execution
> > > > > > > + * will never really happen and @bprm will always be dropped.
> > > > > > > + *
> > > > > > > + * This hook must not change current->cred, only @bprm->cred.
> > > > > > > + *
> > > > > > >   * Return: Returns 0 if the hook is successful and permission is granted.
> > > > > > >   */
> > > > > > >  int security_bprm_creds_for_exec(struct linux_binprm *bprm)
> > > > > > > @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file)
> > > > > > >   * Save open-time permission checking state for later use upon file_permission,
> > > > > > >   * and recheck access if anything has changed since inode_permission.
> > > > > > >   *
> > > > > > > + * We can check if a file is opened for execution (e.g. execve(2) call), either
> > > > > > > + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags &
> > > > > > > + * __FMODE_EXEC .
> > > > > > > + *
> > > > > > >   * Return: Returns 0 if permission is granted.
> > > > > > >   */
> > > > > > >  int security_file_open(struct file *file)
> > > > > > > --
> > > > > > > 2.47.0
> > > > > > >
> > > > > > >
> > > >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 6/6] samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
  2024-11-22 14:50     ` Mickaël Salaün
@ 2024-11-26 17:41       ` Mimi Zohar
  2024-11-27 12:10         ` Mickaël Salaün
  0 siblings, 1 reply; 25+ messages in thread
From: Mimi Zohar @ 2024-11-26 17:41 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Fri, 2024-11-22 at 15:50 +0100, Mickaël Salaün wrote:
> On Thu, Nov 21, 2024 at 03:34:47PM -0500, Mimi Zohar wrote:
> > Hi Mickaël,
> > 
> > On Tue, 2024-11-12 at 20:18 +0100, Mickaël Salaün wrote:
> > > 
> > > +
> > > +/* Returns 1 on error, 0 otherwise. */
> > > +static int interpret_stream(FILE *script, char *const script_name,
> > > +			    char *const *const envp, const bool restrict_stream)
> > > +{
> > > +	int err;
> > > +	char *const script_argv[] = { script_name, NULL };
> > > +	char buf[128] = {};
> > > +	size_t buf_size = sizeof(buf);
> > > +
> > > +	/*
> > > +	 * We pass a valid argv and envp to the kernel to emulate a native
> > > +	 * script execution.  We must use the script file descriptor instead of
> > > +	 * the script path name to avoid race conditions.
> > > +	 */
> > > +	err = execveat(fileno(script), "", script_argv, envp,
> > > +		       AT_EMPTY_PATH | AT_EXECVE_CHECK);
> > 
> > At least with v20, the AT_CHECK always was being set, independent of whether
> > set-exec.c set it.  I'll re-test with v21.
> 
> AT_EXECVE_CEHCK should always be set, only the interpretation of the
> result should be relative to securebits.  This is highlighted in the
> documentation.

Sure, that sounds correct.  With an IMA-appraisal policy, any unsigned script
with the is_check flag set now emits an "cause=IMA-signature-required" audit
message.  However since IMA-appraisal isn't enforcing file signatures, this
sounds wrong.

New audit messages like "IMA-signature-required-by-interpreter" and "IMA-
signature-not-required-by-interpreter" would need to be defined based on the
SECBIT_EXEC_RESTRICT_FILE.


> > 
> > > +	if (err && restrict_stream) {
> > > +		perror("ERROR: Script execution check");
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Reads script. */
> > > +	buf_size = fread(buf, 1, buf_size - 1, script);
> > > +	return interpret_buffer(buf, buf_size);
> > > +}
> > > +
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-25 17:38               ` Jeff Xu
@ 2024-11-27 12:07                 ` Mickaël Salaün
  2024-11-27 15:14                   ` Jeff Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-27 12:07 UTC (permalink / raw)
  To: Jeff Xu
  Cc: Jeff Xu, Al Viro, Christian Brauner, Kees Cook, Paul Moore,
	Serge Hallyn, Adhemerval Zanella Netto, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Elliott Hughes,
	Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Linus Torvalds, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Eric Paris, audit

On Mon, Nov 25, 2024 at 09:38:51AM -0800, Jeff Xu wrote:
> On Fri, Nov 22, 2024 at 6:50 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Thu, Nov 21, 2024 at 10:27:40AM -0800, Jeff Xu wrote:
> > > On Thu, Nov 21, 2024 at 5:40 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Wed, Nov 20, 2024 at 08:06:07AM -0800, Jeff Xu wrote:
> > > > > On Wed, Nov 20, 2024 at 1:42 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote:
> > > > > > > On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > >
> > > > > > > > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
> > > > > > > > be allowed for execution.  The main use case is for script interpreters
> > > > > > > > and dynamic linkers to check execution permission according to the
> > > > > > > > kernel's security policy. Another use case is to add context to access
> > > > > > > > logs e.g., which script (instead of interpreter) accessed a file.  As
> > > > > > > > any executable code, scripts could also use this check [1].
> > > > > > > >
> > > > > > > > This is different from faccessat(2) + X_OK which only checks a subset of
> > > > > > > > access rights (i.e. inode permission and mount options for regular
> > > > > > > > files), but not the full context (e.g. all LSM access checks).  The main
> > > > > > > > use case for access(2) is for SUID processes to (partially) check access
> > > > > > > > on behalf of their caller.  The main use case for execveat(2) +
> > > > > > > > AT_EXECVE_CHECK is to check if a script execution would be allowed,
> > > > > > > > according to all the different restrictions in place.  Because the use
> > > > > > > > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
> > > > > > > > execution, user space gets the same error codes.
> > > > > > > >
> > > > > > > > An interesting point of using execveat(2) instead of openat2(2) is that
> > > > > > > > it decouples the check from the enforcement.  Indeed, the security check
> > > > > > > > can be logged (e.g. with audit) without blocking an execution
> > > > > > > > environment not yet ready to enforce a strict security policy.
> > > > > > > >
> > > > > > > > LSMs can control or log execution requests with
> > > > > > > > security_bprm_creds_for_exec().  However, to enforce a consistent and
> > > > > > > > complete access control (e.g. on binary's dependencies) LSMs should
> > > > > > > > restrict file executability, or mesure executed files, with
> > > > > > > > security_file_open() by checking file->f_flags & __FMODE_EXEC.
> > > > > > > >
> > > > > > > > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
> > > > > > > > doesn't make sense for the kernel to parse the checked files, look for
> > > > > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > > > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > > > > > never called when AT_EXECVE_CHECK is used.
> > > > > > > >
> > > > > > > > It should be noted that script interpreters cannot directly use
> > > > > > > > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
> > > > > > > > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
> > > > > > > > being executed to interpret the script.  Unlike the kernel, script
> > > > > > > > interpreters may just interpret the shebang as a simple comment, which
> > > > > > > > should not change for backward compatibility reasons.
> > > > > > > >
> > > > > > > > Because scripts or libraries files might not currently have the
> > > > > > > > executable permission set, or because we might want specific users to be
> > > > > > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > > > > > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
> > > > > > > > SECBIT_EXEC_DENY_INTERACTIVE securebits.
> > > > > > > >
> > > > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > > > > > This patch has been used for more than a decade with customized script
> > > > > > > > interpreters.  Some examples can be found here:
> > > > > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > > > > > >
> > > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > > > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > > > > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > > > > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > > > > > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net
> > > > > > > > ---
> > > > > > > >
> > > > > > > > Changes since v20:
> > > > > > > > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
> > > > > > > >   Serge Hallyn.
> > > > > > > > * Move the UAPI documentation to a dedicated RST file.
> > > > > > > > * Add Reviewed-by: Serge Hallyn
> > > > > > > >
> > > > > > > > Changes since v19:
> > > > > > > > * Remove mention of "role transition" as suggested by Andy.
> > > > > > > > * Highlight the difference between security_bprm_creds_for_exec() and
> > > > > > > >   the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
> > > > > > > >   discussed with Jeff.
> > > > > > > > * Improve documentation both in UAPI comments and kernel comments
> > > > > > > >   (requested by Kees).
> > > > > > > >
> > > > > > > > New design since v18:
> > > > > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > > > > > ---
> > > > > > > >  Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
> > > > > > > >  Documentation/userspace-api/index.rst      |  1 +
> > > > > > > >  fs/exec.c                                  | 20 +++++++++++--
> > > > > > > >  include/linux/binfmts.h                    |  7 ++++-
> > > > > > > >  include/uapi/linux/fcntl.h                 |  4 +++
> > > > > > > >  kernel/audit.h                             |  1 +
> > > > > > > >  kernel/auditsc.c                           |  1 +
> > > > > > > >  security/security.c                        | 10 +++++++
> > > > > > > >  8 files changed, 75 insertions(+), 3 deletions(-)
> > > > > > > >  create mode 100644 Documentation/userspace-api/check_exec.rst
> > > > > > > >
> > > > > > > > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> > > > > > > > new file mode 100644
> > > > > > > > index 000000000000..ad1aeaa5f6c0
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/Documentation/userspace-api/check_exec.rst
> > > > > > > > @@ -0,0 +1,34 @@
> > > > > > > > +===================
> > > > > > > > +Executability check
> > > > > > > > +===================
> > > > > > > > +
> > > > > > > > +AT_EXECVE_CHECK
> > > > > > > > +===============
> > > > > > > > +
> > > > > > > > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
> > > > > > > > +check on a regular file and returns 0 if execution of this file would be
> > > > > > > > +allowed, ignoring the file format and then the related interpreter dependencies
> > > > > > > > +(e.g. ELF libraries, script's shebang).
> > > > > > > > +
> > > > > > > > +Programs should always perform this check to apply kernel-level checks against
> > > > > > > > +files that are not directly executed by the kernel but passed to a user space
> > > > > > > > +interpreter instead.  All files that contain executable code, from the point of
> > > > > > > > +view of the interpreter, should be checked.  However the result of this check
> > > > > > > > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
> > > > > > > > +``SECBIT_EXEC_DENY_INTERACTIVE.``.
> > > > > > > Regarding "should only"
> > > > > > > Userspace (e.g. libc) could decide to enforce even when
> > > > > > > SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing
> > > > > > > doesn't make sense.
> > > > > >
> > > > > > User space is always in control, but I don't think it would be wise to
> > > > > > not follow the configuration securebits (in a generic system) because
> > > > > > this could result to unattended behaviors (I don't have a specific one
> > > > > > in mind but...).  That being said, configuration and checks are
> > > > > > standalones and specific/tailored systems are free to do the checks they
> > > > > > want.
> > > > > >
> > > > > In the case of dynamic linker, we can always enforce honoring the
> > > > > execveat(AT_EXECVE_CHECK) result, right ? I can't think of a case not
> > > > > to,  the dynamic linker doesn't need to check the
> > > > > SECBIT_EXEC_RESTRICT_FILE bit.
> > > >
> > > > If the library file is not allowed to be executed by *all* access
> > > > control systems (not just mount and file permission, but all LSMs), then
> > > > the AT_EXECVE_CHECK will fail, which is OK as long as it is not a hard
> > > > requirement.
> > > Yes. specifically for the library loading case, I can't think of a
> > > case where we need to by-pass LSMs.  (letting user space to by-pass
> > > LSM check seems questionable in concept, and should only be used when
> > > there aren't other solutions). In the context of SELINUX enforcing
> > > mode,  we will want to enforce it. In the context of process level LSM
> > > such as landlock,  the process can already decide for itself by
> > > selecting the policy for its own domain, it is unnecessary to use
> > > another opt-out solution.
> >
> > My answer wasn't clear.  The execveat(AT_EXECVE_CHECK) can and should
> > always be done, but user space should only enforce restrictions
> > according to the securebits.
> >
> I knew this part (AT_EXESCVE_CHECK is called always)
> Since the securebits are enforced by userspace, setting it to 0 is
> equivalent to opt-out enforcement, that is what I meant by opt-out.

OK, that was confusing because these bits are set to 0 by default (for
compatibility reasons).

> 
> > It doesn't make sense to talk about user space "bypassing" kernel
> > checks.  This patch series provides a feature to enable user space to
> > enforce (at its level) the same checks as the kernel.
> >
> > There is no opt-out solution, but compatibility configuration bits
> > through securebits (which can also be set by LSMs).
> >
> > To answer your question about the dynamic linker, there should be no
> > difference of behavior with a script interpreter.  Both should check
> > executability but only enforce restriction according to the securebits
> > (as explained in the documentation).  Doing otherwise on a generic
> > distro could lead to unexpected behaviors (e.g. if a user enforced a
> > specific SELinux policy that doesn't allow execution of library files).
> >
> > >
> > > There is one case where I see a difference:
> > > ld.so a.out (when a.out is on non-exec mount)
> > >
> > > If the dynamic linker doesn't read SECBIT_EXEC_RESTRICT_FILE setting,
> > > above will always fail. But that is more of a bugfix.
> >
> > No, the dynamic linker should only enforce restrictions according to the
> > securebits, otherwise a user space update (e.g. with a new dynamic
> > linker ignoring the securebits) could break an existing system.
> >
> OK. upgrade is a valid concern. Previously, I was just thinking about
> a new LSM based on this check, not existing LSM policies.
> Do you happen to know which SELinux policy/LSM could break ? i.e. it
> will be applied to libraries once we add AT_EXESCVE_CHECK in the
> dynamic linker.

We cannot assume anything about LSM policies because of custom and
private ones.

> We could give heads up and prepare for that.
> 
> > >
> > > >Relying on the securebits to know if this is a hard
> > > > requirement or not enables system administrator and distros to control
> > > > this potential behavior change.
> > > >
> > > I think, for the dynamic linker, it can be a hard requirement.
> >
> > Not on a generic distro.
> >
> Ok. Maybe this can be done through a configuration option for the
> dynamic linker.

Yes, we could have a built-time option (disabled by default) for the
dynamic linker to enforce that.

> 
> The consideration I have is: securebits is currently designed to
> control both dynamic linker and shell scripts.
> The case for dynamic linker is simpler than scripts cases, (non-exec
> mount, and perhaps some LSM policies for libraries) and distributions
> such as ChromeOS can enforce the dynamic linker case ahead of scripts
> interrupter cases, i.e. without waiting for python/shell being
> upgraded, that can take sometimes.

For secure systems, the end goal is to always enforce such restrictions,
so once interpretation/execution of a set of file types (e.g. ELF
libraries) are tested enough in such a system, we can remove the
securebits checks for the related library/executable (e.g. ld.so) and
consider that they are always set, independently of the current
user/credentials.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 6/6] samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
  2024-11-26 17:41       ` Mimi Zohar
@ 2024-11-27 12:10         ` Mickaël Salaün
  2024-11-27 15:15           ` Mimi Zohar
  0 siblings, 1 reply; 25+ messages in thread
From: Mickaël Salaün @ 2024-11-27 12:10 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Tue, Nov 26, 2024 at 12:41:45PM -0500, Mimi Zohar wrote:
> On Fri, 2024-11-22 at 15:50 +0100, Mickaël Salaün wrote:
> > On Thu, Nov 21, 2024 at 03:34:47PM -0500, Mimi Zohar wrote:
> > > Hi Mickaël,
> > > 
> > > On Tue, 2024-11-12 at 20:18 +0100, Mickaël Salaün wrote:
> > > > 
> > > > +
> > > > +/* Returns 1 on error, 0 otherwise. */
> > > > +static int interpret_stream(FILE *script, char *const script_name,
> > > > +			    char *const *const envp, const bool restrict_stream)
> > > > +{
> > > > +	int err;
> > > > +	char *const script_argv[] = { script_name, NULL };
> > > > +	char buf[128] = {};
> > > > +	size_t buf_size = sizeof(buf);
> > > > +
> > > > +	/*
> > > > +	 * We pass a valid argv and envp to the kernel to emulate a native
> > > > +	 * script execution.  We must use the script file descriptor instead of
> > > > +	 * the script path name to avoid race conditions.
> > > > +	 */
> > > > +	err = execveat(fileno(script), "", script_argv, envp,
> > > > +		       AT_EMPTY_PATH | AT_EXECVE_CHECK);
> > > 
> > > At least with v20, the AT_CHECK always was being set, independent of whether
> > > set-exec.c set it.  I'll re-test with v21.
> > 
> > AT_EXECVE_CEHCK should always be set, only the interpretation of the
> > result should be relative to securebits.  This is highlighted in the
> > documentation.
> 
> Sure, that sounds correct.  With an IMA-appraisal policy, any unsigned script
> with the is_check flag set now emits an "cause=IMA-signature-required" audit
> message.  However since IMA-appraisal isn't enforcing file signatures, this
> sounds wrong.
> 
> New audit messages like "IMA-signature-required-by-interpreter" and "IMA-
> signature-not-required-by-interpreter" would need to be defined based on the
> SECBIT_EXEC_RESTRICT_FILE.

It makes sense.  Could you please send a patch for these
IMA-*-interpreter changes?  I'll include it in the next series.

> 
> 
> > > 
> > > > +	if (err && restrict_stream) {
> > > > +		perror("ERROR: Script execution check");
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	/* Reads script. */
> > > > +	buf_size = fread(buf, 1, buf_size - 1, script);
> > > > +	return interpret_buffer(buf, buf_size);
> > > > +}
> > > > +
> > > 
> > > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-27 12:07                 ` Mickaël Salaün
@ 2024-11-27 15:14                   ` Jeff Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff Xu @ 2024-11-27 15:14 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Jeff Xu, Al Viro, Christian Brauner, Kees Cook, Paul Moore,
	Serge Hallyn, Adhemerval Zanella Netto, Alejandro Colomar,
	Aleksa Sarai, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Heimes, Dmitry Vyukov, Elliott Hughes,
	Eric Biggers, Eric Chiang, Fan Wu, Florian Weimer,
	Geert Uytterhoeven, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Jordan R Abrahams, Lakshmi Ramasubramanian,
	Linus Torvalds, Luca Boccassi, Luis Chamberlain,
	Madhavan T . Venkataraman, Matt Bobrowski, Matthew Garrett,
	Matthew Wilcox, Miklos Szeredi, Mimi Zohar, Nicolas Bouchinet,
	Scott Shell, Shuah Khan, Stephen Rothwell, Steve Dower,
	Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, Eric Paris, audit

On Wed, Nov 27, 2024 at 4:07 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Mon, Nov 25, 2024 at 09:38:51AM -0800, Jeff Xu wrote:
> > On Fri, Nov 22, 2024 at 6:50 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Thu, Nov 21, 2024 at 10:27:40AM -0800, Jeff Xu wrote:
> > > > On Thu, Nov 21, 2024 at 5:40 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Wed, Nov 20, 2024 at 08:06:07AM -0800, Jeff Xu wrote:
> > > > > > On Wed, Nov 20, 2024 at 1:42 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote:
> > > > > > > > On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > >
> > > > > > > > > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
> > > > > > > > > be allowed for execution.  The main use case is for script interpreters
> > > > > > > > > and dynamic linkers to check execution permission according to the
> > > > > > > > > kernel's security policy. Another use case is to add context to access
> > > > > > > > > logs e.g., which script (instead of interpreter) accessed a file.  As
> > > > > > > > > any executable code, scripts could also use this check [1].
> > > > > > > > >
> > > > > > > > > This is different from faccessat(2) + X_OK which only checks a subset of
> > > > > > > > > access rights (i.e. inode permission and mount options for regular
> > > > > > > > > files), but not the full context (e.g. all LSM access checks).  The main
> > > > > > > > > use case for access(2) is for SUID processes to (partially) check access
> > > > > > > > > on behalf of their caller.  The main use case for execveat(2) +
> > > > > > > > > AT_EXECVE_CHECK is to check if a script execution would be allowed,
> > > > > > > > > according to all the different restrictions in place.  Because the use
> > > > > > > > > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
> > > > > > > > > execution, user space gets the same error codes.
> > > > > > > > >
> > > > > > > > > An interesting point of using execveat(2) instead of openat2(2) is that
> > > > > > > > > it decouples the check from the enforcement.  Indeed, the security check
> > > > > > > > > can be logged (e.g. with audit) without blocking an execution
> > > > > > > > > environment not yet ready to enforce a strict security policy.
> > > > > > > > >
> > > > > > > > > LSMs can control or log execution requests with
> > > > > > > > > security_bprm_creds_for_exec().  However, to enforce a consistent and
> > > > > > > > > complete access control (e.g. on binary's dependencies) LSMs should
> > > > > > > > > restrict file executability, or mesure executed files, with
> > > > > > > > > security_file_open() by checking file->f_flags & __FMODE_EXEC.
> > > > > > > > >
> > > > > > > > > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
> > > > > > > > > doesn't make sense for the kernel to parse the checked files, look for
> > > > > > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > > > > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > > > > > > never called when AT_EXECVE_CHECK is used.
> > > > > > > > >
> > > > > > > > > It should be noted that script interpreters cannot directly use
> > > > > > > > > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
> > > > > > > > > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
> > > > > > > > > being executed to interpret the script.  Unlike the kernel, script
> > > > > > > > > interpreters may just interpret the shebang as a simple comment, which
> > > > > > > > > should not change for backward compatibility reasons.
> > > > > > > > >
> > > > > > > > > Because scripts or libraries files might not currently have the
> > > > > > > > > executable permission set, or because we might want specific users to be
> > > > > > > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > > > > > > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
> > > > > > > > > SECBIT_EXEC_DENY_INTERACTIVE securebits.
> > > > > > > > >
> > > > > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > > > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > > > > > > This patch has been used for more than a decade with customized script
> > > > > > > > > interpreters.  Some examples can be found here:
> > > > > > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > > > > > > >
> > > > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > > > > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > > > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > > > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > > > > > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > > > > > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > > > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > > > > > > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net
> > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > Changes since v20:
> > > > > > > > > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
> > > > > > > > >   Serge Hallyn.
> > > > > > > > > * Move the UAPI documentation to a dedicated RST file.
> > > > > > > > > * Add Reviewed-by: Serge Hallyn
> > > > > > > > >
> > > > > > > > > Changes since v19:
> > > > > > > > > * Remove mention of "role transition" as suggested by Andy.
> > > > > > > > > * Highlight the difference between security_bprm_creds_for_exec() and
> > > > > > > > >   the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
> > > > > > > > >   discussed with Jeff.
> > > > > > > > > * Improve documentation both in UAPI comments and kernel comments
> > > > > > > > >   (requested by Kees).
> > > > > > > > >
> > > > > > > > > New design since v18:
> > > > > > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > > > > > > ---
> > > > > > > > >  Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
> > > > > > > > >  Documentation/userspace-api/index.rst      |  1 +
> > > > > > > > >  fs/exec.c                                  | 20 +++++++++++--
> > > > > > > > >  include/linux/binfmts.h                    |  7 ++++-
> > > > > > > > >  include/uapi/linux/fcntl.h                 |  4 +++
> > > > > > > > >  kernel/audit.h                             |  1 +
> > > > > > > > >  kernel/auditsc.c                           |  1 +
> > > > > > > > >  security/security.c                        | 10 +++++++
> > > > > > > > >  8 files changed, 75 insertions(+), 3 deletions(-)
> > > > > > > > >  create mode 100644 Documentation/userspace-api/check_exec.rst
> > > > > > > > >
> > > > > > > > > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
> > > > > > > > > new file mode 100644
> > > > > > > > > index 000000000000..ad1aeaa5f6c0
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/Documentation/userspace-api/check_exec.rst
> > > > > > > > > @@ -0,0 +1,34 @@
> > > > > > > > > +===================
> > > > > > > > > +Executability check
> > > > > > > > > +===================
> > > > > > > > > +
> > > > > > > > > +AT_EXECVE_CHECK
> > > > > > > > > +===============
> > > > > > > > > +
> > > > > > > > > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
> > > > > > > > > +check on a regular file and returns 0 if execution of this file would be
> > > > > > > > > +allowed, ignoring the file format and then the related interpreter dependencies
> > > > > > > > > +(e.g. ELF libraries, script's shebang).
> > > > > > > > > +
> > > > > > > > > +Programs should always perform this check to apply kernel-level checks against
> > > > > > > > > +files that are not directly executed by the kernel but passed to a user space
> > > > > > > > > +interpreter instead.  All files that contain executable code, from the point of
> > > > > > > > > +view of the interpreter, should be checked.  However the result of this check
> > > > > > > > > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
> > > > > > > > > +``SECBIT_EXEC_DENY_INTERACTIVE.``.
> > > > > > > > Regarding "should only"
> > > > > > > > Userspace (e.g. libc) could decide to enforce even when
> > > > > > > > SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing
> > > > > > > > doesn't make sense.
> > > > > > >
> > > > > > > User space is always in control, but I don't think it would be wise to
> > > > > > > not follow the configuration securebits (in a generic system) because
> > > > > > > this could result to unattended behaviors (I don't have a specific one
> > > > > > > in mind but...).  That being said, configuration and checks are
> > > > > > > standalones and specific/tailored systems are free to do the checks they
> > > > > > > want.
> > > > > > >
> > > > > > In the case of dynamic linker, we can always enforce honoring the
> > > > > > execveat(AT_EXECVE_CHECK) result, right ? I can't think of a case not
> > > > > > to,  the dynamic linker doesn't need to check the
> > > > > > SECBIT_EXEC_RESTRICT_FILE bit.
> > > > >
> > > > > If the library file is not allowed to be executed by *all* access
> > > > > control systems (not just mount and file permission, but all LSMs), then
> > > > > the AT_EXECVE_CHECK will fail, which is OK as long as it is not a hard
> > > > > requirement.
> > > > Yes. specifically for the library loading case, I can't think of a
> > > > case where we need to by-pass LSMs.  (letting user space to by-pass
> > > > LSM check seems questionable in concept, and should only be used when
> > > > there aren't other solutions). In the context of SELINUX enforcing
> > > > mode,  we will want to enforce it. In the context of process level LSM
> > > > such as landlock,  the process can already decide for itself by
> > > > selecting the policy for its own domain, it is unnecessary to use
> > > > another opt-out solution.
> > >
> > > My answer wasn't clear.  The execveat(AT_EXECVE_CHECK) can and should
> > > always be done, but user space should only enforce restrictions
> > > according to the securebits.
> > >
> > I knew this part (AT_EXESCVE_CHECK is called always)
> > Since the securebits are enforced by userspace, setting it to 0 is
> > equivalent to opt-out enforcement, that is what I meant by opt-out.
>
> OK, that was confusing because these bits are set to 0 by default (for
> compatibility reasons).
>
> >
> > > It doesn't make sense to talk about user space "bypassing" kernel
> > > checks.  This patch series provides a feature to enable user space to
> > > enforce (at its level) the same checks as the kernel.
> > >
> > > There is no opt-out solution, but compatibility configuration bits
> > > through securebits (which can also be set by LSMs).
> > >
> > > To answer your question about the dynamic linker, there should be no
> > > difference of behavior with a script interpreter.  Both should check
> > > executability but only enforce restriction according to the securebits
> > > (as explained in the documentation).  Doing otherwise on a generic
> > > distro could lead to unexpected behaviors (e.g. if a user enforced a
> > > specific SELinux policy that doesn't allow execution of library files).
> > >
> > > >
> > > > There is one case where I see a difference:
> > > > ld.so a.out (when a.out is on non-exec mount)
> > > >
> > > > If the dynamic linker doesn't read SECBIT_EXEC_RESTRICT_FILE setting,
> > > > above will always fail. But that is more of a bugfix.
> > >
> > > No, the dynamic linker should only enforce restrictions according to the
> > > securebits, otherwise a user space update (e.g. with a new dynamic
> > > linker ignoring the securebits) could break an existing system.
> > >
> > OK. upgrade is a valid concern. Previously, I was just thinking about
> > a new LSM based on this check, not existing LSM policies.
> > Do you happen to know which SELinux policy/LSM could break ? i.e. it
> > will be applied to libraries once we add AT_EXESCVE_CHECK in the
> > dynamic linker.
>
> We cannot assume anything about LSM policies because of custom and
> private ones.
>
That is a good point.

> > We could give heads up and prepare for that.
> >
> > > >
> > > > >Relying on the securebits to know if this is a hard
> > > > > requirement or not enables system administrator and distros to control
> > > > > this potential behavior change.
> > > > >
> > > > I think, for the dynamic linker, it can be a hard requirement.
> > >
> > > Not on a generic distro.
> > >
> > Ok. Maybe this can be done through a configuration option for the
> > dynamic linker.
>
> Yes, we could have a built-time option (disabled by default) for the
> dynamic linker to enforce that.
>
> >
> > The consideration I have is: securebits is currently designed to
> > control both dynamic linker and shell scripts.
> > The case for dynamic linker is simpler than scripts cases, (non-exec
> > mount, and perhaps some LSM policies for libraries) and distributions
> > such as ChromeOS can enforce the dynamic linker case ahead of scripts
> > interrupter cases, i.e. without waiting for python/shell being
> > upgraded, that can take sometimes.
>
> For secure systems, the end goal is to always enforce such restrictions,
> so once interpretation/execution of a set of file types (e.g. ELF
> libraries) are tested enough in such a system, we can remove the
> securebits checks for the related library/executable (e.g. ld.so) and
> consider that they are always set, independently of the current
> user/credentials.
Great! I agree with that.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 6/6] samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
  2024-11-27 12:10         ` Mickaël Salaün
@ 2024-11-27 15:15           ` Mimi Zohar
  0 siblings, 0 replies; 25+ messages in thread
From: Mimi Zohar @ 2024-11-27 15:15 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Al Viro, Christian Brauner, Kees Cook, Paul Moore, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Nicolas Bouchinet, Scott Shell, Shuah Khan, Stephen Rothwell,
	Steve Dower, Steve Grubb, Theodore Ts'o, Thibaut Sautereau,
	Vincent Strubel, Xiaoming Ni, Yin Fengwei, kernel-hardening,
	linux-api, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module

On Wed, 2024-11-27 at 13:10 +0100, Mickaël Salaün wrote:
> On Tue, Nov 26, 2024 at 12:41:45PM -0500, Mimi Zohar wrote:
> > On Fri, 2024-11-22 at 15:50 +0100, Mickaël Salaün wrote:
> > > On Thu, Nov 21, 2024 at 03:34:47PM -0500, Mimi Zohar wrote:
> > > > Hi Mickaël,
> > > > 
> > > > On Tue, 2024-11-12 at 20:18 +0100, Mickaël Salaün wrote:
> > > > > 
> > > > > +
> > > > > +/* Returns 1 on error, 0 otherwise. */
> > > > > +static int interpret_stream(FILE *script, char *const script_name,
> > > > > +			    char *const *const envp, const bool restrict_stream)
> > > > > +{
> > > > > +	int err;
> > > > > +	char *const script_argv[] = { script_name, NULL };
> > > > > +	char buf[128] = {};
> > > > > +	size_t buf_size = sizeof(buf);
> > > > > +
> > > > > +	/*
> > > > > +	 * We pass a valid argv and envp to the kernel to emulate a native
> > > > > +	 * script execution.  We must use the script file descriptor instead of
> > > > > +	 * the script path name to avoid race conditions.
> > > > > +	 */
> > > > > +	err = execveat(fileno(script), "", script_argv, envp,
> > > > > +		       AT_EMPTY_PATH | AT_EXECVE_CHECK);
> > > > 
> > > > At least with v20, the AT_CHECK always was being set, independent of whether
> > > > set-exec.c set it.  I'll re-test with v21.
> > > 
> > > AT_EXECVE_CEHCK should always be set, only the interpretation of the
> > > result should be relative to securebits.  This is highlighted in the
> > > documentation.
> > 
> > Sure, that sounds correct.  With an IMA-appraisal policy, any unsigned script
> > with the is_check flag set now emits an "cause=IMA-signature-required" audit
> > message.  However since IMA-appraisal isn't enforcing file signatures, this
> > sounds wrong.
> > 
> > New audit messages like "IMA-signature-required-by-interpreter" and "IMA-
> > signature-not-required-by-interpreter" would need to be defined based on the
> > SECBIT_EXEC_RESTRICT_FILE.
> 
> It makes sense.  Could you please send a patch for these
> IMA-*-interpreter changes?  I'll include it in the next series.

Sent as an RFC.  The audit message is only updated for the missing signature
case.  However, all of the audit messages in ima_appraise_measurement() should
be updated.  The current method doesn't scale.

Mimi

> > 
> > 
> > > > 
> > > > > +	if (err && restrict_stream) {
> > > > > +		perror("ERROR: Script execution check");
> > > > > +		return 1;
> > > > > +	}
> > > > > +
> > > > > +	/* Reads script. */
> > > > > +	buf_size = fread(buf, 1, buf_size - 1, script);
> > > > > +	return interpret_buffer(buf, buf_size);
> > > > > +}
> > > > > +
> > > > 
> > > > 
> > > 
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
  2024-11-21 13:39         ` Mickaël Salaün
  2024-11-21 18:27           ` Jeff Xu
@ 2024-12-05  3:33           ` Paul Moore
  1 sibling, 0 replies; 25+ messages in thread
From: Paul Moore @ 2024-12-05  3:33 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Jeff Xu, Al Viro, Christian Brauner, Kees Cook, Serge Hallyn,
	Adhemerval Zanella Netto, Alejandro Colomar, Aleksa Sarai,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Heimes, Dmitry Vyukov, Elliott Hughes, Eric Biggers,
	Eric Chiang, Fan Wu, Florian Weimer, Geert Uytterhoeven,
	James Morris, Jan Kara, Jann Horn, Jeff Xu, Jonathan Corbet,
	Jordan R Abrahams, Lakshmi Ramasubramanian, Linus Torvalds,
	Luca Boccassi, Luis Chamberlain, Madhavan T . Venkataraman,
	Matt Bobrowski, Matthew Garrett, Matthew Wilcox, Miklos Szeredi,
	Mimi Zohar, Nicolas Bouchinet, Scott Shell, Shuah Khan,
	Stephen Rothwell, Steve Dower, Steve Grubb, Theodore Ts'o,
	Thibaut Sautereau, Vincent Strubel, Xiaoming Ni, Yin Fengwei,
	kernel-hardening, linux-api, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module, Eric Paris, audit

On Thu, Nov 21, 2024 at 8:40 AM Mickaël Salaün <mic@digikod.net> wrote:
> On Wed, Nov 20, 2024 at 08:06:07AM -0800, Jeff Xu wrote:
> > On Wed, Nov 20, 2024 at 1:42 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote:
> > > > On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote:

...

> > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > > index cd57053b4a69..8d9ba5600cf2 100644
> > > > > --- a/kernel/auditsc.c
> > > > > +++ b/kernel/auditsc.c
> > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > > >
> > > > >         context->type = AUDIT_EXECVE;
> > > > >         context->execve.argc = bprm->argc;
> > > > > +       context->execve.is_check = bprm->is_check;
> > > >
> > > > Where is execve.is_check used ?
> > >
> > > It is used in bprm_execve(), exposed to the audit framework, and
> > > potentially used by LSMs.
> > >
> > bprm_execve() uses bprm->is_check, not  the context->execve.is_check.
>
> Correct, this is only for audit but not used yet.
>
> Paul, Eric, do you want me to remove this field, leave it, or extend
> this patch like this?
>
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 8d9ba5600cf2..12cf89fa224a 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -1290,6 +1290,8 @@ static void audit_log_execve_info(struct audit_context *context,
>                 }
>         } while (arg < context->execve.argc);
>
> +       audit_log_format(*ab, " check=%d", context->execve.is_check);
> +
>         /* NOTE: the caller handles the final audit_log_end() call */
>
>  out:

I would prefer to drop the audit changes rather than add a new field
to the audit record at this point in time.  Once we have a better
understanding of how things are actually being deployed by distros,
providers, and admins we can make a more reasoned decision on what we
should audit with respect to AT_EXECVE_CHECK.

Beyond that it looks okay to me from a LSM and audit perspective, so
feel free to add my ACK once you've dropped the audit bits.

Acked-by: Paul Moore <paul@paul-moore.com>

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-12-05  3:34 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-12 19:18 [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Mickaël Salaün
2024-11-12 19:18 ` [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2) Mickaël Salaün
2024-11-20  1:17   ` Jeff Xu
2024-11-20  9:42     ` Mickaël Salaün
2024-11-20 16:06       ` Jeff Xu
2024-11-21 13:39         ` Mickaël Salaün
2024-11-21 18:27           ` Jeff Xu
2024-11-22 14:50             ` Mickaël Salaün
2024-11-25 17:38               ` Jeff Xu
2024-11-27 12:07                 ` Mickaël Salaün
2024-11-27 15:14                   ` Jeff Xu
2024-12-05  3:33           ` Paul Moore
2024-11-12 19:18 ` [PATCH v21 2/6] security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits Mickaël Salaün
2024-11-20  1:30   ` Jeff Xu
2024-11-20  9:42     ` Mickaël Salaün
2024-11-12 19:18 ` [PATCH v21 3/6] selftests/exec: Add 32 tests for AT_EXECVE_CHECK and exec securebits Mickaël Salaün
2024-11-12 19:18 ` [PATCH v21 4/6] selftests/landlock: Add tests for execveat + AT_EXECVE_CHECK Mickaël Salaün
2024-11-12 19:18 ` [PATCH v21 5/6] samples/check-exec: Add set-exec Mickaël Salaün
2024-11-12 19:18 ` [PATCH v21 6/6] samples/check-exec: Add an enlighten "inc" interpreter and 28 tests Mickaël Salaün
2024-11-21 20:34   ` Mimi Zohar
2024-11-22 14:50     ` Mickaël Salaün
2024-11-26 17:41       ` Mimi Zohar
2024-11-27 12:10         ` Mickaël Salaün
2024-11-27 15:15           ` Mimi Zohar
2024-11-21  4:58 ` [PATCH v21 0/6] Script execution control (was O_MAYEXEC) Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).