Linux Security Modules development
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Add support for O_MAYEXEC
From: Mickaël Salaün @ 2020-07-14 18:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
	Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel

Hi,

This sixth patch series mainly adds Kees Cook's file permission check
relocations which help to simplify and generalize the previous series.
I removed the MAY_EXECMOUNT flag patch which is not useful anymore with
this refactoring.  I also removed the static enforcement configuration
through Kconfig to make this series simpler and because it works in pair
with mount configurations (i.e. requires the same capability:
CAP_SYS_ADMIN).

As requested by Mimi Zohar, I completed the series with one of her
patches for IMA.  I also picked Kees Cook's patches to consolidate exec
permission checking into do_filp_open()'s flow:
https://lore.kernel.org/lkml/20200605160013.3954297-1-keescook@chromium.org/


# Goal of O_MAYEXEC

The goal of this patch series is to enable to control script execution
with interpreters help.  A new O_MAYEXEC flag, usable through
openat2(2), is added to enable userspace script interpreters to delegate
to the kernel (and thus the system security policy) the permission to
interpret/execute scripts or other files containing what can be seen as
commands.

A simple system-wide security policy can be enforced by the system
administrator through a sysctl configuration consistent with the mount
points or the file access rights.  The documentation patch explains the
prerequisites.

Furthermore, the security policy can also be delegated to an LSM, either
a MAC system or an integrity system.  For instance, the new kernel
MAY_OPENEXEC flag closes a major IMA measurement/appraisal interpreter
integrity gap by bringing the ability to check the use of scripts [1].
Other uses are expected, such as for magic-links [2], SGX integration
[3], bpffs [4] or IPE [5].


# Prerequisite of its use

Userspace needs to adapt to take advantage of this new feature.  For
example, the PEP 578 [6] (Runtime Audit Hooks) enables Python 3.8 to be
extended with policy enforcement points related to code interpretation,
which can be used to align with the PowerShell audit features.
Additional Python security improvements (e.g. a limited interpreter
withou -c, stdin piping of code) are on their way [7].


# Examples

The initial idea comes from CLIP OS 4 and the original implementation
has been used for more than 12 years:
https://github.com/clipos-archive/clipos4_doc
Chrome OS has a similar approach:
https://chromium.googlesource.com/chromiumos/docs/+/master/security/noexec_shell_scripts.md

Userland patches can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
Actually, there is more than the O_MAYEXEC changes (which matches this search)
e.g., to prevent Python interactive execution. There are patches for
Bash, Wine, Java (Icedtea), Busybox's ash, Perl and Python. There are
also some related patches which do not directly rely on O_MAYEXEC but
which restrict the use of browser plugins and extensions, which may be
seen as scripts too:
https://github.com/clipos-archive/clipos4_portage-overlay/tree/master/www-client

An introduction to O_MAYEXEC was given at the Linux Security Summit
Europe 2018 - Linux Kernel Security Contributions by ANSSI:
https://www.youtube.com/watch?v=chNjCRtPKQY&t=17m15s
The "write xor execute" principle was explained at Kernel Recipes 2018 -
CLIP OS: a defense-in-depth OS:
https://www.youtube.com/watch?v=PjRE0uBtkHU&t=11m14s
See also an overview article: https://lwn.net/Articles/820000/


This patch series can be applied on top of v5.8-rc5 .  This can be tested
with CONFIG_SYSCTL.  I would really appreciate constructive comments on
this patch series.

Previous version:
https://lore.kernel.org/lkml/20200505153156.925111-1-mic@digikod.net/


[1] https://lore.kernel.org/lkml/1544647356.4028.105.camel@linux.ibm.com/
[2] https://lore.kernel.org/lkml/20190904201933.10736-6-cyphar@cyphar.com/
[3] https://lore.kernel.org/lkml/CALCETrVovr8XNZSroey7pHF46O=kj_c5D9K8h=z2T_cNrpvMig@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CALCETrVeZ0eufFXwfhtaG_j+AdvbzEWE0M3wjXMWVEO7pj+xkw@mail.gmail.com/
[5] https://lore.kernel.org/lkml/20200406221439.1469862-12-deven.desai@linux.microsoft.com/
[6] https://www.python.org/dev/peps/pep-0578/
[7] https://lore.kernel.org/lkml/0c70debd-e79e-d514-06c6-4cd1e021fa8b@python.org/

Regards,

Kees Cook (3):
  exec: Change uselib(2) IS_SREG() failure to EACCES
  exec: Move S_ISREG() check earlier
  exec: Move path_noexec() check earlier

Mickaël Salaün (3):
  fs: Introduce O_MAYEXEC flag for openat2(2)
  fs,doc: Enable to enforce noexec mounts or file exec through O_MAYEXEC
  selftest/openat2: Add tests for O_MAYEXEC enforcing

Mimi Zohar (1):
  ima: add policy support for the new file open MAY_OPENEXEC flag

 Documentation/ABI/testing/ima_policy          |   2 +-
 Documentation/admin-guide/sysctl/fs.rst       |  45 +++
 fs/exec.c                                     |  23 +-
 fs/fcntl.c                                    |   2 +-
 fs/namei.c                                    |  31 ++-
 fs/open.c                                     |  14 +-
 include/linux/fcntl.h                         |   2 +-
 include/linux/fs.h                            |   3 +
 include/uapi/asm-generic/fcntl.h              |   7 +
 kernel/sysctl.c                               |  12 +-
 security/integrity/ima/ima_main.c             |   3 +-
 security/integrity/ima/ima_policy.c           |  15 +-
 tools/testing/selftests/kselftest_harness.h   |   3 +
 tools/testing/selftests/openat2/Makefile      |   3 +-
 tools/testing/selftests/openat2/config        |   1 +
 tools/testing/selftests/openat2/helpers.h     |   1 +
 .../testing/selftests/openat2/omayexec_test.c | 262 ++++++++++++++++++
 17 files changed, 400 insertions(+), 29 deletions(-)
 create mode 100644 tools/testing/selftests/openat2/config
 create mode 100644 tools/testing/selftests/openat2/omayexec_test.c

-- 
2.27.0


^ permalink raw reply

* [PATCH v6 3/7] exec: Move path_noexec() check earlier
From: Mickaël Salaün @ 2020-07-14 18:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
	Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <20200714181638.45751-1-mic@digikod.net>

From: Kees Cook <keescook@chromium.org>

The path_noexec() check, like the regular file check, was happening too
late, letting LSMs see impossible execve()s. Check it earlier as well
in may_open() and collect the redundant fs/exec.c path_noexec() test
under the same robustness comment as the S_ISREG() check.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
                    /* new location of MAY_EXEC vs path_noexec() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        security_file_open(f)
                        open()
    /* old location of path_noexec() test */

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20200605160013.3954297-4-keescook@chromium.org
---
 fs/exec.c  | 12 ++++--------
 fs/namei.c |  4 ++++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index bdc6a6eb5dce..4eea20c27b01 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -147,10 +147,8 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
 	 * and check again at the very end too.
 	 */
 	error = -EACCES;
-	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode)))
-		goto exit;
-
-	if (path_noexec(&file->f_path))
+	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) ||
+			 path_noexec(&file->f_path)))
 		goto exit;
 
 	fsnotify_open(file);
@@ -897,10 +895,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
 	 * and check again at the very end too.
 	 */
 	err = -EACCES;
-	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode)))
-		goto exit;
-
-	if (path_noexec(&file->f_path))
+	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) ||
+			 path_noexec(&file->f_path)))
 		goto exit;
 
 	err = deny_write_access(file);
diff --git a/fs/namei.c b/fs/namei.c
index a559ad943970..ddc9b25540fe 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2863,6 +2863,10 @@ static int may_open(const struct path *path, int acc_mode, int flag)
 			return -EACCES;
 		flag &= ~O_TRUNC;
 		break;
+	case S_IFREG:
+		if ((acc_mode & MAY_EXEC) && path_noexec(path))
+			return -EACCES;
+		break;
 	}
 
 	error = inode_permission(inode, MAY_OPEN | acc_mode);
-- 
2.27.0


^ permalink raw reply related

* [PATCH v6 2/7] exec: Move S_ISREG() check earlier
From: Mickaël Salaün @ 2020-07-14 18:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
	Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <20200714181638.45751-1-mic@digikod.net>

From: Kees Cook <keescook@chromium.org>

The execve(2)/uselib(2) syscalls have always rejected non-regular
files. Recently, it was noticed that a deadlock was introduced when trying
to execute pipes, as the S_ISREG() test was happening too late. This was
fixed in commit 73601ea5b7b1 ("fs/open.c: allow opening only regular files
during execve()"), but it was added after inode_permission() had already
run, which meant LSMs could see bogus attempts to execute non-regular
files.

Move the test into the other inode type checks (which already look
for other pathological conditions[1]). Since there is no need to use
FMODE_EXEC while we still have access to "acc_mode", also switch the
test to MAY_EXEC.

Also include a comment with the redundant S_ISREG() checks at the end of
execve(2)/uselib(2) to note that they are present to avoid any mistakes.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
		    /* new location of MAY_EXEC vs S_ISREG() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        /* old location of FMODE_EXEC vs S_ISREG() test */
                        security_file_open(f)
                        open()

[1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20200605160013.3954297-3-keescook@chromium.org
---
 fs/exec.c  | 14 ++++++++++++--
 fs/namei.c |  6 ++++--
 fs/open.c  |  6 ------
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index d7c937044d10..bdc6a6eb5dce 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -141,8 +141,13 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
 	if (IS_ERR(file))
 		goto out;
 
+	/*
+	 * may_open() has already checked for this, so it should be
+	 * impossible to trip now. But we need to be extra cautious
+	 * and check again at the very end too.
+	 */
 	error = -EACCES;
-	if (!S_ISREG(file_inode(file)->i_mode))
+	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode)))
 		goto exit;
 
 	if (path_noexec(&file->f_path))
@@ -886,8 +891,13 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
 	if (IS_ERR(file))
 		goto out;
 
+	/*
+	 * may_open() has already checked for this, so it should be
+	 * impossible to trip now. But we need to be extra cautious
+	 * and check again at the very end too.
+	 */
 	err = -EACCES;
-	if (!S_ISREG(file_inode(file)->i_mode))
+	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode)))
 		goto exit;
 
 	if (path_noexec(&file->f_path))
diff --git a/fs/namei.c b/fs/namei.c
index 72d4219c93ac..a559ad943970 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2849,16 +2849,18 @@ static int may_open(const struct path *path, int acc_mode, int flag)
 	case S_IFLNK:
 		return -ELOOP;
 	case S_IFDIR:
-		if (acc_mode & MAY_WRITE)
+		if (acc_mode & (MAY_WRITE | MAY_EXEC))
 			return -EISDIR;
 		break;
 	case S_IFBLK:
 	case S_IFCHR:
 		if (!may_open_dev(path))
 			return -EACCES;
-		/*FALLTHRU*/
+		fallthrough;
 	case S_IFIFO:
 	case S_IFSOCK:
+		if (acc_mode & MAY_EXEC)
+			return -EACCES;
 		flag &= ~O_TRUNC;
 		break;
 	}
diff --git a/fs/open.c b/fs/open.c
index 6cd48a61cda3..623b7506a6db 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -784,12 +784,6 @@ static int do_dentry_open(struct file *f,
 		return 0;
 	}
 
-	/* Any file opened for execve()/uselib() has to be a regular file. */
-	if (unlikely(f->f_flags & FMODE_EXEC && !S_ISREG(inode->i_mode))) {
-		error = -EACCES;
-		goto cleanup_file;
-	}
-
 	if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) {
 		error = get_write_access(inode);
 		if (unlikely(error))
-- 
2.27.0


^ permalink raw reply related

* [PATCH v6 5/7] fs,doc: Enable to enforce noexec mounts or file exec through O_MAYEXEC
From: Mickaël Salaün @ 2020-07-14 18:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
	Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <20200714181638.45751-1-mic@digikod.net>

Allow for the enforcement of the O_MAYEXEC openat2(2) flag.  Thanks to
the noexec option from the underlying VFS mount, or to the file execute
permission, userspace can enforce these execution policies.  This may
allow script interpreters to check execution permission before reading
commands from a file, or dynamic linkers to allow shared object loading.

Add a new sysctl fs.open_mayexec_enforce to enable system administrators
to enforce two complementary security policies according to the
installed system: enforce the noexec mount option, and enforce
executable file permission.  Indeed, because of compatibility with
installed systems, only system administrators are able to check that
this new enforcement is in line with the system mount points and file
permissions.  A following patch adds documentation.

Being able to restrict execution also enables to protect the kernel by
restricting arbitrary syscalls that an attacker could perform with a
crafted binary or certain script languages.  It also improves multilevel
isolation by reducing the ability of an attacker to use side channels
with specific code.  These restrictions can natively be enforced for ELF
binaries (with the noexec mount option) but require this kernel
extension to properly handle scripts (e.g., Python, Perl).  To get a
consistent execution policy, additional memory restrictions should also
be enforced (e.g. thanks to SELinux).

Because the O_MAYEXEC flag is a meant to enforce a system-wide security
policy (but not application-centric policies), it does not make sense
for userland to check the sysctl value.  Indeed, this new flag only
enables to extend the system ability to enforce a policy thanks to (some
trusted) userland collaboration.  Moreover, additional security policies
could be managed by LSMs.  This is a best-effort approach from the
application developer point of view:
https://lore.kernel.org/lkml/1477d3d7-4b36-afad-7077-a38f42322238@digikod.net/

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
---

Changes since v5:
* Remove the static enforcement configuration through Kconfig because it
  makes the code more simple like this, and because the current sysctl
  configuration can only be set with CAP_SYS_ADMIN, the same way mount
  options (i.e. noexec) can be set.  If an harden distro wants to
  enforce a configuration, it should restrict capabilities or sysctl
  configuration.  Furthermore, an LSM can easily leverage O_MAYEXEC to
  fit its need.
* Move checks from inode_permission() to may_open() and make the error
  codes more consistent according to file types (in line with a previous
  commit): opening a directory with O_MAYEXEC returns EISDIR and other
  non-regular file types may return EACCES.
* In may_open(), when OMAYEXEC_ENFORCE_FILE is set, replace explicit
  call to generic_permission() with an artificial MAY_EXEC to avoid
  double calls.  This makes sense especially when an LSM policy forbids
  execution of a file.
* Replace the custom proc_omayexec() with
  proc_dointvec_minmax_sysadmin(), and then replace the CAP_MAC_ADMIN
  check with a CAP_SYS_ADMIN one (suggested by Kees Cook and Stephen
  Smalley).
* Use BIT() (suggested by Kees Cook).
* Rename variables (suggested by Kees Cook).
* Reword the kconfig help.
* Import the documentation patch (suggested by Kees Cook):
  https://lore.kernel.org/lkml/20200505153156.925111-6-mic@digikod.net/
* Update documentation and add LWN.net article.

Changes since v4:
* Add kernel configuration options to enforce O_MAYEXEC at build time,
  and disable the sysctl in such case (requested by James Morris).
* Reword commit message.

Changes since v3:
* Update comment with O_MAYEXEC.

Changes since v2:
* Cosmetic changes.

Changes since v1:
* Move code from Yama to the FS subsystem (suggested by Kees Cook).
* Make omayexec_inode_permission() static (suggested by Jann Horn).
* Use mode 0600 for the sysctl.
* Only match regular files (not directories nor other types), which
  follows the same semantic as commit 73601ea5b7b1 ("fs/open.c: allow
  opening only regular files during execve()").
---
 Documentation/admin-guide/sysctl/fs.rst | 45 +++++++++++++++++++++++++
 fs/namei.c                              | 29 +++++++++++++---
 include/linux/fs.h                      |  1 +
 kernel/sysctl.c                         | 12 +++++--
 4 files changed, 80 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 2a45119e3331..02ec384b8bbf 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs:
 - inode-nr
 - inode-state
 - nr_open
+- open_mayexec_enforce
 - overflowuid
 - overflowgid
 - pipe-user-pages-hard
@@ -165,6 +166,50 @@ system needs to prune the inode list instead of allocating
 more.
 
 
+open_mayexec_enforce
+--------------------
+
+While being ignored by :manpage:`open(2)` and :manpage:`openat(2)`, the
+``O_MAYEXEC`` flag can be passed to :manpage:`openat2(2)` to only open regular
+files that are expected to be executable.  If the file is not identified as
+executable, then the syscall returns -EACCES.  This may allow a script
+interpreter to check executable permission before reading commands from a file,
+or a dynamic linker to only load executable shared objects.  One interesting
+use case is to enforce a "write xor execute" policy through interpreters.
+
+The ability to restrict code execution must be thought as a system-wide policy,
+which first starts by restricting mount points with the ``noexec`` option.
+This option is also automatically applied to special filesystems such as /proc
+.  This prevents files on such mount points to be directly executed by the
+kernel or mapped as executable memory (e.g. libraries).  With script
+interpreters using the ``O_MAYEXEC`` flag, the executable permission can then
+be checked before reading commands from files. This makes it possible to
+enforce the ``noexec`` at the interpreter level, and thus propagates this
+security policy to scripts.  To be fully effective, these interpreters also
+need to handle the other ways to execute code: command line parameters (e.g.,
+option ``-e`` for Perl), module loading (e.g., option ``-m`` for Python),
+stdin, file sourcing, environment variables, configuration files, etc.
+According to the threat model, it may be acceptable to allow some script
+interpreters (e.g. Bash) to interpret commands from stdin, may it be a TTY or a
+pipe, because it may not be enough to (directly) perform syscalls.
+
+There are two complementary security policies: enforce the ``noexec`` mount
+option, and enforce executable file permission.  These policies are handled by
+the ``fs.open_mayexec_enforce`` sysctl (writable only with ``CAP_SYS_ADMIN``)
+as a bitmask:
+
+1 - Mount restriction: checks that the mount options for the underlying VFS
+    mount do not prevent execution.
+
+2 - File permission restriction: checks that the to-be-opened file is marked as
+    executable for the current process (e.g., POSIX permissions).
+
+Code samples can be found in tools/testing/selftests/openat2/omayexec_test.c
+and interpreter patches (for the original O_MAYEXEC version) may be found at
+https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC .
+See also an overview article: https://lwn.net/Articles/820000/ .
+
+
 overflowgid & overflowuid
 -------------------------
 
diff --git a/fs/namei.c b/fs/namei.c
index ddc9b25540fe..9a9166e5ddd3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -39,6 +39,7 @@
 #include <linux/bitops.h>
 #include <linux/init_task.h>
 #include <linux/uaccess.h>
+#include <linux/sysctl.h>
 
 #include "internal.h"
 #include "mount.h"
@@ -425,10 +426,15 @@ static int sb_permission(struct super_block *sb, struct inode *inode, int mask)
 	return 0;
 }
 
+#define OPEN_MAYEXEC_ENFORCE_MOUNT	BIT(0)
+#define OPEN_MAYEXEC_ENFORCE_FILE	BIT(1)
+
+int sysctl_open_mayexec_enforce __read_mostly;
+
 /**
  * inode_permission - Check for access rights to a given inode
  * @inode: Inode to check permission on
- * @mask: Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ * @mask: Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC, %MAY_OPENEXEC)
  *
  * Check for read/write/execute permissions on an inode.  We use fs[ug]id for
  * this, letting us set arbitrary permissions for filesystem access without
@@ -2849,7 +2855,7 @@ static int may_open(const struct path *path, int acc_mode, int flag)
 	case S_IFLNK:
 		return -ELOOP;
 	case S_IFDIR:
-		if (acc_mode & (MAY_WRITE | MAY_EXEC))
+		if (acc_mode & (MAY_WRITE | MAY_EXEC | MAY_OPENEXEC))
 			return -EISDIR;
 		break;
 	case S_IFBLK:
@@ -2859,13 +2865,26 @@ static int may_open(const struct path *path, int acc_mode, int flag)
 		fallthrough;
 	case S_IFIFO:
 	case S_IFSOCK:
-		if (acc_mode & MAY_EXEC)
+		if (acc_mode & (MAY_EXEC | MAY_OPENEXEC))
 			return -EACCES;
 		flag &= ~O_TRUNC;
 		break;
 	case S_IFREG:
-		if ((acc_mode & MAY_EXEC) && path_noexec(path))
-			return -EACCES;
+		if (path_noexec(path)) {
+			if (acc_mode & MAY_EXEC)
+				return -EACCES;
+			if ((acc_mode & MAY_OPENEXEC) &&
+					(sysctl_open_mayexec_enforce & OPEN_MAYEXEC_ENFORCE_MOUNT))
+				return -EACCES;
+		}
+		if ((acc_mode & MAY_OPENEXEC) &&
+				(sysctl_open_mayexec_enforce & OPEN_MAYEXEC_ENFORCE_FILE))
+			/*
+			 * Because acc_mode may change here, the next and only
+			 * use of acc_mode should then be by the following call
+			 * to inode_permission().
+			 */
+			acc_mode |= MAY_EXEC;
 		break;
 	}
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 56f835c9a87a..071f37707ccc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -83,6 +83,7 @@ extern int sysctl_protected_symlinks;
 extern int sysctl_protected_hardlinks;
 extern int sysctl_protected_fifos;
 extern int sysctl_protected_regular;
+extern int sysctl_open_mayexec_enforce;
 
 typedef __kernel_rwf_t rwf_t;
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index db1ce7af2563..5008a2566e79 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -113,6 +113,7 @@ static int sixty = 60;
 
 static int __maybe_unused neg_one = -1;
 static int __maybe_unused two = 2;
+static int __maybe_unused three = 3;
 static int __maybe_unused four = 4;
 static unsigned long zero_ul;
 static unsigned long one_ul = 1;
@@ -888,7 +889,6 @@ static int proc_taint(struct ctl_table *table, int write,
 	return err;
 }
 
-#ifdef CONFIG_PRINTK
 static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
 				void *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -897,7 +897,6 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
 
 	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 }
-#endif
 
 /**
  * struct do_proc_dointvec_minmax_conv_param - proc_dointvec_minmax() range checking structure
@@ -3264,6 +3263,15 @@ static struct ctl_table fs_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= &two,
 	},
+	{
+		.procname       = "open_mayexec_enforce",
+		.data           = &sysctl_open_mayexec_enforce,
+		.maxlen         = sizeof(int),
+		.mode           = 0600,
+		.proc_handler	= proc_dointvec_minmax_sysadmin,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= &three,
+	},
 #if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
 	{
 		.procname	= "binfmt_misc",
-- 
2.27.0


^ permalink raw reply related

* [PATCH v6 6/7] selftest/openat2: Add tests for O_MAYEXEC enforcing
From: Mickaël Salaün @ 2020-07-14 18:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
	Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <20200714181638.45751-1-mic@digikod.net>

Test propagation of noexec mount points or file executability through
files open with or without O_MAYEXEC, thanks to the
fs.open_mayexec_enforce sysctl.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
---

Changes since v5:
* Refactor with FIXTURE_VARIANT, which make the tests much more easy to
  read and maintain.
* Save and restore initial sysctl value (suggested by Kees Cook).
* Test with a sysctl value of 0.
* Check errno in sysctl_access_write test.
* Update tests for the CAP_SYS_ADMIN switch.
* Update tests to check -EISDIR (replacing -EACCES).
* Replace FIXTURE_DATA() with FIXTURE() (spotted by Kees Cook).
* Use global const strings.

Changes since v3:
* Replace RESOLVE_MAYEXEC with O_MAYEXEC.
* Add tests to check that O_MAYEXEC is ignored by open(2) and openat(2).

Changes since v2:
* Move tests from exec/ to openat2/ .
* Replace O_MAYEXEC with RESOLVE_MAYEXEC from openat2(2).
* Cleanup tests.

Changes since v1:
* Move tests from yama/ to exec/ .
* Fix _GNU_SOURCE in kselftest_harness.h .
* Add a new test sysctl_access_write to check if CAP_MAC_ADMIN is taken
  into account.
* Test directory execution which is always forbidden since commit
  73601ea5b7b1 ("fs/open.c: allow opening only regular files during
  execve()"), and also check that even the root user can not bypass file
  execution checks.
* Make sure delete_workspace() always as enough right to succeed.
* Cosmetic cleanup.
---
 tools/testing/selftests/kselftest_harness.h   |   3 +
 tools/testing/selftests/openat2/Makefile      |   3 +-
 tools/testing/selftests/openat2/config        |   1 +
 tools/testing/selftests/openat2/helpers.h     |   1 +
 .../testing/selftests/openat2/omayexec_test.c | 262 ++++++++++++++++++
 5 files changed, 269 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/openat2/config
 create mode 100644 tools/testing/selftests/openat2/omayexec_test.c

diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index c9f03ef93338..68a0acd9ea1e 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -50,7 +50,10 @@
 #ifndef __KSELFTEST_HARNESS_H
 #define __KSELFTEST_HARNESS_H
 
+#ifndef _GNU_SOURCE
 #define _GNU_SOURCE
+#endif
+
 #include <asm/types.h>
 #include <errno.h>
 #include <stdbool.h>
diff --git a/tools/testing/selftests/openat2/Makefile b/tools/testing/selftests/openat2/Makefile
index 4b93b1417b86..cb98bdb4d5b1 100644
--- a/tools/testing/selftests/openat2/Makefile
+++ b/tools/testing/selftests/openat2/Makefile
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-or-later
 
 CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined
-TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test
+LDLIBS += -lcap
+TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test omayexec_test
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/openat2/config b/tools/testing/selftests/openat2/config
new file mode 100644
index 000000000000..dd53c266bf52
--- /dev/null
+++ b/tools/testing/selftests/openat2/config
@@ -0,0 +1 @@
+CONFIG_SYSCTL=y
diff --git a/tools/testing/selftests/openat2/helpers.h b/tools/testing/selftests/openat2/helpers.h
index a6ea27344db2..1dcd3e1e2f38 100644
--- a/tools/testing/selftests/openat2/helpers.h
+++ b/tools/testing/selftests/openat2/helpers.h
@@ -9,6 +9,7 @@
 
 #define _GNU_SOURCE
 #include <stdint.h>
+#include <stdbool.h>
 #include <errno.h>
 #include <linux/types.h>
 #include "../kselftest.h"
diff --git a/tools/testing/selftests/openat2/omayexec_test.c b/tools/testing/selftests/openat2/omayexec_test.c
new file mode 100644
index 000000000000..a33f31e59045
--- /dev/null
+++ b/tools/testing/selftests/openat2/omayexec_test.c
@@ -0,0 +1,262 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test O_MAYEXEC
+ *
+ * Copyright © 2018-2020 ANSSI
+ *
+ * Author: Mickaël Salaün <mic@digikod.net>
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/capability.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "helpers.h"
+#include "../kselftest_harness.h"
+
+#ifndef O_MAYEXEC
+#define O_MAYEXEC		040000000
+#endif
+
+static const char sysctl_path[] = "/proc/sys/fs/open_mayexec_enforce";
+
+static const char workdir_path[] = "./test-mount";
+static const char file_path[] = "./test-mount/file";
+static const char dir_path[] = "./test-mount/directory";
+
+static void ignore_dac(struct __test_metadata *_metadata, int override)
+{
+	cap_t caps;
+	const cap_value_t cap_val[2] = {
+		CAP_DAC_OVERRIDE,
+		CAP_DAC_READ_SEARCH,
+	};
+
+	caps = cap_get_proc();
+	ASSERT_NE(NULL, caps);
+	ASSERT_EQ(0, cap_set_flag(caps, CAP_EFFECTIVE, 2, cap_val,
+				override ? CAP_SET : CAP_CLEAR));
+	ASSERT_EQ(0, cap_set_proc(caps));
+	EXPECT_EQ(0, cap_free(caps));
+}
+
+static void ignore_sys_admin(struct __test_metadata *_metadata, int override)
+{
+	cap_t caps;
+	const cap_value_t cap_val[1] = {
+		CAP_SYS_ADMIN,
+	};
+
+	caps = cap_get_proc();
+	ASSERT_NE(NULL, caps);
+	ASSERT_EQ(0, cap_set_flag(caps, CAP_EFFECTIVE, 1, cap_val,
+				override ? CAP_SET : CAP_CLEAR));
+	ASSERT_EQ(0, cap_set_proc(caps));
+	EXPECT_EQ(0, cap_free(caps));
+}
+
+static void test_omx(struct __test_metadata *_metadata,
+		const char *const path, const int err_code)
+{
+	struct open_how how = {
+		.flags = O_RDONLY | O_CLOEXEC,
+	};
+	int fd;
+
+	/* Opens without O_MAYEXEC. */
+	fd = sys_openat2(AT_FDCWD, path, &how);
+	ASSERT_LE(0, fd);
+	EXPECT_EQ(0, close(fd));
+
+	how.flags |= O_MAYEXEC;
+
+	/* Checks that O_MAYEXEC is ignored with open(2). */
+	fd = open(path, how.flags);
+	ASSERT_LE(0, fd);
+	EXPECT_EQ(0, close(fd));
+
+	/* Checks that O_MAYEXEC is ignored with openat(2). */
+	fd = openat(AT_FDCWD, path, how.flags);
+	ASSERT_LE(0, fd);
+	EXPECT_EQ(0, close(fd));
+
+	/* Opens with O_MAYEXEC. */
+	fd = sys_openat2(AT_FDCWD, path, &how);
+	if (!err_code) {
+		ASSERT_LE(0, fd);
+		EXPECT_EQ(0, close(fd));
+	} else {
+		ASSERT_EQ(err_code, fd);
+	}
+}
+
+static void test_omx_dir_file(struct __test_metadata *_metadata, const int err_code)
+{
+	test_omx(_metadata, dir_path, -EISDIR);
+	test_omx(_metadata, file_path, err_code);
+}
+
+static void test_dir_file(struct __test_metadata *_metadata, const int err_code)
+{
+	/* Tests as root. */
+	ignore_dac(_metadata, 1);
+	test_omx_dir_file(_metadata, err_code);
+
+	/* Tests without bypass. */
+	ignore_dac(_metadata, 0);
+	test_omx_dir_file(_metadata, err_code);
+}
+
+static void sysctl_write_char(struct __test_metadata *_metadata, const char value)
+{
+	int fd;
+
+	fd = open(sysctl_path, O_WRONLY | O_CLOEXEC);
+	ASSERT_LE(0, fd);
+	ASSERT_EQ(1, write(fd, &value, 1));
+	EXPECT_EQ(0, close(fd));
+}
+
+static char sysctl_read_char(struct __test_metadata *_metadata)
+{
+	int fd;
+	char sysctl_value;
+
+	fd = open(sysctl_path, O_RDONLY | O_CLOEXEC);
+	ASSERT_LE(0, fd);
+	ASSERT_EQ(1, read(fd, &sysctl_value, 1));
+	EXPECT_EQ(0, close(fd));
+	return sysctl_value;
+}
+
+FIXTURE(omayexec) {
+	char initial_sysctl_value;
+};
+
+FIXTURE_VARIANT(omayexec) {
+	const bool mount_exec;
+	const bool file_exec;
+	const int sysctl_err_code[3];
+};
+
+FIXTURE_VARIANT_ADD(omayexec, mount_exec_file_exec) {
+	.mount_exec = true,
+	.file_exec = true,
+	.sysctl_err_code = {0, 0, 0},
+};
+
+FIXTURE_VARIANT_ADD(omayexec, mount_exec_file_noexec)
+{
+	.mount_exec = true,
+	.file_exec = false,
+	.sysctl_err_code = {0, -EACCES, -EACCES},
+};
+
+FIXTURE_VARIANT_ADD(omayexec, mount_noexec_file_exec)
+{
+	.mount_exec = false,
+	.file_exec = true,
+	.sysctl_err_code = {-EACCES, 0, -EACCES},
+};
+
+FIXTURE_VARIANT_ADD(omayexec, mount_noexec_file_noexec)
+{
+	.mount_exec = false,
+	.file_exec = false,
+	.sysctl_err_code = {-EACCES, -EACCES, -EACCES},
+};
+
+FIXTURE_SETUP(omayexec)
+{
+	int fd;
+
+	/*
+	 * Cleans previous workspace if any error previously happened (don't
+	 * check errors).
+	 */
+	umount(workdir_path);
+	rmdir(workdir_path);
+
+	/* Creates a clean mount point. */
+	ASSERT_EQ(0, mkdir(workdir_path, 00700));
+	ASSERT_EQ(0, mount("test", workdir_path, "tmpfs", MS_MGC_VAL |
+				(variant->mount_exec ? 0 : MS_NOEXEC),
+				"mode=0700,size=4k"));
+
+	/* Creates a test file. */
+	fd = open(file_path, O_CREAT | O_RDONLY | O_CLOEXEC,
+			variant->file_exec ? 00500 : 00400);
+	ASSERT_LE(0, fd);
+	EXPECT_EQ(0, close(fd));
+
+	/* Creates a test directory. */
+	ASSERT_EQ(0, mkdir(dir_path, variant->file_exec ? 00500 : 00400));
+
+	/* Saves initial sysctl value. */
+	self->initial_sysctl_value = sysctl_read_char(_metadata);
+
+	/* Prepares for sysctl writes. */
+	ignore_sys_admin(_metadata, 1);
+}
+
+FIXTURE_TEARDOWN(omayexec)
+{
+	/* Restores initial sysctl value. */
+	sysctl_write_char(_metadata, self->initial_sysctl_value);
+
+	/* There is no need to unlink file_path nor dir_path. */
+	ASSERT_EQ(0, umount(workdir_path));
+	ASSERT_EQ(0, rmdir(workdir_path));
+}
+
+TEST_F(omayexec, sysctl_0)
+{
+	/* Do not enforce anything. */
+	sysctl_write_char(_metadata, '0');
+	test_dir_file(_metadata, 0);
+}
+
+TEST_F(omayexec, sysctl_1)
+{
+	/* Enforces mount exec check. */
+	sysctl_write_char(_metadata, '1');
+	test_dir_file(_metadata, variant->sysctl_err_code[0]);
+}
+
+TEST_F(omayexec, sysctl_2)
+{
+	/* Enforces file exec check. */
+	sysctl_write_char(_metadata, '2');
+	test_dir_file(_metadata, variant->sysctl_err_code[1]);
+}
+
+TEST_F(omayexec, sysctl_3)
+{
+	/* Enforces mount and file exec check. */
+	sysctl_write_char(_metadata, '3');
+	test_dir_file(_metadata, variant->sysctl_err_code[2]);
+}
+
+TEST(sysctl_access_write)
+{
+	int fd;
+	ssize_t ret;
+
+	ignore_sys_admin(_metadata, 1);
+	sysctl_write_char(_metadata, '0');
+
+	ignore_sys_admin(_metadata, 0);
+	fd = open(sysctl_path, O_WRONLY | O_CLOEXEC);
+	ASSERT_LE(0, fd);
+	ret = write(fd, "0", 1);
+	ASSERT_EQ(-1, ret);
+	ASSERT_EQ(EPERM, errno);
+	EXPECT_EQ(0, close(fd));
+}
+
+TEST_HARNESS_MAIN
-- 
2.27.0


^ permalink raw reply related

* [PATCH v6 7/7] ima: add policy support for the new file open MAY_OPENEXEC flag
From: Mickaël Salaün @ 2020-07-14 18:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
	Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <20200714181638.45751-1-mic@digikod.net>

From: Mimi Zohar <zohar@linux.ibm.com>

The kernel has no way of differentiating between a file containing data
or code being opened by an interpreter.  The proposed O_MAYEXEC
openat2(2) flag bridges this gap by defining and enabling the
MAY_OPENEXEC flag.

This patch adds IMA policy support for the new MAY_OPENEXEC flag.

Example:
measure func=FILE_CHECK mask=^MAY_OPENEXEC
appraise func=FILE_CHECK appraise_type=imasig mask=^MAY_OPENEXEC

Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Acked-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/1588167523-7866-3-git-send-email-zohar@linux.ibm.com
---
 Documentation/ABI/testing/ima_policy |  2 +-
 security/integrity/ima/ima_main.c    |  3 ++-
 security/integrity/ima/ima_policy.c  | 15 +++++++++++----
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/Documentation/ABI/testing/ima_policy b/Documentation/ABI/testing/ima_policy
index cd572912c593..caca46125fe0 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -31,7 +31,7 @@ Description:
 				[KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK]
 				[KEXEC_CMDLINE] [KEY_CHECK]
 			mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
-			       [[^]MAY_EXEC]
+			       [[^]MAY_EXEC] [[^]MAY_OPENEXEC]
 			fsmagic:= hex value
 			fsuuid:= file system UUID (e.g 8bcbe394-4f13-4144-be8e-5aa9ea2ce2f6)
 			uid:= decimal value
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index c1583d98c5e5..59fd1658a203 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -490,7 +490,8 @@ int ima_file_check(struct file *file, int mask)
 
 	security_task_getsecid(current, &secid);
 	return process_measurement(file, current_cred(), secid, NULL, 0,
-				   mask & (MAY_READ | MAY_WRITE | MAY_EXEC |
+				   mask & (MAY_READ | MAY_WRITE |
+					   MAY_EXEC | MAY_OPENEXEC |
 					   MAY_APPEND), FILE_CHECK);
 }
 EXPORT_SYMBOL_GPL(ima_file_check);
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index e493063a3c34..6487f0b2afdd 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -406,7 +406,8 @@ static bool ima_match_keyring(struct ima_rule_entry *rule,
  * @cred: a pointer to a credentials structure for user validation
  * @secid: the secid of the task to be validated
  * @func: LIM hook identifier
- * @mask: requested action (MAY_READ | MAY_WRITE | MAY_APPEND | MAY_EXEC)
+ * @mask: requested action (MAY_READ | MAY_WRITE | MAY_APPEND | MAY_EXEC |
+ *			    MAY_OPENEXEC)
  * @keyring: keyring name to check in policy for KEY_CHECK func
  *
  * Returns true on rule match, false on failure.
@@ -527,7 +528,8 @@ static int get_subaction(struct ima_rule_entry *rule, enum ima_hooks func)
  *        being made
  * @secid: LSM secid of the task to be validated
  * @func: IMA hook identifier
- * @mask: requested action (MAY_READ | MAY_WRITE | MAY_APPEND | MAY_EXEC)
+ * @mask: requested action (MAY_READ | MAY_WRITE | MAY_APPEND | MAY_EXEC |
+ *			    MAY_OPENEXEC)
  * @pcr: set the pcr to extend
  * @template_desc: the template that should be used for this rule
  * @keyring: the keyring name, if given, to be used to check in the policy.
@@ -1091,6 +1093,8 @@ static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)
 				entry->mask = MAY_READ;
 			else if (strcmp(from, "MAY_APPEND") == 0)
 				entry->mask = MAY_APPEND;
+			else if (strcmp(from, "MAY_OPENEXEC") == 0)
+				entry->mask = MAY_OPENEXEC;
 			else
 				result = -EINVAL;
 			if (!result)
@@ -1422,14 +1426,15 @@ const char *const func_tokens[] = {
 
 #ifdef	CONFIG_IMA_READ_POLICY
 enum {
-	mask_exec = 0, mask_write, mask_read, mask_append
+	mask_exec = 0, mask_write, mask_read, mask_append, mask_openexec
 };
 
 static const char *const mask_tokens[] = {
 	"^MAY_EXEC",
 	"^MAY_WRITE",
 	"^MAY_READ",
-	"^MAY_APPEND"
+	"^MAY_APPEND",
+	"^MAY_OPENEXEC"
 };
 
 void *ima_policy_start(struct seq_file *m, loff_t *pos)
@@ -1518,6 +1523,8 @@ int ima_policy_show(struct seq_file *m, void *v)
 			seq_printf(m, pt(Opt_mask), mt(mask_read) + offset);
 		if (entry->mask & MAY_APPEND)
 			seq_printf(m, pt(Opt_mask), mt(mask_append) + offset);
+		if (entry->mask & MAY_OPENEXEC)
+			seq_printf(m, pt(Opt_mask), mt(mask_openexec) + offset);
 		seq_puts(m, " ");
 	}
 
-- 
2.27.0


^ permalink raw reply related

* [PATCH v6 1/7] exec: Change uselib(2) IS_SREG() failure to EACCES
From: Mickaël Salaün @ 2020-07-14 18:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
	Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <20200714181638.45751-1-mic@digikod.net>

From: Kees Cook <keescook@chromium.org>

Change uselib(2)' S_ISREG() error return to EACCES instead of EINVAL so
the behavior matches execve(2), and the seemingly documented value.
The "not a regular file" failure mode of execve(2) is explicitly
documented[1], but it is not mentioned in uselib(2)[2] which does,
however, say that open(2) and mmap(2) errors may apply. The documentation
for open(2) does not include a "not a regular file" error[3], but mmap(2)
does[4], and it is EACCES.

[1] http://man7.org/linux/man-pages/man2/execve.2.html#ERRORS
[2] http://man7.org/linux/man-pages/man2/uselib.2.html#ERRORS
[3] http://man7.org/linux/man-pages/man2/open.2.html#ERRORS
[4] http://man7.org/linux/man-pages/man2/mmap.2.html#ERRORS

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20200605160013.3954297-2-keescook@chromium.org
---
 fs/exec.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index e6e8a9a70327..d7c937044d10 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -141,11 +141,10 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
 	if (IS_ERR(file))
 		goto out;
 
-	error = -EINVAL;
+	error = -EACCES;
 	if (!S_ISREG(file_inode(file)->i_mode))
 		goto exit;
 
-	error = -EACCES;
 	if (path_noexec(&file->f_path))
 		goto exit;
 
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH ghak84 v4] audit: purge audit_log_string from the intra-kernel audit API
From: Richard Guy Briggs @ 2020-07-14 17:43 UTC (permalink / raw)
  To: Paul Moore
  Cc: Linux-Audit Mailing List, LKML, Linux Security Module list,
	Eric Paris, john.johansen
In-Reply-To: <CAHC9VhSyq7yKQqwvHL5syU9+TFki6-__WfCrvqewbnU3xpND4Q@mail.gmail.com>

On 2020-07-14 12:21, Paul Moore wrote:
> On Mon, Jul 13, 2020 at 3:52 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> > audit_log_string() was inteded to be an internal audit function and
> > since there are only two internal uses, remove them.  Purge all external
> > uses of it by restructuring code to use an existing audit_log_format()
> > or using audit_log_format().
> >
> > Please see the upstream issue
> > https://github.com/linux-audit/audit-kernel/issues/84
> >
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > ---
> > Passes audit-testsuite.
> >
> > Changelog:
> > v4
> > - use double quotes in all replaced audit_log_string() calls
> >
> > v3
> > - fix two warning: non-void function does not return a value in all control paths
> >         Reported-by: kernel test robot <lkp@intel.com>
> >
> > v2
> > - restructure to piggyback on existing audit_log_format() calls, checking quoting needs for each.
> >
> > v1 Vlad Dronov
> > - https://github.com/nefigtut/audit-kernel/commit/dbbcba46335a002f44b05874153a85b9cc18aebf
> >
> >  include/linux/audit.h     |  5 -----
> >  kernel/audit.c            |  4 ++--
> >  security/apparmor/audit.c | 10 ++++------
> >  security/apparmor/file.c  | 25 +++++++------------------
> >  security/apparmor/ipc.c   | 46 +++++++++++++++++++++++-----------------------
> >  security/apparmor/net.c   | 14 ++++++++------
> >  security/lsm_audit.c      |  4 ++--
> >  7 files changed, 46 insertions(+), 62 deletions(-)
> 
> Thanks for restoring the quotes, just one question below ...
> 
> > diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
> > index 4ecedffbdd33..fe36d112aad9 100644
> > --- a/security/apparmor/ipc.c
> > +++ b/security/apparmor/ipc.c
> > @@ -20,25 +20,23 @@
> >
> >  /**
> >   * audit_ptrace_mask - convert mask to permission string
> > - * @buffer: buffer to write string to (NOT NULL)
> >   * @mask: permission mask to convert
> > + *
> > + * Returns: pointer to static string
> >   */
> > -static void audit_ptrace_mask(struct audit_buffer *ab, u32 mask)
> > +static const char *audit_ptrace_mask(u32 mask)
> >  {
> >         switch (mask) {
> >         case MAY_READ:
> > -               audit_log_string(ab, "read");
> > -               break;
> > +               return "read";
> >         case MAY_WRITE:
> > -               audit_log_string(ab, "trace");
> > -               break;
> > +               return "trace";
> >         case AA_MAY_BE_READ:
> > -               audit_log_string(ab, "readby");
> > -               break;
> > +               return "readby";
> >         case AA_MAY_BE_TRACED:
> > -               audit_log_string(ab, "tracedby");
> > -               break;
> > +               return "tracedby";
> >         }
> > +       return "";
> 
> Are we okay with this returning an empty string ("") in this case?
> Should it be a question mark ("?")?
> 
> My guess is that userspace parsing should be okay since it still has
> quotes, I'm just not sure if we wanted to use a question mark as we do
> in other cases where the field value is empty/unknown.

Previously, it would have been an empty value, not even double quotes.
"?" might be an improvement.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply

* Re: [PATCH ghak84 v4] audit: purge audit_log_string from the intra-kernel audit API
From: Paul Moore @ 2020-07-14 16:21 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Linux-Audit Mailing List, LKML, Linux Security Module list,
	Eric Paris, john.johansen
In-Reply-To: <6effbbd4574407d6af21162e57d9102d5f8b02ed.1594664015.git.rgb@redhat.com>

On Mon, Jul 13, 2020 at 3:52 PM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> audit_log_string() was inteded to be an internal audit function and
> since there are only two internal uses, remove them.  Purge all external
> uses of it by restructuring code to use an existing audit_log_format()
> or using audit_log_format().
>
> Please see the upstream issue
> https://github.com/linux-audit/audit-kernel/issues/84
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
> Passes audit-testsuite.
>
> Changelog:
> v4
> - use double quotes in all replaced audit_log_string() calls
>
> v3
> - fix two warning: non-void function does not return a value in all control paths
>         Reported-by: kernel test robot <lkp@intel.com>
>
> v2
> - restructure to piggyback on existing audit_log_format() calls, checking quoting needs for each.
>
> v1 Vlad Dronov
> - https://github.com/nefigtut/audit-kernel/commit/dbbcba46335a002f44b05874153a85b9cc18aebf
>
>  include/linux/audit.h     |  5 -----
>  kernel/audit.c            |  4 ++--
>  security/apparmor/audit.c | 10 ++++------
>  security/apparmor/file.c  | 25 +++++++------------------
>  security/apparmor/ipc.c   | 46 +++++++++++++++++++++++-----------------------
>  security/apparmor/net.c   | 14 ++++++++------
>  security/lsm_audit.c      |  4 ++--
>  7 files changed, 46 insertions(+), 62 deletions(-)

Thanks for restoring the quotes, just one question below ...

> diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
> index 4ecedffbdd33..fe36d112aad9 100644
> --- a/security/apparmor/ipc.c
> +++ b/security/apparmor/ipc.c
> @@ -20,25 +20,23 @@
>
>  /**
>   * audit_ptrace_mask - convert mask to permission string
> - * @buffer: buffer to write string to (NOT NULL)
>   * @mask: permission mask to convert
> + *
> + * Returns: pointer to static string
>   */
> -static void audit_ptrace_mask(struct audit_buffer *ab, u32 mask)
> +static const char *audit_ptrace_mask(u32 mask)
>  {
>         switch (mask) {
>         case MAY_READ:
> -               audit_log_string(ab, "read");
> -               break;
> +               return "read";
>         case MAY_WRITE:
> -               audit_log_string(ab, "trace");
> -               break;
> +               return "trace";
>         case AA_MAY_BE_READ:
> -               audit_log_string(ab, "readby");
> -               break;
> +               return "readby";
>         case AA_MAY_BE_TRACED:
> -               audit_log_string(ab, "tracedby");
> -               break;
> +               return "tracedby";
>         }
> +       return "";

Are we okay with this returning an empty string ("") in this case?
Should it be a question mark ("?")?

My guess is that userspace parsing should be okay since it still has
quotes, I'm just not sure if we wanted to use a question mark as we do
in other cases where the field value is empty/unknown.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH 0/7] Implementing kernel_execve
From: Linus Torvalds @ 2020-07-14 15:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Kees Cook, Andy Lutomirski,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Al Viro, Luis Chamberlain, linux-fsdevel, Tetsuo Handa, LSM List,
	Serge E. Hallyn, James Morris, Kentaro Takeda, Casey Schaufler,
	John Johansen, Christoph Hellwig
In-Reply-To: <871rle8bw2.fsf@x220.int.ebiederm.org>

On Tue, Jul 14, 2020 at 6:30 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Please let me know if you see something wrong.

Ack, looks good to me.

                 Linus

^ permalink raw reply

* Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Arnaldo Carvalho de Melo @ 2020-07-14 15:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexey Budankov, Ravi Bangoria, Alexei Starovoitov, Ingo Molnar,
	James Morris, Namhyung Kim, Serge Hallyn, Jiri Olsa, Song Liu,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Thomas Gleixner,
	linux-kernel, linux-security-module@vger.kernel.org,
	selinux@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	linux-doc@vger.kernel.org, linux-man
In-Reply-To: <20200714105934.GU10769@hirez.programming.kicks-ass.net>

Em Tue, Jul 14, 2020 at 12:59:34PM +0200, Peter Zijlstra escreveu:
> On Mon, Jul 13, 2020 at 03:51:52PM -0300, Arnaldo Carvalho de Melo wrote:
> 
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index 856d98c36f56..a2397f724c10 100644
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -11595,7 +11595,7 @@ SYSCALL_DEFINE5(perf_event_open,
> > > >  		 * perf_event_exit_task() that could imply).
> > > >  		 */
> > > >  		err = -EACCES;
> > > > -		if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> > > > +		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> > > >  			goto err_cred;
> > > >  	}

> > > >> makes monitoring simpler and even more secure to use since Perf tool need
> > > >> not to start/stop/single-step and read/write registers and memory and so on
> > > >> like a debugger or strace-like tool. What do you think?

> > > > I tend to agree, Peter?

> So this basically says that if CAP_PERFMON, we don't care about the
> ptrace() permissions? Just like how CAP_SYS_PTRACE would always allow
> the ptrace checks?

> I suppose that makes sense.

Yeah, it in fact addresses the comment right above it:

        if (task) {
                err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
                if (err)
                        goto err_task;

                /*
                 * Reuse ptrace permission checks for now.
                 *
                 * We must hold exec_update_mutex across this and any potential
                 * perf_install_in_context() call for this new event to
                 * serialize against exec() altering our credentials (and the
                 * perf_event_exit_task() that could imply).
                 */
                err = -EACCES;
                if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
                        goto err_cred;
        }


that "for now" part :-)

Idea is to not require CAP_PTRACE for that, i.e. the attack surface for the
perf binary is reduced.

- Arnaldo

^ permalink raw reply

* [PATCH 7/7] exec: Implement kernel_execve
From: Eric W. Biederman @ 2020-07-14 13:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Kees Cook, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Al Viro,
	Luis Chamberlain, linux-fsdevel, Tetsuo Handa,
	linux-security-module, Serge E. Hallyn, James Morris,
	Kentaro Takeda, Casey Schaufler, John Johansen, Christoph Hellwig
In-Reply-To: <871rle8bw2.fsf@x220.int.ebiederm.org>


To allow the kernel not to play games with set_fs to call exec
implement kernel_execve.  The function kernel_execve takes pointers
into kernel memory and copies the values pointed to onto the new
userspace stack.

The calls with arguments from kernel space of do_execve are replaced
with calls to kernel_execve.

The calls do_execve and do_execveat are made static as there are now
no callers outside of exec.

The comments that mention do_execve are updated to refer to
kernel_execve or execve depending on the circumstances.  In addition
to correcting the comments, this makes it easy to grep for do_execve
and verify it is not used.

Inspired-by: https://lkml.kernel.org/r/20200627072704.2447163-1-hch@lst.de
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/entry/entry_32.S      |  2 +-
 arch/x86/entry/entry_64.S      |  2 +-
 arch/x86/kernel/unwind_frame.c |  2 +-
 fs/exec.c                      | 88 +++++++++++++++++++++++++++++++++-
 include/linux/binfmts.h        |  9 +---
 init/main.c                    |  4 +-
 kernel/umh.c                   |  6 +--
 security/tomoyo/common.h       |  2 +-
 security/tomoyo/domain.c       |  4 +-
 security/tomoyo/tomoyo.c       |  4 +-
 10 files changed, 100 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 024d7d276cd4..8f4e085ee06d 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -854,7 +854,7 @@ SYM_CODE_START(ret_from_fork)
 	CALL_NOSPEC ebx
 	/*
 	 * A kernel thread is allowed to return here after successfully
-	 * calling do_execve().  Exit to userspace to complete the execve()
+	 * calling kernel_execve().  Exit to userspace to complete the execve()
 	 * syscall.
 	 */
 	movl	$0, PT_EAX(%esp)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d2a00c97e53f..73c7e255256b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -293,7 +293,7 @@ SYM_CODE_START(ret_from_fork)
 	CALL_NOSPEC rbx
 	/*
 	 * A kernel thread is allowed to return here after successfully
-	 * calling do_execve().  Exit to userspace to complete the execve()
+	 * calling kernel_execve().  Exit to userspace to complete the execve()
 	 * syscall.
 	 */
 	movq	$0, RAX(%rsp)
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 722a85f3b2dd..e40b4942157f 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -275,7 +275,7 @@ bool unwind_next_frame(struct unwind_state *state)
 		 * This user_mode() check is slightly broader than a PF_KTHREAD
 		 * check because it also catches the awkward situation where a
 		 * newly forked kthread transitions into a user task by calling
-		 * do_execve(), which eventually clears PF_KTHREAD.
+		 * kernel_execve(), which eventually clears PF_KTHREAD.
 		 */
 		if (!user_mode(regs))
 			goto the_end;
diff --git a/fs/exec.c b/fs/exec.c
index f8135dc149b3..3698252719a3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -448,6 +448,23 @@ static int count(struct user_arg_ptr argv, int max)
 	return i;
 }
 
+static int count_strings_kernel(const char *const *argv)
+{
+	int i;
+
+	if (!argv)
+		return 0;
+
+	for (i = 0; argv[i]; ++i) {
+		if (i >= MAX_ARG_STRINGS)
+			return -E2BIG;
+		if (fatal_signal_pending(current))
+			return -ERESTARTNOHAND;
+		cond_resched();
+	}
+	return i;
+}
+
 static int bprm_stack_limits(struct linux_binprm *bprm)
 {
 	unsigned long limit, ptr_size;
@@ -624,6 +641,20 @@ int copy_string_kernel(const char *arg, struct linux_binprm *bprm)
 }
 EXPORT_SYMBOL(copy_string_kernel);
 
+static int copy_strings_kernel(int argc, const char *const *argv,
+			       struct linux_binprm *bprm)
+{
+	while (argc-- > 0) {
+		int ret = copy_string_kernel(argv[argc], bprm);
+		if (ret < 0)
+			return ret;
+		if (fatal_signal_pending(current))
+			return -ERESTARTNOHAND;
+		cond_resched();
+	}
+	return 0;
+}
+
 #ifdef CONFIG_MMU
 
 /*
@@ -1991,7 +2022,60 @@ static int do_execveat_common(int fd, struct filename *filename,
 	return retval;
 }
 
-int do_execve(struct filename *filename,
+int kernel_execve(const char *kernel_filename,
+		  const char *const *argv, const char *const *envp)
+{
+	struct filename *filename;
+	struct linux_binprm *bprm;
+	int fd = AT_FDCWD;
+	int retval;
+
+	filename = getname_kernel(kernel_filename);
+	if (IS_ERR(filename))
+		return PTR_ERR(filename);
+
+	bprm = alloc_bprm(fd, filename);
+	if (IS_ERR(bprm)) {
+		retval = PTR_ERR(bprm);
+		goto out_ret;
+	}
+
+	retval = count_strings_kernel(argv);
+	if (retval < 0)
+		goto out_free;
+	bprm->argc = retval;
+
+	retval = count_strings_kernel(envp);
+	if (retval < 0)
+		goto out_free;
+	bprm->envc = retval;
+
+	retval = bprm_stack_limits(bprm);
+	if (retval < 0)
+		goto out_free;
+
+	retval = copy_string_kernel(bprm->filename, bprm);
+	if (retval < 0)
+		goto out_free;
+	bprm->exec = bprm->p;
+
+	retval = copy_strings_kernel(bprm->envc, envp, bprm);
+	if (retval < 0)
+		goto out_free;
+
+	retval = copy_strings_kernel(bprm->argc, argv, bprm);
+	if (retval < 0)
+		goto out_free;
+
+	retval = bprm_execve(bprm, fd, filename, 0);
+out_free:
+	free_bprm(bprm);
+out_ret:
+	putname(filename);
+	return retval;
+}
+
+static int do_execve(struct filename *filename,
 	const char __user *const __user *__argv,
 	const char __user *const __user *__envp)
 {
@@ -2000,7 +2084,7 @@ int do_execve(struct filename *filename,
 	return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
 }
 
-int do_execveat(int fd, struct filename *filename,
+static int do_execveat(int fd, struct filename *filename,
 		const char __user *const __user *__argv,
 		const char __user *const __user *__envp,
 		int flags)
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 8e9e1b0c8eb8..0571701ab1c5 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -135,12 +135,7 @@ int copy_string_kernel(const char *arg, struct linux_binprm *bprm);
 extern void set_binfmt(struct linux_binfmt *new);
 extern ssize_t read_code(struct file *, unsigned long, loff_t, size_t);
 
-extern int do_execve(struct filename *,
-		     const char __user * const __user *,
-		     const char __user * const __user *);
-extern int do_execveat(int, struct filename *,
-		       const char __user * const __user *,
-		       const char __user * const __user *,
-		       int);
+int kernel_execve(const char *filename,
+		  const char *const *argv, const char *const *envp);
 
 #endif /* _LINUX_BINFMTS_H */
diff --git a/init/main.c b/init/main.c
index 0ead83e86b5a..78ccec5c28f3 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1329,9 +1329,7 @@ static int run_init_process(const char *init_filename)
 	pr_debug("  with environment:\n");
 	for (p = envp_init; *p; p++)
 		pr_debug("    %s\n", *p);
-	return do_execve(getname_kernel(init_filename),
-		(const char __user *const __user *)argv_init,
-		(const char __user *const __user *)envp_init);
+	return kernel_execve(init_filename, argv_init, envp_init);
 }
 
 static int try_to_run_init_process(const char *init_filename)
diff --git a/kernel/umh.c b/kernel/umh.c
index 6ca2096298b9..a25433f9cd9a 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -98,9 +98,9 @@ static int call_usermodehelper_exec_async(void *data)
 
 	commit_creds(new);
 
-	retval = do_execve(getname_kernel(sub_info->path),
-			   (const char __user *const __user *)sub_info->argv,
-			   (const char __user *const __user *)sub_info->envp);
+	retval = kernel_execve(sub_info->path,
+			       (const char *const *)sub_info->argv,
+			       (const char *const *)sub_info->envp);
 out:
 	sub_info->retval = retval;
 	/*
diff --git a/security/tomoyo/common.h b/security/tomoyo/common.h
index 050473df5809..85246b9df7ca 100644
--- a/security/tomoyo/common.h
+++ b/security/tomoyo/common.h
@@ -425,7 +425,7 @@ struct tomoyo_request_info {
 	struct tomoyo_obj_info *obj;
 	/*
 	 * For holding parameters specific to execve() request.
-	 * NULL if not dealing do_execve().
+	 * NULL if not dealing execve().
 	 */
 	struct tomoyo_execve *ee;
 	struct tomoyo_domain_info *domain;
diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index 7869d6a9980b..53b3e1f5f227 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -767,7 +767,7 @@ int tomoyo_find_next_domain(struct linux_binprm *bprm)
 
 	/*
 	 * Check for domain transition preference if "file execute" matched.
-	 * If preference is given, make do_execve() fail if domain transition
+	 * If preference is given, make execve() fail if domain transition
 	 * has failed, for domain transition preference should be used with
 	 * destination domain defined.
 	 */
@@ -810,7 +810,7 @@ int tomoyo_find_next_domain(struct linux_binprm *bprm)
 		snprintf(ee->tmp, TOMOYO_EXEC_TMPSIZE - 1, "<%s>",
 			 candidate->name);
 		/*
-		 * Make do_execve() fail if domain transition across namespaces
+		 * Make execve() fail if domain transition across namespaces
 		 * has failed.
 		 */
 		reject_on_transition_failure = true;
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index f9adddc42ac8..1f3cd432d830 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -93,7 +93,7 @@ static int tomoyo_bprm_check_security(struct linux_binprm *bprm)
 	struct tomoyo_task *s = tomoyo_task(current);
 
 	/*
-	 * Execute permission is checked against pathname passed to do_execve()
+	 * Execute permission is checked against pathname passed to execve()
 	 * using current domain.
 	 */
 	if (!s->old_domain_info) {
@@ -307,7 +307,7 @@ static int tomoyo_file_fcntl(struct file *file, unsigned int cmd,
  */
 static int tomoyo_file_open(struct file *f)
 {
-	/* Don't check read permission here if called from do_execve(). */
+	/* Don't check read permission here if called from execve(). */
 	if (current->in_execve)
 		return 0;
 	return tomoyo_check_open_permission(tomoyo_domain(), &f->f_path,
-- 
2.25.0


^ permalink raw reply related

* [PATCH 6/7] exec: Factor bprm_stack_limits out of prepare_arg_pages
From: Eric W. Biederman @ 2020-07-14 13:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Kees Cook, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Al Viro,
	Luis Chamberlain, linux-fsdevel, Tetsuo Handa,
	linux-security-module, Serge E. Hallyn, James Morris,
	Kentaro Takeda, Casey Schaufler, John Johansen, Christoph Hellwig
In-Reply-To: <871rle8bw2.fsf@x220.int.ebiederm.org>


In preparation for implementiong kernel_execve (which will take kernel
pointers not userspace pointers) factor out bprm_stack_limits out of
prepare_arg_pages.  This separates the counting which depends upon the
getting data from userspace from the calculations of the stack limits
which is usable in kernel_execve.

The remove prepare_args_pages and compute bprm->argc and bprm->envc
directly in do_execveat_common, before bprm_stack_limits is called.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 50508892fa71..f8135dc149b3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -448,19 +448,10 @@ static int count(struct user_arg_ptr argv, int max)
 	return i;
 }
 
-static int prepare_arg_pages(struct linux_binprm *bprm,
-			struct user_arg_ptr argv, struct user_arg_ptr envp)
+static int bprm_stack_limits(struct linux_binprm *bprm)
 {
 	unsigned long limit, ptr_size;
 
-	bprm->argc = count(argv, MAX_ARG_STRINGS);
-	if (bprm->argc < 0)
-		return bprm->argc;
-
-	bprm->envc = count(envp, MAX_ARG_STRINGS);
-	if (bprm->envc < 0)
-		return bprm->envc;
-
 	/*
 	 * Limit to 1/4 of the max stack size or 3/4 of _STK_LIM
 	 * (whichever is smaller) for the argv+env strings.
@@ -1964,7 +1955,17 @@ static int do_execveat_common(int fd, struct filename *filename,
 		goto out_ret;
 	}
 
-	retval = prepare_arg_pages(bprm, argv, envp);
+	retval = count(argv, MAX_ARG_STRINGS);
+	if (retval < 0)
+		goto out_free;
+	bprm->argc = retval;
+
+	retval = count(envp, MAX_ARG_STRINGS);
+	if (retval < 0)
+		goto out_free;
+	bprm->envc = retval;
+
+	retval = bprm_stack_limits(bprm);
 	if (retval < 0)
 		goto out_free;
 
-- 
2.25.0


^ permalink raw reply related

* [PATCH 5/7] exec: Factor bprm_execve out of do_execve_common
From: Eric W. Biederman @ 2020-07-14 13:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Kees Cook, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Al Viro,
	Luis Chamberlain, linux-fsdevel, Tetsuo Handa,
	linux-security-module, Serge E. Hallyn, James Morris,
	Kentaro Takeda, Casey Schaufler, John Johansen, Christoph Hellwig
In-Reply-To: <871rle8bw2.fsf@x220.int.ebiederm.org>


Currently it is necessary for the usermode helper code and the code
that launches init to use set_fs so that pages coming from the kernel
look like they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument
copying from userspace needs to happen earlier.  Factor bprm_execve
out of do_execve_common to separate out the copying of arguments
to the newe stack, and the rest of exec.

In separating bprm_execve from do_execve_common the copying
of the arguments onto the new stack happens earlier.

As the copying of the arguments does not depend any security hooks,
files, the file table, current->in_execve, current->fs->in_exec,
bprm->unsafe, or creds this is safe.

Likewise the security hook security_creds_for_exec does not depend upon
preventing the argument copying from happening.

In addition to making it possible to implement kernel_execve that
performs the copying differently, this separation of bprm_execve from
do_execve_common makes for a nice separation of responsibilities making
the exec code easier to navigate.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 108 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 58 insertions(+), 50 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index afb168bf5e23..50508892fa71 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1856,44 +1856,16 @@ static int exec_binprm(struct linux_binprm *bprm)
 /*
  * sys_execve() executes a new program.
  */
-static int do_execveat_common(int fd, struct filename *filename,
-			      struct user_arg_ptr argv,
-			      struct user_arg_ptr envp,
-			      int flags)
+static int bprm_execve(struct linux_binprm *bprm,
+		       int fd, struct filename *filename, int flags)
 {
-	struct linux_binprm *bprm;
 	struct file *file;
 	struct files_struct *displaced;
 	int retval;
 
-	if (IS_ERR(filename))
-		return PTR_ERR(filename);
-
-	/*
-	 * We move the actual failure in case of RLIMIT_NPROC excess from
-	 * set*uid() to execve() because too many poorly written programs
-	 * don't check setuid() return code.  Here we additionally recheck
-	 * whether NPROC limit is still exceeded.
-	 */
-	if ((current->flags & PF_NPROC_EXCEEDED) &&
-	    atomic_read(&current_user()->processes) > rlimit(RLIMIT_NPROC)) {
-		retval = -EAGAIN;
-		goto out_ret;
-	}
-
-	/* We're below the limit (still or again), so we don't want to make
-	 * further execve() calls fail. */
-	current->flags &= ~PF_NPROC_EXCEEDED;
-
-	bprm = alloc_bprm(fd, filename);
-	if (IS_ERR(bprm)) {
-		retval = PTR_ERR(bprm);
-		goto out_ret;
-	}
-
 	retval = unshare_files(&displaced);
 	if (retval)
-		goto out_free;
+		return retval;
 
 	retval = prepare_bprm_creds(bprm);
 	if (retval)
@@ -1919,28 +1891,11 @@ static int do_execveat_common(int fd, struct filename *filename,
 	    close_on_exec(fd, rcu_dereference_raw(current->files->fdt)))
 		bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
 
-	retval = prepare_arg_pages(bprm, argv, envp);
-	if (retval < 0)
-		goto out;
-
 	/* Set the unchanging part of bprm->cred */
 	retval = security_bprm_creds_for_exec(bprm);
 	if (retval)
 		goto out;
 
-	retval = copy_string_kernel(bprm->filename, bprm);
-	if (retval < 0)
-		goto out;
-
-	bprm->exec = bprm->p;
-	retval = copy_strings(bprm->envc, envp, bprm);
-	if (retval < 0)
-		goto out;
-
-	retval = copy_strings(bprm->argc, argv, bprm);
-	if (retval < 0)
-		goto out;
-
 	retval = exec_binprm(bprm);
 	if (retval < 0)
 		goto out;
@@ -1951,8 +1906,6 @@ static int do_execveat_common(int fd, struct filename *filename,
 	rseq_execve(current);
 	acct_update_integrals(current);
 	task_numa_free(current, false);
-	free_bprm(bprm);
-	putname(filename);
 	if (displaced)
 		put_files_struct(displaced);
 	return retval;
@@ -1974,6 +1927,61 @@ static int do_execveat_common(int fd, struct filename *filename,
 out_files:
 	if (displaced)
 		reset_files_struct(displaced);
+
+	return retval;
+}
+
+static int do_execveat_common(int fd, struct filename *filename,
+			      struct user_arg_ptr argv,
+			      struct user_arg_ptr envp,
+			      int flags)
+{
+	struct linux_binprm *bprm;
+	int retval;
+
+	if (IS_ERR(filename))
+		return PTR_ERR(filename);
+
+	/*
+	 * We move the actual failure in case of RLIMIT_NPROC excess from
+	 * set*uid() to execve() because too many poorly written programs
+	 * don't check setuid() return code.  Here we additionally recheck
+	 * whether NPROC limit is still exceeded.
+	 */
+	if ((current->flags & PF_NPROC_EXCEEDED) &&
+	    atomic_read(&current_user()->processes) > rlimit(RLIMIT_NPROC)) {
+		retval = -EAGAIN;
+		goto out_ret;
+	}
+
+	/* We're below the limit (still or again), so we don't want to make
+	 * further execve() calls fail. */
+	current->flags &= ~PF_NPROC_EXCEEDED;
+
+	bprm = alloc_bprm(fd, filename);
+	if (IS_ERR(bprm)) {
+		retval = PTR_ERR(bprm);
+		goto out_ret;
+	}
+
+	retval = prepare_arg_pages(bprm, argv, envp);
+	if (retval < 0)
+		goto out_free;
+
+	retval = copy_string_kernel(bprm->filename, bprm);
+	if (retval < 0)
+		goto out_free;
+	bprm->exec = bprm->p;
+
+	retval = copy_strings(bprm->envc, envp, bprm);
+	if (retval < 0)
+		goto out_free;
+
+	retval = copy_strings(bprm->argc, argv, bprm);
+	if (retval < 0)
+		goto out_free;
+
+	retval = bprm_execve(bprm, fd, filename, flags);
 out_free:
 	free_bprm(bprm);
 
-- 
2.25.0


^ permalink raw reply related

* [PATCH 4/7] exec: Move bprm_mm_init into alloc_bprm
From: Eric W. Biederman @ 2020-07-14 13:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Kees Cook, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Al Viro,
	Luis Chamberlain, linux-fsdevel, Tetsuo Handa,
	linux-security-module, Serge E. Hallyn, James Morris,
	Kentaro Takeda, Casey Schaufler, John Johansen, Christoph Hellwig
In-Reply-To: <871rle8bw2.fsf@x220.int.ebiederm.org>


Currently it is necessary for the usermode helper code and the code that
launches init to use set_fs so that pages coming from the kernel look like
they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument copying
from userspace needs to happen earlier.  Move the allocation and
initialization of bprm->mm into alloc_bprm so that the bprm->mm is
available early to store the new user stack into.  This is a prerequisite
for copying argv and envp into the new user stack early before ther rest of
exec.

To keep the things consistent the cleanup of bprm->mm is moved into
free_bprm.  So that bprm->mm will be cleaned up whenever bprm->mm is
allocated and free_bprm are called.

Moving bprm_mm_init earlier is safe as it does not depend on any files,
current->in_execve, current->fs->in_exec, bprm->unsafe, or the if the file
table is shared. (AKA bprm_mm_init does not depend on any of the code that
happens between alloc_bprm and where it was previously called.)

This moves bprm->mm cleanup after current->fs->in_exec is set to 0.  This
is safe because current->fs->in_exec is only used to preventy taking an
additional reference on the fs_struct.

This moves bprm->mm cleanup after current->in_execve is set to 0.  This is
safe because current->in_execve is only used by the lsms (apparmor and
tomoyou) and always for LSM specific functions, never for anything to do
with the mm.

This adds bprm->mm cleanup into the successful return path.  This is safe
because being on the successful return path implies that begin_new_exec
succeeded and set brpm->mm to NULL.  As bprm->mm is NULL bprm cleanup I am
moving into free_bprm will do nothing.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 7e8af27dd199..afb168bf5e23 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1543,6 +1543,10 @@ static int prepare_bprm_creds(struct linux_binprm *bprm)
 
 static void free_bprm(struct linux_binprm *bprm)
 {
+	if (bprm->mm) {
+		acct_arg_size(bprm, 0);
+		mmput(bprm->mm);
+	}
 	free_arg_pages(bprm);
 	if (bprm->cred) {
 		mutex_unlock(&current->signal->cred_guard_mutex);
@@ -1582,6 +1586,10 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename)
 		bprm->filename = bprm->fdpath;
 	}
 	bprm->interp = bprm->filename;
+
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_free;
 	return bprm;
 
 out_free:
@@ -1911,10 +1919,6 @@ static int do_execveat_common(int fd, struct filename *filename,
 	    close_on_exec(fd, rcu_dereference_raw(current->files->fdt)))
 		bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
 
-	retval = bprm_mm_init(bprm);
-	if (retval)
-		goto out_unmark;
-
 	retval = prepare_arg_pages(bprm, argv, envp);
 	if (retval < 0)
 		goto out;
@@ -1962,10 +1966,6 @@ static int do_execveat_common(int fd, struct filename *filename,
 	 */
 	if (bprm->point_of_no_return && !fatal_signal_pending(current))
 		force_sigsegv(SIGSEGV);
-	if (bprm->mm) {
-		acct_arg_size(bprm, 0);
-		mmput(bprm->mm);
-	}
 
 out_unmark:
 	current->fs->in_exec = 0;
-- 
2.25.0


^ permalink raw reply related

* [PATCH 3/7] exec: Move initialization of bprm->filename into alloc_bprm
From: Eric W. Biederman @ 2020-07-14 13:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Kees Cook, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Al Viro,
	Luis Chamberlain, linux-fsdevel, Tetsuo Handa,
	linux-security-module, Serge E. Hallyn, James Morris,
	Kentaro Takeda, Casey Schaufler, John Johansen, Christoph Hellwig
In-Reply-To: <871rle8bw2.fsf@x220.int.ebiederm.org>


Currently it is necessary for the usermode helper code and the code
that launches init to use set_fs so that pages coming from the kernel
look like they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument
copying from userspace needs to happen earlier.  Move the computation
of bprm->filename and possible allocation of a name in the case
of execveat into alloc_bprm to make that possible.

The exectuable name, the arguments, and the environment are
copied into the new usermode stack which is stored in bprm
until exec passes the point of no return.

As the executable name is copied first onto the usermode stack
it needs to be known.  As there are no dependencies to computing
the executable name, compute it early in alloc_bprm.

As an implementation detail if the filename needs to be generated
because it embeds a file descriptor store that filename in a new field
bprm->fdpath, and free it in free_bprm.  Previously this was done in
an independent variable pathbuf.  I have renamed pathbuf fdpath
because fdpath is more suggestive of what kind of path is in the
variable.  I moved fdpath into struct linux_binprm because it is
tightly tied to the other variables in struct linux_binprm, and as
such is needed to allow the call alloc_binprm to move.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c               | 61 ++++++++++++++++++++++-------------------
 include/linux/binfmts.h |  1 +
 2 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 526156d6461d..7e8af27dd199 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1557,15 +1557,37 @@ static void free_bprm(struct linux_binprm *bprm)
 	/* If a binfmt changed the interp, free it. */
 	if (bprm->interp != bprm->filename)
 		kfree(bprm->interp);
+	kfree(bprm->fdpath);
 	kfree(bprm);
 }
 
-static struct linux_binprm *alloc_bprm(void)
+static struct linux_binprm *alloc_bprm(int fd, struct filename *filename)
 {
 	struct linux_binprm *bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
+	int retval = -ENOMEM;
 	if (!bprm)
-		return ERR_PTR(-ENOMEM);
+		goto out;
+
+	if (fd == AT_FDCWD || filename->name[0] == '/') {
+		bprm->filename = filename->name;
+	} else {
+		if (filename->name[0] == '\0')
+			bprm->fdpath = kasprintf(GFP_KERNEL, "/dev/fd/%d", fd);
+		else
+			bprm->fdpath = kasprintf(GFP_KERNEL, "/dev/fd/%d/%s",
+						  fd, filename->name);
+		if (!bprm->fdpath)
+			goto out_free;
+
+		bprm->filename = bprm->fdpath;
+	}
+	bprm->interp = bprm->filename;
 	return bprm;
+
+out_free:
+	free_bprm(bprm);
+out:
+	return ERR_PTR(retval);
 }
 
 int bprm_change_interp(const char *interp, struct linux_binprm *bprm)
@@ -1831,7 +1853,6 @@ static int do_execveat_common(int fd, struct filename *filename,
 			      struct user_arg_ptr envp,
 			      int flags)
 {
-	char *pathbuf = NULL;
 	struct linux_binprm *bprm;
 	struct file *file;
 	struct files_struct *displaced;
@@ -1856,7 +1877,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	 * further execve() calls fail. */
 	current->flags &= ~PF_NPROC_EXCEEDED;
 
-	bprm = alloc_bprm();
+	bprm = alloc_bprm(fd, filename);
 	if (IS_ERR(bprm)) {
 		retval = PTR_ERR(bprm);
 		goto out_ret;
@@ -1881,28 +1902,14 @@ static int do_execveat_common(int fd, struct filename *filename,
 	sched_exec();
 
 	bprm->file = file;
-	if (fd == AT_FDCWD || filename->name[0] == '/') {
-		bprm->filename = filename->name;
-	} else {
-		if (filename->name[0] == '\0')
-			pathbuf = kasprintf(GFP_KERNEL, "/dev/fd/%d", fd);
-		else
-			pathbuf = kasprintf(GFP_KERNEL, "/dev/fd/%d/%s",
-					    fd, filename->name);
-		if (!pathbuf) {
-			retval = -ENOMEM;
-			goto out_unmark;
-		}
-		/*
-		 * Record that a name derived from an O_CLOEXEC fd will be
-		 * inaccessible after exec. Relies on having exclusive access to
-		 * current->files (due to unshare_files above).
-		 */
-		if (close_on_exec(fd, rcu_dereference_raw(current->files->fdt)))
-			bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
-		bprm->filename = pathbuf;
-	}
-	bprm->interp = bprm->filename;
+	/*
+	 * Record that a name derived from an O_CLOEXEC fd will be
+	 * inaccessible after exec. Relies on having exclusive access to
+	 * current->files (due to unshare_files above).
+	 */
+	if (bprm->fdpath &&
+	    close_on_exec(fd, rcu_dereference_raw(current->files->fdt)))
+		bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
 
 	retval = bprm_mm_init(bprm);
 	if (retval)
@@ -1941,7 +1948,6 @@ static int do_execveat_common(int fd, struct filename *filename,
 	acct_update_integrals(current);
 	task_numa_free(current, false);
 	free_bprm(bprm);
-	kfree(pathbuf);
 	putname(filename);
 	if (displaced)
 		put_files_struct(displaced);
@@ -1970,7 +1976,6 @@ static int do_execveat_common(int fd, struct filename *filename,
 		reset_files_struct(displaced);
 out_free:
 	free_bprm(bprm);
-	kfree(pathbuf);
 
 out_ret:
 	putname(filename);
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index eb5cb8df5485..8e9e1b0c8eb8 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -56,6 +56,7 @@ struct linux_binprm {
 	const char *interp;	/* Name of the binary really executed. Most
 				   of the time same as filename, but could be
 				   different for binfmt_{misc,script} */
+	const char *fdpath;	/* generated filename for execveat */
 	unsigned interp_flags;
 	int execfd;		/* File descriptor of the executable */
 	unsigned long loader, exec;
-- 
2.25.0


^ permalink raw reply related

* [PATCH 2/7] exec: Factor out alloc_bprm
From: Eric W. Biederman @ 2020-07-14 13:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Kees Cook, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Al Viro,
	Luis Chamberlain, linux-fsdevel, Tetsuo Handa,
	linux-security-module, Serge E. Hallyn, James Morris,
	Kentaro Takeda, Casey Schaufler, John Johansen, Christoph Hellwig
In-Reply-To: <871rle8bw2.fsf@x220.int.ebiederm.org>


Currently it is necessary for the usermode helper code and the code
that launches init to use set_fs so that pages coming from the kernel
look like they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument
copying from userspace needs to happen earlier.  Move the allocation
of the bprm into it's own function (alloc_bprm) and move the call of
alloc_bprm before unshare_files so that bprm can ultimately be
allocated, the arguments can be placed on the new stack, and then the
bprm can be passed into the core of exec.

Neither the allocation of struct binprm nor the unsharing depend upon each
other so swapping the order in which they are called is trivially safe.

To keep things consistent the order of cleanup at the end of
do_execve_common swapped to match the order of initialization.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 23dfbb820626..526156d6461d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1560,6 +1560,14 @@ static void free_bprm(struct linux_binprm *bprm)
 	kfree(bprm);
 }
 
+static struct linux_binprm *alloc_bprm(void)
+{
+	struct linux_binprm *bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
+	if (!bprm)
+		return ERR_PTR(-ENOMEM);
+	return bprm;
+}
+
 int bprm_change_interp(const char *interp, struct linux_binprm *bprm)
 {
 	/* If a binfmt changed the interp, free it first. */
@@ -1848,18 +1856,19 @@ static int do_execveat_common(int fd, struct filename *filename,
 	 * further execve() calls fail. */
 	current->flags &= ~PF_NPROC_EXCEEDED;
 
-	retval = unshare_files(&displaced);
-	if (retval)
+	bprm = alloc_bprm();
+	if (IS_ERR(bprm)) {
+		retval = PTR_ERR(bprm);
 		goto out_ret;
+	}
 
-	retval = -ENOMEM;
-	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
-	if (!bprm)
-		goto out_files;
+	retval = unshare_files(&displaced);
+	if (retval)
+		goto out_free;
 
 	retval = prepare_bprm_creds(bprm);
 	if (retval)
-		goto out_free;
+		goto out_files;
 
 	check_unsafe_exec(bprm);
 	current->in_execve = 1;
@@ -1956,13 +1965,13 @@ static int do_execveat_common(int fd, struct filename *filename,
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
 
+out_files:
+	if (displaced)
+		reset_files_struct(displaced);
 out_free:
 	free_bprm(bprm);
 	kfree(pathbuf);
 
-out_files:
-	if (displaced)
-		reset_files_struct(displaced);
 out_ret:
 	putname(filename);
 	return retval;
-- 
2.25.0


^ permalink raw reply related

* [PATCH 1/7] exec: Remove unnecessary spaces from binfmts.h
From: Eric W. Biederman @ 2020-07-14 13:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Kees Cook, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Al Viro,
	Luis Chamberlain, linux-fsdevel, Tetsuo Handa,
	linux-security-module, Serge E. Hallyn, James Morris,
	Kentaro Takeda, Casey Schaufler, John Johansen, Christoph Hellwig
In-Reply-To: <871rle8bw2.fsf@x220.int.ebiederm.org>


The general convention in the linux kernel is to define a pointer
member as "type *name".  The declaration of struct linux_binprm has
several pointer defined as "type * name".  Update them to the
form of "type *name" for consistency.

Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/binfmts.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 7c27d7b57871..eb5cb8df5485 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -45,15 +45,15 @@ struct linux_binprm {
 #ifdef __alpha__
 	unsigned int taso:1;
 #endif
-	struct file * executable; /* Executable to pass to the interpreter */
-	struct file * interpreter;
-	struct file * file;
+	struct file *executable; /* Executable to pass to the interpreter */
+	struct file *interpreter;
+	struct file *file;
 	struct cred *cred;	/* new credentials */
 	int unsafe;		/* how unsafe this exec is (mask of LSM_UNSAFE_*) */
 	unsigned int per_clear;	/* bits to clear in current->personality */
 	int argc, envc;
-	const char * filename;	/* Name of binary as seen by procps */
-	const char * interp;	/* Name of the binary really executed. Most
+	const char *filename;	/* Name of binary as seen by procps */
+	const char *interp;	/* Name of the binary really executed. Most
 				   of the time same as filename, but could be
 				   different for binfmt_{misc,script} */
 	unsigned interp_flags;
-- 
2.25.0


^ permalink raw reply related

* [PATCH 0/7] Implementing kernel_execve
From: Eric W. Biederman @ 2020-07-14 13:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Kees Cook, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Al Viro,
	Luis Chamberlain, linux-fsdevel, Tetsuo Handa,
	linux-security-module, Serge E. Hallyn, James Morris,
	Kentaro Takeda, Casey Schaufler, John Johansen, Christoph Hellwig


This set of changes implements kernel_execve to remove the need for
kernel threads to pass in pointers to in-kernel data structures
to functions that take __user pointers.   Which is part of the
greater removal of set_fs work.

This set of changes makes do_execve static and so I have updated the
comments.  This affects the comments in the x86 entry point code
and the comments in tomoyo.  I believe I have updated them correctly.
If not please let me know.

I have moved the calls of copy_strings before the call of
security_bprm_creds_for_exec.  Which might be of interest to the
security folks.  I can't see that it matters but I have copied the
security folks just to be certain.

By moving the initialization of the new stack that copy_strings does
earlier it becomes possible to copy all of the parameters to exec before
anything else is done which makes it possible to have one function
kernel_execve that uncondtionally handles copying parameters from kernel
space, and another function do_execveat_common which handles copying
parameters from userspace.

This work was inspired by Christoph Hellwig's similar patchset, which my
earlier work to remove the file parameter to do_execveat_common
conflicted with.
https://lore.kernel.org/linux-fsdevel/20200627072704.2447163-1-hch@lst.de/

I figured that after causing all of that trouble for the set_fs work
the least I could do is implement the change myself.

The big practical change from Christoph's work is that he did not
separate out the copying of parameters from the rest of the work of
exec, which did not help the maintainability of the code.

Please let me know if you see something wrong.

This set of changes is against my exec-next branch:
https://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next

Eric W. Biederman (7):
      exec: Remove unnecessary spaces from binfmts.h
      exec: Factor out alloc_bprm
      exec: Move initialization of bprm->filename into alloc_bprm
      exec: Move bprm_mm_init into alloc_bprm
      exec: Factor bprm_execve out of do_execve_common
      exec: Factor bprm_stack_limits out of prepare_arg_pages
      exec: Implement kernel_execve

 arch/x86/entry/entry_32.S      |   2 +-
 arch/x86/entry/entry_64.S      |   2 +-
 arch/x86/kernel/unwind_frame.c |   2 +-
 fs/exec.c                      | 301 ++++++++++++++++++++++++++++-------------
 include/linux/binfmts.h        |  20 ++-
 init/main.c                    |   4 +-
 kernel/umh.c                   |   6 +-
 security/tomoyo/common.h       |   2 +-
 security/tomoyo/domain.c       |   4 +-
 security/tomoyo/tomoyo.c       |   4 +-
 10 files changed, 224 insertions(+), 123 deletions(-)

Eric

^ permalink raw reply

* Re: [PATCH v9 2/2] tpm: Add support for event log pointer found in TPM2 ACPI table
From: Stefan Berger @ 2020-07-14 12:09 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Stefan Berger, linux-integrity, linux-kernel, linux-acpi,
	linux-security-module
In-Reply-To: <20200714112030.GA1448526@linux.intel.com>

On 7/14/20 7:20 AM, Jarkko Sakkinen wrote:
> On Wed, Jul 08, 2020 at 10:17:17AM -0400, Stefan Berger wrote:
>>> ❯ swtpm-mvo.swtpm socket --tpmstate dir=/tmp/mytpm1 \
>>>     --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock \
>>>     --log level=20
>>> swtpm: Could not open UnixIO socket: No such file or directory
>>
>> Did you create the directory '/tmp/mytpm1' ?
> Yes. It's the socket file that it is complain because it does
> not exist beforehand.


The socket file is created by the swtpm program.


>
> /Jarkko



^ permalink raw reply

* Re: [PATCH v9 2/2] tpm: Add support for event log pointer found in TPM2 ACPI table
From: Jarkko Sakkinen @ 2020-07-14 11:20 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Stefan Berger, linux-integrity, linux-kernel, linux-acpi,
	linux-security-module
In-Reply-To: <e42cb59d-6a3d-12be-bb51-88aa8c5dba23@linux.ibm.com>

On Wed, Jul 08, 2020 at 10:17:17AM -0400, Stefan Berger wrote:
> > ❯ swtpm-mvo.swtpm socket --tpmstate dir=/tmp/mytpm1 \
> >    --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock \
> >    --log level=20
> > swtpm: Could not open UnixIO socket: No such file or directory
> 
> 
> Did you create the directory '/tmp/mytpm1' ?

Yes. It's the socket file that it is complain because it does
not exist beforehand.

/Jarkko

^ permalink raw reply

* Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Peter Zijlstra @ 2020-07-14 10:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexey Budankov, Ravi Bangoria, Alexei Starovoitov, Ingo Molnar,
	James Morris, Namhyung Kim, Serge Hallyn, Jiri Olsa, Song Liu,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Thomas Gleixner,
	linux-kernel, linux-security-module@vger.kernel.org,
	selinux@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	linux-doc@vger.kernel.org, linux-man
In-Reply-To: <20200713185152.GA18094@kernel.org>

On Mon, Jul 13, 2020 at 03:51:52PM -0300, Arnaldo Carvalho de Melo wrote:

> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index 856d98c36f56..a2397f724c10 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -11595,7 +11595,7 @@ SYSCALL_DEFINE5(perf_event_open,
> > >  		 * perf_event_exit_task() that could imply).
> > >  		 */
> > >  		err = -EACCES;
> > > -		if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> > > +		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> > >  			goto err_cred;
> > >  	}
> > > 
> > >> makes monitoring simpler and even more secure to use since Perf tool need
> > >> not to start/stop/single-step and read/write registers and memory and so on
> > >> like a debugger or strace-like tool. What do you think?
> > > 
> > > I tend to agree, Peter?

So this basically says that if CAP_PERFMON, we don't care about the
ptrace() permissions? Just like how CAP_SYS_PTRACE would always allow
the ptrace checks?

I suppose that makes sense.

^ permalink raw reply

* Re: [V2 PATCH 2/3] dt-bindings: chosen: Document ima-kexec-buffer
From: Prakhar Srivastava @ 2020-07-13 20:32 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: linux-arm-kernel, linux-kernel, linuxppc-dev, devicetree,
	linux-integrity, linux-security-module, catalin.marinas, will,
	mpe, benh, paulus, robh+dt, frowand.list, zohar, dmitry.kasatkin,
	jmorris, serge, pasha.tatashin, allison, kstewart,
	takahiro.akashi, tglx, vincenzo.frascino, mark.rutland, masahiroy,
	james.morse, bhsharma, mbrugger, hsinyi, tao.li, christophe.leroy,
	gregkh, nramas, tusharsu, balajib
In-Reply-To: <87mu4yr2k2.fsf@morokweng.localdomain>



On 6/19/20 5:41 PM, Thiago Jung Bauermann wrote:
> 
> Prakhar Srivastava <prsriva@linux.microsoft.com> writes:
> 
>> Integrity measurement architecture(IMA) validates if files
>> have been accidentally or maliciously altered, both remotely and
>> locally, appraise a file's measurement against a "good" value stored
>> as an extended attribute, and enforce local file integrity.
>>
>> IMA also measures singatures of kernel and initrd during kexec along with
>> the command line used for kexec.
>> These measurements are critical to verify the seccurity posture of the OS.
>>
>> Resering memory and adding the memory information to a device tree node
>> acts as the mechanism to carry over IMA measurement logs.
>>
>> Update devicetree documentation to reflect the addition of new property
>> under the chosen node.
> 
> Thank you for writing this documentation patch. It's something I should
> have done when I added the powerpc IMA kexec support.
> 
> You addressed Rob Herring's comments regarding the commit message, but
> not the ones regarding the patch contents.
> 
> When posting a new version of the patches, make sure to address all
> comments made so far. Addressing a comment doesn't necessarily mean
> implementing the requested change. If you don't then you should at least
> explain why you chose a different path.
> 
> I mention it because this has occurred before with this patch series,
> and it's hard to make forward progress if review comments get ignored.
> 
>> ---
>>   Documentation/devicetree/bindings/chosen.txt | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
>> index 45e79172a646..a15f70c007ef 100644
>> --- a/Documentation/devicetree/bindings/chosen.txt
>> +++ b/Documentation/devicetree/bindings/chosen.txt
>> @@ -135,3 +135,20 @@ e.g.
>>   		linux,initrd-end = <0x82800000>;
>>   	};
>>   };
>> +
>> +linux,ima-kexec-buffer
>> +----------------------
>> +
>> +This property(currently used by powerpc, arm64) holds the memory range,
> 
> space before the parenthesis.
> 
>> +the address and the size, of the IMA measurement logs that are being carried
> 
> Maybe it's because English isn't my first language, but IMHO it's
> clearer if "the address and the size" is between parentheses rather than
> commas.
> 
>> +over to the kexec session.
> 
> I don't think there's a "kexec session", but I'm not sure what a good
> term would be. "linux,booted-from-kexec" uses "new kernel" so perhaps
> that's a good option to use instead of "kexec session".
> 
>> +
>> +/ {
>> +	chosen {
>> +		linux,ima-kexec-buffer = <0x9 0x82000000 0x0 0x00008000>;
>> +	};
>> +};
>> +
>> +This porperty does not represent real hardware, but the memory allocated for
>> +carrying the IMA measurement logs. The address and the suze are expressed in
>> +#address-cells and #size-cells, respectively of the root node.
> 
> 
I will update the descriptions and ack the comments/changes in the 
patches as well.

Thankyou,
Prakhar Srivastava
> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 

^ permalink raw reply

* Re: [V2 PATCH 1/3] Refactoring powerpc code for carrying over IMA measurement logs, to move non architecture specific code to security/ima.
From: Prakhar Srivastava @ 2020-07-13 20:30 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: linux-arm-kernel, linux-kernel, linuxppc-dev, devicetree,
	linux-integrity, linux-security-module, catalin.marinas, will,
	mpe, benh, paulus, robh+dt, frowand.list, zohar, dmitry.kasatkin,
	jmorris, serge, pasha.tatashin, allison, kstewart,
	takahiro.akashi, tglx, vincenzo.frascino, mark.rutland, masahiroy,
	james.morse, bhsharma, mbrugger, hsinyi, tao.li, christophe.leroy,
	gregkh, nramas, tusharsu, balajib
In-Reply-To: <87o8per3m0.fsf@morokweng.localdomain>



On 6/19/20 5:19 PM, Thiago Jung Bauermann wrote:
> 
> Prakhar Srivastava <prsriva@linux.microsoft.com> writes:
> 
>> Powerpc has support to carry over the IMA measurement logs. Refatoring the
>> non-architecture specific code out of arch/powerpc and into security/ima.
>>
>> The code adds support for reserving and freeing up of memory for IMA measurement
>> logs.
> 
> Last week, Mimi provided this feedback:
> 
> "From your patch description, this patch should be broken up.  Moving
> the non-architecture specific code out of powerpc should be one patch.
>   Additional support should be in another patch.  After each patch, the
> code should work properly."
> 
> That's not what you do here. You move the code, but you also make other
> changes at the same time. This has two problems:
> 
> 1. It makes the patch harder to review, because it's very easy to miss a
>     change.
> 
> 2. If in the future a git bisect later points to this patch, it's not
>     clear whether the problem is because of the code movement, or because
>     of the other changes.
> 
> When you move code, ideally the patch should only make the changes
> necessary to make the code work at its new location. The patch which
> does code movement should not cause any change in behavior.
> 
> Other changes should go in separate patches, either before or after the
> one moving the code.
> 
> More comments below.
> 
Hi Thiago,

Apologies for the delayed response i was away for a few days.
I am working on breaking up the changes so that its easier to review and 
update as well.

Thanks,
Prakhar Srivastava

>>
>> ---
>>   arch/powerpc/include/asm/ima.h     |  10 ---
>>   arch/powerpc/kexec/ima.c           | 126 ++---------------------------
>>   security/integrity/ima/ima_kexec.c | 116 ++++++++++++++++++++++++++
>>   3 files changed, 124 insertions(+), 128 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/ima.h b/arch/powerpc/include/asm/ima.h
>> index ead488cf3981..c29ec86498f8 100644
>> --- a/arch/powerpc/include/asm/ima.h
>> +++ b/arch/powerpc/include/asm/ima.h
>> @@ -4,15 +4,6 @@
>>
>>   struct kimage;
>>
>> -int ima_get_kexec_buffer(void **addr, size_t *size);
>> -int ima_free_kexec_buffer(void);
>> -
>> -#ifdef CONFIG_IMA
>> -void remove_ima_buffer(void *fdt, int chosen_node);
>> -#else
>> -static inline void remove_ima_buffer(void *fdt, int chosen_node) {}
>> -#endif
>> -
>>   #ifdef CONFIG_IMA_KEXEC
>>   int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long load_addr,
>>   			      size_t size);
>> @@ -22,7 +13,6 @@ int setup_ima_buffer(const struct kimage *image, void *fdt, int chosen_node);
>>   static inline int setup_ima_buffer(const struct kimage *image, void *fdt,
>>   				   int chosen_node)
>>   {
>> -	remove_ima_buffer(fdt, chosen_node);
>>   	return 0;
>>   }
> 
> This is wrong. Even if the currently running kernel doesn't have
> CONFIG_IMA_KEXEC, it should remove the IMA buffer property and memory
> reservation from the FDT that is being prepared for the next kernel.
> 
> This is because the IMA kexec buffer is useless for the next kernel,
> regardless of whether the current kernel supports CONFIG_IMA_KEXEC or
> not. Keeping it around would be a waste of memory.
> 
I will keep it in my next revision.
My understanding was the reserved memory is freed and property removed 
when IMA loads the logs on init. During setup_fdt in kexec, a duplicate 
copy of the dt is used, but memory still needs to be allocated, thus the 
property itself indicats presence of reserved memory.


>> @@ -179,13 +64,18 @@ int setup_ima_buffer(const struct kimage *image, void *fdt, int chosen_node)
>>   	int ret, addr_cells, size_cells, entry_size;
>>   	u8 value[16];
>>
>> -	remove_ima_buffer(fdt, chosen_node);
> 
> This is wrong, for the same reason stated above.
> 
>>   	if (!image->arch.ima_buffer_size)
>>   		return 0;
>>
>> -	ret = get_addr_size_cells(&addr_cells, &size_cells);
>> -	if (ret)
>> +	ret = fdt_address_cells(fdt, chosen_node);
>> +	if (ret < 0)
>> +		return ret;
>> +	addr_cells = ret;
>> +
>> +	ret = fdt_size_cells(fdt, chosen_node);
>> +	if (ret < 0)
>>   		return ret;
>> +	size_cells = ret;
>>
>>   	entry_size = 4 * (addr_cells + size_cells);
>>
> 
> I liked this change. Thanks! I agree it's better to use
> fdt_address_cells() and fdt_size_cells() here.
> 
> But it should be in a separate patch. Either before or after the one
> moving the code.
> 
>> diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c
>> index 121de3e04af2..e1e6d6154015 100644
>> --- a/security/integrity/ima/ima_kexec.c
>> +++ b/security/integrity/ima/ima_kexec.c
>> @@ -10,8 +10,124 @@
>>   #include <linux/seq_file.h>
>>   #include <linux/vmalloc.h>
>>   #include <linux/kexec.h>
>> +#include <linux/of.h>
>> +#include <linux/memblock.h>
>> +#include <linux/libfdt.h>
>>   #include "ima.h"
>>
>> +static int get_addr_size_cells(int *addr_cells, int *size_cells)
>> +{
>> +	struct device_node *root;
>> +
>> +	root = of_find_node_by_path("/");
>> +	if (!root)
>> +		return -EINVAL;
>> +
>> +	*addr_cells = of_n_addr_cells(root);
>> +	*size_cells = of_n_size_cells(root);
>> +
>> +	of_node_put(root);
>> +
>> +	return 0;
>> +}
>> +
>> +static int do_get_kexec_buffer(const void *prop, int len, unsigned long *addr,
>> +			       size_t *size)
>> +{
>> +	int ret, addr_cells, size_cells;
>> +
>> +	ret = get_addr_size_cells(&addr_cells, &size_cells);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (len < 4 * (addr_cells + size_cells))
>> +		return -ENOENT;
>> +
>> +	*addr = of_read_number(prop, addr_cells);
>> +	*size = of_read_number(prop + 4 * addr_cells, size_cells);
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * ima_get_kexec_buffer - get IMA buffer from the previous kernel
>> + * @addr:	On successful return, set to point to the buffer contents.
>> + * @size:	On successful return, set to the buffer size.
>> + *
>> + * Return: 0 on success, negative errno on error.
>> + */
>> +int ima_get_kexec_buffer(void **addr, size_t *size)
>> +{
>> +	int ret, len;
>> +	unsigned long tmp_addr;
>> +	size_t tmp_size;
>> +	const void *prop;
>> +
>> +	prop = of_get_property(of_chosen, "linux,ima-kexec-buffer", &len);
>> +	if (!prop)
>> +		return -ENOENT;
>> +
>> +	ret = do_get_kexec_buffer(prop, len, &tmp_addr, &tmp_size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	*addr = __va(tmp_addr);
>> +	*size = tmp_size;
>> +
>> +	return 0;
>> +}
> 
> The functions above were moved without being changed. Good.
> 
>> +/**
>> + * ima_free_kexec_buffer - free memory used by the IMA buffer
>> + */
>> +int ima_free_kexec_buffer(void)
>> +{
>> +	int ret;
>> +	unsigned long addr;
>> +	size_t size;
>> +	struct property *prop;
>> +
>> +	prop = of_find_property(of_chosen, "linux,ima-kexec-buffer", NULL);
>> +	if (!prop)
>> +		return -ENOENT;
>> +
>> +	ret = do_get_kexec_buffer(prop->value, prop->length, &addr, &size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = of_remove_property(of_chosen, prop);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return memblock_free(__pa(addr), size);
> 
> Here you added a __pa() call. Do you store a virtual address in
> linux,ima-kexec-buffer property? Doesn't it make more sense to store a
> physical address?
> 
trying to minimize the changes here as do_get_kexec_buffer return the va.
I will refactor this to remove the double translation.

> Even if making this change is the correct thing to do, it should be a
> separate patch, unless it can't be avoided. And if that is the case,
> then it should be explained in the commit message.
> 
>> +
>> +}
>> +
>> +/**
>> + * remove_ima_buffer - remove the IMA buffer property and reservation from @fdt
>> + *
>> + * The IMA measurement buffer is of no use to a subsequent kernel, so we always
>> + * remove it from the device tree.
>> + */
>> +void remove_ima_buffer(void *fdt, int chosen_node)
>> +{
>> +	int ret, len;
>> +	unsigned long addr;
>> +	size_t size;
>> +	const void *prop;
>> +
>> +	prop = fdt_getprop(fdt, chosen_node, "linux,ima-kexec-buffer", &len);
>> +	if (!prop)
>> +		return;
>> +
>> +	do_get_kexec_buffer(prop, len, &addr, &size);
>> +	ret = fdt_delprop(fdt, chosen_node, "linux,ima-kexec-buffer");
>> +	if (ret < 0)
>> +		return;
>> +
>> +	memblock_free(addr, size);
>> +}
> 
> Here is another function that changed when moved. This one I know to be
> wrong. You're confusing the purposes of remove_ima_buffer() and
> ima_free_kexec_buffer().
> 
> You did send me a question about them nearly three weeks ago which I
> only answered today, so I apologize. Also, their names could more
> clearly reflect their differences, so it's bad naming on my part.
> 
> With IMA kexec buffers, there are two kernels (and thus their two
> respective, separate device trees) to be concerned about:
> 
> 1. the currently running kernel, which uses the live device tree
> (accessed with the of_* functions) and the memblock subsystem;
> 
> 2. the kernel which is being loaded by kexec, which will use the FDT
> blob being passed around as argument to these functions, and the memory
> reservations in the memory reservation table of the FDT blob.
> 
> ima_free_kexec_buffer() is used by IMA in the currently running kernel.
> Therefore the device tree it is concerned about is the live one, and
> thus uses the of_* functions to access it. And uses memblock to change
> the memory reservation.
> 
> remove_ima_buffer() on the other hand is used by the kexec code to
> prepare an FDT blob for the kernel that is being loaded. It should not
> make any changes to live device tree, nor to memblock allocations. It
> should only make changes to the FDT blob.
> 
Thank you for this, greatly appreciate clearing my misunderstandings.
> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 

^ permalink raw reply

* [PATCH ghak84 v4] audit: purge audit_log_string from the intra-kernel audit API
From: Richard Guy Briggs @ 2020-07-13 19:51 UTC (permalink / raw)
  To: Linux-Audit Mailing List, LKML, Linux Security Module list
  Cc: Paul Moore, eparis, john.johansen, Richard Guy Briggs

audit_log_string() was inteded to be an internal audit function and
since there are only two internal uses, remove them.  Purge all external
uses of it by restructuring code to use an existing audit_log_format()
or using audit_log_format().

Please see the upstream issue
https://github.com/linux-audit/audit-kernel/issues/84

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
Passes audit-testsuite.

Changelog:
v4
- use double quotes in all replaced audit_log_string() calls

v3
- fix two warning: non-void function does not return a value in all control paths
	Reported-by: kernel test robot <lkp@intel.com>

v2
- restructure to piggyback on existing audit_log_format() calls, checking quoting needs for each.

v1 Vlad Dronov
- https://github.com/nefigtut/audit-kernel/commit/dbbcba46335a002f44b05874153a85b9cc18aebf

 include/linux/audit.h     |  5 -----
 kernel/audit.c            |  4 ++--
 security/apparmor/audit.c | 10 ++++------
 security/apparmor/file.c  | 25 +++++++------------------
 security/apparmor/ipc.c   | 46 +++++++++++++++++++++++-----------------------
 security/apparmor/net.c   | 14 ++++++++------
 security/lsm_audit.c      |  4 ++--
 7 files changed, 46 insertions(+), 62 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 523f77494847..b3d859831a31 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -694,9 +694,4 @@ static inline bool audit_loginuid_set(struct task_struct *tsk)
 	return uid_valid(audit_get_loginuid(tsk));
 }
 
-static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
-{
-	audit_log_n_string(ab, buf, strlen(buf));
-}
-
 #endif
diff --git a/kernel/audit.c b/kernel/audit.c
index 8c201f414226..a2f3e34aa724 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2080,13 +2080,13 @@ void audit_log_d_path(struct audit_buffer *ab, const char *prefix,
 	/* We will allow 11 spaces for ' (deleted)' to be appended */
 	pathname = kmalloc(PATH_MAX+11, ab->gfp_mask);
 	if (!pathname) {
-		audit_log_string(ab, "<no_memory>");
+		audit_log_format(ab, "\"<no_memory>\"");
 		return;
 	}
 	p = d_path(path, pathname, PATH_MAX+11);
 	if (IS_ERR(p)) { /* Should never happen since we send PATH_MAX */
 		/* FIXME: can we save some information here? */
-		audit_log_string(ab, "<too_long>");
+		audit_log_format(ab, "\"<too_long>\"");
 	} else
 		audit_log_untrustedstring(ab, p);
 	kfree(pathname);
diff --git a/security/apparmor/audit.c b/security/apparmor/audit.c
index 597732503815..f7e97c7e80f3 100644
--- a/security/apparmor/audit.c
+++ b/security/apparmor/audit.c
@@ -57,18 +57,16 @@ static void audit_pre(struct audit_buffer *ab, void *ca)
 	struct common_audit_data *sa = ca;
 
 	if (aa_g_audit_header) {
-		audit_log_format(ab, "apparmor=");
-		audit_log_string(ab, aa_audit_type[aad(sa)->type]);
+		audit_log_format(ab, "apparmor=\"%s\"",
+				 aa_audit_type[aad(sa)->type]);
 	}
 
 	if (aad(sa)->op) {
-		audit_log_format(ab, " operation=");
-		audit_log_string(ab, aad(sa)->op);
+		audit_log_format(ab, " operation=\"%s\"", aad(sa)->op);
 	}
 
 	if (aad(sa)->info) {
-		audit_log_format(ab, " info=");
-		audit_log_string(ab, aad(sa)->info);
+		audit_log_format(ab, " info=\"%s\"", aad(sa)->info);
 		if (aad(sa)->error)
 			audit_log_format(ab, " error=%d", aad(sa)->error);
 	}
diff --git a/security/apparmor/file.c b/security/apparmor/file.c
index 9a2d14b7c9f8..92acf9a49405 100644
--- a/security/apparmor/file.c
+++ b/security/apparmor/file.c
@@ -35,20 +35,6 @@ static u32 map_mask_to_chr_mask(u32 mask)
 }
 
 /**
- * audit_file_mask - convert mask to permission string
- * @buffer: buffer to write string to (NOT NULL)
- * @mask: permission mask to convert
- */
-static void audit_file_mask(struct audit_buffer *ab, u32 mask)
-{
-	char str[10];
-
-	aa_perm_mask_to_str(str, sizeof(str), aa_file_perm_chrs,
-			    map_mask_to_chr_mask(mask));
-	audit_log_string(ab, str);
-}
-
-/**
  * file_audit_cb - call back for file specific audit fields
  * @ab: audit_buffer  (NOT NULL)
  * @va: audit struct to audit values of  (NOT NULL)
@@ -57,14 +43,17 @@ static void file_audit_cb(struct audit_buffer *ab, void *va)
 {
 	struct common_audit_data *sa = va;
 	kuid_t fsuid = current_fsuid();
+	char str[10];
 
 	if (aad(sa)->request & AA_AUDIT_FILE_MASK) {
-		audit_log_format(ab, " requested_mask=");
-		audit_file_mask(ab, aad(sa)->request);
+		aa_perm_mask_to_str(str, sizeof(str), aa_file_perm_chrs,
+				    map_mask_to_chr_mask(aad(sa)->request));
+		audit_log_format(ab, " requested_mask=\"%s\"", str);
 	}
 	if (aad(sa)->denied & AA_AUDIT_FILE_MASK) {
-		audit_log_format(ab, " denied_mask=");
-		audit_file_mask(ab, aad(sa)->denied);
+		aa_perm_mask_to_str(str, sizeof(str), aa_file_perm_chrs,
+				    map_mask_to_chr_mask(aad(sa)->denied));
+		audit_log_format(ab, " denied_mask=\"%s\"", str);
 	}
 	if (aad(sa)->request & AA_AUDIT_FILE_MASK) {
 		audit_log_format(ab, " fsuid=%d",
diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
index 4ecedffbdd33..fe36d112aad9 100644
--- a/security/apparmor/ipc.c
+++ b/security/apparmor/ipc.c
@@ -20,25 +20,23 @@
 
 /**
  * audit_ptrace_mask - convert mask to permission string
- * @buffer: buffer to write string to (NOT NULL)
  * @mask: permission mask to convert
+ *
+ * Returns: pointer to static string
  */
-static void audit_ptrace_mask(struct audit_buffer *ab, u32 mask)
+static const char *audit_ptrace_mask(u32 mask)
 {
 	switch (mask) {
 	case MAY_READ:
-		audit_log_string(ab, "read");
-		break;
+		return "read";
 	case MAY_WRITE:
-		audit_log_string(ab, "trace");
-		break;
+		return "trace";
 	case AA_MAY_BE_READ:
-		audit_log_string(ab, "readby");
-		break;
+		return "readby";
 	case AA_MAY_BE_TRACED:
-		audit_log_string(ab, "tracedby");
-		break;
+		return "tracedby";
 	}
+	return "";
 }
 
 /* call back to audit ptrace fields */
@@ -47,12 +45,12 @@ static void audit_ptrace_cb(struct audit_buffer *ab, void *va)
 	struct common_audit_data *sa = va;
 
 	if (aad(sa)->request & AA_PTRACE_PERM_MASK) {
-		audit_log_format(ab, " requested_mask=");
-		audit_ptrace_mask(ab, aad(sa)->request);
+		audit_log_format(ab, " requested_mask=\"%s\"",
+				 audit_ptrace_mask(aad(sa)->request));
 
 		if (aad(sa)->denied & AA_PTRACE_PERM_MASK) {
-			audit_log_format(ab, " denied_mask=");
-			audit_ptrace_mask(ab, aad(sa)->denied);
+			audit_log_format(ab, " denied_mask=\"%s\"",
+					 audit_ptrace_mask(aad(sa)->denied));
 		}
 	}
 	audit_log_format(ab, " peer=");
@@ -142,16 +140,18 @@ static inline int map_signal_num(int sig)
 }
 
 /**
- * audit_file_mask - convert mask to permission string
- * @buffer: buffer to write string to (NOT NULL)
+ * audit_signal_mask - convert mask to permission string
  * @mask: permission mask to convert
+ *
+ * Returns: pointer to static string
  */
-static void audit_signal_mask(struct audit_buffer *ab, u32 mask)
+static const char *audit_signal_mask(u32 mask)
 {
 	if (mask & MAY_READ)
-		audit_log_string(ab, "receive");
+		return "receive";
 	if (mask & MAY_WRITE)
-		audit_log_string(ab, "send");
+		return "send";
+	return "";
 }
 
 /**
@@ -164,11 +164,11 @@ static void audit_signal_cb(struct audit_buffer *ab, void *va)
 	struct common_audit_data *sa = va;
 
 	if (aad(sa)->request & AA_SIGNAL_PERM_MASK) {
-		audit_log_format(ab, " requested_mask=");
-		audit_signal_mask(ab, aad(sa)->request);
+		audit_log_format(ab, " requested_mask=\"%s\"",
+				 audit_signal_mask(aad(sa)->request));
 		if (aad(sa)->denied & AA_SIGNAL_PERM_MASK) {
-			audit_log_format(ab, " denied_mask=");
-			audit_signal_mask(ab, aad(sa)->denied);
+			audit_log_format(ab, " denied_mask=\"%s\"",
+					 audit_signal_mask(aad(sa)->denied));
 		}
 	}
 	if (aad(sa)->signal == SIGUNKNOWN)
diff --git a/security/apparmor/net.c b/security/apparmor/net.c
index d8afc39f663a..fa0e85568450 100644
--- a/security/apparmor/net.c
+++ b/security/apparmor/net.c
@@ -72,16 +72,18 @@ void audit_net_cb(struct audit_buffer *ab, void *va)
 {
 	struct common_audit_data *sa = va;
 
-	audit_log_format(ab, " family=");
 	if (address_family_names[sa->u.net->family])
-		audit_log_string(ab, address_family_names[sa->u.net->family]);
+		audit_log_format(ab, " family=\"%s\"",
+				 address_family_names[sa->u.net->family]);
 	else
-		audit_log_format(ab, "\"unknown(%d)\"", sa->u.net->family);
-	audit_log_format(ab, " sock_type=");
+		audit_log_format(ab, " family=\"unknown(%d)\"",
+				 sa->u.net->family);
 	if (sock_type_names[aad(sa)->net.type])
-		audit_log_string(ab, sock_type_names[aad(sa)->net.type]);
+		audit_log_format(ab, " sock_type=\"%s\"",
+				 sock_type_names[aad(sa)->net.type]);
 	else
-		audit_log_format(ab, "\"unknown(%d)\"", aad(sa)->net.type);
+		audit_log_format(ab, " sock_type=\"unknown(%d)\"",
+				 aad(sa)->net.type);
 	audit_log_format(ab, " protocol=%d", aad(sa)->net.protocol);
 
 	if (aad(sa)->request & NET_PERMS_MASK) {
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 7c555621c2bd..53d0d183db8f 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -432,8 +432,8 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 				 a->u.ibendport->port);
 		break;
 	case LSM_AUDIT_DATA_LOCKDOWN:
-		audit_log_format(ab, " lockdown_reason=");
-		audit_log_string(ab, lockdown_reasons[a->u.reason]);
+		audit_log_format(ab, " lockdown_reason=\"%s\"",
+				 lockdown_reasons[a->u.reason]);
 		break;
 	} /* switch (a->type) */
 }
-- 
1.8.3.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox