netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 next 0/5] Improve Module autoloading infrastructure
@ 2017-11-27 17:18 Djalal Harouni
  2017-11-27 17:18 ` [PATCH v5 next 1/5] modules:capabilities: add request_module_cap() Djalal Harouni
                   ` (5 more replies)
  0 siblings, 6 replies; 49+ messages in thread
From: Djalal Harouni @ 2017-11-27 17:18 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, linux-kernel, linux-security-module,
	kernel-hardening
  Cc: Jonathan Corbet, Ingo Molnar, David S. Miller, netdev,
	Peter Zijlstra, Linus Torvalds, Djalal Harouni

Hi List,

This is v5 of automatic module load restriction series. I have renamed
it to 'Improve Module autoloading infrastructure' for abvious reasons:

* It is more of an infrastructure change now.
* Reduce the use of word 'restriction' and use 'improving' maybe easy
  to sell  ?

These patches are based on next-20171127

Credits:
========
The idea was inspired from grsecurity 'GRKERNSEC_MODHARDEN' config option.
However, upstream Linux implementation is more focused on the run-time
behavior with a three mode switch, plus upstream version solves Linux
usecases with a per process tree flag that can be used in containers,
sandboxes, etc to block direct implicit auto-load operations.
This implementation does not share anything with grsecurity.

The first RFC was as LSM, Kees Cook and others suggested that it can be
turned as a core kernel feature and after some months, they were right.


Previous versions:
==================

v1 RFC as an LSM:
http://www.openwall.com/lists/kernel-hardening/2017/02/02/21

v2 RFC as an LSM:
http://www.openwall.com/lists/kernel-hardening/2017/04/09/1

v3 core feature as requested by kernel maintainers:
https://lkml.org/lkml/2017/4/19/1086

v4:
https://lkml.org/lkml/2017/5/22/312

All previous feedback from: Andy Lutomirski, Solar Designer, Kees Cook,
Rusty Russell, Ben Hutchings and Serge Hallyn is fixed in this series.

Please check Changelog for details. Thank you for the feedback.


==============

Currently, an explicit call to load or unload kernel modules require
CAP_SYS_MODULE capability. However unprivileged users have always been
able to load some modules using the implicit auto-load operation. An
automatic module loading happens when programs request a kernel feature
from a module that is not loaded. In order to satisfy userspace, the
kernel then automatically load all these required modules.

Historically, the kernel was always able to automatically load modules
if they are not blacklisted. This is one of the most important and
transparent operations of Linux, it allows to provide numerous other
features as they are needed which is crucial for a better user experience.
However, as Linux is popular now and used for different appliances some
of these may need to control such operations. For such systems, recent
needs showed that in some cases allowing to control automatic module
loading is as important as the operation itself. Restricting unprivileged
programs or attackers that abuse this feature to load unused modules or
modules that contain bugs is a significant security measure.

This allows administrators or some special programs to have the
appropriate time to update and deny module autoloading in advance, then
blacklist the corresponding ones. Not doing so may affect the global state
of the machine, especially containers where some apps are moved from one
context to another and not having such mechanisms may allow to expose
and exploit the vulnerable parts to escape the container sandbox.

Embedded or IoT devices also started to ship as containers using generic
distros, some vendors do not have the appropriate time to make their own
OS, hence, using base images is getting popular. These setups may include
unnecessary modules that the final applications will not need. Untrusted
access may abuse the module auto-load feature to expose vulnerabilities.

As every code contains bugs or vulnerabilties, the following
vulnerabilities that affected some features that are often compiled as
modules could have been completely blocked, by a better mechanism that
handles module autoloading, especially if the system does not need them.

Past months:
* DCCP use after free CVE-2017-6074 [1] [2]
  Unprivileged to local root.

* XFRM framework CVE-2017-7184 [3]
  As advertised it seems it was used to break Ubuntu on a security
  contest.

* n_hldc CVE-2017-2636 [4] [5]
  Local privilege escalation. 

* L2TPv3 CVE-2016-10200

The list is longer.

To improve the current status, this series tries to re-work how module
autoloading is performed by adding two new properties:
"modules_autoload_mode" sysctl flag, and a per-task one.

The sysctl controls modules auto-load feature and complements
"modules_disabled" which apply to all modules operations. This new flag
allows to control only automatic module loading and if it is allowed or
not, aligning in the process the implicit operation with the explicit one
where both now are covered by capabilities checks.

The three modes that "modules_autoload_mode" support allow to provide
restrictions on automatic module loading without breaking user
experience.

The sysctl flag is available at "/proc/sys/kernel/modules_autoload_mode"

When modules_autoload_mode is set to (0), the default, there are no
restrictions.

When modules_autoload_mode is set to (1), processes must have
CAP_SYS_MODULE to be able to trigger a module auto-load operation,
or CAP_NET_ADMIN for modules with a 'netdev-%s' alias, or other
capabilities for specific aliased modules.

When modules_autoload_mode is set to (2), automatic module loading
is disabled for all.

Notes on relation between "modules_disabled=0" and
"modules_autoload_mode=2":
1) Once "modules_disabled=1" set, it needs a reboot to undo the
setting.

2) Restricting automatic module loading does not interfere with
explicit module load or unload operations.

3) New features provided by modules can be made available without
rebooting the system.

4) A bad version of a module can be unloaded and replaced with a
better one without rebooting the system.

==========================

The patches also support process trees, containers, and sandboxes by
providing an inherited per-task "modules_autoload_mode" flag that cannot be
re-enabled once disabled. This offers the following advantages:

1) Automatic module loading is still available to the rest of the
system.

2) It is easy to use in containers and sandboxes. DCCP example could
have been used to escape containers. The XFRM framework CVE-2017-7184
needs CAP_NET_ADMIN, but attackers may start to target CAP_NET_ADMIN,
a per-task flag will make it harder.

3) Suitable for desktop and more interactive Linux systems.

4) Will allow in future to implement a per user policy.
The user database format is old and not extensible, as discussed maybe
with a modern format we may achieve the following:

User=djalal
NewKernelFeatures=yes

Which means that that interactive user will be allowed to load extra
Linux features. Others, volatile accounts or guests can be easily
blocked from doing so.

5) CAP_NET_ADMIN is useful, it handles lot of operations, at same time it
started to look more like CAP_SYS_ADMIN which is overloaded. We need
CAP_NET_ADMIN, containers need it, but at same time maybe we do not
want programs running with it to load 'netdev-%s' modules. Having an
extra per-task flag allow to discharge a bit CAP_NET_ADMIN and clearly
target automatic module loading operations.


Usage:
        prctl(PR_SET_MODULES_AUTOLOAD_mode, value, 0, 0, 0).

The per-task "modules_autoload_mode" supports the following values:
0       There are no restrictions, usually the default unless set
        by parent.
1       The task must have CAP_SYS_MODULE to be able to trigger a
        module auto-load operation, or CAP_NET_ADMIN for modules with
        a 'netdev-%s' alias.
2       Automatic modules loading is disabled for the current task.

The mode may only be increased, never decreased, thus ensuring that once
applied, processes can never relax their setting. This make it easy for
developers and users to handle.

Note that even if the per-task "modules_autoload_mode" allows to auto-load
the corresponding modules, automatic module loading may still fail due to
the global sysctl "modules_autoload_mode". For more details please see
Documentation/sysctl/kernel.txt, section "modules_autoload_mode".

When a request to a kernel module is denied, the module name with the
corresponding process name and its pid are logged. Administrators can use
such information to explicitly load the appropriate modules.


# Testing:

##) Global sysctl "modules_autoload_mode"

Before patch:
$ lsmod | grep ipip -
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
$ lsmod | grep ipip -
ipip                   16384  0
tunnel4                16384  1 ipip
ip_tunnel              28672  1 ipip
$ cat /proc/sys/kernel/modules_autoload_mode
0

After patch:
$ lsmod | grep ipip -
# echo 2 > /proc/sys/kernel/modules_autoload_mode
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
add tunnel "tunl0" failed: No such device
$ dmesg
...
[ 1876.378389] module: automatic module loading of netdev-tunl0 by "ip"[1453] was denied
[ 1876.380994] module: automatic module loading of tunl0 by "ip"[1453] was denied
...
$ lsmod | grep ipip -


##) Per-task "modules_autoload_mode" flag

Here we use DCCP as an example since the public PoC was against it.

The following tool can be used to test the feature:
https://gist.githubusercontent.com/tixxdz/cf567e4275714199a32c4a80de4ea63a/raw/13e52ea0ee65772871bcf10fb6c94fedd349f5c1/pr_modules_autoload_mode_test.c

DCCP use after free CVE-2017-6074 (unprivileged to local root):
The code path can be triggered by unprivileged, using the trigger.c
program for DCCP use after free [2] and that was fixed by
commit 5edabca9d4cff7f "dccp: fix freeing skb too early for IPV6_RECVPKTINFO".

Before patch:
$ lsmod | grep dccp
$ strace ./dccp_trigger
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = 3
...
$ lsmod | grep dccp
dccp_ipv6              24576  5
dccp_ipv4              24576  5 dccp_ipv6
dccp                  102400  2 dccp_ipv6,dccp_ipv4
$ grep Modules /proc/self/status
ModulesAutoloadMode:    0


After:
Set task "modules_autoload_mode" to 1, privileged mode.
$ lsmod | grep dccp
$ ./pr_set_no_new_privs
$ grep NoNewPrivs /proc/self/status
NoNewPrivs:     1
$ ./pr_modules_autoload_mode_test 1
$ grep Modules /proc/self/status
ModulesAutoloadMode:    1
$ strace ./dccp_trigger
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = -1 ESOCKTNOSUPPORT (Socket type not supported)
...
$ lsmod | grep dccp
$ dmesg
...
[ 4662.171994] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1759] was denied
[ 4662.177284] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1759] was denied
[ 4662.180181] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1759] was denied
[ 4662.181709] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1759] was denied


Now task "modules_autoload_mode" to 2, disabled mode.
$ lsmod | grep dccp
$ grep Modules /proc/self/status
ModulesAutoloadMode:    0
$ su - root
 # ./pr_modules_autoload_mode_test 2
 # grep Modules /proc/self/status
ModulesAutoloadMode:    2
 # strace ./dccp_trigger

...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = -1 ESOCKTNOSUPPORT (Socket type not supported)
...
...
[ 5154.218740] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1873] was denied
[ 5154.219828] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1873] was denied
[ 5154.221814] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1873] was denied
[ 5154.222731] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1873] was denied


As showed, this blocks automatic module loading per-task. This allows to
provide a usable system, where only some sandboxed apps or containers will be
restricted to trigger automatic module loading, other parts of the
system can continue to use the feature as it is which is the case of the
desktop and userfriendly machines.

[1] http://www.openwall.com/lists/oss-security/2017/02/22/3
[2] https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-6074
[3] http://www.openwall.com/lists/oss-security/2017/03/29/2
[4] http://www.openwall.com/lists/oss-security/2017/03/07/6 
[5] https://a13xp0p0v.github.io/2017/03/24/CVE-2017-2636.html


Finally we already have a use case for the prctl() interface to enforce
some systemd services, in docker and other containers, also in some
sandboxes, etc.


# Changes since v4:
*) Removed the property that when the "modules_autoload_mode" sysctl is
   set to "2" disabled mode, then that value is pinned and we can not
   revert it. Now you can undo the value if you have the appropriate
   privileges as it was suggested.

   Suggested-by: Solar Designer <solar@openwall.com>
   Suggested-by: Andy Lutomirski <luto@kernel.org>
   https://lkml.org/lkml/2017/5/22/330

*) Added request_module_cap() to take '@required_cap' and '@prefix'
   arguments that will be used to check if module autoloading is allowed
   or not.

   Suggested-by: Kees Cook <keescook@chromium.org>

*) More cleanups and documentation.


# Changes since v3:
*) Renamed the sysctl from "modules_autoload" to "modules_autoload_mode"
   and the prctl() operation flag to "PR_{SET|GET}_MODULES_AUTOLOAD_MODE"
   as it was requested.

   Suggested-by: Ben Hutchings <ben.hutchings@codethink.co.uk>


*) Updated __request_module() to take the capability that may allow to
   auto-load a module with the appropriate alias. This way we never
   parse aliases as it was requested by Rusty Russell. Security and
   SELinux hooks were updated too.

   Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
   https://lkml.org/lkml/2017/4/24/7


*) Updated code to set prctl(PR_SET_MODULES_AUTOLOAD_MODE, 1, 0, 0, 0),
   the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) before or run with
   CAP_SYS_ADMIN privileges in its namespace. If these are not true,
   -EACCES will be returned.

   Suggested-by: Andy Lutomirski <luto@amacapital.net>
   https://lkml.org/lkml/2017/4/22/22


*) Remove task initialization logic and other cleanups
   Suggested-by: Kees Cook <keescook@chromium.org>


*) Other code and documentation cleanups.
   

# Changes since v2:
*) Implemented as a core kernel feature inside capabilities subsystem
*) Renamed sysctl to "modules_autoload" to align with "modules_disabled"

   Suggested-by: Kees Cook <keescook@chromium.org>

*) Improved documentation.
*) Removed unused code.


# Changes since v1:
*) Renamed module to ModAutoRestrict
*) Improved documentation to explicity refer to module autoloading.
*) Switched to use the new task_security_alloc() hook.
*) Switched from rhash tables to use task->security since it is in
   linux-security/next branch now.
*) Check all parameters passed to prctl() syscall.
*) Many other bug fixes and documentation improvements.


Patches (5) Djalal Harouni:
 (1/5) modules:capabilities: add request_module_cap()
 (2/5) modules:capabilities: add cap_kernel_module_request() permission check
 (3/5) modules:capabilities: automatic module loading restriction
 (4/5) modules:capabilities: add a per-task modules auto-load mode
 (5/5) net: modules: use request_module_cap() to load 'netdev-%s' modules

 Documentation/filesystems/proc.txt                 |   3 +
 Documentation/sysctl/kernel.txt                    |  54 ++++++++
 Documentation/userspace-api/index.rst              |   1 +
 .../userspace-api/modules_autoload_mode.rst        | 116 ++++++++++++++++
 fs/proc/array.c                                    |   6 +
 include/linux/init_task.h                          |   8 ++
 include/linux/kmod.h                               |  65 ++++++++-
 include/linux/lsm_hooks.h                          |   6 +-
 include/linux/module.h                             |  41 +++++-
 include/linux/sched.h                              |   5 +
 include/linux/security.h                           |  11 +-
 include/uapi/linux/prctl.h                         |   8 ++
 kernel/kmod.c                                      |  29 +++-
 kernel/module.c                                    | 153 +++++++++++++++++++++
 kernel/sysctl.c                                    |  28 ++++
 net/core/dev_ioctl.c                               |   4 +-
 security/commoncap.c                               |  62 +++++++++
 security/security.c                                |   6 +-
 security/selinux/hooks.c                           |   3 +-
 19 files changed, 587 insertions(+), 22 deletions(-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-27 17:18 [PATCH v5 next 0/5] Improve Module autoloading infrastructure Djalal Harouni
@ 2017-11-27 17:18 ` Djalal Harouni
  2017-11-27 18:48   ` Randy Dunlap
  2017-11-28 19:14   ` Luis R. Rodriguez
  2017-11-27 17:18 ` [PATCH v5 next 2/5] modules:capabilities: add cap_kernel_module_request() permission check Djalal Harouni
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 49+ messages in thread
From: Djalal Harouni @ 2017-11-27 17:18 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, linux-kernel, linux-security-module,
	kernel-hardening
  Cc: Jonathan Corbet, Ingo Molnar, David S. Miller, netdev,
	Peter Zijlstra, Linus Torvalds, Djalal Harouni

This is a preparation patch to improve the module auto-load
infrastructure.

We need this patch to have more control on module auto-load operations.
The operation by default is allowed unless enduser or the calling code
requests that we need to perform futher permission checks.

With this change subsystems will be able to decide if module auto-load
feature first will have to do a capability check and load the module if
the permission check succeeds or deny the operation.

As an example "netdev-%s" modules, they are allowed to be loaded if
CAP_NET_ADMIN is set. Therefore, in order to not break this assumption,
and allow userspace to load "netdev-%s" modules with CAP_NET_ADMIN,
we have added:

        request_module_cap(required_cap, prefix, fmt...)

This new function will take:
'@required_cap': Required capability to load the module
'@prefix': The module prefix if any, otherwise NULL
'@fmt': printf style format string for the name of the module with its
        arguments if any

ex:
        request_module_cap(CAP_NET_ADMIN, "netdev", "%s", mod);

After a discussion with Rusty Russell [1], the suggestion was to pass
the capability from request_module() to security_kernel_module_request()
for 'netdev-%s' modules that need CAP_NET_ADMIN, and after review from
Kees Cook [2] and experimenting with the code, the patch now does the
following:

* Adds the request_module_cap() function.
* Updates the __request_module() to take the "required_cap" argument
        with the "prefix".

This patch also updates SELinux which is currently the only user of
security_kernel_module_request(), the security hook now accepts
'required_cap' and 'prefix' as arguments.

Based on patch by Rusty Russell and discussion with Kees Cook:
[1] https://lkml.org/lkml/2017/4/26/735
[2] https://lkml.org/lkml/2017/5/23/775

Cc: Serge Hallyn <serge@hallyn.com>
Cc: Andy Lutomirski <luto@kernel.org>
Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 include/linux/kmod.h      | 65 ++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/lsm_hooks.h |  6 ++++-
 include/linux/security.h  |  7 +++--
 kernel/kmod.c             | 29 ++++++++++++++++-----
 security/security.c       |  6 +++--
 security/selinux/hooks.c  |  3 ++-
 6 files changed, 97 insertions(+), 19 deletions(-)

diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index 40c89ad..ccd6a1c 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -33,16 +33,67 @@
 extern char modprobe_path[]; /* for sysctl */
 /* modprobe exit status on success, -ve on error.  Return value
  * usually useless though. */
-extern __printf(2, 3)
-int __request_module(bool wait, const char *name, ...);
-#define request_module(mod...) __request_module(true, mod)
-#define request_module_nowait(mod...) __request_module(false, mod)
+extern __printf(4, 5)
+int __request_module(bool wait, int required_cap,
+		     const char *prefix, const char *name, ...);
 #define try_then_request_module(x, mod...) \
-	((x) ?: (__request_module(true, mod), (x)))
+	((x) ?: (__request_module(true, -1, NULL, mod), (x)))
 #else
-static inline int request_module(const char *name, ...) { return -ENOSYS; }
-static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; }
+static inline __printf(4, 5)
+int __request_module(bool wait, int required_cap,
+		     const char *prefix, const char *name, ...)
+{ return -ENOSYS; }
 #define try_then_request_module(x, mod...) (x)
 #endif
 
+/**
+ * request_module  Try to load a kernel module
+ *
+ * Automatically loads the request module.
+ *
+ * @mod...: The module name
+ */
+#define request_module(mod...) __request_module(true, -1, NULL, mod)
+
+#define request_module_nowait(mod...) __request_module(false, -1, NULL, mod)
+
+/**
+ * request_module_cap  Load kernel module only if the required capability is set
+ *
+ * Automatically load a module if the required capability is set and it
+ * corresponds to the appropriate subsystem that is indicated by prefix.
+ *
+ * This allows to load aliased modules like 'netdev-%s' with CAP_NET_ADMIN.
+ *
+ * ex:
+ *	request_module_cap(CAP_NET_ADMIN, "netdev", "%s", mod);
+ *
+ * @required_cap: Required capability to load the module
+ * @prefix: The module prefix if any, otherwise NULL
+ * @fmt: printf style format string for the name of the module with its
+ *       arguments if any
+ *
+ * If '@required_cap' is positive, the security subsystem will check if
+ * '@prefix' is set and if caller has the required capability then the
+ * operation is allowed.
+ * The security subsystem can not make assumption about the boundaries
+ * of other subsystems, it is their responsability to make a call with
+ * the right capability and module alias.
+ *
+ * If '@required_cap' is positive and '@prefix' is NULL then we assume
+ * that the '@required_cap' is CAP_SYS_MODULE.
+ *
+ * If '@required_cap' is negative then there are no permission checks, this
+ * is the equivalent to request_module() function.
+ *
+ * This function trust callers to pass the right capability with the
+ * appropriate prefix.
+ *
+ * Note: the permission checks may still fail, even if the required
+ * capability is negative, this is due to module loading restrictions
+ * that are controlled by the enduser.
+ */
+#define request_module_cap(required_cap, prefix, fmt...) \
+	__request_module(true, required_cap, prefix, fmt)
+
 #endif /* __LINUX_KMOD_H__ */
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 7161d8e..d898bd0 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -571,6 +571,9 @@
  *	Ability to trigger the kernel to automatically upcall to userspace for
  *	userspace to load a kernel module with the given name.
  *	@kmod_name name of the module requested by the kernel
+ *	@required_cap If positive, the required capability to automatically load
+ *	the correspondig kernel module.
+ *	@prefix The prefix of the module if any. Can be NULL.
  *	Return 0 if successful.
  * @kernel_read_file:
  *	Read a file specified by userspace.
@@ -1543,7 +1546,8 @@ union security_list_options {
 	void (*cred_transfer)(struct cred *new, const struct cred *old);
 	int (*kernel_act_as)(struct cred *new, u32 secid);
 	int (*kernel_create_files_as)(struct cred *new, struct inode *inode);
-	int (*kernel_module_request)(char *kmod_name);
+	int (*kernel_module_request)(char *kmod_name, int required_cap,
+				     const char *prefix);
 	int (*kernel_read_file)(struct file *file, enum kernel_read_file_id id);
 	int (*kernel_post_read_file)(struct file *file, char *buf, loff_t size,
 				     enum kernel_read_file_id id);
diff --git a/include/linux/security.h b/include/linux/security.h
index 73f1ef6..41e700a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -326,7 +326,8 @@ int security_prepare_creds(struct cred *new, const struct cred *old, gfp_t gfp);
 void security_transfer_creds(struct cred *new, const struct cred *old);
 int security_kernel_act_as(struct cred *new, u32 secid);
 int security_kernel_create_files_as(struct cred *new, struct inode *inode);
-int security_kernel_module_request(char *kmod_name);
+int security_kernel_module_request(char *kmod_name, int required_cap,
+				   const char *prefix);
 int security_kernel_read_file(struct file *file, enum kernel_read_file_id id);
 int security_kernel_post_read_file(struct file *file, char *buf, loff_t size,
 				   enum kernel_read_file_id id);
@@ -919,7 +920,9 @@ static inline int security_kernel_create_files_as(struct cred *cred,
 	return 0;
 }
 
-static inline int security_kernel_module_request(char *kmod_name)
+static inline int security_kernel_module_request(char *kmod_name,
+						 int required_cap,
+						 const char *prefix)
 {
 	return 0;
 }
diff --git a/kernel/kmod.c b/kernel/kmod.c
index bc6addd..679d401 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -109,6 +109,8 @@ static int call_modprobe(char *module_name, int wait)
 /**
  * __request_module - try to load a kernel module
  * @wait: wait (or not) for the operation to complete
+ * @required_cap: required capability to load the module
+ * @prefix: module prefix if any otherwise NULL
  * @fmt: printf style format string for the name of the module
  * @...: arguments as specified in the format string
  *
@@ -119,14 +121,20 @@ static int call_modprobe(char *module_name, int wait)
  * must check that the service they requested is now available not blindly
  * invoke it.
  *
- * If module auto-loading support is disabled then this function
- * becomes a no-operation.
+ * If "required_cap" is positive, The security subsystem will trust the caller
+ * that if it has "required_cap" then it may allow to load some modules that
+ * have a specific alias.
+ *
+ * If module auto-loading support is disabled by clearing the modprobe path
+ * then this function becomes a no-operation.
  */
-int __request_module(bool wait, const char *fmt, ...)
+int __request_module(bool wait, int required_cap,
+		     const char *prefix, const char *fmt, ...)
 {
 	va_list args;
 	char module_name[MODULE_NAME_LEN];
 	int ret;
+	int len = 0;
 
 	/*
 	 * We don't allow synchronous module loading from async.  Module
@@ -139,13 +147,22 @@ int __request_module(bool wait, const char *fmt, ...)
 	if (!modprobe_path[0])
 		return 0;
 
+	/*
+	 * Lets attach the prefix to the module name
+	 */
+	if (prefix != NULL && *prefix != '\0') {
+		len += snprintf(module_name, MODULE_NAME_LEN, "%s-", prefix);
+		if (len >= MODULE_NAME_LEN)
+			return -ENAMETOOLONG;
+	}
+
 	va_start(args, fmt);
-	ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
+	ret = vsnprintf(module_name + len, MODULE_NAME_LEN - len, fmt, args);
 	va_end(args);
-	if (ret >= MODULE_NAME_LEN)
+	if (ret >= MODULE_NAME_LEN - len)
 		return -ENAMETOOLONG;
 
-	ret = security_kernel_module_request(module_name);
+	ret = security_kernel_module_request(module_name, required_cap, prefix);
 	if (ret)
 		return ret;
 
diff --git a/security/security.c b/security/security.c
index 1cd8526..91ecebd 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1015,9 +1015,11 @@ int security_kernel_create_files_as(struct cred *new, struct inode *inode)
 	return call_int_hook(kernel_create_files_as, 0, new, inode);
 }
 
-int security_kernel_module_request(char *kmod_name)
+int security_kernel_module_request(char *kmod_name, int required_cap,
+				   const char *prefix)
 {
-	return call_int_hook(kernel_module_request, 0, kmod_name);
+	return call_int_hook(kernel_module_request, 0, kmod_name,
+			     required_cap, prefix);
 }
 
 int security_kernel_read_file(struct file *file, enum kernel_read_file_id id)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d07299d..69f25da 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3889,7 +3889,8 @@ static int selinux_kernel_create_files_as(struct cred *new, struct inode *inode)
 	return ret;
 }
 
-static int selinux_kernel_module_request(char *kmod_name)
+static int selinux_kernel_module_request(char *kmod_name, int required_cap,
+					 const char *prefix)
 {
 	struct common_audit_data ad;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 next 2/5] modules:capabilities: add cap_kernel_module_request() permission check
  2017-11-27 17:18 [PATCH v5 next 0/5] Improve Module autoloading infrastructure Djalal Harouni
  2017-11-27 17:18 ` [PATCH v5 next 1/5] modules:capabilities: add request_module_cap() Djalal Harouni
@ 2017-11-27 17:18 ` Djalal Harouni
  2017-11-30  2:05   ` Luis R. Rodriguez
  2017-11-27 17:18 ` [PATCH v5 next 3/5] modules:capabilities: automatic module loading restriction Djalal Harouni
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 49+ messages in thread
From: Djalal Harouni @ 2017-11-27 17:18 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, linux-kernel, linux-security-module,
	kernel-hardening
  Cc: Jonathan Corbet, Ingo Molnar, David S. Miller, netdev,
	Peter Zijlstra, Linus Torvalds, Djalal Harouni

This is a preparation patch to improve for the module auto-load
infrastrucutre.

With this change, subsystems that want to autoload modules and implement
onsite capability checks, can defer the checks to the capability
subsystem by passing the required capabilities with the appropriate
modules alias. The capability subsystem will trust callers about
the passed values and perform a capability check to either allow module
auto-loading or deny it.

This patch changes:
* Adds cap_kernel_module_request() capability hook.
* Adds an empty may_autoload_module() that will be updated in the next
  patch.

Cc: James Morris <james.l.morris@oracle.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 include/linux/module.h   | 10 ++++++++++
 include/linux/security.h |  4 +++-
 kernel/module.c          | 23 +++++++++++++++++++++++
 security/commoncap.c     | 26 ++++++++++++++++++++++++++
 4 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index c69b49a..5cbb239 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -497,6 +497,10 @@ bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr);
 bool is_module_percpu_address(unsigned long addr);
 bool is_module_text_address(unsigned long addr);
 
+/* Determine whether a module auto-load operation is permitted. */
+int may_autoload_module(char *kmod_name, int required_cap,
+			const char *kmod_prefix);
+
 static inline bool within_module_core(unsigned long addr,
 				      const struct module *mod)
 {
@@ -643,6 +647,12 @@ bool is_module_sig_enforced(void);
 
 #else /* !CONFIG_MODULES... */
 
+static inline int may_autoload_module(char *kmod_name, int required_cap,
+				      const char *kmod_prefix)
+{
+	return -ENOSYS;
+}
+
 static inline struct module *__module_address(unsigned long addr)
 {
 	return NULL;
diff --git a/include/linux/security.h b/include/linux/security.h
index 41e700a..9bb53b5 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -102,6 +102,8 @@ extern int cap_task_setscheduler(struct task_struct *p);
 extern int cap_task_setioprio(struct task_struct *p, int ioprio);
 extern int cap_task_setnice(struct task_struct *p, int nice);
 extern int cap_vm_enough_memory(struct mm_struct *mm, long pages);
+extern int cap_kernel_module_request(char *kmod_name, int required_cap,
+				     const char *kmod_prefix);
 
 struct msghdr;
 struct sk_buff;
@@ -924,7 +926,7 @@ static inline int security_kernel_module_request(char *kmod_name,
 						 int required_cap,
 						 const char *prefix)
 {
-	return 0;
+	return cap_kernel_module_request(kmod_name, required_cap, prefix);
 }
 
 static inline int security_kernel_read_file(struct file *file,
diff --git a/kernel/module.c b/kernel/module.c
index f0411a2..3380d39 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4340,6 +4340,29 @@ struct module *__module_text_address(unsigned long addr)
 }
 EXPORT_SYMBOL_GPL(__module_text_address);
 
+/**
+ * may_autoload_module - Determine whether a module auto-load operation
+ * is permitted
+ * @kmod_name: The module name
+ * @required_cap: if positive, may allow to auto-load the module if this
+ *	capability is set
+ * @kmod_prefix: The module prefix if any, otherwise NULL
+ *
+ * Determine whether a module auto-load operation is allowed or not.
+ *
+ * This allows to have more control on automatic module loading, and align it
+ * with explicit load/unload module operations. The kernel contains several
+ * modules, some of them are not updated often and may contain bugs and
+ * vulnerabilities.
+ *
+ * Returns 0 if the module request is allowed or -EPERM if not.
+ */
+int may_autoload_module(char *kmod_name, int required_cap,
+			const char *kmod_prefix)
+{
+	return 0;
+}
+
 /* Don't grab lock, we're oopsing. */
 void print_modules(void)
 {
diff --git a/security/commoncap.c b/security/commoncap.c
index 4f8e093..236e573 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1340,6 +1340,31 @@ int cap_mmap_file(struct file *file, unsigned long reqprot,
 	return 0;
 }
 
+/**
+ * cap_kernel_module_request - Determine whether a module auto-load is permitted
+ * @kmod_name: The module name
+ * @required_cap: if positive, may allow to auto-load the module if this
+ *	capability is set
+ * @kmod_prefix: the module prefix if any, otherwise NULL
+ *
+ * Determine whether a module should be automatically loaded.
+ * Returns 0 if the module request should be allowed, -EPERM if not.
+ */
+int cap_kernel_module_request(char *kmod_name, int required_cap,
+			      const char *kmod_prefix)
+{
+	int ret;
+	char comm[sizeof(current->comm)];
+
+	ret = may_autoload_module(kmod_name, required_cap, kmod_prefix);
+	if (ret < 0)
+		pr_notice_ratelimited(
+			"module: automatic module loading of %.64s by \"%s\"[%d] was denied\n",
+			kmod_name, get_task_comm(comm, current), current->pid);
+
+	return ret;
+}
+
 #ifdef CONFIG_SECURITY
 
 struct security_hook_list capability_hooks[] __lsm_ro_after_init = {
@@ -1361,6 +1386,7 @@ struct security_hook_list capability_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(task_setioprio, cap_task_setioprio),
 	LSM_HOOK_INIT(task_setnice, cap_task_setnice),
 	LSM_HOOK_INIT(vm_enough_memory, cap_vm_enough_memory),
+	LSM_HOOK_INIT(kernel_module_request, cap_kernel_module_request),
 };
 
 void __init capability_add_hooks(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 next 3/5] modules:capabilities: automatic module loading restriction
  2017-11-27 17:18 [PATCH v5 next 0/5] Improve Module autoloading infrastructure Djalal Harouni
  2017-11-27 17:18 ` [PATCH v5 next 1/5] modules:capabilities: add request_module_cap() Djalal Harouni
  2017-11-27 17:18 ` [PATCH v5 next 2/5] modules:capabilities: add cap_kernel_module_request() permission check Djalal Harouni
@ 2017-11-27 17:18 ` Djalal Harouni
  2017-11-30  1:23   ` Luis R. Rodriguez
  2017-11-27 17:18 ` [PATCH v5 next 4/5] modules:capabilities: add a per-task modules auto-load mode Djalal Harouni
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 49+ messages in thread
From: Djalal Harouni @ 2017-11-27 17:18 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, linux-kernel, linux-security-module,
	kernel-hardening
  Cc: Jonathan Corbet, Ingo Molnar, David S. Miller, netdev,
	Peter Zijlstra, Linus Torvalds, Djalal Harouni

Currently, an explicit call to load or unload kernel modules require
CAP_SYS_MODULE capability. However unprivileged users have always been
able to load some modules using the implicit auto-load operation. An
automatic module loading happens when programs request a kernel feature
from a module that is not loaded. In order to satisfy userspace, the
kernel then automatically load all these required modules.

Historically, the kernel was always able to automatically load modules
if they are not blacklisted. This is one of the most important and
transparent operations of Linux, it allows to provide numerous other
features as they are needed which is crucial for a better user experience.
However, as Linux is popular now and used for different appliances some
of these may need to control such operations. For such systems, recent
needs showed that in some cases allowing to control automatic module
loading is as important as the operation itself. Restricting unprivileged
programs or attackers that abuse this feature to load unused modules or
modules that contain bugs is a significant security measure.

This allows administrators or some special programs to have the
appropriate time to update and deny module autoloading in advance, then
blacklist the corresponding ones. Not doing so may affect the global state
of the machine, especially containers where some apps are moved from one
context to another and not having such mechanisms may allow to expose
and exploit the vulnerable parts to escape the container sandbox.

Embedded or IoT devices also started to ship as containers using generic
distros, some vendors do not have the appropriate time to make their own
OS, hence, using base images is getting popular. These setups may include
unnecessary modules that the final applications will not need. Untrusted
access may abuse the module auto-load feature to expose vulnerabilities.

As every code contains bugs or vulnerabilties, the following
vulnerabilities that affected some features that are often compiled as
modules could have been completely blocked, by restricting autoloading
modules if the system does not need them.

Past months:
* DCCP use after free CVE-2017-6074 [1] [2]
  Unprivileged to local root.

* XFRM framework CVE-2017-7184 [3]
  As advertised it seems it was used to break Ubuntu on a security
  contest.

* n_hldc CVE-2017-2636 [4] [5]
  Local privilege escalation.

* L2TPv3 CVE-2016-10200

The list is longer.

To improve the current status, this patch introduces "modules_autoload_mode"
kernel sysctl flag. The flag controls modules auto-load feature and
complements "modules_disabled" which apply to all modules operations.
This new flag allows to control only automatic module loading and if it is
allowed or not, aligning in the process the implicit operation with the
explicit one where both now are covered by capabilities checks.

The three modes that "modules_autoload_mode" support allow to provide
restrictions on automatic module loading without breaking user
experience.

The sysctl flag is available at "/proc/sys/kernel/modules_autoload_mode"

When modules_autoload_mode is set to (0), the default, there are no
restrictions.

When modules_autoload_mode is set to (1), processes must have
CAP_SYS_MODULE to be able to trigger a module auto-load operation,
or CAP_NET_ADMIN for modules with a 'netdev-%s' alias, or other
capabilities for specific aliased modules.

When modules_autoload_mode is set to (2), automatic module loading
is disabled for all.

Notes on relation between "modules_disabled=0" and
"modules_autoload_mode=2":
1) Once "modules_disabled=1" set, it needs a reboot to undo the
setting.

2) Restricting automatic module loading does not interfere with
explicit module load or unload operations.

3) New features provided by modules can be made available without
rebooting the system.

4) A bad version of a module can be unloaded and replaced with a
better one without rebooting the system.

The idea of module auto-load restriction was inspired from grsecurity
'GRKERNSEC_MODHARDEN' config option. Upstream Linux implementation is
more focused on the run-time behavior with a three mode switch.

Testing
-------

Example 1)

Before:
$ lsmod | grep ipip -
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
$ lsmod | grep ipip -
ipip                   16384  0
tunnel4                16384  1 ipip
ip_tunnel              28672  1 ipip
$ cat /proc/sys/kernel/modules_autoload_mode
0

After:
$ lsmod | grep ipip -
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
add tunnel "tunl0" failed: No such device
$ dmesg
...
[ 1876.378389] module: automatic module loading of netdev-tunl0 by "ip"[1453] was denied
[ 1876.380994] module: automatic module loading of tunl0 by "ip"[1453] was denied
...
$ lsmod | grep ipip -
$

Example 2)

DCCP use after free CVE-2017-6074:
The code path can be triggered by unprivileged, using the trigger.c
program for DCCP use after free [2] and that was fixed by
commit 5edabca9d4cff7f "dccp: fix freeing skb too early for IPV6_RECVPKTINFO".

Before:
$ lsmod | grep dccp
$ strace ./dccp_trigger
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = 3
...
$ lsmod | grep dccp
dccp_ipv6              24576  5
dccp_ipv4              24576  5 dccp_ipv6
dccp                  102400  2 dccp_ipv6,dccp_ipv4

After:
Only privileged:
$ lsmod | grep dccp
$ strace ./dccp_trigger
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = -1 ESOCKTNOSUPPORT (Socket type not supported)
...
$ lsmod | grep dccp
$ dmesg
...
[  175.945063] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1390] was denied
[  175.947952] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1390] was denied
[  175.956061] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1390] was denied
[  175.959733] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1390] was denied

$ sudo strace ./dccp_trigger
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = 3
...
$ lsmod | grep dccp
dccp_ipv6              24576  6
dccp_ipv4              24576  5 dccp_ipv6
dccp                  102400  2 dccp_ipv6,dccp_ipv4

Disable automatic module loading:
$ lsmod | grep dccp
$ su - root
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = -1 ESOCKTNOSUPPORT (Socket type not supported)
...
$ lsmod | grep dccp
$ dmesg
...
[  126.596545] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1291] was denied
[  126.598800] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1291] was denied
[  126.601264] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1291] was denied
[  126.602839] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1291] was denied

As an example, this blocks abuses, DCCP still can be explicilty loaded by
an administrator using modprobe, at same time automatic module loading is
disabled forever.

[1] http://www.openwall.com/lists/oss-security/2017/02/22/3
[2] https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-6074
[3] http://www.openwall.com/lists/oss-security/2017/03/29/2
[4] http://www.openwall.com/lists/oss-security/2017/03/07/6
[5] https://a13xp0p0v.github.io/2017/03/24/CVE-2017-2636.html

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Solar Designer <solar@openwall.com>
Cc: Andy Lutomirski <luto@kernel.org>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 Documentation/sysctl/kernel.txt | 54 +++++++++++++++++++++++++++
 include/linux/module.h          | 11 +++++-
 kernel/module.c                 | 81 ++++++++++++++++++++++++++++++++++++++++-
 kernel/sysctl.c                 | 28 ++++++++++++++
 4 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 694968c..dc44075 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -43,6 +43,7 @@ show up in /proc/sys/kernel:
 - l2cr                        [ PPC only ]
 - modprobe                    ==> Documentation/debugging-modules.txt
 - modules_disabled
+- modules_autoload_mode
 - msg_next_id		      [ sysv ipc ]
 - msgmax
 - msgmnb
@@ -413,6 +414,59 @@ to false.  Generally used with the "kexec_load_disabled" toggle.
 
 ==============================================================
 
+modules_autoload_mode:
+
+A sysctl to control if modules auto-load feature is allowed or not.
+This sysctl complements "modules_disabled" which is for all module
+operations where this flag applies only to automatic module loading.
+Automatic module loading happens when programs request a kernel
+feature that is implemented by an unloaded module, the kernel
+automatically runs the program pointed by "modprobe" sysctl in order
+to load the corresponding module.
+
+Historically, the kernel was always able to automatically load modules
+if they are not blacklisted. This is one of the most important and
+transparent operations of Linux, it allows to provide numerous other
+features as they are needed which is crucial for a better user experience.
+However, as Linux is popular now and used for different appliances some
+of these may need to control such operations. For such systems, recent
+needs showed that in some cases allowing to control automatic module
+loading is as important as the operation itself. Restricting unprivileged
+programs or attackers that abuse this feature to load unused modules or
+modules that contain bugs is a significant security measure.
+
+The three modes that "modules_autoload_mode" support allow to provide
+restrictions on automatic module loading without breaking user
+experience.
+
+When modules_autoload_mode is set to (0), the default, there are no
+restrictions.
+
+When modules_autoload_mode is set to (1), processes must have
+CAP_SYS_MODULE to be able to trigger a module auto-load operation,
+CAP_NET_ADMIN for modules with a 'netdev-%s' alias, or other
+capabilities for specific aliased modules.
+
+When modules_autoload_mode is set to (2), automatic module loading
+is disabled for all.
+
+
+Notes on relation between "modules_disabled=0" and
+"modules_autoload_mode=2":
+1) Once "modules_disabled=1" set, it needs a reboot to undo the
+setting.
+2) Restricting automatic module loading does not interfere with
+explicit module load or unload operations.
+3) New features provided by modules can be made available without
+rebooting the system.
+4) A bad version of a module can still be unloaded and replaced with
+a better one without rebooting the system.
+
+The idea of module auto-load restriction was inspired from grsecurity
+'GRKERNSEC_MODHARDEN' config option.
+
+==============================================================
+
 msg_next_id, sem_next_id, and shm_next_id:
 
 These three toggles allows to specify desired id for next allocated IPC
diff --git a/include/linux/module.h b/include/linux/module.h
index 5cbb239..c36aed8 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -261,7 +261,16 @@ struct notifier_block;
 
 #ifdef CONFIG_MODULES
 
-extern int modules_disabled; /* for sysctl */
+enum {
+	MODULES_AUTOLOAD_ALLOWED	= 0,
+	MODULES_AUTOLOAD_PRIVILEGED	= 1,
+	MODULES_AUTOLOAD_DISABLED	= 2,
+};
+
+extern int modules_disabled; /* sysctl for explicit module load/unload */
+extern int modules_autoload_mode; /* sysctl for automatic module loading */
+extern const int modules_autoload_max; /* max value for modules_autoload_mode */
+
 /* Get/put a kernel symbol (calls must be symmetric) */
 void *__symbol_get(const char *symbol);
 void *__symbol_get_gpl(const char *symbol);
diff --git a/kernel/module.c b/kernel/module.c
index 3380d39..a7205fb 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -290,6 +290,8 @@ EXPORT_SYMBOL(is_module_sig_enforced);
 
 /* Block module loading/unloading? */
 int modules_disabled = 0;
+int modules_autoload_mode = MODULES_AUTOLOAD_ALLOWED;
+const int modules_autoload_max = MODULES_AUTOLOAD_DISABLED;
 core_param(nomodule, modules_disabled, bint, 0);
 
 /* Waiting for a module to finish initializing? */
@@ -4355,12 +4357,89 @@ EXPORT_SYMBOL_GPL(__module_text_address);
  * modules, some of them are not updated often and may contain bugs and
  * vulnerabilities.
  *
+ * If "@required_cap" is positive and a valid capability then it is checked
+ * together with the "@kmod_prefix" to either allow or deny automatic module
+ * loading.
+ *
+ * However even if the caller has the required capability, the operation can
+ * still be denied due to the global "modules_autoload_mode" sysctl mode. Unless
+ * set by enduser, the operation is always allowed which is the default.
+ *
+ * The permission check is performed in this order:
+ * 1) If the global sysctl "modules_autoload_mode" is set to 'disabled', then
+ *    operation is denied.
+ *
+ * 2) If the global sysctl "modules_autoload_mode" is set to 'privileged', then:
+ *
+ *   2.1) If "@required_cap" is positive and "@kmod_prefix" is set, then
+ *   if the caller has the capability, the operation is allowed.
+ *
+ *   2.2) If "@required_cap" is positive and "@kmod_prefix" is NULL, then we
+ *   fallback to check if caller has CAP_SYS_MODULE, if so, operation is
+ *   allowed.
+ *
+ *   2.3) If caller passes "@required_cap" as a negative then we fallback to
+ *   check if caller has CAP_SYS_MODULE, if so, operation is allowed.
+ *
+ *   We require capabilities to autoload modules here, and CAP_SYS_MODULE here is
+ *   the default.
+ *
+ *   2.4) Otherwise operation is denied.
+ *
+ * 3) If the global sysctl "modules_autoload_mode" is set to 'allowed' which is
+ *    the default, then:
+ *
+ *   3.1) If "@required_cap" is positive and "@kmod_prefix" is set, we check if
+ *   caller has the capability, if so, operation is allowed.
+ *   In this case the calling subsystem requires the capability to be set before
+ *   allowing modules autoload operations and we have to honor that.
+ *
+ *   3.2) If "@required_cap" is positive and "@kmod_prefix" is NULL, then we
+ *   fallback to check if caller has CAP_SYS_MODULE, if so, operation is
+ *   allowed.
+ *
+ *   3.3) If caller passes "@required_cap" as a negative then operation is
+ *   allowed. This is the most common case as it is used now by
+ *   request_module() function.
+ *
+ *   3.4) Otherwise operation is denied.
+ *
  * Returns 0 if the module request is allowed or -EPERM if not.
  */
 int may_autoload_module(char *kmod_name, int required_cap,
 			const char *kmod_prefix)
 {
-	return 0;
+	int module_require_cap = CAP_SYS_MODULE;
+	unsigned int autoload = modules_autoload_mode;
+
+	/* Short-cut for most use cases where kmod auto-loading is allowed */
+	if (autoload == MODULES_AUTOLOAD_ALLOWED && required_cap < 0)
+		return 0;
+
+	/* If autoload is disabled then fail here */
+	if (autoload == MODULES_AUTOLOAD_DISABLED)
+		return -EPERM;
+
+	/* If caller requires privileges */
+	if (required_cap > 0) {
+		/*
+		 * If '@kmod_prefix' is set then use the '@required_cap'.
+		 * This allows to cover 'netdev-%s' alias modules and others
+		 * with their corresponding capability
+		 */
+		if (kmod_prefix != NULL && *kmod_prefix != '\0')
+			module_require_cap = required_cap;
+	}
+
+	/*
+	 * We require privileges if '@required_cap' was set or if the
+	 * 'modules_autoload_mode' is set to 'privileged' mode.
+	 */
+	if (capable(module_require_cap))
+		return 0;
+
+	/* Otherwise fail */
+	return -EPERM;
 }
 
 /* Don't grab lock, we're oopsing. */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2fb4e27..0b6f0c8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -207,6 +207,11 @@ static int proc_taint(struct ctl_table *table, int write,
 			       void __user *buffer, size_t *lenp, loff_t *ppos);
 #endif
 
+#ifdef CONFIG_MODULES
+static int modules_autoload_dointvec_minmax(struct ctl_table *table, int write,
+				void __user *buffer, size_t *lenp, loff_t *ppos);
+#endif
+
 #ifdef CONFIG_PRINTK
 static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
 				void __user *buffer, size_t *lenp, loff_t *ppos);
@@ -683,6 +688,15 @@ static struct ctl_table kern_table[] = {
 		.extra1		= &one,
 		.extra2		= &one,
 	},
+	{
+		.procname	= "modules_autoload_mode",
+		.data		= &modules_autoload_mode,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= modules_autoload_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= (void *)&modules_autoload_max,
+	},
 #endif
 #ifdef CONFIG_UEVENT_HELPER
 	{
@@ -2499,6 +2513,20 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
 }
 #endif
 
+#ifdef CONFIG_MODULES
+static int modules_autoload_dointvec_minmax(struct ctl_table *table, int write,
+				void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	/*
+	 * Only CAP_SYS_MODULE in init user namespace are allowed to change this
+	 */
+	if (write && !capable(CAP_SYS_MODULE))
+		return -EPERM;
+
+	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+}
+#endif
+
 struct do_proc_dointvec_minmax_conv_param {
 	int *min;
 	int *max;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 next 4/5] modules:capabilities: add a per-task modules auto-load mode
  2017-11-27 17:18 [PATCH v5 next 0/5] Improve Module autoloading infrastructure Djalal Harouni
                   ` (2 preceding siblings ...)
  2017-11-27 17:18 ` [PATCH v5 next 3/5] modules:capabilities: automatic module loading restriction Djalal Harouni
@ 2017-11-27 17:18 ` Djalal Harouni
  2017-11-27 17:18 ` [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules Djalal Harouni
  2017-11-27 18:41 ` [PATCH v5 next 0/5] Improve Module autoloading infrastructure Linus Torvalds
  5 siblings, 0 replies; 49+ messages in thread
From: Djalal Harouni @ 2017-11-27 17:18 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, linux-kernel, linux-security-module,
	kernel-hardening
  Cc: Jonathan Corbet, Ingo Molnar, David S. Miller, netdev,
	Peter Zijlstra, Linus Torvalds, Djalal Harouni

Previous patches added the global sysctl "modules_autoload_mode". This patch
make it possible to support process trees, containers, and sandboxes by
providing an inherited per-task "modules_autoload_mode" flag that cannot be
re-enabled once disabled. This allows to improve automatic module loading
without affecting the rest of the system.

Why we need this ?

Usually a request to a kernel feature that is implemented by a module
that is not loaded may trigger automatic module loading feature,
allowing to transparently satisfy userspace, and provide numeours
features as they are needed. In this case an implicit kernel module load
operation happens.

In most cases to load or unload a kernel module, an explicit operation
happens where programs are required to have CAP_SYS_MODULE capability to
perform so. However, in general with implicit module loading, no
capabilities are required as automatic module loading is one of the most
important and transparent operations of Linux.

Recent vulnerabilities showed that automatic module loading can be
abused in order to expose more bugs. Some of these vulnerabilities are:

* DCCP use after free CVE-2017-6074 [1] [2]
  Unprivileged to local root PoC.

* XFRM framework CVE-2017-7184 [3]
  As advertised it seems it was used to break Ubuntu at a security
  contest.

* n_hldc CVE-2017-2636 [4] [5]
  Local privilege escalation.

* L2TPv3 CVE-2016-10200

Currently most of Linux code is in a form of modules, and not all
modules are written or maintained in the same way. In a container or
sandbox world, apps can be moved from one context to another or from
one Linux system to another one, the ability to restrict some of these
apps to load extra kernel modules will prevent exposing some kernel
interfaces that have not been updated withing such systems.

The DCCP vulnerability CVE-2017-6074 that can be triggered by
unprivileged, or CVE-2017-7184 in the XFRM framework are some recent
real examples. CVE-2017-7184 was used to break Ubuntu at a security
contest. Ubuntu is more of desktop distro, using a global switch to
disable automatic module loading will harm users. Actually this design
will always end up being ignored by such kind of systems that need to
offer a competitive and interactive solution for their users.

>From this and from observing how apps are being run, this patch
introduces a per-task "modules_autoload_mode" to restrict automatic
module loading. This offers the following advantages:

1) Allows to abstract in userspace as something like:
DenyNewFeatures=yes

2) Automatic module loading is still available to the rest of the
system.

2) It is easy to use in containers and sandboxes. DCCP example could
have been used to escape containers. The XFRM framework CVE-2017-7184
needs CAP_NET_ADMIN, but attackers may start to target CAP_NET_ADMIN,
a per-task flag will make it harder.

3) Suitable for desktop and more interactive Linux systems.

4) Will allow in future to implement a per user policy.
The user database format is old and not extensible, as discussed maybe
with a modern format we may achieve the following:

User=djalal
DenyNewFeatures=no

Which means that interactive user will be allowed to load extra
Linux features. Others, volatile accounts or guests can be easily
blocked from doing so.

5) CAP_NET_ADMIN is useful, it handles lot of operations, at same time it
started to look more like CAP_SYS_ADMIN which is overloaded. We need
CAP_NET_ADMIN, containers need it, but at same time maybe we do not
want programs running with it to load 'netdev-%s' modules. Having an
extra per-task flag allow to discharge CAP_NET_ADMIN and other
capabilities, it is clearly targeted to automatic module loading
operations and from a higher view to 'load new kernel features schema'.

Usage:
------

To set the per-task "modules_autoload_mode":

        prctl(PR_SET_MODULES_AUTOLOAD_MODE, mode, 0, 0, 0);

When a module auto-load request is triggered by current task, then the
operation has first to satisfy the per-task access mode before attempting
to implicitly load the module. Once set, this setting is inherited across
fork, clone and execve.

Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or run with
CAP_SYS_ADMIN privileges in its namespace.  If these are not true, -EACCES
will be returned.  This requirement ensures that unprivileged programs cannot
affect the behaviour or surprise privileged children.

The per-task "modules_autoload_mode" supports the following values:
0       There are no restrictions, usually the default unless set
        by parent.
1       The task must have CAP_SYS_MODULE to be able to trigger a
        module auto-load operation, or CAP_NET_ADMIN for modules with
        a 'netdev-%s' alias.
2       Automatic modules loading is disabled for the current task.

The mode may only be increased, never decreased, thus ensuring that once
applied, processes can never relax their setting. This make it easy for
developers and users to handle.

Note that even if the per-task "modules_autoload_mode" allows to auto-load
the corresponding modules, automatic module loading may still fail due to
the global sysctl "modules_autoload_mode". For more details please see
Documentation/sysctl/kernel.txt, section "modules_autoload_mode".

When a request to a kernel module is denied, the module name with the
corresponding process name and its pid are logged. Administrators can use
such information to explicitly load the appropriate modules.

Testing per-task or per container setup
---------------------------------------

The following tool can be used to test the feature:
https://gist.githubusercontent.com/tixxdz/cf567e4275714199a32c4a80de4ea63a/raw/13e52ea0ee65772871bcf10fb6c94fedd349f5c1/pr_modules_autoload_mode_test.c

Example 1)

Before patch:
$ lsmod | grep ipip -
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
$ lsmod | grep ipip -
ipip                   16384  0
tunnel4                16384  1 ipip
ip_tunnel              28672  1 ipip
$ grep Modules /proc/self/status
ModulesAutoloadMode:    0

After patch:
Set task "modules_autoload_mode" to disabled.
$ lsmod | grep ipip -
$ grep Modules /proc/self/status
ModulesAutoloadMode:    0
$ su - root
 # ./pr_modules_autoload_mode_test 2
 # grep Modules /proc/self/status
ModulesAutoloadMode:    2
 # ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
add tunnel "tunl0" failed: No such device
...
[  634.954652] module: automatic module loading of netdev-tunl0 by "ip"[1560] was denied
[  634.955775] module: automatic module loading of tunl0 by "ip"[1560] was denied
...

Example 2)

Sample with XFRM tunnel mode.

Before patch:
$ lsmod | grep xfrm -
$ grep Modules /proc/self/status
ModulesAutoloadMode:    0
$ sudo ip xfrm state add src 10.0.2.100 dst 10.0.1.100 proto esp spi $id1 \
> reqid $id2 mode tunnel auth "hmac(sha256)" $key1 enc "cbc(aes)" $key2
$ lsmod | grep xfrm
xfrm4_mode_tunnel      16384  2

After patch:
Set task "modules_autoload_mode" to disabled.
$ lsmod | grep xfrm -
$ grep Modules /proc/self/status
ModulesAutoloadMode:    0
$ su - root
 # ./pr_modules_autoload_mode_test 2
 # grep Modules /proc/self/status
ModulesAutoloadMode:    2
 # ip xfrm state add src 10.0.2.100 dst 10.0.1.100 proto esp spi $id1 \
> reqid $id2 mode tunnel auth "hmac(sha256)" $key1 enc "cbc(aes)" $key2
RTNETLINK answers: Protocol not supported
...
[ 3458.139490] module: automatic module loading of xfrm-mode-2-1 by "ip"[1506] was denied
...

Example 3)

Here we use DCCP as an example since the public PoC was against it.

DCCP use after free CVE-2017-6074 (unprivileged to local root):
The code path can be triggered by unprivileged, using the trigger.c
program for DCCP use after free [2] and that was fixed by
commit 5edabca9d4cff7f "dccp: fix freeing skb too early for IPV6_RECVPKTINFO".

Before patch:
$ lsmod | grep dccp
$ strace ./dccp_trigger
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = 3
...
$ lsmod | grep dccp
dccp_ipv6              24576  5
dccp_ipv4              24576  5 dccp_ipv6
dccp                  102400  2 dccp_ipv6,dccp_ipv4
$ grep Modules /proc/self/status
ModulesAutoloadMode:    0

After patch:

Set task "modules_autoload_mode" to 1, privileged mode.
$ lsmod | grep dccp
$ ./pr_set_no_new_privs
$ grep NoNewPrivs /proc/self/status
NoNewPrivs:     1
$ ./pr_modules_autoload_mode_test 1
$ grep Modules /proc/self/status
ModulesAutoloadMode:    1
$ strace ./dccp_trigger
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = -1 ESOCKTNOSUPPORT (Socket type not supported)
...
$ lsmod | grep dccp
$ dmesg
...
[ 4662.171994] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1759] was denied
[ 4662.177284] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1759] was denied
[ 4662.180181] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1759] was denied
[ 4662.181709] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1759] was denied

Now task "modules_autoload_mode" to 2, disabled mode.
$ lsmod | grep dccp
$ grep Modules /proc/self/status
ModulesAutoloadMode:    0
$ su - root
 # ./pr_modules_autoload_mode_test 2
 # grep Modules /proc/self/status
ModulesAutoloadMode:    2
 # strace ./dccp_trigger

...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = -1 ESOCKTNOSUPPORT (Socket type not supported)
...
...
[ 5154.218740] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1873] was denied
[ 5154.219828] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1873] was denied
[ 5154.221814] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1873] was denied
[ 5154.222731] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1873] was denied

As showed, this blocks automatic module loading per-task. This allows to
provide a usable system, where only some sandboxed apps or containers will be
restricted to trigger automatic module loading, other parts of the
system can continue to use the feature as it is which is the case of the
desktop and userfriendly machines.

[1] http://www.openwall.com/lists/oss-security/2017/02/22/3
[2] https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-6074
[3] http://www.openwall.com/lists/oss-security/2017/03/29/2
[4] http://www.openwall.com/lists/oss-security/2017/03/07/6
[5] https://a13xp0p0v.github.io/2017/03/24/CVE-2017-2636.html

Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 Documentation/filesystems/proc.txt                 |   3 +
 Documentation/userspace-api/index.rst              |   1 +
 .../userspace-api/modules_autoload_mode.rst        | 116 +++++++++++++++++++++
 fs/proc/array.c                                    |   6 ++
 include/linux/init_task.h                          |   8 ++
 include/linux/module.h                             |  20 ++++
 include/linux/sched.h                              |   5 +
 include/uapi/linux/prctl.h                         |   8 ++
 kernel/module.c                                    |  83 ++++++++++++---
 security/commoncap.c                               |  36 +++++++
 10 files changed, 270 insertions(+), 16 deletions(-)
 create mode 100644 Documentation/userspace-api/modules_autoload_mode.rst

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 2a84bb3..1974cb6 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -195,6 +195,7 @@ read the file /proc/PID/status:
   CapBnd: ffffffffffffffff
   NoNewPrivs:     0
   Seccomp:        0
+  ModulesAutoloadMode:    0
   voluntary_ctxt_switches:        0
   nonvoluntary_ctxt_switches:     1
 
@@ -269,6 +270,8 @@ Table 1-2: Contents of the status files (as of 4.8)
  CapBnd                      bitmap of capabilities bounding set
  NoNewPrivs                  no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
  Seccomp                     seccomp mode, like prctl(PR_GET_SECCOMP, ...)
+ ModulesAutoloadMode         modules auto-load mode, like
+                             prctl(PR_GET_MODULES_AUTOLOAD_MODE, ...)
  Cpus_allowed                mask of CPUs on which this process may run
  Cpus_allowed_list           Same as previous, but in "list format"
  Mems_allowed                mask of memory nodes allowed to this process
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 7b2eb1b..bfd51b7 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -17,6 +17,7 @@ place where this information is gathered.
    :maxdepth: 2
 
    no_new_privs
+   modules_autoload_mode
    seccomp_filter
    unshare
 
diff --git a/Documentation/userspace-api/modules_autoload_mode.rst b/Documentation/userspace-api/modules_autoload_mode.rst
new file mode 100644
index 0000000..1153c35
--- /dev/null
+++ b/Documentation/userspace-api/modules_autoload_mode.rst
@@ -0,0 +1,116 @@
+======================================
+Per-task module auto-load restrictions
+======================================
+
+
+Introduction
+============
+
+Usually a request to a kernel feature that is implemented by a module
+that is not loaded may trigger automatic module loading feature, allowing
+to transparently satisfy userspace, and provide numerous other features
+as they are needed. In this case an implicit kernel module load
+operation happens.
+
+In most cases to load or unload a kernel module, an explicit operation
+happens where programs are required to have ``CAP_SYS_MODULE`` capability
+to perform so. However, with implicit module loading, no capabilities are
+required, or only ``CAP_NET_ADMIN`` in rare cases where the module has the
+'netdev-%s' alias. Historically this was always the case as automatic
+module loading is one of the most important and transparent operations
+of Linux, users expect that their programs just work, yet, recent cases
+showed that this can be abused by unprivileged users or attackers to load
+modules that were not updated, or modules that contain bugs and
+vulnerabilities.
+
+Currently most of Linux code is in a form of modules, hence, allowing to
+control automatic module loading in some cases is as important as the
+operation itself, especially in the context where Linux is used in
+different appliances.
+
+Restricting automatic module loading allows administratros to have the
+appropriate time to update or deny module autoloading in advance. In a
+container or sandbox world where apps can be moved from one context to
+another, the ability to restrict some containers or apps to load extra
+kernel modules will prevent exposing some kernel interfaces that may not
+receive the same care as some other parts of the core. The DCCP vulnerability
+CVE-2017-6074 that can be triggered by unprivileged, or CVE-2017-7184
+in the XFRM framework are some real examples where users or programs are
+able to expose such kernel interfaces and escape their sandbox.
+
+The per-task ``modules_autoload_mode`` allow to restrict automatic module
+loading per task, preventing the kernel from exposing more of its
+interface. This is particularly useful for containers and sandboxes as
+noted above, they are restricted from affecting the rest of the system
+without affecting its functionality, automatic module loading is still
+available for others.
+
+
+Usage
+=====
+
+When the kernel is compiled with modules support ``CONFIG_MODULES``, then:
+
+``PR_SET_MODULES_AUTOLOAD_MODE``:
+        Set the current task ``modules_autoload_mode``. When a module
+        auto-load request is triggered by current task, then the
+        operation has first to satisfy the per-task access mode before
+        attempting to implicitly load the module. As an example,
+        automatic loading of modules that contain bugs or vulnerabilities
+        can be restricted and unprivileged users can no longer abuse such
+        interfaces. Once set, this setting is inherited across ``fork(2)``,
+        ``clone(2)`` and ``execve(2)``.
+
+        Prior to use, the task must call ``prctl(PR_SET_NO_NEW_PRIVS, 1)``
+        or run with ``CAP_SYS_ADMIN`` privileges in its namespace.  If
+        these are not true, ``-EACCES`` will be returned.  This requirement
+        ensures that unprivileged programs cannot affect the behaviour or
+        surprise privileged children.
+
+        Usage:
+                ``prctl(PR_SET_MODULES_AUTOLOAD_MODE, mode, 0, 0, 0);``
+
+        The 'mode' argument supports the following values:
+        0       There are no restrictions, usually the default unless set
+                by parent.
+        1       The task must have ``CAP_SYS_MODULE`` to be able to trigger a
+                module auto-load operation, or ``CAP_NET_ADMIN`` for modules
+                with a 'netdev-%s' alias.
+        2       Automatic modules loading is disabled for the current task.
+
+        The mode may only be increased, never decreased, thus ensuring
+        that once applied, processes can never relax their setting.
+
+
+        Returned values:
+        0               On success.
+        ``-EINVAL``     If 'mode' is not valid, or the operation is not
+                        supported.
+        ``-EACCES``     If task does not have ``CAP_SYS_ADMIN`` in its namespace
+                        or is not running with ``no_new_privs``.
+        ``-EPERM``      If 'mode' is less strict than current task
+                        ``modules_autoload_mode``.
+
+
+        Note that even if the per-task ``modules_autoload_mode`` allows to
+        auto-load the corresponding modules, automatic module loading
+        may still fail due to the global sysctl ``modules_autoload_mode``.
+        The default mode of ``modules_autoload_mode`` is to always allow
+        automatic module loading. For more details, please see
+        Documentation/sysctl/kernel.txt, section "modules_autoload_mode".
+
+
+        When a request to a kernel module is denied, the module name with the
+        corresponding process name and its pid are logged. Administrators can
+        use such information to explicitly load the appropriate modules.
+
+
+``PR_GET_MODULES_AUTOLOAD_MODE``:
+        Return the current task ``modules_autoload_mode``.
+
+        Usage:
+                ``prctl(PR_GET_MODULES_AUTOLOAD_MODE, 0, 0, 0, 0);``
+
+        Returned values:
+        mode            The task's ``modules_autoload_mode``
+        ``-ENOSYS``     If the kernel was compiled without ``CONFIG_MODULES``.
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 79375fc..57b6cc5 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -90,6 +90,7 @@
 #include <linux/string_helpers.h>
 #include <linux/user_namespace.h>
 #include <linux/fs_struct.h>
+#include <linux/module.h>
 
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -343,10 +344,15 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p)
 
 static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
 {
+	int autoload = task_modules_autoload_mode(p);
+
 	seq_put_decimal_ull(m, "NoNewPrivs:\t", task_no_new_privs(p));
 #ifdef CONFIG_SECCOMP
 	seq_put_decimal_ull(m, "\nSeccomp:\t", p->seccomp.mode);
 #endif
+	if (autoload != -ENOSYS)
+		seq_put_decimal_ull(m, "\nModulesAutoloadMode:\t", autoload);
+
 	seq_putc(m, '\n');
 }
 
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6a53262..f564b41 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -153,6 +153,13 @@ extern struct cred init_cred;
 # define INIT_CGROUP_SCHED(tsk)
 #endif
 
+#ifdef CONFIG_MODULES
+# define INIT_MODULES_AUTOLOAD_MODE(tsk)				\
+	.modules_autoload_mode = 0,
+#else
+# define INIT_MODULES_AUTOLOAD_MODE(tsk)
+#endif
+
 #ifdef CONFIG_PERF_EVENTS
 # define INIT_PERF_EVENTS(tsk)						\
 	.perf_event_mutex = 						\
@@ -250,6 +257,7 @@ extern struct cred init_cred;
 	.tasks		= LIST_HEAD_INIT(tsk.tasks),			\
 	INIT_PUSHABLE_TASKS(tsk)					\
 	INIT_CGROUP_SCHED(tsk)						\
+	INIT_MODULES_AUTOLOAD_MODE(tsk)					\
 	.ptraced	= LIST_HEAD_INIT(tsk.ptraced),			\
 	.ptrace_entry	= LIST_HEAD_INIT(tsk.ptrace_entry),		\
 	.real_parent	= &tsk,						\
diff --git a/include/linux/module.h b/include/linux/module.h
index c36aed8..1d742d3 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -13,6 +13,7 @@
 #include <linux/kmod.h>
 #include <linux/init.h>
 #include <linux/elf.h>
+#include <linux/sched.h>
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
@@ -510,6 +511,15 @@ bool is_module_text_address(unsigned long addr);
 int may_autoload_module(char *kmod_name, int required_cap,
 			const char *kmod_prefix);
 
+/* Set 'modules_autoload_mode' of current task */
+int task_set_modules_autoload_mode(unsigned long value);
+
+/* Read task's 'modules_autoload_mode' */
+static inline int task_modules_autoload_mode(struct task_struct *task)
+{
+	return task->modules_autoload_mode;
+}
+
 static inline bool within_module_core(unsigned long addr,
 				      const struct module *mod)
 {
@@ -662,6 +672,16 @@ static inline int may_autoload_module(char *kmod_name, int required_cap,
 	return -ENOSYS;
 }
 
+static inline int task_set_modules_autoload_mode(unsigned long value)
+{
+	return -ENOSYS;
+}
+
+static inline int task_modules_autoload_mode(struct task_struct *task)
+{
+	return -ENOSYS;
+}
+
 static inline struct module *__module_address(unsigned long addr)
 {
 	return NULL;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e5a2fbc..1b8cf78 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -658,6 +658,11 @@ struct task_struct {
 
 	struct restart_block		restart_block;
 
+#ifdef CONFIG_MODULES
+	/* per-task modules auto-load mode */
+	unsigned			modules_autoload_mode:2;
+#endif
+
 	pid_t				pid;
 	pid_t				tgid;
 
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 3165863..5baf9ae 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -211,4 +211,12 @@ struct prctl_mm_map {
 #define PR_SET_PDEATHSIG_PROC		48
 #define PR_GET_PDEATHSIG_PROC		49
 
+/*
+ * Control the per-task modules auto-load mode
+ *
+ * See Documentation/prctl/modules_autoload_mode.txt for more details.
+ */
+#define PR_SET_MODULES_AUTOLOAD_MODE	50
+#define PR_GET_MODULES_AUTOLOAD_MODE	51
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/module.c b/kernel/module.c
index a7205fb..5c24ac4b 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4345,6 +4345,7 @@ EXPORT_SYMBOL_GPL(__module_text_address);
 /**
  * may_autoload_module - Determine whether a module auto-load operation
  * is permitted
+ *
  * @kmod_name: The module name
  * @required_cap: if positive, may allow to auto-load the module if this
  *	capability is set
@@ -4362,47 +4363,51 @@ EXPORT_SYMBOL_GPL(__module_text_address);
  * loading.
  *
  * However even if the caller has the required capability, the operation can
- * still be denied due to the global "modules_autoload_mode" sysctl mode. Unless
- * set by enduser, the operation is always allowed which is the default.
+ * still be denied due to the per-task "modules_autoload_mode" mode and the
+ * global "modules_autoload_mode" sysctl one. Unless set by enduser, the
+ * operation is always allowed which is the default.
  *
  * The permission check is performed in this order:
- * 1) If the global sysctl "modules_autoload_mode" is set to 'disabled', then
- *    operation is denied.
+ * 1) We calculate the strict mode of both:
+ *    per-task 'modules_autoload_mode' and global sysctl 'modules_autoload_mode'
+ *
+ * We follow up with the result mode as "modules_autoload_mode":
  *
- * 2) If the global sysctl "modules_autoload_mode" is set to 'privileged', then:
+ * 2) If "modules_autoload_mode" is set to 'disabled', then operation is denied.
  *
- *   2.1) If "@required_cap" is positive and "@kmod_prefix" is set, then
+ * 3) If "modules_autoload_mode" is set to 'privileged', then:
+ *
+ *   3.1) If "@required_cap" is positive and "@kmod_prefix" is set, then
  *   if the caller has the capability, the operation is allowed.
  *
- *   2.2) If "@required_cap" is positive and "@kmod_prefix" is NULL, then we
+ *   3.2) If "@required_cap" is positive and "@kmod_prefix" is NULL, then we
  *   fallback to check if caller has CAP_SYS_MODULE, if so, operation is
  *   allowed.
  *
- *   2.3) If caller passes "@required_cap" as a negative then we fallback to
+ *   3.3) If caller passes "@required_cap" as a negative then we fallback to
  *   check if caller has CAP_SYS_MODULE, if so, operation is allowed.
  *
  *   We require capabilities to autoload modules here, and CAP_SYS_MODULE here is
  *   the default.
  *
- *   2.4) Otherwise operation is denied.
+ *   3.4) Otherwise operation is denied.
  *
- * 3) If the global sysctl "modules_autoload_mode" is set to 'allowed' which is
- *    the default, then:
+ * 4) If "modules_autoload_mode" is set to 'allowed' which is the default, then:
  *
- *   3.1) If "@required_cap" is positive and "@kmod_prefix" is set, we check if
+ *   4.1) If "@required_cap" is positive and "@kmod_prefix" is set, we check if
  *   caller has the capability, if so, operation is allowed.
  *   In this case the calling subsystem requires the capability to be set before
  *   allowing modules autoload operations and we have to honor that.
  *
- *   3.2) If "@required_cap" is positive and "@kmod_prefix" is NULL, then we
+ *   4.2) If "@required_cap" is positive and "@kmod_prefix" is NULL, then we
  *   fallback to check if caller has CAP_SYS_MODULE, if so, operation is
  *   allowed.
  *
- *   3.3) If caller passes "@required_cap" as a negative then operation is
+ *   4.3) If caller passes "@required_cap" as a negative then operation is
  *   allowed. This is the most common case as it is used now by
  *   request_module() function.
  *
- *   3.4) Otherwise operation is denied.
+ *   4.4) Otherwise operation is denied.
  *
  * Returns 0 if the module request is allowed or -EPERM if not.
  */
@@ -4410,7 +4415,8 @@ int may_autoload_module(char *kmod_name, int required_cap,
 			const char *kmod_prefix)
 {
 	int module_require_cap = CAP_SYS_MODULE;
-	unsigned int autoload = modules_autoload_mode;
+	unsigned int autoload = max_t(unsigned int, modules_autoload_mode,
+				      current->modules_autoload_mode);
 
 	/* Short-cut for most use cases where kmod auto-loading is allowed */
 	if (autoload == MODULES_AUTOLOAD_ALLOWED && required_cap < 0)
@@ -4442,6 +4448,51 @@ int may_autoload_module(char *kmod_name, int required_cap,
 	return -EPERM;
 }
 
+/**
+ * task_set_modules_autoload_mode - Set per-task modules auto-load mode
+ * @value: Value to set "modules_autoload_mode" of current task
+ *
+ * Set current task "modules_autoload_mode". The task has to have
+ * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. This
+ * avoids scenarios where unprivileged tasks can affect the behaviour of
+ * privilged children by restricting module or kernel features.
+ *
+ * The task's "modules_autoload_mode" may only be increased, never decreased.
+ *
+ * Returns 0 on success, -EINVAL if @value is not valid, -EACCES if task does
+ * not have CAP_SYS_ADMIN in its namespace or is not running with no_new_privs,
+ * and finally -EPERM if @value is less strict than current task
+ * "modules_autoload_mode".
+ *
+ */
+int task_set_modules_autoload_mode(unsigned long value)
+{
+	if (value > MODULES_AUTOLOAD_DISABLED)
+		return -EINVAL;
+
+	/*
+	 * To set task "modules_autoload_mode" requires that the task has
+	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
+	 * This avoids scenarios where unprivileged tasks can affect the
+	 * behaviour of privileged children by restricting module features.
+	 */
+	if (!task_no_new_privs(current) &&
+	    security_capable_noaudit(current_cred(), current_user_ns(),
+				     CAP_SYS_ADMIN) != 0)
+		return -EACCES;
+
+	/*
+	 * The "modules_autoload_mode" may only be increased, never decreased,
+	 * ensuring that once applied, processes can never relax their settings.
+	 */
+	if (current->modules_autoload_mode > value)
+		return -EPERM;
+	else if (current->modules_autoload_mode < value)
+		current->modules_autoload_mode = value;
+
+	return 0;
+}
+
 /* Don't grab lock, we're oopsing. */
 void print_modules(void)
 {
diff --git a/security/commoncap.c b/security/commoncap.c
index 236e573..67a235c 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1157,6 +1157,36 @@ static int cap_prctl_drop(unsigned long cap)
 	return commit_creds(new);
 }
 
+/*
+ * Implement PR_SET_MODULES_AUTOLOAD_MODE.
+ *
+ * Returns 0 on success, -ve on error.
+ */
+static int pr_set_modules_autoload_mode(unsigned long arg2, unsigned long arg3,
+					unsigned long arg4, unsigned long arg5)
+{
+	if (arg3 || arg4 || arg5)
+		return -EINVAL;
+
+	return task_set_modules_autoload_mode(arg2);
+}
+
+/*
+ * Implement PR_GET_MODULES_AUTOLOAD_MODE.
+ *
+ * Return current task "modules_autoload_mode", -ve on error.
+ */
+static inline int pr_get_modules_autoload_mode(unsigned long arg2,
+					       unsigned long arg3,
+					       unsigned long arg4,
+					       unsigned long arg5)
+{
+	if (arg2 || arg3 || arg4 || arg5)
+		return -EINVAL;
+
+	return task_modules_autoload_mode(current);
+}
+
 /**
  * cap_task_prctl - Implement process control functions for this security module
  * @option: The process control function requested
@@ -1287,6 +1317,12 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 			return commit_creds(new);
 		}
 
+	case PR_SET_MODULES_AUTOLOAD_MODE:
+		return pr_set_modules_autoload_mode(arg2, arg3, arg4, arg5);
+
+	case PR_GET_MODULES_AUTOLOAD_MODE:
+		return pr_get_modules_autoload_mode(arg2, arg3, arg4, arg5);
+
 	default:
 		/* No functionality available - continue with default */
 		return -ENOSYS;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
  2017-11-27 17:18 [PATCH v5 next 0/5] Improve Module autoloading infrastructure Djalal Harouni
                   ` (3 preceding siblings ...)
  2017-11-27 17:18 ` [PATCH v5 next 4/5] modules:capabilities: add a per-task modules auto-load mode Djalal Harouni
@ 2017-11-27 17:18 ` Djalal Harouni
  2017-11-27 18:44   ` Linus Torvalds
  2017-11-27 18:41 ` [PATCH v5 next 0/5] Improve Module autoloading infrastructure Linus Torvalds
  5 siblings, 1 reply; 49+ messages in thread
From: Djalal Harouni @ 2017-11-27 17:18 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, linux-kernel, linux-security-module,
	kernel-hardening
  Cc: Jonathan Corbet, Ingo Molnar, David S. Miller, netdev,
	Peter Zijlstra, Linus Torvalds, Djalal Harouni

This uses the new request_module_cap() facility to directly propagate
CAP_NET_ADMIN capability and the 'netdev' module prefix to the
capability subsystem as it was suggested.

We do not remove the explicit capable(CAP_NET_ADMIN) check here, but we
may remove it in future versions since it is also performed by the
capability subsystem. This allows to have a better interface where other
subsystems will just use this call and let the capability subsystem
handles the permission checks, if the modules should be loaded or not.

This is also an infrastructure fix since historically Linux always
allowed to auto-load modules without privileges, and later the net code
started to check capabilities and prefixes, adapted the CAP_NET_ADMIN
check with the 'netdev' prefix to prevent abusing the capability by
loading non-netdev modules. However from a bigger picture we want to
continue to support automatic module loading as non privileged but also
implement easy policy solutions like:

User=djalal
DenyNewFeatures=no

Which will translate to allow the interactive user djalal to load extra
Linux features. Others, volatile accounts or guests can be easily
blocked from doing so. We have introduced in previous patches the
necessary infrastructure and now with this change we start to use the
new request_module_cap() function to explicitly tell the capability
subsystem that we want to auto-load modules with CAP_NET_ADMIN if they
are prefixed.

This is also based on suggestions from Rusty Russel and Kees Cook [1]

[1] https://lkml.org/lkml/2017/4/26/735

Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Andy Lutomirski <luto@kernel.org>
Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 net/core/dev_ioctl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index 7e690d0..fdd8560 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -382,8 +382,10 @@ void dev_load(struct net *net, const char *name)
 	rcu_read_unlock();
 
 	no_module = !dev;
+	/* "netdev-%s" modules are allowed if CAP_NET_ADMIN is set */
 	if (no_module && capable(CAP_NET_ADMIN))
-		no_module = request_module("netdev-%s", name);
+		no_module = request_module_cap(CAP_NET_ADMIN, "netdev",
+					       "%s", name);
 	if (no_module && capable(CAP_SYS_MODULE))
 		request_module("%s", name);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 0/5] Improve Module autoloading infrastructure
  2017-11-27 17:18 [PATCH v5 next 0/5] Improve Module autoloading infrastructure Djalal Harouni
                   ` (4 preceding siblings ...)
  2017-11-27 17:18 ` [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules Djalal Harouni
@ 2017-11-27 18:41 ` Linus Torvalds
  2017-11-27 19:02   ` Linus Torvalds
  2017-11-27 19:14   ` David Miller
  5 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2017-11-27 18:41 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 9:18 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
>
> The sysctl flag is available at "/proc/sys/kernel/modules_autoload_mode"
>
> When modules_autoload_mode is set to (0), the default, there are no
> restrictions.

So quick question: do we actually need this?

Yes, it may be the current default, but is it anything that people
actually depend on?

I'd have expected that most module loading comes from system actions
anyway, not normal users.

So I'd like to explore first whether it even makes sense to make a new option.

New options are bad because:

 - opt-in security isn't security at all

 - having to configure things is complex

so we should generally strive to _not_ need new random config options.

What are the real life use-cases for normal users having modules auto-load?

               Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
  2017-11-27 17:18 ` [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules Djalal Harouni
@ 2017-11-27 18:44   ` Linus Torvalds
  2017-11-27 21:41     ` Djalal Harouni
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2017-11-27 18:44 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 9:18 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
> This uses the new request_module_cap() facility to directly propagate
> CAP_NET_ADMIN capability and the 'netdev' module prefix to the
> capability subsystem as it was suggested.

This is the kind of complexity that I wonder if it's worth it at all.

Nobody sane actually uses those stupid capability bits. Have you ever
actually seen it used in real life?

They were a mistake, and we should never have done them - another case
of security people who think that complexity == security, when in
reality nobody actually wants the complexity or is willing to set it
up and manage it.

                   Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-27 17:18 ` [PATCH v5 next 1/5] modules:capabilities: add request_module_cap() Djalal Harouni
@ 2017-11-27 18:48   ` Randy Dunlap
  2017-11-27 21:35     ` Djalal Harouni
  2017-11-28 19:14   ` Luis R. Rodriguez
  1 sibling, 1 reply; 49+ messages in thread
From: Randy Dunlap @ 2017-11-27 18:48 UTC (permalink / raw)
  To: Djalal Harouni, Kees Cook, Andy Lutomirski, Andrew Morton,
	Luis R. Rodriguez, James Morris, Ben Hutchings, Solar Designer,
	Serge Hallyn, Jessica Yu, Rusty Russell, linux-kernel,
	linux-security-module, kernel-hardening
  Cc: Jonathan Corbet, Ingo Molnar, David S. Miller, netdev,
	Peter Zijlstra, Linus Torvalds

Hi,

Mostly typos/spellos...


On 11/27/2017 09:18 AM, Djalal Harouni wrote:
> Cc: Serge Hallyn <serge@hallyn.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
> Suggested-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
> ---
>  include/linux/kmod.h      | 65 ++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/lsm_hooks.h |  6 ++++-
>  include/linux/security.h  |  7 +++--
>  kernel/kmod.c             | 29 ++++++++++++++++-----
>  security/security.c       |  6 +++--
>  security/selinux/hooks.c  |  3 ++-
>  6 files changed, 97 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/kmod.h b/include/linux/kmod.h
> index 40c89ad..ccd6a1c 100644
> --- a/include/linux/kmod.h
> +++ b/include/linux/kmod.h
> @@ -33,16 +33,67 @@

> +/**
> + * request_module  Try to load a kernel module
> + *
> + * Automatically loads the request module.
> + *
> + * @mod...: The module name
> + */

what are the "..." for?  what do they do here?

> +#define request_module(mod...) __request_module(true, -1, NULL, mod)
> +
> +#define request_module_nowait(mod...) __request_module(false, -1, NULL, mod)
> +
> +/**
> + * request_module_cap  Load kernel module only if the required capability is set
> + *
> + * Automatically load a module if the required capability is set and it
> + * corresponds to the appropriate subsystem that is indicated by prefix.
> + *
> + * This allows to load aliased modules like 'netdev-%s' with CAP_NET_ADMIN.
> + *
> + * ex:
> + *	request_module_cap(CAP_NET_ADMIN, "netdev", "%s", mod);
> + *
> + * @required_cap: Required capability to load the module
> + * @prefix: The module prefix if any, otherwise NULL
> + * @fmt: printf style format string for the name of the module with its
> + *       arguments if any
> + *
> + * If '@required_cap' is positive, the security subsystem will check if
> + * '@prefix' is set and if caller has the required capability then the
> + * operation is allowed.
> + * The security subsystem can not make assumption about the boundaries
> + * of other subsystems, it is their responsability to make a call with

                                       responsibility

> + * the right capability and module alias.
> + *
> + * If '@required_cap' is positive and '@prefix' is NULL then we assume
> + * that the '@required_cap' is CAP_SYS_MODULE.
> + *
> + * If '@required_cap' is negative then there are no permission checks, this
> + * is the equivalent to request_module() function.
> + *
> + * This function trust callers to pass the right capability with the

                    trusts

> + * appropriate prefix.
> + *
> + * Note: the permission checks may still fail, even if the required
> + * capability is negative, this is due to module loading restrictions

                    negative; this

> + * that are controlled by the enduser.
> + */
> +#define request_module_cap(required_cap, prefix, fmt...) \
> +	__request_module(true, required_cap, prefix, fmt)
> +
>  #endif /* __LINUX_KMOD_H__ */
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 7161d8e..d898bd0 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -571,6 +571,9 @@
>   *	Ability to trigger the kernel to automatically upcall to userspace for
>   *	userspace to load a kernel module with the given name.
>   *	@kmod_name name of the module requested by the kernel
> + *	@required_cap If positive, the required capability to automatically load
> + *	the correspondig kernel module.

            corresponding

> + *	@prefix The prefix of the module if any. Can be NULL.
>   *	Return 0 if successful.
>   * @kernel_read_file:
>   *	Read a file specified by userspace.

> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index bc6addd..679d401 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -109,6 +109,8 @@ static int call_modprobe(char *module_name, int wait)
>  /**
>   * __request_module - try to load a kernel module
>   * @wait: wait (or not) for the operation to complete
> + * @required_cap: required capability to load the module
> + * @prefix: module prefix if any otherwise NULL
>   * @fmt: printf style format string for the name of the module
>   * @...: arguments as specified in the format string
>   *
> @@ -119,14 +121,20 @@ static int call_modprobe(char *module_name, int wait)
>   * must check that the service they requested is now available not blindly
>   * invoke it.
>   *
> - * If module auto-loading support is disabled then this function
> - * becomes a no-operation.
> + * If "required_cap" is positive, The security subsystem will trust the caller

                                     the

> + * that if it has "required_cap" then it may allow to load some modules that
> + * have a specific alias.
> + *
> + * If module auto-loading support is disabled by clearing the modprobe path
> + * then this function becomes a no-operation.
>   */
> -int __request_module(bool wait, const char *fmt, ...)
> +int __request_module(bool wait, int required_cap,
> +		     const char *prefix, const char *fmt, ...)
>  {
>  	va_list args;
>  	char module_name[MODULE_NAME_LEN];
>  	int ret;
> +	int len = 0;
>  
>  	/*
>  	 * We don't allow synchronous module loading from async.  Module
> @@ -139,13 +147,22 @@ int __request_module(bool wait, const char *fmt, ...)
>  	if (!modprobe_path[0])
>  		return 0;
>  
> +	/*
> +	 * Lets attach the prefix to the module name

Let's
but better:
	 * Attach the prefix to the module name

> +	 */
> +	if (prefix != NULL && *prefix != '\0') {
> +		len += snprintf(module_name, MODULE_NAME_LEN, "%s-", prefix);
> +		if (len >= MODULE_NAME_LEN)
> +			return -ENAMETOOLONG;
> +	}
> +
>  	va_start(args, fmt);
> -	ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
> +	ret = vsnprintf(module_name + len, MODULE_NAME_LEN - len, fmt, args);
>  	va_end(args);
> -	if (ret >= MODULE_NAME_LEN)
> +	if (ret >= MODULE_NAME_LEN - len)
>  		return -ENAMETOOLONG;
>  
> -	ret = security_kernel_module_request(module_name);
> +	ret = security_kernel_module_request(module_name, required_cap, prefix);
>  	if (ret)
>  		return ret;
>  


-- 
~Randy

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 0/5] Improve Module autoloading infrastructure
  2017-11-27 18:41 ` [PATCH v5 next 0/5] Improve Module autoloading infrastructure Linus Torvalds
@ 2017-11-27 19:02   ` Linus Torvalds
  2017-11-27 19:12     ` Linus Torvalds
  2017-11-27 19:14   ` David Miller
  1 sibling, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2017-11-27 19:02 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 10:41 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> What are the real life use-cases for normal users having modules auto-load?

Well, I could do some testing. One case is apparently both bluetoothd
and ip6tables that have been converted to indeed use CAP_NET_ADMIN.

Annoying.

Still, can't we just say "you need to have capabilities" and leave it at that?

Something (UNTESTED and whitespace damaged) like this:

  diff --git a/kernel/kmod.c b/kernel/kmod.c
  index bc6addd9152b..a3f3218f66c6 100644
  --- a/kernel/kmod.c
  +++ b/kernel/kmod.c
  @@ -139,6 +139,11 @@ int __request_module(bool wait, const char *fmt, ...)
           if (!modprobe_path[0])
                   return 0;

  +        if (WARN_ON_ONCE(!capable(CAP_SYS_MODULE) ||
  +                         !capable(CAP_SYS_ADMIN) ||
  +                         !capable(CAP_NET_ADMIN)))
  +                return -EPERM;
  +
           va_start(args, fmt);
           ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
           va_end(args);

instead of adding complex infrastructure for something people might
not want anyway?

Now, the above will not necessarily work with a legacy /dev/ directory
where al the nodes have been pre-populated, and opening the device
node is supposed to load the module. So _historically_ we did indeed
load modules as normal users. But does that really happen any more?

I'd hate to default to historical behavior if there's no actual reason to do so.

                  Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 0/5] Improve Module autoloading infrastructure
  2017-11-27 19:02   ` Linus Torvalds
@ 2017-11-27 19:12     ` Linus Torvalds
  2017-11-27 21:31       ` Djalal Harouni
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2017-11-27 19:12 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 11:02 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Now, the above will not necessarily work with a legacy /dev/ directory
> where al the nodes have been pre-populated, and opening the device
> node is supposed to load the module. So _historically_ we did indeed
> load modules as normal users. But does that really happen any more?

Sadly, it looks like bluetoothd actually does expect to load the
bt-proto-XYZ modules with no capabilities at all.

So apparently we really do depend on not needing capabilities for
module loading.

Oh well.

                 Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 0/5] Improve Module autoloading infrastructure
  2017-11-27 18:41 ` [PATCH v5 next 0/5] Improve Module autoloading infrastructure Linus Torvalds
  2017-11-27 19:02   ` Linus Torvalds
@ 2017-11-27 19:14   ` David Miller
  2017-11-27 22:31     ` James Morris
  1 sibling, 1 reply; 49+ messages in thread
From: David Miller @ 2017-11-27 19:14 UTC (permalink / raw)
  To: torvalds
  Cc: tixxdz, keescook, luto, akpm, mcgrof, james.l.morris,
	ben.hutchings, solar, serge, jeyu, rusty, linux-kernel,
	linux-security-module, kernel-hardening, corbet, mingo, netdev,
	peterz

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 27 Nov 2017 10:41:30 -0800

> What are the real life use-cases for normal users having modules
> auto-load?

User opens SCTP socket, SCTP protocol module loads.

People build test cases via namespaces, and in that namespaces normal
users can setup virtual tunnel devices themselves, and those configure
operations can bring the tunnel module in.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 0/5] Improve Module autoloading infrastructure
  2017-11-27 19:12     ` Linus Torvalds
@ 2017-11-27 21:31       ` Djalal Harouni
  0 siblings, 0 replies; 49+ messages in thread
From: Djalal Harouni @ 2017-11-27 21:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

Hi Linus,

On Mon, Nov 27, 2017 at 8:12 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 11:02 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Now, the above will not necessarily work with a legacy /dev/ directory
>> where al the nodes have been pre-populated, and opening the device
>> node is supposed to load the module. So _historically_ we did indeed
>> load modules as normal users. But does that really happen any more?
>
> Sadly, it looks like bluetoothd actually does expect to load the
> bt-proto-XYZ modules with no capabilities at all.
>
> So apparently we really do depend on not needing capabilities for
> module loading.
>
> Oh well.

Yes DCCP is unprivileged, tun and all tunneling, some md drivers, some
crypto, and device drivers... fs modules can be loaded inside
usernamespaces, and maybe when some request requires external symbols
too...

However tunneling helps to solve real usecases, so that's why the
backward compatibility and opt-in.

I do perfectly understand that opt-in is not the best choice, however
these patchset includes a per process tree, and given that lot of code
is running in containers and sandboxes, it is better than nothing. I
will follow up later with patches to the major ones especially when we
force the flag by default. Ubuntu was said to be owned in a past
security contest due to this kind of things, and now since they have
ubuntu snaps or apps they can set the flag, and others will follow.

Thanks!

>                  Linus



-- 
tixxdz

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-27 18:48   ` Randy Dunlap
@ 2017-11-27 21:35     ` Djalal Harouni
  0 siblings, 0 replies; 49+ messages in thread
From: Djalal Harouni @ 2017-11-27 21:35 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, linux-kernel, LSM List,
	kernel-hardening, Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

Hi Randy,

On Mon, Nov 27, 2017 at 7:48 PM, Randy Dunlap <rdunlap@infradead.org> wrote:
> Hi,
>
> Mostly typos/spellos...
>
>
> On 11/27/2017 09:18 AM, Djalal Harouni wrote:
>> Cc: Serge Hallyn <serge@hallyn.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
>> Suggested-by: Kees Cook <keescook@chromium.org>
>> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
>> ---
>>  include/linux/kmod.h      | 65 ++++++++++++++++++++++++++++++++++++++++++-----
>>  include/linux/lsm_hooks.h |  6 ++++-
>>  include/linux/security.h  |  7 +++--
>>  kernel/kmod.c             | 29 ++++++++++++++++-----
>>  security/security.c       |  6 +++--
>>  security/selinux/hooks.c  |  3 ++-
>>  6 files changed, 97 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/kmod.h b/include/linux/kmod.h
>> index 40c89ad..ccd6a1c 100644
>> --- a/include/linux/kmod.h
>> +++ b/include/linux/kmod.h
>> @@ -33,16 +33,67 @@
>
>> +/**
>> + * request_module  Try to load a kernel module
>> + *
>> + * Automatically loads the request module.
>> + *
>> + * @mod...: The module name
>> + */
>
> what are the "..." for?  what do they do here?

Ok, will fix it.

>
>> +#define request_module(mod...) __request_module(true, -1, NULL, mod)
>> +
>> +#define request_module_nowait(mod...) __request_module(false, -1, NULL, mod)
>> +
>> +/**
>> + * request_module_cap  Load kernel module only if the required capability is set
>> + *
[...]
>
>
> --
> ~Randy

Thank you very much for the review, will fix all.


-- 
tixxdz

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
  2017-11-27 18:44   ` Linus Torvalds
@ 2017-11-27 21:41     ` Djalal Harouni
  2017-11-27 22:04       ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Djalal Harouni @ 2017-11-27 21:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

Hi Linus,

On Mon, Nov 27, 2017 at 7:44 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 9:18 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
>> This uses the new request_module_cap() facility to directly propagate
>> CAP_NET_ADMIN capability and the 'netdev' module prefix to the
>> capability subsystem as it was suggested.
>
> This is the kind of complexity that I wonder if it's worth it at all.
>
> Nobody sane actually uses those stupid capability bits. Have you ever
> actually seen it used in real life?

Yes they are complicated even for developers, and normal users do not
understand them, however yes every sandbox and container is exposing
them to endusers directly, they are documented!  so yes CAP_SYS_MODULE
is exposed but it does not cover autoloading.

However, we are trying hard to abstract some semantics that are easy
to grasp, we are mutating capabilities and seccomp to have an
abstracted "yes/no" options for our endusers.

Now, if you are referring to kernel code, the networking subsystem is
using them and I don't want to break any assumption here. There is
still the request_module(), the request_module_cap() was suggested so
networking code later won't have to do the checks on its own, and
maybe it can be consistent in the long term. The phonet sockets even
needs CAP_SYS_ADMIN...


>
> They were a mistake, and we should never have done them - another case
> of security people who think that complexity == security, when in
> reality nobody actually wants the complexity or is willing to set it
> up and manage it.

Alright, but I guess we are stuck, is there something better on how we
can do this or describe this ?


Please note in these patches, the mode is specifically described as:

* allowed: for backward compatibility  (I would have done without it)
* privileged: which includes capabilities (backward compatibility too)
or we can add what ever in the future
* disabled: even for privileged.

So I would have preferred if it is something like "yes/no" but...
However in userspace we will try hard to hide this complexity and the
capability bits.

Now I can see that the code comments and doc refer to privileged with
capabilities a lot, where we can maybe update that doc and code to
less state that privileged means capabilities ? Suggestions ?

Thanks!

>                    Linus


-- 
tixxdz

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
  2017-11-27 21:41     ` Djalal Harouni
@ 2017-11-27 22:04       ` Linus Torvalds
  2017-11-27 22:59         ` Kees Cook
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2017-11-27 22:04 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 1:41 PM, Djalal Harouni <tixxdz@gmail.com> wrote:
>
> However, we are trying hard to abstract some semantics that are easy
> to grasp, we are mutating capabilities and seccomp to have an
> abstracted "yes/no" options for our endusers.

Yes.

Sadly, it looks like we actually do have users that just expect to
load modules dynamically without any capabilities at all.

So we can't actually disallow it by default at all, which imho makes
this security option essentially useless.

A security option that people can't use without breaking their system
is pointless.

We saw that with SELinux - people ended up just disabling it for
_years_, simply because it ended up breaking so much in practice. And
yes, it got fixed eventually, but at an incredibly high maintenance
cost of all the crazy rules databases.

> Alright, but I guess we are stuck, is there something better on how we
> can do this or describe this ?

So I wonder if we can perhaps look at the places that actually do
"requerst_module()", and start filtering them on that basis.

Some of them will already have checked for capabilities.

Others clearly expect to juist work even _without_ capabilities (ie
the bluetoothd case).

So the whole "let's add a global config option" model is broken. There
is no possible global rule. It will break things, which in turn mean
that people won't turn it on (and we can't turn it on by default),
which in turn makes this pointless.

In other words, I really think that anything that just adds a mode
flag cannot work.

So instead of having one "modules_autoload_mode" thing, maybe the
individual requerst_module() cases need to simply be audited.

Put another way: I think the part of your patch series that does that
"request_module_cap()" and makes the netdev modules use it is a good
addition.

It's the "mode" part I really don't agree with, because apparently we
really need to default it to permissive.

So how about instead:

 - add that "request_module_cap()" and make the networking code that
already uses CAP_ADMIN_NET use it.

 - make "request_module()" itself default to being
"request_module_cap(CAP_SYS_MODULE,..)"

 - make sure that when the capability check fails, we print an error
message, and then for the ones that trigger, we will audit them and
see if it's ok.

Because that "mode" flag defaulting to off will just mean that the
default case will remain the existing unsafe one, and that's bad.

Opt-in really doesn't work. We've done it.

Global flags for varied behavior really doesn't work. We've done that
too. Different cases want different behavior, the global flag is just
fundamentally broken.

              Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 0/5] Improve Module autoloading infrastructure
  2017-11-27 19:14   ` David Miller
@ 2017-11-27 22:31     ` James Morris
  2017-11-27 23:04       ` Kees Cook
  0 siblings, 1 reply; 49+ messages in thread
From: James Morris @ 2017-11-27 22:31 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, tixxdz, keescook, luto, akpm, mcgrof, ben.hutchings,
	solar, serge, jeyu, rusty, linux-kernel, linux-security-module,
	kernel-hardening, corbet, mingo, netdev, peterz

On Tue, 28 Nov 2017, David Miller wrote:

> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Mon, 27 Nov 2017 10:41:30 -0800
> 
> > What are the real life use-cases for normal users having modules
> > auto-load?
> 
> User opens SCTP socket, SCTP protocol module loads.
> 
> People build test cases via namespaces, and in that namespaces normal
> users can setup virtual tunnel devices themselves, and those configure
> operations can bring the tunnel module in.

What about implementing a white list of modules which are able to be 
loaded by unprivileged users?

Then, Linus' solution would look something like:

	va_start(args, fmt);
	ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
	va_end(args);

	if (WARN_ON_ONCE(!capable(CAP_SYS_MODULE) ||
                         !capable(CAP_SYS_ADMIN) ||
                         !capable(CAP_NET_ADMIN) ||
                         !unprivileged_autoload(module_name)))
		return -EPERM;



-- 
James Morris
<james.l.morris@oracle.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
  2017-11-27 22:04       ` Linus Torvalds
@ 2017-11-27 22:59         ` Kees Cook
  2017-11-27 23:14           ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Kees Cook @ 2017-11-27 22:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Djalal Harouni, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 2:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 1:41 PM, Djalal Harouni <tixxdz@gmail.com> wrote:
>>
>> However, we are trying hard to abstract some semantics that are easy
>> to grasp, we are mutating capabilities and seccomp to have an
>> abstracted "yes/no" options for our endusers.
>
> Yes.
>
> Sadly, it looks like we actually do have users that just expect to
> load modules dynamically without any capabilities at all.
>
> So we can't actually disallow it by default at all, which imho makes
> this security option essentially useless.

The good news here is that this series comes with an expected user:
systemd, for which I suspect this will option will get wider and wider
use.

> A security option that people can't use without breaking their system
> is pointless.

Another bit of good news is that systems _depending_ cap-less module
loading are uncommon enough that many system builders can enable this
protection and nothing will break. (Even you thought it was rare
enough that we could just make this default-enabled.)

Besides, distros understand that they have to keep an eye out for
these things since the kernel has a long history of not allowing
default-enabled protections that change userspace behavior, etc.

> So I wonder if we can perhaps look at the places that actually do
> "requerst_module()", and start filtering them on that basis.
>
> Some of them will already have checked for capabilities.
>
> Others clearly expect to just work even _without_ capabilities (ie
> the bluetoothd case).
>
> So the whole "let's add a global config option" model is broken. There
> is no possible global rule. It will break things, which in turn mean
> that people won't turn it on (and we can't turn it on by default),
> which in turn makes this pointless.
>
> In other words, I really think that anything that just adds a mode
> flag cannot work.
>
> So instead of having one "modules_autoload_mode" thing, maybe the
> individual requerst_module() cases need to simply be audited.
>
> Put another way: I think the part of your patch series that does that
> "request_module_cap()" and makes the netdev modules use it is a good
> addition.
>
> It's the "mode" part I really don't agree with, because apparently we
> really need to default it to permissive.
>
> So how about instead:
>
>  - add that "request_module_cap()" and make the networking code that
> already uses CAP_ADMIN_NET use it.
>
>  - make "request_module()" itself default to being
> "request_module_cap(CAP_SYS_MODULE,..)"
>
>  - make sure that when the capability check fails, we print an error
> message, and then for the ones that trigger, we will audit them and
> see if it's ok.

The issue isn't "is it always okay to cap-less-load this module?" it's
"should the kernel's attack surface be allowed to grown by an
unprivileged user?" Nearly all the request_module() sites have already
been audited and adjusted to avoid _arbitrary_ module loading (usually
by adding module prefixes: netdev-, crypto-, etc), and that greatly
helped reduce the ability for a unprivileged user to load a
known-buggy module, but there will remain many subsystems that
continue to expect to grow the kernel's features on demand within
their subsystem, and sometimes those modules will have bugs that an
unprivileged user can exploit (this history of this happening is quite
real; it's not a theoretical concern). This is especially problematic
for distro kernels where they build tons of modules.

There isn't a way for an admin to say "I only want root to be able to
load modules". They can't delete all the "unused" kernel modules,
because maybe root will want the module, they can't change module file
permissions because the modules are read with init's permissions (and
it would be terrible to repeatedly change all the file perms each time
a new kernel got installed), modules can only be blacklisted (not
whitelisted) in /etc/modprobe.d/, etc.

> Because that "mode" flag defaulting to off will just mean that the
> default case will remain the existing unsafe one, and that's bad.
>
> Opt-in really doesn't work. We've done it.
>
> Global flags for varied behavior really doesn't work. We've done that
> too. Different cases want different behavior, the global flag is just
> fundamentally broken.

I don't disagree that a global should be avoided, but I'm struggling
to see another option here. We can't break userspace by default so we
can't restrict cap-less loading by default. But we can allow userspace
to _choose_ to break itself, especially within a container. This isn't
uncommon, especially for modules, where we even have the global
"modules_disabled" sysctl already. The level of granularity of control
here is the issue, and it's what this series solves.

The options I see for module loading control are:
1) monolithic kernel (no modules)
2) modular kernel that flips on modules_disabled after boot (no
modules after boot)
3) modular kernel that allows per-subsystem unpriv module loading (all
modules loadable)

There is a demand for something between 2 and 3 where only root can
load modules. (And as pointed out in the series, this is _especially_
true for containers where the admin may want to leave module loading
alone in the init namespace, but stop any module loading in the
container.)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 0/5] Improve Module autoloading infrastructure
  2017-11-27 22:31     ` James Morris
@ 2017-11-27 23:04       ` Kees Cook
  2017-11-27 23:44         ` James Morris
  0 siblings, 1 reply; 49+ messages in thread
From: Kees Cook @ 2017-11-27 23:04 UTC (permalink / raw)
  To: James Morris, Linus Torvalds
  Cc: David Miller, Djalal Harouni, Andy Lutomirski, Andrew Morton,
	Luis R. Rodriguez, Ben Hutchings, Solar Designer, Serge E. Hallyn,
	Jessica Yu, Rusty Russell, LKML, linux-security-module,
	kernel-hardening, Jonathan Corbet, Ingo Molnar,
	Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 2:31 PM, James Morris <james.l.morris@oracle.com> wrote:
> On Tue, 28 Nov 2017, David Miller wrote:
>
>> From: Linus Torvalds <torvalds@linux-foundation.org>
>> Date: Mon, 27 Nov 2017 10:41:30 -0800
>>
>> > What are the real life use-cases for normal users having modules
>> > auto-load?
>>
>> User opens SCTP socket, SCTP protocol module loads.
>>
>> People build test cases via namespaces, and in that namespaces normal
>> users can setup virtual tunnel devices themselves, and those configure
>> operations can bring the tunnel module in.
>
> What about implementing a white list of modules which are able to be
> loaded by unprivileged users?
>
> Then, Linus' solution would look something like:
>
>         va_start(args, fmt);
>         ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
>         va_end(args);
>
>         if (WARN_ON_ONCE(!capable(CAP_SYS_MODULE) ||
>                          !capable(CAP_SYS_ADMIN) ||
>                          !capable(CAP_NET_ADMIN) ||
>                          !unprivileged_autoload(module_name)))
>                 return -EPERM;

We have some of this already with the module prefixes. Doing this
per-module would need to be exported to userspace, I think. It'd be
way too fragile sitting in the kernel.

To control this via modprobe, we'd need to expand modprobe to include
the user that is trying to load the module (so it can reason about who
is doing it), and then teach modprobe about that so the policy could
be exported to userspace.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
  2017-11-27 22:59         ` Kees Cook
@ 2017-11-27 23:14           ` Linus Torvalds
  2017-11-27 23:19             ` Kees Cook
  2017-11-28  1:23             ` Kees Cook
  0 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2017-11-27 23:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: Djalal Harouni, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
>
> I don't disagree that a global should be avoided, but I'm struggling
> to see another option here. We can't break userspace by default so we
> can't restrict cap-less loading by default. But we can allow userspace
> to _choose_ to break itself, especially within a container. This isn't
> uncommon, especially for modules, where we even have the global
> "modules_disabled" sysctl already. The level of granularity of control
> here is the issue, and it's what this series solves.

So there's two "global" here

 - if a container were to choose to break itself, it should damn well
be container-specific, not some global option

   This part seems to be ok in the patch series, since the "global" is
really per-task. So it's not global in the "system-wide" sense.

 - if _one_ request_module() caller were to say "I don't want to be
loaded by a normal user", that doesn't mean that _other_
request_module() cases shouldn't.

   This is the part I'm objecting to, because it means that we can't
enable this stricter policy by default.

And the thing is, the patch series seems to already introduce largely
the better model of just making it site-specific. Introducing that
request_module_cap() thing and then using it for networking is a good
step.

But I also suspect that we _could_ just make the stricter rules
actually be default, if we just fixed the thing up to not be "every
request_module() is the same".

For example, several request_module() calls come from device node
opens, and it makes sense that we can just say: "if you have access to
the device node, then you have the right to request the module".

But that would need to be not a global "request_module()" behavior,
but a behavior that is tied to the particular call-site.

IOW, extend on that request_module_cap() model, and introduce
(perhaps) a "request_module_dev()" call that basically means "the user
opened the device node for the requested module".

Because those kinds of permissions aren't necessarily about
capabilities, but about things like "I'm in the dialout group, I get
to open tty devices and by implication request their modules".

And that _really_ isn't global behavior.  The fact that I might be
able to load a serial; module has *nothing* to do with whether I can
load some other kind of module at all.

That global mode is just wrong.

                Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
  2017-11-27 23:14           ` Linus Torvalds
@ 2017-11-27 23:19             ` Kees Cook
  2017-11-27 23:35               ` Linus Torvalds
  2017-11-28  1:23             ` Kees Cook
  1 sibling, 1 reply; 49+ messages in thread
From: Kees Cook @ 2017-11-27 23:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Djalal Harouni, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 3:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> I don't disagree that a global should be avoided, but I'm struggling
>> to see another option here. We can't break userspace by default so we
>> can't restrict cap-less loading by default. But we can allow userspace
>> to _choose_ to break itself, especially within a container. This isn't
>> uncommon, especially for modules, where we even have the global
>> "modules_disabled" sysctl already. The level of granularity of control
>> here is the issue, and it's what this series solves.
>
> So there's two "global" here
>
>  - if a container were to choose to break itself, it should damn well
> be container-specific, not some global option
>
>    This part seems to be ok in the patch series, since the "global" is
> really per-task. So it's not global in the "system-wide" sense.
>
>  - if _one_ request_module() caller were to say "I don't want to be
> loaded by a normal user", that doesn't mean that _other_
> request_module() cases shouldn't.
>
>    This is the part I'm objecting to, because it means that we can't
> enable this stricter policy by default.
>
> And the thing is, the patch series seems to already introduce largely
> the better model of just making it site-specific. Introducing that
> request_module_cap() thing and then using it for networking is a good
> step.
>
> But I also suspect that we _could_ just make the stricter rules
> actually be default, if we just fixed the thing up to not be "every
> request_module() is the same".
>
> For example, several request_module() calls come from device node
> opens, and it makes sense that we can just say: "if you have access to
> the device node, then you have the right to request the module".
>
> But that would need to be not a global "request_module()" behavior,
> but a behavior that is tied to the particular call-site.
>
> IOW, extend on that request_module_cap() model, and introduce
> (perhaps) a "request_module_dev()" call that basically means "the user
> opened the device node for the requested module".
>
> Because those kinds of permissions aren't necessarily about
> capabilities, but about things like "I'm in the dialout group, I get
> to open tty devices and by implication request their modules".
>
> And that _really_ isn't global behavior.  The fact that I might be
> able to load a serial; module has *nothing* to do with whether I can
> load some other kind of module at all.
>
> That global mode is just wrong.

What about exporting this entirely to userspace, giving it as much
context as possible? i.e. inform modprobe about the user doing it,
maybe the subsystem, etc?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
  2017-11-27 23:19             ` Kees Cook
@ 2017-11-27 23:35               ` Linus Torvalds
  0 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2017-11-27 23:35 UTC (permalink / raw)
  To: Kees Cook
  Cc: Djalal Harouni, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 3:19 PM, Kees Cook <keescook@chromium.org> wrote:
>
> What about exporting this entirely to userspace, giving it as much
> context as possible? i.e. inform modprobe about the user doing it,
> maybe the subsystem, etc?

Yeah, except for the fact that we don't trust user-mode?

We used to do that exact thing. It was a nasty disaster, and caused
version skew and other horrible problems.

So no. Th e"let's just let user mode sort it out" doesn't work. User
mode doesn't sort anything out, it just makes it worse.

It's not some made-up example when I say that user-mode has decided
that kernel requests have to be completely serialized, and recusive
invocations will just hang.

So no. We do not go down that particular rat-hole. It's just a bigger
chance of getting things wrong.

                Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 0/5] Improve Module autoloading infrastructure
  2017-11-27 23:04       ` Kees Cook
@ 2017-11-27 23:44         ` James Morris
  0 siblings, 0 replies; 49+ messages in thread
From: James Morris @ 2017-11-27 23:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, David Miller, Djalal Harouni, Andy Lutomirski,
	Andrew Morton, Luis R. Rodriguez, Ben Hutchings, Solar Designer,
	Serge E. Hallyn, Jessica Yu, Rusty Russell, LKML,
	linux-security-module, kernel-hardening, Jonathan Corbet,
	Ingo Molnar, Network Development, Peter Zijlstra

On Mon, 27 Nov 2017, Kees Cook wrote:

> >         if (WARN_ON_ONCE(!capable(CAP_SYS_MODULE) ||
> >                          !capable(CAP_SYS_ADMIN) ||
> >                          !capable(CAP_NET_ADMIN) ||
> >                          !unprivileged_autoload(module_name)))

(Side note: the capable() calls would ideally come after the whitelist 
check).

> We have some of this already with the module prefixes. Doing this
> per-module would need to be exported to userspace, I think. It'd be
> way too fragile sitting in the kernel.

What about writing a whitelist to /proc (per-task) or /sys/fs (global) ?

The per-task whitelist is inherited from the global one by default, or 
from a parent process if it's been modified in the parent.


-- 
James Morris
<james.l.morris@oracle.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
  2017-11-27 23:14           ` Linus Torvalds
  2017-11-27 23:19             ` Kees Cook
@ 2017-11-28  1:23             ` Kees Cook
  1 sibling, 0 replies; 49+ messages in thread
From: Kees Cook @ 2017-11-28  1:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Djalal Harouni, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	David S. Miller, Network Development, Peter Zijlstra

On Mon, Nov 27, 2017 at 3:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> I don't disagree that a global should be avoided, but I'm struggling
>> to see another option here. We can't break userspace by default so we
>> can't restrict cap-less loading by default. But we can allow userspace
>> to _choose_ to break itself, especially within a container. This isn't
>> uncommon, especially for modules, where we even have the global
>> "modules_disabled" sysctl already. The level of granularity of control
>> here is the issue, and it's what this series solves.
>
> So there's two "global" here
>
>  - if a container were to choose to break itself, it should damn well
> be container-specific, not some global option
>
>    This part seems to be ok in the patch series, since the "global" is
> really per-task. So it's not global in the "system-wide" sense.

Right, though in the case of init, it could flip that toggle for
itself and it would then effectively be system-wide.

>  - if _one_ request_module() caller were to say "I don't want to be
> loaded by a normal user", that doesn't mean that _other_
> request_module() cases shouldn't.
>
>    This is the part I'm objecting to, because it means that we can't
> enable this stricter policy by default.

Okay, I see what you mean here. You want to clearly distinguish
between unprivileged and privileged-only. I'm unconvinced that's going
to change much, as the bulk of the exposed request_module() users are
already expecting to be unprivileged (and that's why they were all
converted to requiring a named prefix).

> And the thing is, the patch series seems to already introduce largely
> the better model of just making it site-specific. Introducing that
> request_module_cap() thing and then using it for networking is a good
> step.
>
> But I also suspect that we _could_ just make the stricter rules
> actually be default, if we just fixed the thing up to not be "every
> request_module() is the same".

When doing some of the older module name prefix work, I did consider
introducing a new request_module() API that included the prefix name
as an explicit argument (instead of embedding it in the format
string). We could easily start there, and then have "plain"
request_module() require privs. But we'll still need a way to say
"admin doesn't want unpriv module auto-loading".

> For example, several request_module() calls come from device node
> opens, and it makes sense that we can just say: "if you have access to
> the device node, then you have the right to request the module".

Many of these callers are using network interfaces to do this work, so
there isn't as clean a permission model associated with those like
there might be with a filesystem open(). But that doesn't matter (see
below).

> But that would need to be not a global "request_module()" behavior,
> but a behavior that is tied to the particular call-site.
>
> IOW, extend on that request_module_cap() model, and introduce
> (perhaps) a "request_module_dev()" call that basically means "the user
> opened the device node for the requested module".
>
> Because those kinds of permissions aren't necessarily about
> capabilities, but about things like "I'm in the dialout group, I get
> to open tty devices and by implication request their modules".

This really doesn't address the main concern that is the problem:
whitelisting vs blacklisting. In your example, the dialout group gives
access to specific ttys or serial ports, etc, but an admin may want a
way to make sure the users don't load some buggy line discipline. Now,
that admin could blacklist all those modules one at a time, but new
stuff might get introduced, it doesn't handle other subsystems, etc.

We need to provide a way to whitelist autoloaded modules. The
demonstrated need (to whitelist _no_ modules, addressed by this
series) provides that level of control on a task basis (effectively
making it container-specific).

> And that _really_ isn't global behavior.  The fact that I might be
> able to load a serial; module has *nothing* to do with whether I can
> load some other kind of module at all.
>
> That global mode is just wrong.

If the per-task thing stays and the global sysctl goes, that would be
fine by me. That still gives admins a way to control the autoload
behavior, assuming their init knows how to set the flag. The global
sysctl, in my mind, is really more of a way for an admin to do this
after the fact without rebooting, etc. But I don't have a strong
opinion about the global sysctl.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-27 17:18 ` [PATCH v5 next 1/5] modules:capabilities: add request_module_cap() Djalal Harouni
  2017-11-27 18:48   ` Randy Dunlap
@ 2017-11-28 19:14   ` Luis R. Rodriguez
  2017-11-28 20:11     ` Kees Cook
  2017-11-28 20:18     ` Djalal Harouni
  1 sibling, 2 replies; 49+ messages in thread
From: Luis R. Rodriguez @ 2017-11-28 19:14 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, linux-kernel, linux-security-module,
	kernel-hardening, Jonathan Corbet, Ingo Molnar, David S. Miller,
	netdev, Peter Zijlstra, Linus Torvalds

On Mon, Nov 27, 2017 at 06:18:34PM +0100, Djalal Harouni wrote:
...

> After a discussion with Rusty Russell [1], the suggestion was to pass
> the capability from request_module() to security_kernel_module_request()
> for 'netdev-%s' modules that need CAP_NET_ADMIN, and after review from
> Kees Cook [2] and experimenting with the code, the patch now does the
> following:
> 
> * Adds the request_module_cap() function.
> * Updates the __request_module() to take the "required_cap" argument
>         with the "prefix".

...

> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
> ---
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index bc6addd..679d401 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -139,13 +147,22 @@ int __request_module(bool wait, const char *fmt, ...)
>  	if (!modprobe_path[0])
>  		return 0;
>  
> +	/*
> +	 * Lets attach the prefix to the module name
> +	 */
> +	if (prefix != NULL && *prefix != '\0') {
> +		len += snprintf(module_name, MODULE_NAME_LEN, "%s-", prefix);
> +		if (len >= MODULE_NAME_LEN)
> +			return -ENAMETOOLONG;
> +	}
> +
>  	va_start(args, fmt);
> -	ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
> +	ret = vsnprintf(module_name + len, MODULE_NAME_LEN - len, fmt, args);
>  	va_end(args);
> -	if (ret >= MODULE_NAME_LEN)
> +	if (ret >= MODULE_NAME_LEN - len)
>  		return -ENAMETOOLONG;
>  
> -	ret = security_kernel_module_request(module_name);
> +	ret = security_kernel_module_request(module_name, required_cap, prefix);
>  	if (ret)
>  		return ret;
>  

kmod is just a helper to poke userpsace to load a module, that's it.

The old init_module() and newer finit_module() do the real handy work or
module loading, and both currently only use may_init_module():

static int may_init_module(void)
{
	if (!capable(CAP_SYS_MODULE) || modules_disabled)
		return -EPERM;                                                  

	return 0;
}

This begs the question:

  o If userspace just tries to just use raw finit_module() do we want similar
    checks?

Otherwise, correct me if I'm wrong this all seems pointless.

If we want something similar I think we might need to be processing aliases and
check for the aliases for their desired restrictions on finit_module(),
otherwise userspace can skip through the checks if the module name does not
match the alias prefix.

To be clear, aliases are completely ignored today on load_module(), so loading
'xfs' with finit_module() will just have the kernel know about 'xfs' not
'fs-xfs'.

So we currently do not process aliases in kernel.

I have debugging patches to enable us to process them, but they are just for
debugging and I've been meaning to send them in for review. I designed them
only for debugging given last time someone suggested for aliases processing to
be added, the only use case we found was a pre-optimizations we decided to avoid
pursuing. Debugging is a good reason to have alias processing in-kernel though.

The pre-optimization we decided to stay away from was to check if the requested
module via request_module() was already loaded *and* also check if the name passed
matches any of the existing module aliases for currently loaded modules. Today
request_module() does not even check if a requested module is already loaded,
its a stupid loader, it just goes to userspace, and lets userspace figure it
out. Userspace in turn could check for aliases, but it could lie, or not be up
to date to do that.

The pre-optmization is a theoretical gain only then, and if userspace had
proper alias checking it is arguable that it may perform just as equal.
To help valuate these sorts of things we now have:

tools/testing/selftests/kmod/kmod.sh

So further patches can use and test impact with it.

Anyway -- so aliasing is currently only a debugging consideration, but without
processing aliases, all this work seems pointless to me as the real loader is
in finit_module().
 
kmod is just a stupid loader and uses the kernel usermode helper to do work.

  Luis

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 19:14   ` Luis R. Rodriguez
@ 2017-11-28 20:11     ` Kees Cook
  2017-11-28 21:16       ` Luis R. Rodriguez
  2017-11-28 20:18     ` Djalal Harouni
  1 sibling, 1 reply; 49+ messages in thread
From: Kees Cook @ 2017-11-28 20:11 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Djalal Harouni, Andy Lutomirski, Andrew Morton, James Morris,
	Ben Hutchings, Solar Designer, Serge Hallyn, Jessica Yu,
	Rusty Russell, LKML, linux-security-module, kernel-hardening,
	Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, Nov 28, 2017 at 11:14 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> kmod is just a helper to poke userpsace to load a module, that's it.
>
> The old init_module() and newer finit_module() do the real handy work or
> module loading, and both currently only use may_init_module():
>
> static int may_init_module(void)
> {
>         if (!capable(CAP_SYS_MODULE) || modules_disabled)
>                 return -EPERM;
>
>         return 0;
> }
>
> This begs the question:
>
>   o If userspace just tries to just use raw finit_module() do we want similar
>     checks?
>
> Otherwise, correct me if I'm wrong this all seems pointless.

Hm? That's direct-loading, not auto-loading. This series is only about
auto-loading.

We already have a global sysctl for blocking direct-loading (modules_disabled).

> If we want something similar I think we might need to be processing aliases and
> check for the aliases for their desired restrictions on finit_module(),

We don't need to handle aliases.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 19:14   ` Luis R. Rodriguez
  2017-11-28 20:11     ` Kees Cook
@ 2017-11-28 20:18     ` Djalal Harouni
  1 sibling, 0 replies; 49+ messages in thread
From: Djalal Harouni @ 2017-11-28 20:18 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, James Morris,
	Ben Hutchings, Solar Designer, Serge Hallyn, Jessica Yu,
	Rusty Russell, linux-kernel, LSM List, kernel-hardening,
	Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

Hi Luis,

On Tue, Nov 28, 2017 at 8:14 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Mon, Nov 27, 2017 at 06:18:34PM +0100, Djalal Harouni wrote:
> ...
>
>> After a discussion with Rusty Russell [1], the suggestion was to pass
>> the capability from request_module() to security_kernel_module_request()
>> for 'netdev-%s' modules that need CAP_NET_ADMIN, and after review from
>> Kees Cook [2] and experimenting with the code, the patch now does the
>> following:
>>
>> * Adds the request_module_cap() function.
>> * Updates the __request_module() to take the "required_cap" argument
>>         with the "prefix".
>
> ...
>
>> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
>> ---
>> diff --git a/kernel/kmod.c b/kernel/kmod.c
>> index bc6addd..679d401 100644
>> --- a/kernel/kmod.c
>> +++ b/kernel/kmod.c
>> @@ -139,13 +147,22 @@ int __request_module(bool wait, const char *fmt, ...)
>>       if (!modprobe_path[0])
>>               return 0;
>>
>> +     /*
>> +      * Lets attach the prefix to the module name
>> +      */
>> +     if (prefix != NULL && *prefix != '\0') {
>> +             len += snprintf(module_name, MODULE_NAME_LEN, "%s-", prefix);
>> +             if (len >= MODULE_NAME_LEN)
>> +                     return -ENAMETOOLONG;
>> +     }
>> +
>>       va_start(args, fmt);
>> -     ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
>> +     ret = vsnprintf(module_name + len, MODULE_NAME_LEN - len, fmt, args);
>>       va_end(args);
>> -     if (ret >= MODULE_NAME_LEN)
>> +     if (ret >= MODULE_NAME_LEN - len)
>>               return -ENAMETOOLONG;
>>
>> -     ret = security_kernel_module_request(module_name);
>> +     ret = security_kernel_module_request(module_name, required_cap, prefix);
>>       if (ret)
>>               return ret;
>>
>
> kmod is just a helper to poke userpsace to load a module, that's it.
>
> The old init_module() and newer finit_module() do the real handy work or
> module loading, and both currently only use may_init_module():
>
> static int may_init_module(void)
> {
>         if (!capable(CAP_SYS_MODULE) || modules_disabled)
>                 return -EPERM;
>
>         return 0;
> }
>
> This begs the question:
>
>   o If userspace just tries to just use raw finit_module() do we want similar
>     checks?
>
> Otherwise, correct me if I'm wrong this all seems pointless.
>
> If we want something similar I think we might need to be processing aliases and
> check for the aliases for their desired restrictions on finit_module(),
> otherwise userspace can skip through the checks if the module name does not
> match the alias prefix.
>
> To be clear, aliases are completely ignored today on load_module(), so loading
> 'xfs' with finit_module() will just have the kernel know about 'xfs' not
> 'fs-xfs'.
>
> So we currently do not process aliases in kernel.
>
> I have debugging patches to enable us to process them, but they are just for
> debugging and I've been meaning to send them in for review. I designed them
> only for debugging given last time someone suggested for aliases processing to
> be added, the only use case we found was a pre-optimizations we decided to avoid
> pursuing. Debugging is a good reason to have alias processing in-kernel though.
>
> The pre-optimization we decided to stay away from was to check if the requested
> module via request_module() was already loaded *and* also check if the name passed
> matches any of the existing module aliases for currently loaded modules. Today
> request_module() does not even check if a requested module is already loaded,
> its a stupid loader, it just goes to userspace, and lets userspace figure it
> out. Userspace in turn could check for aliases, but it could lie, or not be up
> to date to do that.
>
> The pre-optmization is a theoretical gain only then, and if userspace had
> proper alias checking it is arguable that it may perform just as equal.
> To help valuate these sorts of things we now have:
>
> tools/testing/selftests/kmod/kmod.sh
>
> So further patches can use and test impact with it.
>
> Anyway -- so aliasing is currently only a debugging consideration, but without
> processing aliases, all this work seems pointless to me as the real loader is
> in finit_module().

These patchset are about module auto-loading which is triggered from
multiple paths in the kernel, the cover letter notes all the
differences between the two operations and why the explicit one and
"modules_disabled=1" is already a pain.

The finit_module() is covered directly by CAP_SYS_MODULE, and for
aliasing I am not sure how it will be related or how userspace will
maintain it, we do not have a use case for it, we want a simple flag.

Thank you!


-- 
tixxdz

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 20:11     ` Kees Cook
@ 2017-11-28 21:16       ` Luis R. Rodriguez
  2017-11-28 21:33         ` Djalal Harouni
  2017-11-28 21:39         ` Kees Cook
  0 siblings, 2 replies; 49+ messages in thread
From: Luis R. Rodriguez @ 2017-11-28 21:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis R. Rodriguez, Djalal Harouni, Andy Lutomirski, Andrew Morton,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, LKML, linux-security-module,
	kernel-hardening, Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, Nov 28, 2017 at 12:11:34PM -0800, Kees Cook wrote:
> On Tue, Nov 28, 2017 at 11:14 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > kmod is just a helper to poke userpsace to load a module, that's it.
> >
> > The old init_module() and newer finit_module() do the real handy work or
> > module loading, and both currently only use may_init_module():
> >
> > static int may_init_module(void)
> > {
> >         if (!capable(CAP_SYS_MODULE) || modules_disabled)
> >                 return -EPERM;
> >
> >         return 0;
> > }
> >
> > This begs the question:
> >
> >   o If userspace just tries to just use raw finit_module() do we want similar
> >     checks?
> >
> > Otherwise, correct me if I'm wrong this all seems pointless.
> 
> Hm? That's direct-loading, not auto-loading. This series is only about
> auto-loading.

And *all* auto-loading uses aliases? What's the difference between auto-loading
and direct-loading?

> We already have a global sysctl for blocking direct-loading (modules_disabled).

My point was that even if you have a CAP_NET_ADMIN check on request_module(),
finit_module() will not check for it, so a crafty userspace could still try
to just finit_module() directly, and completely then bypass the CAP_NET_ADMIN
check.

So unless I'm missing something, I see no point in adding extra checks for
request_module() but nothing for the respective load_module().

  Luis

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 21:16       ` Luis R. Rodriguez
@ 2017-11-28 21:33         ` Djalal Harouni
  2017-11-28 22:18           ` Luis R. Rodriguez
  2017-11-28 21:39         ` Kees Cook
  1 sibling, 1 reply; 49+ messages in thread
From: Djalal Harouni @ 2017-11-28 21:33 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, James Morris,
	Ben Hutchings, Solar Designer, Serge Hallyn, Jessica Yu,
	Rusty Russell, LKML, linux-security-module, kernel-hardening,
	Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, Nov 28, 2017 at 10:16 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Tue, Nov 28, 2017 at 12:11:34PM -0800, Kees Cook wrote:
>> On Tue, Nov 28, 2017 at 11:14 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > kmod is just a helper to poke userpsace to load a module, that's it.
>> >
>> > The old init_module() and newer finit_module() do the real handy work or
>> > module loading, and both currently only use may_init_module():
>> >
>> > static int may_init_module(void)
>> > {
>> >         if (!capable(CAP_SYS_MODULE) || modules_disabled)
>> >                 return -EPERM;
>> >
>> >         return 0;
>> > }
>> >
>> > This begs the question:
>> >
>> >   o If userspace just tries to just use raw finit_module() do we want similar
>> >     checks?
>> >
>> > Otherwise, correct me if I'm wrong this all seems pointless.
>>
>> Hm? That's direct-loading, not auto-loading. This series is only about
>> auto-loading.
>
> And *all* auto-loading uses aliases? What's the difference between auto-loading
> and direct-loading?

Not all auto-loading uses aliases, auto-loading is when kernel code
calls request_module() to loads the feature that was not present, and
direct-loading in this thread is the direct syscalls like
finit_module().

>> We already have a global sysctl for blocking direct-loading (modules_disabled).
>
> My point was that even if you have a CAP_NET_ADMIN check on request_module(),
> finit_module() will not check for it, so a crafty userspace could still try
> to just finit_module() directly, and completely then bypass the CAP_NET_ADMIN
> check.

The finit_module() uses CAP_SYS_MODULE which should allow all modules
and in this context it should be more privileged than CAP_NET_ADMIN
which is only for "netdev-%s" (to not load arbitrary modules with it).

finit_module() coming from request_module() always has the
CAP_NET_ADMIN, hence the check is done before.

> So unless I'm missing something, I see no point in adding extra checks for
> request_module() but nothing for the respective load_module().

I see, request_module() is called from kernel context which runs in
init namespace will full capabilities, the spawned userspace modprobe
will get CAP_SYS_MODULE and all other caps, then after comes modprobe
and load_module().

Btw as suggested by Linus I will update with request_module_cap() and
I can offer my help maintaining these bits too.


>
>   Luis



-- 
tixxdz

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 21:16       ` Luis R. Rodriguez
  2017-11-28 21:33         ` Djalal Harouni
@ 2017-11-28 21:39         ` Kees Cook
  2017-11-28 22:12           ` Luis R. Rodriguez
  2017-11-29 13:46           ` Alan Cox
  1 sibling, 2 replies; 49+ messages in thread
From: Kees Cook @ 2017-11-28 21:39 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Djalal Harouni, Andy Lutomirski, Andrew Morton, James Morris,
	Ben Hutchings, Solar Designer, Serge Hallyn, Jessica Yu,
	Rusty Russell, LKML, linux-security-module, kernel-hardening,
	Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, Nov 28, 2017 at 1:16 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> And *all* auto-loading uses aliases? What's the difference between auto-loading
> and direct-loading?

The difference is the process privileges. Unprivilged autoloading
(e.g. int n_hdlc = N_HDLC; ioctl(fd,
TIOCSETD, &n_hdlc)), triggers a privileged call to finit_module()
under CAP_SYS_MODULE.

>> We already have a global sysctl for blocking direct-loading (modules_disabled).
>
> My point was that even if you have a CAP_NET_ADMIN check on request_module(),
> finit_module() will not check for it, so a crafty userspace could still try
> to just finit_module() directly, and completely then bypass the CAP_NET_ADMIN
> check.

You need CAP_SYS_MODULE to run finit_module().

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 21:39         ` Kees Cook
@ 2017-11-28 22:12           ` Luis R. Rodriguez
  2017-11-28 22:18             ` Kees Cook
  2017-11-29 13:46           ` Alan Cox
  1 sibling, 1 reply; 49+ messages in thread
From: Luis R. Rodriguez @ 2017-11-28 22:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis R. Rodriguez, Djalal Harouni, Andy Lutomirski, Andrew Morton,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, LKML, linux-security-module,
	kernel-hardening, Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, Nov 28, 2017 at 01:39:58PM -0800, Kees Cook wrote:
> On Tue, Nov 28, 2017 at 1:16 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > And *all* auto-loading uses aliases? What's the difference between auto-loading
> > and direct-loading?
> 
> The difference is the process privileges. Unprivilged autoloading
> (e.g. int n_hdlc = N_HDLC; ioctl(fd,
> TIOCSETD, &n_hdlc)), triggers a privileged call to finit_module()
> under CAP_SYS_MODULE.

Ah, so system call implicated request_module() calls.

> >> We already have a global sysctl for blocking direct-loading (modules_disabled).
> >
> > My point was that even if you have a CAP_NET_ADMIN check on request_module(),
> > finit_module() will not check for it, so a crafty userspace could still try
> > to just finit_module() directly, and completely then bypass the CAP_NET_ADMIN
> > check.
> 
> You need CAP_SYS_MODULE to run finit_module().

OK and since CAP_SYS_MODULE is much more restrictive one could argue, what's the
point here?

  Luis

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 22:12           ` Luis R. Rodriguez
@ 2017-11-28 22:18             ` Kees Cook
  2017-11-28 22:48               ` Luis R. Rodriguez
  0 siblings, 1 reply; 49+ messages in thread
From: Kees Cook @ 2017-11-28 22:18 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Djalal Harouni, Andy Lutomirski, Andrew Morton, James Morris,
	Ben Hutchings, Solar Designer, Serge Hallyn, Jessica Yu,
	Rusty Russell, LKML, linux-security-module, kernel-hardening,
	Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, Nov 28, 2017 at 2:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Tue, Nov 28, 2017 at 01:39:58PM -0800, Kees Cook wrote:
>> On Tue, Nov 28, 2017 at 1:16 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > And *all* auto-loading uses aliases? What's the difference between auto-loading
>> > and direct-loading?
>>
>> The difference is the process privileges. Unprivilged autoloading
>> (e.g. int n_hdlc = N_HDLC; ioctl(fd,
>> TIOCSETD, &n_hdlc)), triggers a privileged call to finit_module()
>> under CAP_SYS_MODULE.
>
> Ah, so system call implicated request_module() calls.

Yup. Unprivileged user does something that ultimately hits a
request_module() in the kernel. Then the kernel calls out with the
usermode helper (which has CAP_SYS_MODULE) and calls finit_module().

> OK and since CAP_SYS_MODULE is much more restrictive one could argue, what's the
> point here?

The goal is to block an unprivileged user from being able to trigger a
module load without blocking root from loading modules directly.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 21:33         ` Djalal Harouni
@ 2017-11-28 22:18           ` Luis R. Rodriguez
  2017-11-28 22:52             ` Djalal Harouni
  0 siblings, 1 reply; 49+ messages in thread
From: Luis R. Rodriguez @ 2017-11-28 22:18 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Luis R. Rodriguez, Kees Cook, Andy Lutomirski, Andrew Morton,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, LKML, linux-security-module,
	kernel-hardening, Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, Nov 28, 2017 at 10:33:27PM +0100, Djalal Harouni wrote:
> On Tue, Nov 28, 2017 at 10:16 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Tue, Nov 28, 2017 at 12:11:34PM -0800, Kees Cook wrote:
> >> On Tue, Nov 28, 2017 at 11:14 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >> > kmod is just a helper to poke userpsace to load a module, that's it.
> >> >
> >> > The old init_module() and newer finit_module() do the real handy work or
> >> > module loading, and both currently only use may_init_module():
> >> >
> >> > static int may_init_module(void)
> >> > {
> >> >         if (!capable(CAP_SYS_MODULE) || modules_disabled)
> >> >                 return -EPERM;
> >> >
> >> >         return 0;
> >> > }
> >> >
> >> > This begs the question:
> >> >
> >> >   o If userspace just tries to just use raw finit_module() do we want similar
> >> >     checks?
> >> >
> >> > Otherwise, correct me if I'm wrong this all seems pointless.
> >>
> >> Hm? That's direct-loading, not auto-loading. This series is only about
> >> auto-loading.
> >
> > And *all* auto-loading uses aliases? What's the difference between auto-loading
> > and direct-loading?
> 
> Not all auto-loading uses aliases, auto-loading is when kernel code
> calls request_module() to loads the feature that was not present, 

It seems the actual interest here is system call implicated request_module()
calls? Because there are uses of request_module() which may be module hacks,
and not implicated via system calls.

> and direct-loading in this thread is the direct syscalls like
> finit_module().

OK.

> >> We already have a global sysctl for blocking direct-loading (modules_disabled).
> >
> > My point was that even if you have a CAP_NET_ADMIN check on request_module(),
> > finit_module() will not check for it, so a crafty userspace could still try
> > to just finit_module() directly, and completely then bypass the CAP_NET_ADMIN
> > check.
> 
> The finit_module() uses CAP_SYS_MODULE which should allow all modules
> and in this context it should be more privileged than CAP_NET_ADMIN
> which is only for "netdev-%s" (to not load arbitrary modules with it).
> 
> finit_module() coming from request_module() always has the
> CAP_NET_ADMIN, hence the check is done before.

But since CAP_SYS_MODULE is more restrictive, what's the point in checking
for CAP_NET_ADMIN?

> > So unless I'm missing something, I see no point in adding extra checks for
> > request_module() but nothing for the respective load_module().
> 
> I see, request_module() is called from kernel context which runs in
> init namespace will full capabilities, the spawned userspace modprobe
> will get CAP_SYS_MODULE and all other caps, then after comes modprobe
> and load_module().

Right, so defining the gains of adding this extra check is not very clear
yet. It would seem a benefit exists, what is it?

> Btw as suggested by Linus I will update with request_module_cap() and > I can
> offer my help maintaining these bits too.

Can you start by extending lib/test_module.c and
tools/testing/selftests/kmod/kmod.sh with a proof of concept of the gains here,
as well as ensuring things work as expected ?

  Luis

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 22:18             ` Kees Cook
@ 2017-11-28 22:48               ` Luis R. Rodriguez
  2017-11-29  7:49                 ` Michal Kubecek
  0 siblings, 1 reply; 49+ messages in thread
From: Luis R. Rodriguez @ 2017-11-28 22:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis R. Rodriguez, Djalal Harouni, Andy Lutomirski, Andrew Morton,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, LKML, linux-security-module,
	kernel-hardening, Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, Nov 28, 2017 at 02:18:18PM -0800, Kees Cook wrote:
> On Tue, Nov 28, 2017 at 2:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Tue, Nov 28, 2017 at 01:39:58PM -0800, Kees Cook wrote:
> >> On Tue, Nov 28, 2017 at 1:16 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >> > And *all* auto-loading uses aliases? What's the difference between auto-loading
> >> > and direct-loading?
> >>
> >> The difference is the process privileges. Unprivilged autoloading
> >> (e.g. int n_hdlc = N_HDLC; ioctl(fd,
> >> TIOCSETD, &n_hdlc)), triggers a privileged call to finit_module()
> >> under CAP_SYS_MODULE.
> >
> > Ah, so system call implicated request_module() calls.
> 
> Yup. Unprivileged user does something that ultimately hits a
> request_module() in the kernel. Then the kernel calls out with the
> usermode helper (which has CAP_SYS_MODULE) and calls finit_module().

Thanks, using this terminology is much better to understand than auto-loading,
given it does make it clear an unprivileged call was one that initiated the
request_module() call, there are many uses of request_module() which *are*
privileged.

> > OK and since CAP_SYS_MODULE is much more restrictive one could argue, what's the
> > point here?
> 
> The goal is to block an unprivileged user from being able to trigger a
> module load without blocking root from loading modules directly.

I see now. Do we have an audit of all system calls which implicate a
request_module() call? Networking is a good example for sure to start
off with but I was curious if we have a grasp of how wide spread this
could be.

I'll go review the patches again now with all this in mind.

  Luis

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 22:18           ` Luis R. Rodriguez
@ 2017-11-28 22:52             ` Djalal Harouni
  0 siblings, 0 replies; 49+ messages in thread
From: Djalal Harouni @ 2017-11-28 22:52 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, James Morris,
	Ben Hutchings, Solar Designer, Serge Hallyn, Jessica Yu,
	Rusty Russell, LKML, linux-security-module, kernel-hardening,
	Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, Nov 28, 2017 at 11:18 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Tue, Nov 28, 2017 at 10:33:27PM +0100, Djalal Harouni wrote:
>> On Tue, Nov 28, 2017 at 10:16 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > On Tue, Nov 28, 2017 at 12:11:34PM -0800, Kees Cook wrote:
>> >> On Tue, Nov 28, 2017 at 11:14 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> >> > kmod is just a helper to poke userpsace to load a module, that's it.
>> >> >
>> >> > The old init_module() and newer finit_module() do the real handy work or
>> >> > module loading, and both currently only use may_init_module():
>> >> >
>> >> > static int may_init_module(void)
>> >> > {
>> >> >         if (!capable(CAP_SYS_MODULE) || modules_disabled)
>> >> >                 return -EPERM;
>> >> >
>> >> >         return 0;
>> >> > }
>> >> >
>> >> > This begs the question:
>> >> >
>> >> >   o If userspace just tries to just use raw finit_module() do we want similar
>> >> >     checks?
>> >> >
>> >> > Otherwise, correct me if I'm wrong this all seems pointless.
>> >>
>> >> Hm? That's direct-loading, not auto-loading. This series is only about
>> >> auto-loading.
>> >
>> > And *all* auto-loading uses aliases? What's the difference between auto-loading
>> > and direct-loading?
>>
>> Not all auto-loading uses aliases, auto-loading is when kernel code
>> calls request_module() to loads the feature that was not present,
>
> It seems the actual interest here is system call implicated request_module()
> calls? Because there are uses of request_module() which may be module hacks,
> and not implicated via system calls.

Indeed.


>> and direct-loading in this thread is the direct syscalls like
>> finit_module().
>
> OK.
>
>> >> We already have a global sysctl for blocking direct-loading (modules_disabled).
>> >
>> > My point was that even if you have a CAP_NET_ADMIN check on request_module(),
>> > finit_module() will not check for it, so a crafty userspace could still try
>> > to just finit_module() directly, and completely then bypass the CAP_NET_ADMIN
>> > check.
>>
>> The finit_module() uses CAP_SYS_MODULE which should allow all modules
>> and in this context it should be more privileged than CAP_NET_ADMIN
>> which is only for "netdev-%s" (to not load arbitrary modules with it).
>>
>> finit_module() coming from request_module() always has the
>> CAP_NET_ADMIN, hence the check is done before.
>
> But since CAP_SYS_MODULE is more restrictive, what's the point in checking
> for CAP_NET_ADMIN?

For backward compatibility with 'netdev' modules since it is for those.


>> > So unless I'm missing something, I see no point in adding extra checks for
>> > request_module() but nothing for the respective load_module().
>>
>> I see, request_module() is called from kernel context which runs in
>> init namespace will full capabilities, the spawned userspace modprobe
>> will get CAP_SYS_MODULE and all other caps, then after comes modprobe
>> and load_module().
>
> Right, so defining the gains of adding this extra check is not very clear
> yet. It would seem a benefit exists, what is it?

it will able to filter if the request_module() should continue loading
the module or deny it which prevents spawning the *privileged*
usermode helper. This is all based on are we allowed to load new
features or not, or IOW I don't want to allow new features or modules
autoloading from now and on, as stated in the cover letter for various
benefit including security, reduce the amount of kernel code running,
but also do not allow new features for anyone like tunneling, etc.


>> Btw as suggested by Linus I will update with request_module_cap() and > I can
>> offer my help maintaining these bits too.
>
> Can you start by extending lib/test_module.c and
> tools/testing/selftests/kmod/kmod.sh with a proof of concept of the gains here,
> as well as ensuring things work as expected ?

Alright Luis, thanks for the hint, yes I will make sure to cover these.

For gains, kees already answered in the other email, and please check
the DCCP exploit and others linked in the cover letter.


Thank you!

>   Luis



-- 
tixxdz

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 22:48               ` Luis R. Rodriguez
@ 2017-11-29  7:49                 ` Michal Kubecek
  0 siblings, 0 replies; 49+ messages in thread
From: Michal Kubecek @ 2017-11-29  7:49 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Kees Cook, Djalal Harouni, Andy Lutomirski, Andrew Morton,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, LKML, linux-security-module,
	kernel-hardening, Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, Nov 28, 2017 at 11:48:49PM +0100, Luis R. Rodriguez wrote:
> On Tue, Nov 28, 2017 at 02:18:18PM -0800, Kees Cook wrote:
> > On Tue, Nov 28, 2017 at 2:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > On Tue, Nov 28, 2017 at 01:39:58PM -0800, Kees Cook wrote:
> > >> On Tue, Nov 28, 2017 at 1:16 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > >> > And *all* auto-loading uses aliases? What's the difference
> > >> > between auto-loading and direct-loading?
> > >>
> > >> The difference is the process privileges. Unprivilged autoloading
> > >> (e.g. int n_hdlc = N_HDLC; ioctl(fd,
> > >> TIOCSETD, &n_hdlc)), triggers a privileged call to finit_module()
> > >> under CAP_SYS_MODULE.
> > >
> > > Ah, so system call implicated request_module() calls.
> > 
> > Yup. Unprivileged user does something that ultimately hits a
> > request_module() in the kernel. Then the kernel calls out with the
> > usermode helper (which has CAP_SYS_MODULE) and calls finit_module().
> 
> Thanks, using this terminology is much better to understand than
> auto-loading, given it does make it clear an unprivileged call was one
> that initiated the request_module() call, there are many uses of
> request_module() which *are* privileged.
> 
> > > OK and since CAP_SYS_MODULE is much more restrictive one could
> > > argue, what's the point here?
> > 
> > The goal is to block an unprivileged user from being able to trigger a
> > module load without blocking root from loading modules directly.
> 
> I see now. Do we have an audit of all system calls which implicate a
> request_module() call? Networking is a good example for sure to start
> off with but I was curious if we have a grasp of how wide spread this
> could be.

I'm not sure it makes sense to classify this by syscalls. In networking,
request_module() can be triggered e.g. by a netlink message (genetlink
family lookup is an example not needing any privileges) so that one of
the syscalls would be sendmsg().

Michal Kubecek

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-28 21:39         ` Kees Cook
  2017-11-28 22:12           ` Luis R. Rodriguez
@ 2017-11-29 13:46           ` Alan Cox
  2017-11-29 14:50             ` David Miller
  1 sibling, 1 reply; 49+ messages in thread
From: Alan Cox @ 2017-11-29 13:46 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis R. Rodriguez, Djalal Harouni, Andy Lutomirski, Andrew Morton,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, LKML, linux-security-module,
	kernel-hardening, Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Tue, 28 Nov 2017 13:39:58 -0800
Kees Cook <keescook@chromium.org> wrote:

> On Tue, Nov 28, 2017 at 1:16 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > And *all* auto-loading uses aliases? What's the difference between auto-loading
> > and direct-loading?  
> 
> The difference is the process privileges. Unprivilged autoloading
> (e.g. int n_hdlc = N_HDLC; ioctl(fd,
> TIOCSETD, &n_hdlc)), triggers a privileged call to finit_module()
> under CAP_SYS_MODULE.

If you have CAP_SYS_DAC you can rename any module to ppp.ko and ask the
network manager (which has the right permissions) to init a ppp
connection. Capabilities alone are simply not enough to do any kind of
useful protection on a current system and the Linux capability model is
broken architecturally and not fixable because fixing it would break lots
of real systems.

I really don't care what the module loading rules end up with and whether
we add CAP_SYS_YET_ANOTHER_MEANINGLESS_FLAG but what is actually needed
is to properly incorporate it into securiy ruiles for whatever LSM you
are using.

Alan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-29 13:46           ` Alan Cox
@ 2017-11-29 14:50             ` David Miller
  2017-11-29 15:54               ` Theodore Ts'o
  0 siblings, 1 reply; 49+ messages in thread
From: David Miller @ 2017-11-29 14:50 UTC (permalink / raw)
  To: gnomes
  Cc: keescook, mcgrof, tixxdz, luto, akpm, james.l.morris,
	ben.hutchings, solar, serge, jeyu, rusty, linux-kernel,
	linux-security-module, kernel-hardening, corbet, mingo, netdev,
	peterz, torvalds

From: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Date: Wed, 29 Nov 2017 13:46:12 +0000

> I really don't care what the module loading rules end up with and
> whether we add CAP_SYS_YET_ANOTHER_MEANINGLESS_FLAG but what is
> actually needed is to properly incorporate it into securiy ruiles
> for whatever LSM you are using.

I'm surprised we're not using the SHA1 hashes or whatever we compute
for the modules to make sure we are loading the foo.ko that we expect
to be.

Then even if someone can rename every file on the system they cannot
force a rogue module to load unless they build one with a proper hash
collision.

Ie. transform module load strings at build time:

	ppp.ko	--> ppp.ko:SHA1

Or something like that.  And the kernel refuses to load a ppp.ko
with a mismatching SHA1.

All of this capability stuff seems to dance a circle around the
problem rather than fix it.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-29 14:50             ` David Miller
@ 2017-11-29 15:54               ` Theodore Ts'o
  2017-11-29 15:58                 ` David Miller
  2017-11-29 17:28                 ` Serge E. Hallyn
  0 siblings, 2 replies; 49+ messages in thread
From: Theodore Ts'o @ 2017-11-29 15:54 UTC (permalink / raw)
  To: David Miller
  Cc: gnomes, keescook, mcgrof, tixxdz, luto, akpm, james.l.morris,
	ben.hutchings, solar, serge, jeyu, rusty, linux-kernel,
	linux-security-module, kernel-hardening, corbet, mingo, netdev,
	peterz, torvalds

On Wed, Nov 29, 2017 at 09:50:14AM -0500, David Miller wrote:
> From: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
> Date: Wed, 29 Nov 2017 13:46:12 +0000
> 
> > I really don't care what the module loading rules end up with and
> > whether we add CAP_SYS_YET_ANOTHER_MEANINGLESS_FLAG but what is
> > actually needed is to properly incorporate it into securiy ruiles
> > for whatever LSM you are using.
> 
> I'm surprised we're not using the SHA1 hashes or whatever we compute
> for the modules to make sure we are loading the foo.ko that we expect
> to be.

We do have signed modules.  But this won't help us if the user is
using a distro kernel which has compiled some module which is known to
be unmaintained which everyone in the know *expects* to have 0-day
bugs, such as DCCP.  That's because the DCCP module is signed.

We could fix this by adding to the signature used for module signing
to include the module name, so that the bad guy can't rename dccp.ko
to be ppp.ko, I suppose....

> All of this capability stuff seems to dance a circle around the
> problem rather than fix it.

Half the problem here is that with containers, people are changing the
security model, because they want to let untrusted users have "root",
without really having "root".  Part of the fundamental problem is that
there are some well-meaning, but fundamentally misguided people, who
have been asserting: "Containers are just as secure as VM's".

Well, they are not.  And the sooner people get past this, the better
off they'll be....

						- Ted

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-29 15:54               ` Theodore Ts'o
@ 2017-11-29 15:58                 ` David Miller
  2017-11-29 16:29                   ` Theodore Ts'o
  2017-11-29 22:45                   ` Linus Torvalds
  2017-11-29 17:28                 ` Serge E. Hallyn
  1 sibling, 2 replies; 49+ messages in thread
From: David Miller @ 2017-11-29 15:58 UTC (permalink / raw)
  To: tytso
  Cc: gnomes, keescook, mcgrof, tixxdz, luto, akpm, james.l.morris,
	ben.hutchings, solar, serge, jeyu, rusty, linux-kernel,
	linux-security-module, kernel-hardening, corbet, mingo, netdev,
	peterz, torvalds

From: Theodore Ts'o <tytso@mit.edu>
Date: Wed, 29 Nov 2017 10:54:06 -0500

> On Wed, Nov 29, 2017 at 09:50:14AM -0500, David Miller wrote:
>> From: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
>> Date: Wed, 29 Nov 2017 13:46:12 +0000
>> 
>> > I really don't care what the module loading rules end up with and
>> > whether we add CAP_SYS_YET_ANOTHER_MEANINGLESS_FLAG but what is
>> > actually needed is to properly incorporate it into securiy ruiles
>> > for whatever LSM you are using.
>> 
>> I'm surprised we're not using the SHA1 hashes or whatever we compute
>> for the modules to make sure we are loading the foo.ko that we expect
>> to be.
> 
> We do have signed modules.  But this won't help us if the user is
> using a distro kernel which has compiled some module which is known to
> be unmaintained which everyone in the know *expects* to have 0-day
> bugs, such as DCCP.  That's because the DCCP module is signed.

That's not what we're talking about.

We're talking about making sure that loading "ppp.ko" really gets
ppp.ko rather than some_other_module.ko renamed to ppp.ko via some
other mechanism.

Both modules have legitimate signatures so the kernel will happily
load both.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-29 15:58                 ` David Miller
@ 2017-11-29 16:29                   ` Theodore Ts'o
  2017-11-29 22:45                   ` Linus Torvalds
  1 sibling, 0 replies; 49+ messages in thread
From: Theodore Ts'o @ 2017-11-29 16:29 UTC (permalink / raw)
  To: David Miller
  Cc: gnomes, keescook, mcgrof, tixxdz, luto, akpm, james.l.morris,
	ben.hutchings, solar, serge, jeyu, rusty, linux-kernel,
	linux-security-module, kernel-hardening, corbet, mingo, netdev,
	peterz, torvalds

On Wed, Nov 29, 2017 at 10:58:16AM -0500, David Miller wrote:
> That's not what we're talking about.
> 
> We're talking about making sure that loading "ppp.ko" really gets
> ppp.ko rather than some_other_module.ko renamed to ppp.ko via some
> other mechanism.

Right, and the best solution to this problem is to include in the
signature the name of the module.  (One way of doing this is adding
the module name into the cryptographic checksum which is then
digitally signed by the kernel module key; we would have to bump the
version number of the signature format, obviously.)  This binds the
name of the module into the digital signature so that it can't be
renamed.

						- Ted

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-29 15:54               ` Theodore Ts'o
  2017-11-29 15:58                 ` David Miller
@ 2017-11-29 17:28                 ` Serge E. Hallyn
  2017-11-30  0:35                   ` Theodore Ts'o
  1 sibling, 1 reply; 49+ messages in thread
From: Serge E. Hallyn @ 2017-11-29 17:28 UTC (permalink / raw)
  To: Theodore Ts'o, David Miller, gnomes, keescook, mcgrof, tixxdz,
	luto, akpm, james.l.morris, ben.hutchings, solar, serge, jeyu,
	rusty, linux-kernel, linux-security-module, kernel-hardening,
	corbet, mingo, netdev, peterz, torvalds

Quoting Theodore Ts'o (tytso@mit.edu):
> Half the problem here is that with containers, people are changing the
> security model, because they want to let untrusted users have "root",
> without really having "root".  Part of the fundamental problem is that
> there are some well-meaning, but fundamentally misguided people, who
> have been asserting: "Containers are just as secure as VM's".
> 
> Well, they are not.  And the sooner people get past this, the better
> off they'll be....

Just to be clear, module loading requires - and must always continue to
require - CAP_SYS_MODULE against the initial user namespace.  Containers
in user namespaces do not have that.

I don't believe anyone has ever claimed that containers which are not in
a user namespace are in any way secure.

(And as for the other claim, I'd prefer to stick to "VMs are in most
cases as insecure as properly configured containers" :)

-serge

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-29 15:58                 ` David Miller
  2017-11-29 16:29                   ` Theodore Ts'o
@ 2017-11-29 22:45                   ` Linus Torvalds
  2017-11-30  0:06                     ` Kees Cook
  1 sibling, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2017-11-29 22:45 UTC (permalink / raw)
  To: David Miller
  Cc: Theodore Ts'o, One Thousand Gnomes, Kees Cook,
	Luis R. Rodriguez, Djalal Harouni, Andrew Lutomirski,
	Andrew Morton, James Morris, Ben Hutchings, Solar Designer,
	Serge E. Hallyn, Jessica Yu, Rusty Russell,
	Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	Network Development, Peter Zijlstra

On Wed, Nov 29, 2017 at 7:58 AM, David Miller <davem@davemloft.net> wrote:
>
> We're talking about making sure that loading "ppp.ko" really gets
> ppp.ko rather than some_other_module.ko renamed to ppp.ko via some
> other mechanism.
>
> Both modules have legitimate signatures so the kernel will happily
> load both.

Yes. We could make the module name be part of the signing process, but
one problem with that is that at module loading time we don't actually
have the filename any more.

User space opens the file and then just feeds the data to the kernel.
So if you fooled modprobe into feeding the wrong module, that's it.

And yes, we can obviously embed the module name into the ELF headers
(that is all part of the signed payload), but the module name doesn't
actually necessarily match what we originally asked for.

Why? Module aliases and module dependencies - which is why we have
that user mode side at all. When we do "request_module(XYZ)" we don't
necessarily know what the dependencies are, so we expect modprobe to
just load the right modules.

So if modprobe then loads some other module (dccp or whatever), the
kernel has no real way to know "oh, that wasn't part of the dependency
chain for the module we aked for".

Now, if modprobe is taught to check that the filename of the module
that it opens actually matches the metadata in the ELF sections, that
would solve it, but it's out of the kernels hands..

             Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-29 22:45                   ` Linus Torvalds
@ 2017-11-30  0:06                     ` Kees Cook
  0 siblings, 0 replies; 49+ messages in thread
From: Kees Cook @ 2017-11-30  0:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, Theodore Ts'o, One Thousand Gnomes,
	Luis R. Rodriguez, Djalal Harouni, Andrew Lutomirski,
	Andrew Morton, James Morris, Ben Hutchings, Solar Designer,
	Serge E. Hallyn, Jessica Yu, Rusty Russell,
	Linux Kernel Mailing List, LSM List,
	kernel-hardening@lists.openwall.com, Jonathan Corbet, Ingo Molnar,
	Network Development, Peter Zijlstra

On Wed, Nov 29, 2017 at 2:45 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Nov 29, 2017 at 7:58 AM, David Miller <davem@davemloft.net> wrote:
>>
>> We're talking about making sure that loading "ppp.ko" really gets
>> ppp.ko rather than some_other_module.ko renamed to ppp.ko via some
>> other mechanism.
>>
>> Both modules have legitimate signatures so the kernel will happily
>> load both.
>
> Yes. We could make the module name be part of the signing process, but
> one problem with that is that at module loading time we don't actually
> have the filename any more.

FWIW, I added this (well, KBUILD_MODNAME) to the module info just recently:

3e2e857f9c3a ("module: Add module name to modinfo")

> User space opens the file and then just feeds the data to the kernel.
> So if you fooled modprobe into feeding the wrong module, that's it.
>
> And yes, we can obviously embed the module name into the ELF headers
> (that is all part of the signed payload), but the module name doesn't
> actually necessarily match what we originally asked for.
>
> Why? Module aliases and module dependencies - which is why we have
> that user mode side at all. When we do "request_module(XYZ)" we don't
> necessarily know what the dependencies are, so we expect modprobe to
> just load the right modules.
>
> So if modprobe then loads some other module (dccp or whatever), the
> kernel has no real way to know "oh, that wasn't part of the dependency
> chain for the module we aked for".
>
> Now, if modprobe is taught to check that the filename of the module
> that it opens actually matches the metadata in the ELF sections, that
> would solve it, but it's out of the kernels hands..

Right, the aliases are why these kinds of renaming shenanigans don't
mean anything: it's not distinguishable from whatever modprobe.conf
ultimately tells modprobe to do.

If you can't trust your filesystem to hold your kernel modules
correctly, you have much bigger problems. (And yes, capabilities are a
problem here, since there are many paths to full root from individual
capabilities, but that's a known issue that is much larger than
tricking modprobe.)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-29 17:28                 ` Serge E. Hallyn
@ 2017-11-30  0:35                   ` Theodore Ts'o
  2017-11-30 17:17                     ` Serge E. Hallyn
  0 siblings, 1 reply; 49+ messages in thread
From: Theodore Ts'o @ 2017-11-30  0:35 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: David Miller, gnomes, keescook, mcgrof, tixxdz, luto, akpm,
	james.l.morris, ben.hutchings, solar, jeyu, rusty, linux-kernel,
	linux-security-module, kernel-hardening, corbet, mingo, netdev,
	peterz, torvalds

On Wed, Nov 29, 2017 at 11:28:52AM -0600, Serge E. Hallyn wrote:
> 
> Just to be clear, module loading requires - and must always continue to
> require - CAP_SYS_MODULE against the initial user namespace.  Containers
> in user namespaces do not have that.
> 
> I don't believe anyone has ever claimed that containers which are not in
> a user namespace are in any way secure.

Unless the container performs some action which causes the kernel to
call request_module(), which then loads some kernel module,
potentially containing cr*p unmaintained code which was included when
the distro compiled the world, into the host kernel.

This is an attack vector that doesn't exist if you are using VM's.
And in general, the attack surface of the entire Linux
kernel<->userspace API is far larger than that which is exposed by the
guest<->host interface.

For that reason, containers are *far* more insecure than VM's, since
once the attacker gets root on the guest VM, they then have to attack
the hypervisor interface.  And if you compare the attack surface of
the two, it's pretty clear which is larger, and it's not the
hypervisor interface.

						- Ted

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 3/5] modules:capabilities: automatic module loading restriction
  2017-11-27 17:18 ` [PATCH v5 next 3/5] modules:capabilities: automatic module loading restriction Djalal Harouni
@ 2017-11-30  1:23   ` Luis R. Rodriguez
  2017-11-30 12:22     ` Djalal Harouni
  0 siblings, 1 reply; 49+ messages in thread
From: Luis R. Rodriguez @ 2017-11-30  1:23 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, linux-kernel, linux-security-module,
	kernel-hardening, Jonathan Corbet, Ingo Molnar, David S. Miller,
	netdev, Peter Zijlstra, Linus Torvalds

On Mon, Nov 27, 2017 at 06:18:36PM +0100, Djalal Harouni wrote:
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 5cbb239..c36aed8 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -261,7 +261,16 @@ struct notifier_block;
>  
>  #ifdef CONFIG_MODULES
>  
> -extern int modules_disabled; /* for sysctl */
> +enum {
> +	MODULES_AUTOLOAD_ALLOWED	= 0,
> +	MODULES_AUTOLOAD_PRIVILEGED	= 1,
> +	MODULES_AUTOLOAD_DISABLED	= 2,
> +};
> +

Can you kdocify these and add a respective rst doc file?  Maybe stuff your
extensive docs which you are currently adding to
Documentation/sysctl/kernel.txt to this new file and in kernel.txt just refer
to it. This way this can be also nicely visibly documented on the web with the
new sphinx.

This way you can take advantage of the kdocs you are already adding and refer
to them.

> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 2fb4e27..0b6f0c8 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -683,6 +688,15 @@ static struct ctl_table kern_table[] = {
>  		.extra1		= &one,
>  		.extra2		= &one,
>  	},
> +	{
> +		.procname	= "modules_autoload_mode",
> +		.data		= &modules_autoload_mode,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= modules_autoload_dointvec_minmax,

It would seem this is a unint ... so why not reflect that?

> @@ -2499,6 +2513,20 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
>  }
>  #endif
>  
> +#ifdef CONFIG_MODULES
> +static int modules_autoload_dointvec_minmax(struct ctl_table *table, int write,
> +				void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	/*
> +	 * Only CAP_SYS_MODULE in init user namespace are allowed to change this
> +	 */
> +	if (write && !capable(CAP_SYS_MODULE))
> +		return -EPERM;
> +
> +	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +}
> +#endif

We now have proc_douintvec_minmax().

  Luis

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 2/5] modules:capabilities: add cap_kernel_module_request() permission check
  2017-11-27 17:18 ` [PATCH v5 next 2/5] modules:capabilities: add cap_kernel_module_request() permission check Djalal Harouni
@ 2017-11-30  2:05   ` Luis R. Rodriguez
  0 siblings, 0 replies; 49+ messages in thread
From: Luis R. Rodriguez @ 2017-11-30  2:05 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, Luis R. Rodriguez,
	James Morris, Ben Hutchings, Solar Designer, Serge Hallyn,
	Jessica Yu, Rusty Russell, linux-kernel, linux-security-module,
	kernel-hardening, Jonathan Corbet, Ingo Molnar, David S. Miller,
	netdev, Peter Zijlstra, Linus Torvalds

On Mon, Nov 27, 2017 at 06:18:35PM +0100, Djalal Harouni wrote:
> +/* Determine whether a module auto-load operation is permitted. */
> +int may_autoload_module(char *kmod_name, int required_cap,
> +			const char *kmod_prefix);
> +

While we are reviewing a general LSM for this, it has me wondering if an LSM or
userspace feed info may every want to use other possible context we could add for
free to make a determination.

For instance since all request_module() calls are in header files, we could 
for add for free THIS_MODULE as context to may_autoload_module() as well, so
struct module. The LSM could in theory then also help ensure only specific
modules are allowed to request a module load. Perhaps userspace could say
only built-in code could request certain modules.

Just a thought.

  Luis

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 3/5] modules:capabilities: automatic module loading restriction
  2017-11-30  1:23   ` Luis R. Rodriguez
@ 2017-11-30 12:22     ` Djalal Harouni
  0 siblings, 0 replies; 49+ messages in thread
From: Djalal Harouni @ 2017-11-30 12:22 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Kees Cook, Andy Lutomirski, Andrew Morton, James Morris,
	Ben Hutchings, Solar Designer, Serge Hallyn, Jessica Yu,
	Rusty Russell, linux-kernel, LSM List, kernel-hardening,
	Jonathan Corbet, Ingo Molnar, David S. Miller,
	Network Development, Peter Zijlstra, Linus Torvalds

On Thu, Nov 30, 2017 at 2:23 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Mon, Nov 27, 2017 at 06:18:36PM +0100, Djalal Harouni wrote:
>> diff --git a/include/linux/module.h b/include/linux/module.h
>> index 5cbb239..c36aed8 100644
>> --- a/include/linux/module.h
>> +++ b/include/linux/module.h
>> @@ -261,7 +261,16 @@ struct notifier_block;
>>
>>  #ifdef CONFIG_MODULES
>>
>> -extern int modules_disabled; /* for sysctl */
>> +enum {
>> +     MODULES_AUTOLOAD_ALLOWED        = 0,
>> +     MODULES_AUTOLOAD_PRIVILEGED     = 1,
>> +     MODULES_AUTOLOAD_DISABLED       = 2,
>> +};
>> +
>
> Can you kdocify these and add a respective rst doc file?  Maybe stuff your
> extensive docs which you are currently adding to
> Documentation/sysctl/kernel.txt to this new file and in kernel.txt just refer
> to it. This way this can be also nicely visibly documented on the web with the
> new sphinx.
>
> This way you can take advantage of the kdocs you are already adding and refer
> to them.

Alright I'll do it in the next series next week, we'll change the
semantics as requested by Linus and Kees here:
http://www.openwall.com/lists/kernel-hardening/2017/11/29/38

To block the privilege escalation through the usermod helper.


>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index 2fb4e27..0b6f0c8 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -683,6 +688,15 @@ static struct ctl_table kern_table[] = {
>>               .extra1         = &one,
>>               .extra2         = &one,
>>       },
>> +     {
>> +             .procname       = "modules_autoload_mode",
>> +             .data           = &modules_autoload_mode,
>> +             .maxlen         = sizeof(int),
>> +             .mode           = 0644,
>> +             .proc_handler   = modules_autoload_dointvec_minmax,
>
> It would seem this is a unint ... so why not reflect that?
>
>> @@ -2499,6 +2513,20 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
>>  }
>>  #endif
>>
>> +#ifdef CONFIG_MODULES
>> +static int modules_autoload_dointvec_minmax(struct ctl_table *table, int write,
>> +                             void __user *buffer, size_t *lenp, loff_t *ppos)
>> +{
>> +     /*
>> +      * Only CAP_SYS_MODULE in init user namespace are allowed to change this
>> +      */
>> +     if (write && !capable(CAP_SYS_MODULE))
>> +             return -EPERM;
>> +
>> +     return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>> +}
>> +#endif
>
> We now have proc_douintvec_minmax().
>

Yes, however in that same response by Linus it was suggested to drop
the sysctl completely, so next iterations will not have this code.

Thank you for the review!

-- 
tixxdz

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 next 1/5] modules:capabilities: add request_module_cap()
  2017-11-30  0:35                   ` Theodore Ts'o
@ 2017-11-30 17:17                     ` Serge E. Hallyn
  0 siblings, 0 replies; 49+ messages in thread
From: Serge E. Hallyn @ 2017-11-30 17:17 UTC (permalink / raw)
  To: Theodore Ts'o, Serge E. Hallyn, David Miller, gnomes,
	keescook, mcgrof, tixxdz, luto, akpm, james.l.morris,
	ben.hutchings, solar, jeyu, rusty, linux-kernel,
	linux-security-module, kernel-hardening, corbet, mingo, netdev,
	peterz, torvalds

On Wed, Nov 29, 2017 at 07:35:31PM -0500, Theodore Ts'o wrote:
> On Wed, Nov 29, 2017 at 11:28:52AM -0600, Serge E. Hallyn wrote:
> > 
> > Just to be clear, module loading requires - and must always continue to
> > require - CAP_SYS_MODULE against the initial user namespace.  Containers
> > in user namespaces do not have that.
> > 
> > I don't believe anyone has ever claimed that containers which are not in
> > a user namespace are in any way secure.
> 
> Unless the container performs some action which causes the kernel to
> call request_module(), which then loads some kernel module,

A local unprivileged user can do the same thing.  I reject the popular
notion that linux is a single user operating system.  More interesting
are the (very real) cases where root in a container can do something
which a local unprivileged user could not do.  Since a local unprivileged
user can always create a new namespace, *those* constitute a real and
interesting problem.

> potentially containing cr*p unmaintained code which was included when
> the distro compiled the world, into the host kernel.

> This is an attack vector that doesn't exist if you are using VM's.

Until the vm tenant uses a trivial vm escape.

> And in general, the attack surface of the entire Linux
> kernel<->userspace API is far larger than that which is exposed by the
> guest<->host interface.
> 
> For that reason, containers are *far* more insecure than VM's, since
> once the attacker gets root on the guest VM, they then have to attack
> the hypervisor interface.  And if you compare the attack surface of
> the two, it's pretty clear which is larger, and it's not the
> hypervisor interface.

Any time anyone spends a day looking at either the black hole that is
the hardware emulators or the xen and kvm code itself they walk away
with a set of cve's.  It *should* be more secure, it's not.  You're
telling me your house is safe because you put up a no tresspassing
sign.

-serge

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2017-11-30 17:17 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-27 17:18 [PATCH v5 next 0/5] Improve Module autoloading infrastructure Djalal Harouni
2017-11-27 17:18 ` [PATCH v5 next 1/5] modules:capabilities: add request_module_cap() Djalal Harouni
2017-11-27 18:48   ` Randy Dunlap
2017-11-27 21:35     ` Djalal Harouni
2017-11-28 19:14   ` Luis R. Rodriguez
2017-11-28 20:11     ` Kees Cook
2017-11-28 21:16       ` Luis R. Rodriguez
2017-11-28 21:33         ` Djalal Harouni
2017-11-28 22:18           ` Luis R. Rodriguez
2017-11-28 22:52             ` Djalal Harouni
2017-11-28 21:39         ` Kees Cook
2017-11-28 22:12           ` Luis R. Rodriguez
2017-11-28 22:18             ` Kees Cook
2017-11-28 22:48               ` Luis R. Rodriguez
2017-11-29  7:49                 ` Michal Kubecek
2017-11-29 13:46           ` Alan Cox
2017-11-29 14:50             ` David Miller
2017-11-29 15:54               ` Theodore Ts'o
2017-11-29 15:58                 ` David Miller
2017-11-29 16:29                   ` Theodore Ts'o
2017-11-29 22:45                   ` Linus Torvalds
2017-11-30  0:06                     ` Kees Cook
2017-11-29 17:28                 ` Serge E. Hallyn
2017-11-30  0:35                   ` Theodore Ts'o
2017-11-30 17:17                     ` Serge E. Hallyn
2017-11-28 20:18     ` Djalal Harouni
2017-11-27 17:18 ` [PATCH v5 next 2/5] modules:capabilities: add cap_kernel_module_request() permission check Djalal Harouni
2017-11-30  2:05   ` Luis R. Rodriguez
2017-11-27 17:18 ` [PATCH v5 next 3/5] modules:capabilities: automatic module loading restriction Djalal Harouni
2017-11-30  1:23   ` Luis R. Rodriguez
2017-11-30 12:22     ` Djalal Harouni
2017-11-27 17:18 ` [PATCH v5 next 4/5] modules:capabilities: add a per-task modules auto-load mode Djalal Harouni
2017-11-27 17:18 ` [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules Djalal Harouni
2017-11-27 18:44   ` Linus Torvalds
2017-11-27 21:41     ` Djalal Harouni
2017-11-27 22:04       ` Linus Torvalds
2017-11-27 22:59         ` Kees Cook
2017-11-27 23:14           ` Linus Torvalds
2017-11-27 23:19             ` Kees Cook
2017-11-27 23:35               ` Linus Torvalds
2017-11-28  1:23             ` Kees Cook
2017-11-27 18:41 ` [PATCH v5 next 0/5] Improve Module autoloading infrastructure Linus Torvalds
2017-11-27 19:02   ` Linus Torvalds
2017-11-27 19:12     ` Linus Torvalds
2017-11-27 21:31       ` Djalal Harouni
2017-11-27 19:14   ` David Miller
2017-11-27 22:31     ` James Morris
2017-11-27 23:04       ` Kees Cook
2017-11-27 23:44         ` James Morris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).