Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [RFC PATCH 4/6] security/fbfam: Add a new sysctl to control the crashing rate threshold
From: Kees Cook @ 2020-09-10 23:14 UTC (permalink / raw)
  To: kernel-hardening
  Cc: John Wood, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, linux-kernel, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-5-keescook@chromium.org>

On Thu, Sep 10, 2020 at 01:21:05PM -0700, Kees Cook wrote:
> From: John Wood <john.wood@gmx.com>
> 
> This is a previous step to add the detection feature.
> 
> A fork brute force attack will be detected when an application crashes
> quickly. Since, a rate can be defined as a time per fault, add a new
> sysctl to control the crashing rate threshold.
> 
> This way, each system can tune the detection's sensibility adjusting the
> milliseconds per fault. So, if the application's crashing rate falls
> under this threshold an attack will be detected. So, the higher this
> value, the faster an attack will be detected.
> 
> Signed-off-by: John Wood <john.wood@gmx.com>
> ---
>  include/fbfam/fbfam.h   |  4 ++++
>  kernel/sysctl.c         |  9 +++++++++
>  security/fbfam/Makefile |  1 +
>  security/fbfam/fbfam.c  | 11 +++++++++++
>  security/fbfam/sysctl.c | 20 ++++++++++++++++++++
>  5 files changed, 45 insertions(+)
>  create mode 100644 security/fbfam/sysctl.c
> 
> diff --git a/include/fbfam/fbfam.h b/include/fbfam/fbfam.h
> index b5b7d1127a52..2cfe51d2b0d5 100644
> --- a/include/fbfam/fbfam.h
> +++ b/include/fbfam/fbfam.h
> @@ -3,8 +3,12 @@
>  #define _FBFAM_H_
>  
>  #include <linux/sched.h>
> +#include <linux/sysctl.h>
>  
>  #ifdef CONFIG_FBFAM
> +#ifdef CONFIG_SYSCTL
> +extern struct ctl_table fbfam_sysctls[];
> +#endif

Instead of doing the extern and adding to sysctl.c, this can all be done
directly (dynamically) from the fbfam.c file instead.

>  int fbfam_fork(struct task_struct *child);
>  int fbfam_execve(void);
>  int fbfam_exit(void);
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 09e70ee2332e..c3b4d737bef3 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -77,6 +77,8 @@
>  #include <linux/uaccess.h>
>  #include <asm/processor.h>
>  
> +#include <fbfam/fbfam.h>
> +
>  #ifdef CONFIG_X86
>  #include <asm/nmi.h>
>  #include <asm/stacktrace.h>
> @@ -2660,6 +2662,13 @@ static struct ctl_table kern_table[] = {
>  		.extra1		= SYSCTL_ZERO,
>  		.extra2		= SYSCTL_ONE,
>  	},
> +#endif
> +#ifdef CONFIG_FBFAM
> +	{
> +		.procname	= "fbfam",
> +		.mode		= 0555,
> +		.child		= fbfam_sysctls,
> +	},
>  #endif
>  	{ }
>  };
> diff --git a/security/fbfam/Makefile b/security/fbfam/Makefile
> index f4b9f0b19c44..b8d5751ecea4 100644
> --- a/security/fbfam/Makefile
> +++ b/security/fbfam/Makefile
> @@ -1,2 +1,3 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_FBFAM) += fbfam.o
> +obj-$(CONFIG_SYSCTL) += sysctl.o
> diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
> index 0387f95f6408..9be4639b72eb 100644
> --- a/security/fbfam/fbfam.c
> +++ b/security/fbfam/fbfam.c
> @@ -7,6 +7,17 @@
>  #include <linux/refcount.h>
>  #include <linux/slab.h>
>  
> +/**
> + * sysctl_crashing_rate_threshold - Crashing rate threshold.
> + *
> + * The rate's units are in milliseconds per fault.
> + *
> + * A fork brute force attack will be detected if the application's crashing rate
> + * falls under this threshold. So, the higher this value, the faster an attack
> + * will be detected.
> + */
> +unsigned long sysctl_crashing_rate_threshold = 30000;

I would move the sysctls here, instead. (Also, the above should be
const.)

> +
>  /**
>   * struct fbfam_stats - Fork brute force attack mitigation statistics.
>   * @refc: Reference counter.
> diff --git a/security/fbfam/sysctl.c b/security/fbfam/sysctl.c
> new file mode 100644
> index 000000000000..430323ad8e9f
> --- /dev/null
> +++ b/security/fbfam/sysctl.c
> @@ -0,0 +1,20 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/sysctl.h>
> +
> +extern unsigned long sysctl_crashing_rate_threshold;
> +static unsigned long ulong_one = 1;
> +static unsigned long ulong_max = ULONG_MAX;
> +
> +struct ctl_table fbfam_sysctls[] = {
> +	{
> +		.procname	= "crashing_rate_threshold",
> +		.data		= &sysctl_crashing_rate_threshold,
> +		.maxlen		= sizeof(sysctl_crashing_rate_threshold),
> +		.mode		= 0644,
> +		.proc_handler	= proc_doulongvec_minmax,
> +		.extra1		= &ulong_one,
> +		.extra2		= &ulong_max,
> +	},
> +	{ }
> +};

I wouldn't bother splitting this into a separate file. (Just leave it in
fbfam.c)

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH v20 05/23] net: Prepare UDS for security module stacking
From: Paul Moore @ 2020-09-10 14:11 UTC (permalink / raw)
  To: John Johansen
  Cc: Stephen Smalley, Casey Schaufler, Casey Schaufler, James Morris,
	LSM List, SElinux list, linux-audit, Kees Cook, Tetsuo Handa,
	Stephen Smalley
In-Reply-To: <b67799e2-fa22-2890-698d-f410913b0c8a@canonical.com>

On Wed, Sep 9, 2020 at 2:47 PM John Johansen
<john.johansen@canonical.com> wrote:
> ... For now Casey can drop it from this series.

As long as that whenever it reappears there is at the very least some
note of the limits in the commit description and the code (via
comments in the struct).  Of course that assumes we can't find an
alternate solution that we can all agree on which doesn't have these
stacking limits.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [RFC PATCH 1/6] security/fbfam: Add a Kconfig to enable the fbfam feature
From: Jann Horn @ 2020-09-10 21:21 UTC (permalink / raw)
  To: Kees Cook, John Wood
  Cc: Kernel Hardening, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, kernel list, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-2-keescook@chromium.org>

On Thu, Sep 10, 2020 at 10:21 PM Kees Cook <keescook@chromium.org> wrote:
> From: John Wood <john.wood@gmx.com>
>
> Add a menu entry under "Security options" to enable the "Fork brute
> force attack mitigation" feature.
[...]
> +config FBFAM

Please give this a more descriptive name than FBFAM. Some name where,
if a random kernel developer sees an "#ifdef" with that name in some
random piece of kernel code, they immediately have a rough idea for
what kind of feature this is.

Perhaps something like THROTTLE_FORK_CRASHES. Or something else that
is equally descriptive.

> +       bool "Fork brute force attack mitigation"
> +       default n

"default n" is superfluous and should AFAIK be omitted.

> +       help
> +         This is a user defense that detects any fork brute force attack
> +         based on the application's crashing rate. When this measure is
> +         triggered the fork system call is blocked.

This help text claims that the mitigation will block fork(), but patch
6/6 actually kills the process hierarchy.

^ permalink raw reply

* Re: [RFC PATCH 5/6] security/fbfam: Detect a fork brute force attack
From: Jann Horn @ 2020-09-10 21:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kernel Hardening, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, kernel list, linux-fsdevel,
	linux-security-module
In-Reply-To: <20200910202107.3799376-6-keescook@chromium.org>

On Thu, Sep 10, 2020 at 10:22 PM Kees Cook <keescook@chromium.org> wrote:
> To detect a fork brute force attack it is necessary to compute the
> crashing rate of the application. This calculation is performed in each
> fatal fail of a task, or in other words, when a core dump is triggered.
> If this rate shows that the application is crashing quickly, there is a
> clear signal that an attack is happening.
>
> Since the crashing rate is computed in milliseconds per fault, if this
> rate goes under a certain threshold a warning is triggered.
[...]
> +/**
> + * fbfam_handle_attack() - Fork brute force attack detection.
> + * @signal: Signal number that causes the core dump.
> + *
> + * The crashing rate of an application is computed in milliseconds per fault in
> + * each crash. So, if this rate goes under a certain threshold there is a clear
> + * signal that the application is crashing quickly. At this moment, a fork brute
> + * force attack is happening.
> + *
> + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> + *         otherwise.
> + */
> +int fbfam_handle_attack(int signal)
> +{
> +       struct fbfam_stats *stats = current->fbfam_stats;
> +       u64 delta_jiffies, delta_time;
> +       u64 crashing_rate;
> +
> +       if (!stats)
> +               return -EFAULT;
> +
> +       if (!(signal == SIGILL || signal == SIGBUS || signal == SIGKILL ||
> +             signal == SIGSEGV || signal == SIGSYS))
> +               return 0;

As far as I can tell, you can never get here with SIGKILL, since
SIGKILL doesn't trigger core dumping and also isn't used by seccomp?

> +
> +       stats->faults += 1;

This is a data race. If you want to be able to increment a variable
that may be concurrently incremented by other tasks, use either
locking or the atomic_t helpers.

> +       delta_jiffies = get_jiffies_64() - stats->jiffies;
> +       delta_time = jiffies64_to_msecs(delta_jiffies);
> +       crashing_rate = delta_time / (u64)stats->faults;

Do I see this correctly, is this computing the total runtime of this
process hierarchy divided by the total number of faults seen in this
process hierarchy? If so, you may want to reconsider whether that's
really the behavior you want. For example, if I configure the minimum
period between crashes to be 30s (as is the default in the sysctl
patch), and I try to attack a server that has been running without any
crashes for a month, I'd instantly be able to crash around
30*24*60*60/30 = 86400 times before the detection kicks in. That seems
suboptimal.

(By the way, it kind of annoys me that you call it the "rate" when
it's actually the inverse of the rate. "Period" might be more
appropriate?)



> +       if (crashing_rate < (u64)sysctl_crashing_rate_threshold)
> +               pr_warn("fbfam: Fork brute force attack detected\n");
> +
> +       return 0;
> +}
> +
> --
> 2.25.1
>

^ permalink raw reply

* [PATCH] socket.7,unix.7: add initial description for SO_PEERSEC
From: Stephen Smalley @ 2020-09-10 21:00 UTC (permalink / raw)
  To: mtk.manpages; +Cc: linux-man, linux-security-module, selinux, Stephen Smalley

SO_PEERSEC was introduced for AF_UNIX stream sockets connected via
connect(2) in Linux 2.6.2 and later augmented to support AF_UNIX stream
and datagram sockets created via socketpair(2) in Linux 4.18.  Document
SO_PEERSEC in the socket.7 and unix.7 man pages following the example
of the existing SO_PEERCRED descriptions.  SO_PEERSEC is also supported
on AF_INET sockets when using labeled IPSEC or NetLabel but defer
adding a description of that support to a separate patch.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
---
 man7/socket.7 |  5 +++++
 man7/unix.7   | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/man7/socket.7 b/man7/socket.7
index 21e891791..c3635f95b 100644
--- a/man7/socket.7
+++ b/man7/socket.7
@@ -690,6 +690,11 @@ Return the credentials of the peer process connected to this socket.
 For further details, see
 .BR unix (7).
 .TP
+.BR SO_PEERSEC " (since Linux 2.6.2)"
+Return the security context of the peer socket connected to this socket.
+For further details, see
+.BR unix (7).
+.TP
 .B SO_PRIORITY
 Set the protocol-defined priority for all packets to be sent on
 this socket.
diff --git a/man7/unix.7 b/man7/unix.7
index f61b51424..1032c0aa1 100644
--- a/man7/unix.7
+++ b/man7/unix.7
@@ -349,6 +349,46 @@ stream sockets and for
 .B AF_UNIX
 stream and datagram socket pairs created using
 .BR socketpair (2).
+.TP
+.B SO_PEERSEC
+This read-only socket option returns the
+security context of the peer socket connected to this socket.
+By default, this will be the same as the security context of
+the process that created the peer socket unless overridden
+by the policy or by a process with the required permissions.
+.IP
+The argument to
+.BR getsockopt (2)
+is a pointer to a
+buffer of the specified length in bytes
+into which the security context string will be copied.
+If the buffer length is less than the length of the security
+context string, then
+.BR getsockopt (2)
+will return the required length
+via
+.I optlen
+and return \-1 and sets
+.I errno
+to
+.BR ERANGE .
+The caller should allocate at least
+.BR NAME_MAX
+bytes for the buffer initially although this is not guaranteed
+to be sufficient.  Resizing the buffer to the returned length
+and retrying may be necessary.
+.IP
+For SELinux, the security context string is a null-terminated
+string and the returned length includes the terminating null.
+Other security modules may differ.
+.IP
+The use of this option for sockets in the
+.B AF_UNIX
+address family
+is supported since Linux 2.6.2 for connected stream sockets and
+since Linux 4.18, also for stream and datagram socket pairs created
+using
+.BR socketpair (2).
 .\"
 .SS Autobind feature
 If a
-- 
2.25.1


^ permalink raw reply related

* Re: [RFC PATCH 6/6] security/fbfam: Mitigate a fork brute force attack
From: Jann Horn @ 2020-09-10 20:55 UTC (permalink / raw)
  To: Kees Cook, John Wood
  Cc: Kernel Hardening, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, kernel list, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-7-keescook@chromium.org>

On Thu, Sep 10, 2020 at 10:22 PM Kees Cook <keescook@chromium.org> wrote:
> In order to mitigate a fork brute force attack it is necessary to kill
> all the offending tasks. This tasks are all the ones that share the
> statistical data with the current task (the task that has crashed).
>
> Since the attack detection is done in the function fbfam_handle_attack()
> that is called every time a core dump is triggered, only is needed to
> kill the others tasks that share the same statistical data, not the
> current one as this is in the path to be killed.
>
> When the SIGKILL signal is sent to the offending tasks from the function
> fbfam_kill_tasks(), this one will be called again during the core dump
> due to the shared statistical data shows a quickly crashing rate. So, to
> avoid kill again the same tasks due to a recursive call of this
> function, it is necessary to disable the attack detection.
>
> To disable this attack detection, add a condition in the function
> fbfam_handle_attack() to not compute the crashing rate when the jiffies
> stored in the statistical data are set to zero.
[...]
>  /**
> - * fbfam_handle_attack() - Fork brute force attack detection.
> + * fbfam_kill_tasks() - Kill the offending tasks
> + *
> + * When a fork brute force attack is detected it is necessary to kill all the
> + * offending tasks. Since this function is called from fbfam_handle_attack(),
> + * and so, every time a core dump is triggered, only is needed to kill the
> + * others tasks that share the same statistical data, not the current one as
> + * this is in the path to be killed.
> + *
> + * When the SIGKILL signal is sent to the offending tasks, this function will be
> + * called again during the core dump due to the shared statistical data shows a
> + * quickly crashing rate. So, to avoid kill again the same tasks due to a
> + * recursive call of this function, it is necessary to disable the attack
> + * detection setting the jiffies to zero.
> + *
> + * To improve the for_each_process loop it is possible to end it when all the
> + * tasks that shared the same statistics are found.

This is not a fastpath, there's no need to be clever and optimize
things here, please get rid of that optimization. Especially since
that fastpath looks racy against concurrent execve().

> + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> + *         otherwise.
> + */
> +static int fbfam_kill_tasks(void)
> +{
> +       struct fbfam_stats *stats = current->fbfam_stats;
> +       struct task_struct *p;
> +       unsigned int to_kill, killed = 0;
> +
> +       if (!stats)
> +               return -EFAULT;
> +
> +       to_kill = refcount_read(&stats->refc) - 1;
> +       if (!to_kill)
> +               return 0;
> +
> +       /* Disable the attack detection */
> +       stats->jiffies = 0;
> +       rcu_read_lock();
> +
> +       for_each_process(p) {
> +               if (p == current || p->fbfam_stats != stats)

p->fbfam_stats could change concurrently, you should at least use
READ_ONCE() here.

Also, if this codepath is hit by a non-leader thread, "p == current"
will always be false, and you'll end up killing the caller, too. You
may want to compare with current->group_leader instead.


> +                       continue;
> +
> +               do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
> +               pr_warn("fbfam: Offending process with PID %d killed\n",
> +                       p->pid);

Normally pr_*() messages about tasks mention not just the pid, but
also the ->comm name of the task.

> +               killed += 1;
> +               if (killed >= to_kill)
> +                       break;
> +       }
> +
> +       rcu_read_unlock();
> +       return 0;
> +}

^ permalink raw reply

* [RFC PATCH 1/6] security/fbfam: Add a Kconfig to enable the fbfam feature
From: Kees Cook @ 2020-09-10 20:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, linux-kernel, linux-fsdevel,
	linux-security-module
In-Reply-To: <20200910202107.3799376-1-keescook@chromium.org>

From: John Wood <john.wood@gmx.com>

Add a menu entry under "Security options" to enable the "Fork brute
force attack mitigation" feature.

Signed-off-by: John Wood <john.wood@gmx.com>
---
 security/Kconfig       |  1 +
 security/fbfam/Kconfig | 10 ++++++++++
 2 files changed, 11 insertions(+)
 create mode 100644 security/fbfam/Kconfig

diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..00a90e25b8d5 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -290,6 +290,7 @@ config LSM
 	  If unsure, leave this as the default.
 
 source "security/Kconfig.hardening"
+source "security/fbfam/Kconfig"
 
 endmenu
 
diff --git a/security/fbfam/Kconfig b/security/fbfam/Kconfig
new file mode 100644
index 000000000000..bbe7f6aad369
--- /dev/null
+++ b/security/fbfam/Kconfig
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+config FBFAM
+	bool "Fork brute force attack mitigation"
+	default n
+	help
+	  This is a user defense that detects any fork brute force attack
+	  based on the application's crashing rate. When this measure is
+	  triggered the fork system call is blocked.
+
+	  If you are unsure how to answer this question, answer N.
-- 
2.25.1


^ permalink raw reply related

* [RFC PATCH 3/6] security/fbfam: Use the api to manage statistics
From: Kees Cook @ 2020-09-10 20:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, linux-kernel, linux-fsdevel,
	linux-security-module
In-Reply-To: <20200910202107.3799376-1-keescook@chromium.org>

From: John Wood <john.wood@gmx.com>

Use the previous defined api to manage statistics calling it accordingly
when a task forks, calls execve or exits.

Signed-off-by: John Wood <john.wood@gmx.com>
---
 fs/exec.c     | 2 ++
 kernel/exit.c | 2 ++
 kernel/fork.c | 4 ++++
 3 files changed, 8 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index a91003e28eaa..b30118674d32 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -71,6 +71,7 @@
 #include "internal.h"
 
 #include <trace/events/sched.h>
+#include <fbfam/fbfam.h>
 
 static int bprm_creds_from_file(struct linux_binprm *bprm);
 
@@ -1940,6 +1941,7 @@ static int bprm_execve(struct linux_binprm *bprm,
 	task_numa_free(current, false);
 	if (displaced)
 		put_files_struct(displaced);
+	fbfam_execve();
 	return retval;
 
 out:
diff --git a/kernel/exit.c b/kernel/exit.c
index 733e80f334e7..39a6139dcf31 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -67,6 +67,7 @@
 #include <linux/uaccess.h>
 #include <asm/unistd.h>
 #include <asm/mmu_context.h>
+#include <fbfam/fbfam.h>
 
 static void __unhash_process(struct task_struct *p, bool group_dead)
 {
@@ -852,6 +853,7 @@ void __noreturn do_exit(long code)
 		__this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied);
 	exit_rcu();
 	exit_tasks_rcu_finish();
+	fbfam_exit();
 
 	lockdep_free_task(tsk);
 	do_task_dead();
diff --git a/kernel/fork.c b/kernel/fork.c
index 49677d668de4..c933838450a8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -107,6 +107,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/task.h>
 
+#include <fbfam/fbfam.h>
+
 /*
  * Minimum number of threads to boot the kernel
  */
@@ -941,6 +943,8 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 #ifdef CONFIG_MEMCG
 	tsk->active_memcg = NULL;
 #endif
+
+	fbfam_fork(tsk);
 	return tsk;
 
 free_stack:
-- 
2.25.1


^ permalink raw reply related

* Re: [RFC PATCH 3/6] security/fbfam: Use the api to manage statistics
From: Jann Horn @ 2020-09-10 20:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kernel Hardening, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, kernel list, linux-fsdevel,
	linux-security-module
In-Reply-To: <20200910202107.3799376-4-keescook@chromium.org>

On Thu, Sep 10, 2020 at 10:21 PM Kees Cook <keescook@chromium.org> wrote:
> Use the previous defined api to manage statistics calling it accordingly
> when a task forks, calls execve or exits.

You defined functions that return error codes in the previous patch,
but here you ignore the return values. That's a bad idea.

You should probably check the return value in execve() (and fail the
execution in the case where memory allocation fails), and make it so
that the other functions always succeed.

^ permalink raw reply

* Re: [RESEND][RFC PATCH 0/6] Fork brute force attack mitigation (fbfam)
From: Jann Horn @ 2020-09-10 20:39 UTC (permalink / raw)
  To: Kees Cook, John Wood
  Cc: Kernel Hardening, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, kernel list, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-1-keescook@chromium.org>

On Thu, Sep 10, 2020 at 10:21 PM Kees Cook <keescook@chromium.org> wrote:
> [kees: re-sending this series on behalf of John Wood <john.wood@gmx.com>
> also visible at https://github.com/johwood/linux fbfam]
[...]
> The goal of this patch serie is to detect and mitigate a fork brute force
> attack.
>
> Attacks with the purpose to break ASLR or bypass canaries traditionaly use
> some level of brute force with the help of the fork system call. This is
> possible since when creating a new process using fork its memory contents
> are the same as those of the parent process (the process that called the
> fork system call). So, the attacker can test the memory infinite times to
> find the correct memory values or the correct memory addresses without
> worrying about crashing the application.

For the next version of this patchset, you may want to clarify that
this is intended to stop brute force attacks *against vulnerable
userspace processes* that fork off worker processes. I was halfway
through the patch series before I realized that.

^ permalink raw reply

* [RFC PATCH 6/6] security/fbfam: Mitigate a fork brute force attack
From: Kees Cook @ 2020-09-10 20:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, linux-kernel, linux-fsdevel,
	linux-security-module
In-Reply-To: <20200910202107.3799376-1-keescook@chromium.org>

From: John Wood <john.wood@gmx.com>

In order to mitigate a fork brute force attack it is necessary to kill
all the offending tasks. This tasks are all the ones that share the
statistical data with the current task (the task that has crashed).

Since the attack detection is done in the function fbfam_handle_attack()
that is called every time a core dump is triggered, only is needed to
kill the others tasks that share the same statistical data, not the
current one as this is in the path to be killed.

When the SIGKILL signal is sent to the offending tasks from the function
fbfam_kill_tasks(), this one will be called again during the core dump
due to the shared statistical data shows a quickly crashing rate. So, to
avoid kill again the same tasks due to a recursive call of this
function, it is necessary to disable the attack detection.

To disable this attack detection, add a condition in the function
fbfam_handle_attack() to not compute the crashing rate when the jiffies
stored in the statistical data are set to zero.

Signed-off-by: John Wood <john.wood@gmx.com>
---
 security/fbfam/fbfam.c | 76 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 71 insertions(+), 5 deletions(-)

diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
index 3aa669e4ea51..173a6122390f 100644
--- a/security/fbfam/fbfam.c
+++ b/security/fbfam/fbfam.c
@@ -4,8 +4,11 @@
 #include <linux/errno.h>
 #include <linux/gfp.h>
 #include <linux/jiffies.h>
+#include <linux/pid.h>
 #include <linux/printk.h>
+#include <linux/rcupdate.h>
 #include <linux/refcount.h>
+#include <linux/sched/signal.h>
 #include <linux/signal.h>
 #include <linux/slab.h>
 
@@ -24,7 +27,8 @@ unsigned long sysctl_crashing_rate_threshold = 30000;
  * struct fbfam_stats - Fork brute force attack mitigation statistics.
  * @refc: Reference counter.
  * @faults: Number of crashes since jiffies.
- * @jiffies: First fork or execve timestamp.
+ * @jiffies: First fork or execve timestamp. If zero, the attack detection is
+ *           disabled.
  *
  * The purpose of this structure is to manage all the necessary information to
  * compute the crashing rate of an application. So, it holds a first fork or
@@ -175,13 +179,69 @@ int fbfam_exit(void)
 }
 
 /**
- * fbfam_handle_attack() - Fork brute force attack detection.
+ * fbfam_kill_tasks() - Kill the offending tasks
+ *
+ * When a fork brute force attack is detected it is necessary to kill all the
+ * offending tasks. Since this function is called from fbfam_handle_attack(),
+ * and so, every time a core dump is triggered, only is needed to kill the
+ * others tasks that share the same statistical data, not the current one as
+ * this is in the path to be killed.
+ *
+ * When the SIGKILL signal is sent to the offending tasks, this function will be
+ * called again during the core dump due to the shared statistical data shows a
+ * quickly crashing rate. So, to avoid kill again the same tasks due to a
+ * recursive call of this function, it is necessary to disable the attack
+ * detection setting the jiffies to zero.
+ *
+ * To improve the for_each_process loop it is possible to end it when all the
+ * tasks that shared the same statistics are found.
+ *
+ * Return: -EFAULT if the current task doesn't have statistical data. Zero
+ *         otherwise.
+ */
+static int fbfam_kill_tasks(void)
+{
+	struct fbfam_stats *stats = current->fbfam_stats;
+	struct task_struct *p;
+	unsigned int to_kill, killed = 0;
+
+	if (!stats)
+		return -EFAULT;
+
+	to_kill = refcount_read(&stats->refc) - 1;
+	if (!to_kill)
+		return 0;
+
+	/* Disable the attack detection */
+	stats->jiffies = 0;
+	rcu_read_lock();
+
+	for_each_process(p) {
+		if (p == current || p->fbfam_stats != stats)
+			continue;
+
+		do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
+		pr_warn("fbfam: Offending process with PID %d killed\n",
+			p->pid);
+
+		killed += 1;
+		if (killed >= to_kill)
+			break;
+	}
+
+	rcu_read_unlock();
+	return 0;
+}
+
+/**
+ * fbfam_handle_attack() - Fork brute force attack detection and mitigation.
  * @signal: Signal number that causes the core dump.
  *
  * The crashing rate of an application is computed in milliseconds per fault in
  * each crash. So, if this rate goes under a certain threshold there is a clear
  * signal that the application is crashing quickly. At this moment, a fork brute
- * force attack is happening.
+ * force attack is happening. Under this scenario it is necessary to kill all
+ * the offending tasks in order to mitigate the attack.
  *
  * Return: -EFAULT if the current task doesn't have statistical data. Zero
  *         otherwise.
@@ -195,6 +255,10 @@ int fbfam_handle_attack(int signal)
 	if (!stats)
 		return -EFAULT;
 
+	/* The attack detection is disabled */
+	if (!stats->jiffies)
+		return 0;
+
 	if (!(signal == SIGILL || signal == SIGBUS || signal == SIGKILL ||
 	      signal == SIGSEGV || signal == SIGSYS))
 		return 0;
@@ -205,9 +269,11 @@ int fbfam_handle_attack(int signal)
 	delta_time = jiffies64_to_msecs(delta_jiffies);
 	crashing_rate = delta_time / (u64)stats->faults;
 
-	if (crashing_rate < (u64)sysctl_crashing_rate_threshold)
-		pr_warn("fbfam: Fork brute force attack detected\n");
+	if (crashing_rate >= (u64)sysctl_crashing_rate_threshold)
+		return 0;
 
+	pr_warn("fbfam: Fork brute force attack detected\n");
+	fbfam_kill_tasks();
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related

* [RFC PATCH 5/6] security/fbfam: Detect a fork brute force attack
From: Kees Cook @ 2020-09-10 20:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, linux-kernel, linux-fsdevel,
	linux-security-module
In-Reply-To: <20200910202107.3799376-1-keescook@chromium.org>

From: John Wood <john.wood@gmx.com>

To detect a fork brute force attack it is necessary to compute the
crashing rate of the application. This calculation is performed in each
fatal fail of a task, or in other words, when a core dump is triggered.
If this rate shows that the application is crashing quickly, there is a
clear signal that an attack is happening.

Since the crashing rate is computed in milliseconds per fault, if this
rate goes under a certain threshold a warning is triggered.

Signed-off-by: John Wood <john.wood@gmx.com>
---
 fs/coredump.c          |  2 ++
 include/fbfam/fbfam.h  |  2 ++
 security/fbfam/fbfam.c | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)

diff --git a/fs/coredump.c b/fs/coredump.c
index 76e7c10edfc0..d4ba4e1828d5 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -51,6 +51,7 @@
 #include "internal.h"
 
 #include <trace/events/sched.h>
+#include <fbfam/fbfam.h>
 
 int core_uses_pid;
 unsigned int core_pipe_limit;
@@ -825,6 +826,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 fail_creds:
 	put_cred(cred);
 fail:
+	fbfam_handle_attack(siginfo->si_signo);
 	return;
 }
 
diff --git a/include/fbfam/fbfam.h b/include/fbfam/fbfam.h
index 2cfe51d2b0d5..9ac8e33d8291 100644
--- a/include/fbfam/fbfam.h
+++ b/include/fbfam/fbfam.h
@@ -12,10 +12,12 @@ extern struct ctl_table fbfam_sysctls[];
 int fbfam_fork(struct task_struct *child);
 int fbfam_execve(void);
 int fbfam_exit(void);
+int fbfam_handle_attack(int signal);
 #else
 static inline int fbfam_fork(struct task_struct *child) { return 0; }
 static inline int fbfam_execve(void) { return 0; }
 static inline int fbfam_exit(void) { return 0; }
+static inline int fbfam_handle_attack(int signal) { return 0; }
 #endif
 
 #endif /* _FBFAM_H_ */
diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
index 9be4639b72eb..3aa669e4ea51 100644
--- a/security/fbfam/fbfam.c
+++ b/security/fbfam/fbfam.c
@@ -4,7 +4,9 @@
 #include <linux/errno.h>
 #include <linux/gfp.h>
 #include <linux/jiffies.h>
+#include <linux/printk.h>
 #include <linux/refcount.h>
+#include <linux/signal.h>
 #include <linux/slab.h>
 
 /**
@@ -172,3 +174,40 @@ int fbfam_exit(void)
 	return 0;
 }
 
+/**
+ * fbfam_handle_attack() - Fork brute force attack detection.
+ * @signal: Signal number that causes the core dump.
+ *
+ * The crashing rate of an application is computed in milliseconds per fault in
+ * each crash. So, if this rate goes under a certain threshold there is a clear
+ * signal that the application is crashing quickly. At this moment, a fork brute
+ * force attack is happening.
+ *
+ * Return: -EFAULT if the current task doesn't have statistical data. Zero
+ *         otherwise.
+ */
+int fbfam_handle_attack(int signal)
+{
+	struct fbfam_stats *stats = current->fbfam_stats;
+	u64 delta_jiffies, delta_time;
+	u64 crashing_rate;
+
+	if (!stats)
+		return -EFAULT;
+
+	if (!(signal == SIGILL || signal == SIGBUS || signal == SIGKILL ||
+	      signal == SIGSEGV || signal == SIGSYS))
+		return 0;
+
+	stats->faults += 1;
+
+	delta_jiffies = get_jiffies_64() - stats->jiffies;
+	delta_time = jiffies64_to_msecs(delta_jiffies);
+	crashing_rate = delta_time / (u64)stats->faults;
+
+	if (crashing_rate < (u64)sysctl_crashing_rate_threshold)
+		pr_warn("fbfam: Fork brute force attack detected\n");
+
+	return 0;
+}
+
-- 
2.25.1


^ permalink raw reply related

* [RESEND][RFC PATCH 0/6] Fork brute force attack mitigation (fbfam)
From: Kees Cook @ 2020-09-10 20:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, linux-kernel, linux-fsdevel,
	linux-security-module

[kees: re-sending this series on behalf of John Wood <john.wood@gmx.com>
 also visible at https://github.com/johwood/linux fbfam]

From: John Wood <john.wood@gmx.com>

The goal of this patch serie is to detect and mitigate a fork brute force
attack.

Attacks with the purpose to break ASLR or bypass canaries traditionaly use
some level of brute force with the help of the fork system call. This is
possible since when creating a new process using fork its memory contents
are the same as those of the parent process (the process that called the
fork system call). So, the attacker can test the memory infinite times to
find the correct memory values or the correct memory addresses without
worrying about crashing the application.

Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this implementation.

Other implementations
---------------------

The public version of grsecurity, as a summary, is based on the idea of
delay the fork system call if a child died due to a fatal error. This has
some issues:

1.- Bad practices: Add delays to the kernel is, in general, a bad idea.

2.- Weak points: This protection can be bypassed using two different
    methods since it acts only when the fork is called after a child has
    crashed.

    2.1.- Bypass 1: So, it would still be possible for an attacker to fork
	  a big amount of children (in the order of thousands), then probe
	  all of them, and finally wait the protection time before repeat
	  the steps.

    2.2.- Bypass 2: This method is based on the idea that the protection
	  doesn't act if the parent crashes. So, it would still be possible
	  for an attacker to fork a process and probe itself. Then, fork
	  the child process and probe itself again. This way, these steps
	  can be repeated infinite times without any mitigation.

This implementation
-------------------

The main idea behind this implementation is to improve the existing ones
focusing on the weak points annotated before. So, the solution for the
first bypass method is to detect a fast crash rate instead of only one
simple crash. For the second bypass method the solution is to detect both
the crash of parent and child processes. Moreover, as a mitigation method
it is better to kill all the offending tasks involve in the attack instead
of use delays.

So, the solution to the two bypass methods previously commented is to use
some statistical data shared across all the processes that can have the
same memory contents. Or in other words, a statistical data shared between
all the processes that fork the task 0, and all the processes that fork
after an execve system call.

These statistics hold the timestamp for the first fork (case of a fork of
task zero) or the timestamp for the execve system call (the other case).
Also, hold the number of faults of all the tasks that share the same
statistical data since the commented timestamp.

With this information it is possible to detect a brute force attack when a
task die in a fatal way computing the crashing rate. This rate shows the
milliseconds per fault and when it goes under a certain threshold there is
a clear signal that something malicious is happening.

Once detected, the mitigation only kills the processes that share the same
statistical data and so, all the tasks that can have the same memory
contents. This way, an attack is rejected.

The fbfam feature can be enabled, disabled and tuned as follows:

1.- Per system enabling: This feature can be enabled in build time using
    the config application under:

    Security options  --->  Fork brute force attack mitigation

2.- Per process enabling/disabling: To allow that specific applications can
    turn off or turn on the detection and mitigation of a fork brute force
    attack when required, there are two new prctls.

    prctl(PR_FBFAM_ENABLE, 0, 0, 0, 0)  -> To enable the feature
    prctl(PR_FBFAM_DISABLE, 0, 0, 0, 0) -> To disable the feature

    Both functions return zero on success and -EFAULT if the current task
    doesn't have statistical data.

3.- Fine tuning: To customize the detection's sensibility there is a new
    sysctl that allows to set the crashing rate threshold. It is accessible
    through the file:

    /proc/sys/kernel/fbfam/crashing_rate_threshold

    The units are in milliseconds per fault and the attack's mitigation is
    triggered if the crashing rate of an application goes under this
    threshold. So, the higher this value, the faster an attack will be
    detected.

So, knowing all this information I will explain now the different patches:

The 1/9 patch adds a new config for the fbfam feature.

The 2/9 and 3/9 patches add and use the api to manage the statistical data
necessary to compute the crashing rate of an application.

The 4/9 patch adds a new sysctl to fine tuning the detection's sensibility.

The 5/9 patch detects a fork brute force attack calculating the crashing
rate.

The 6/9 patch mitigates the attack killing all the offending tasks.

The 7/9 patch adds two new prctls to allow per task enabling/disabling.

The 8/9 patch adds general documentation.

The 9/9 patch adds an entry to the maintainers list.

This patch series is a task of the KSPP [1] and it is worth to mention
that there is a previous attempt without any continuation [2].

[1] https://github.com/KSPP/linux/issues/39
[2] https://lore.kernel.org/linux-fsdevel/1419457167-15042-1-git-send-email-richard@nod.at/

Any constructive comments are welcome.

Note: During the compilation these warnings were shown:

kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x18: unreachable instruction
arch/x86/kernel/cpu/mce/core.o: warning: objtool: mce_panic()+0x123: unreachable instruction
arch/x86/kernel/smpboot.o: warning: objtool: native_play_dead()+0x122: unreachable instruction
net/core/skbuff.o: warning: objtool: skb_push.cold()+0x14: unreachable instruction

John Wood (6):
  security/fbfam: Add a Kconfig to enable the fbfam feature
  security/fbfam: Add the api to manage statistics
  security/fbfam: Use the api to manage statistics
  security/fbfam: Add a new sysctl to control the crashing rate
    threshold
  security/fbfam: Detect a fork brute force attack
  security/fbfam: Mitigate a fork brute force attack

 fs/coredump.c           |   2 +
 fs/exec.c               |   2 +
 include/fbfam/fbfam.h   |  24 ++++
 include/linux/sched.h   |   4 +
 kernel/exit.c           |   2 +
 kernel/fork.c           |   4 +
 kernel/sysctl.c         |   9 ++
 security/Kconfig        |   1 +
 security/Makefile       |   4 +
 security/fbfam/Kconfig  |  10 ++
 security/fbfam/Makefile |   3 +
 security/fbfam/fbfam.c  | 279 ++++++++++++++++++++++++++++++++++++++++
 security/fbfam/sysctl.c |  20 +++
 13 files changed, 364 insertions(+)
 create mode 100644 include/fbfam/fbfam.h
 create mode 100644 security/fbfam/Kconfig
 create mode 100644 security/fbfam/Makefile
 create mode 100644 security/fbfam/fbfam.c
 create mode 100644 security/fbfam/sysctl.c

-- 
2.25.1

^ permalink raw reply

* [RFC PATCH 2/6] security/fbfam: Add the api to manage statistics
From: Kees Cook @ 2020-09-10 20:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, linux-kernel, linux-fsdevel,
	linux-security-module
In-Reply-To: <20200910202107.3799376-1-keescook@chromium.org>

From: John Wood <john.wood@gmx.com>

Create a statistical data structure to hold all the necessary
information involve in a fork brute force attack. This info is a
timestamp for the first fork or execve and the number of crashes since
then. Moreover, due to this statitistical data will be shared between
different tasks, a reference counter it is necessary.

For a correct management of an attack it is also necessary that all the
tasks hold statistical data. The same statistical data needs to be
shared between all the tasks that hold the same memory contents or in
other words, between all the tasks that have been forked without any
execve call.

When a forked task calls the execve system call, the memory contents are
set with new values. So, in this scenario the parent's statistical data
no need to be share. Instead, a new statistical data structure must be
allocated to start a new cycle.

The statistical data that every task holds needs to be clear when a task
exits. Due to this data is shared across multiples tasks, the reference
counter is useful to free the previous allocated data only when there
are not other pointers to the same data. Or in other words, when the
reference counter reaches zero.

So, based in all the previous information add the api to manage all the
commented cases.

Also, add to the struct task_struct a new field to point to the
statitistical data related to an attack. This way, all the tasks will
have access to this information.

Signed-off-by: John Wood <john.wood@gmx.com>
---
 include/fbfam/fbfam.h   |  18 +++++
 include/linux/sched.h   |   4 +
 security/Makefile       |   4 +
 security/fbfam/Makefile |   2 +
 security/fbfam/fbfam.c  | 163 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 191 insertions(+)
 create mode 100644 include/fbfam/fbfam.h
 create mode 100644 security/fbfam/Makefile
 create mode 100644 security/fbfam/fbfam.c

diff --git a/include/fbfam/fbfam.h b/include/fbfam/fbfam.h
new file mode 100644
index 000000000000..b5b7d1127a52
--- /dev/null
+++ b/include/fbfam/fbfam.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _FBFAM_H_
+#define _FBFAM_H_
+
+#include <linux/sched.h>
+
+#ifdef CONFIG_FBFAM
+int fbfam_fork(struct task_struct *child);
+int fbfam_execve(void);
+int fbfam_exit(void);
+#else
+static inline int fbfam_fork(struct task_struct *child) { return 0; }
+static inline int fbfam_execve(void) { return 0; }
+static inline int fbfam_exit(void) { return 0; }
+#endif
+
+#endif /* _FBFAM_H_ */
+
diff --git a/include/linux/sched.h b/include/linux/sched.h
index afe01e232935..00e1aa5e00cd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1315,6 +1315,10 @@ struct task_struct {
 	struct callback_head		mce_kill_me;
 #endif
 
+#ifdef CONFIG_FBFAM
+	struct fbfam_stats		*fbfam_stats;
+#endif
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.
diff --git a/security/Makefile b/security/Makefile
index 3baf435de541..36dc4b536349 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_LSM)			+= bpf/
 # Object integrity file lists
 subdir-$(CONFIG_INTEGRITY)		+= integrity
 obj-$(CONFIG_INTEGRITY)			+= integrity/
+
+# Object fbfam file lists
+subdir-$(CONFIG_FBFAM)			+= fbfam
+obj-$(CONFIG_FBFAM)			+= fbfam/
diff --git a/security/fbfam/Makefile b/security/fbfam/Makefile
new file mode 100644
index 000000000000..f4b9f0b19c44
--- /dev/null
+++ b/security/fbfam/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_FBFAM) += fbfam.o
diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
new file mode 100644
index 000000000000..0387f95f6408
--- /dev/null
+++ b/security/fbfam/fbfam.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <asm/current.h>
+#include <fbfam/fbfam.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/jiffies.h>
+#include <linux/refcount.h>
+#include <linux/slab.h>
+
+/**
+ * struct fbfam_stats - Fork brute force attack mitigation statistics.
+ * @refc: Reference counter.
+ * @faults: Number of crashes since jiffies.
+ * @jiffies: First fork or execve timestamp.
+ *
+ * The purpose of this structure is to manage all the necessary information to
+ * compute the crashing rate of an application. So, it holds a first fork or
+ * execve timestamp and a number of crashes since then. This way the crashing
+ * rate in milliseconds per fault can be compute when necessary with the
+ * following formula:
+ *
+ * u64 delta_jiffies = get_jiffies_64() - fbfam_stats::jiffies;
+ * u64 delta_time = jiffies64_to_msecs(delta_jiffies);
+ * u64 crashing_rate = delta_time / (u64)fbfam_stats::faults;
+ *
+ * If the fbfam_stats::faults is zero, the above formula can't be used. In this
+ * case, the crashing rate is zero.
+ *
+ * Moreover, since the same allocated structure will be used in every fork
+ * since the first one or execve, it's also necessary a reference counter.
+ */
+struct fbfam_stats {
+	refcount_t refc;
+	unsigned int faults;
+	u64 jiffies;
+};
+
+/**
+ * fbfam_new_stats() - Allocation of new statistics structure.
+ *
+ * If the allocation is successful the reference counter is set to one to
+ * indicate that there will be one task that points to this structure. The
+ * faults field is initialize to zero and the timestamp for this moment is set.
+ *
+ * Return: NULL if the allocation fails. A pointer to the new allocated
+ *         statistics structure if it success.
+ */
+static struct fbfam_stats *fbfam_new_stats(void)
+{
+	struct fbfam_stats *stats = kmalloc(sizeof(struct fbfam_stats),
+					    GFP_KERNEL);
+
+	if (stats) {
+		refcount_set(&stats->refc, 1);
+		stats->faults = 0;
+		stats->jiffies = get_jiffies_64();
+	}
+
+	return stats;
+}
+
+/*
+ * fbfam_fork() - Fork management.
+ * @child: Pointer to the child task that will be created with the fork system
+ *         call.
+ *
+ * For a correct management of a fork brute force attack it is necessary that
+ * all the tasks hold statistical data. The same statistical data needs to be
+ * shared between all the tasks that hold the same memory contents or in other
+ * words, between all the tasks that have been forked without any execve call.
+ *
+ * To ensure this, if the current task doesn't have statistical data when forks
+ * (only possible in the first fork of the zero task), it is mandatory to
+ * allocate a new one. This way, the child task always will share the statistics
+ * with its parent.
+ *
+ * Return: -ENOMEN if the allocation of the new statistics structure fails.
+ *         Zero otherwise.
+ */
+int fbfam_fork(struct task_struct *child)
+{
+	struct fbfam_stats **stats = &current->fbfam_stats;
+
+	if (!*stats) {
+		*stats = fbfam_new_stats();
+		if (!*stats)
+			return -ENOMEM;
+	}
+
+	refcount_inc(&(*stats)->refc);
+	child->fbfam_stats = *stats;
+	return 0;
+}
+
+/**
+ * fbfam_execve() - Execve management.
+ *
+ * When a forked task calls the execve system call, the memory contents are set
+ * with new values. So, in this scenario the parent's statistical data no need
+ * to be share. Instead, a new statistical data structure must be allocated to
+ * start a new cycle. This condition is detected when the statistics reference
+ * counter holds a value greater than or equal to two (a fork always sets the
+ * statistics reference counter to two since the parent and the child task are
+ * sharing the same data).
+ *
+ * However, if the execve function is called immediately after another execve
+ * call, althought the memory contents are reset, there is no need to allocate
+ * a new statistical data structure. This is possible because at this moment
+ * only one task (the task that calls the execve function) points to the data.
+ * In this case, the previous allocation is used and only the faults and time
+ * fields are reset.
+ *
+ * Return: -ENOMEN if the allocation of the new statistics structure fails.
+ *         -EFAULT if the current task doesn't have statistical data. Zero
+ *         otherwise.
+ */
+int fbfam_execve(void)
+{
+	struct fbfam_stats **stats = &current->fbfam_stats;
+
+	if (!*stats)
+		return -EFAULT;
+
+	if (!refcount_dec_not_one(&(*stats)->refc)) {
+		/* execve call after an execve call */
+		(*stats)->faults = 0;
+		(*stats)->jiffies = get_jiffies_64();
+		return 0;
+	}
+
+	/* execve call after a fork call */
+	*stats = fbfam_new_stats();
+	if (!*stats)
+		return -ENOMEM;
+
+	return 0;
+}
+
+/**
+ * fbfam_exit() - Exit management.
+ *
+ * The statistical data that every task holds needs to be clear when a task
+ * exits. Due to this data is shared across multiples tasks, the reference
+ * counter is useful to free the previous allocated data only when there are
+ * not other pointers to the same data. Or in other words, when the reference
+ * counter reaches zero.
+ *
+ * Return: -EFAULT if the current task doesn't have statistical data. Zero
+ *         otherwise.
+ */
+int fbfam_exit(void)
+{
+	struct fbfam_stats *stats = current->fbfam_stats;
+
+	if (!stats)
+		return -EFAULT;
+
+	if (refcount_dec_and_test(&stats->refc))
+		kfree(stats);
+
+	return 0;
+}
+
-- 
2.25.1


^ permalink raw reply related

* [RFC PATCH 4/6] security/fbfam: Add a new sysctl to control the crashing rate threshold
From: Kees Cook @ 2020-09-10 20:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, linux-kernel, linux-fsdevel,
	linux-security-module
In-Reply-To: <20200910202107.3799376-1-keescook@chromium.org>

From: John Wood <john.wood@gmx.com>

This is a previous step to add the detection feature.

A fork brute force attack will be detected when an application crashes
quickly. Since, a rate can be defined as a time per fault, add a new
sysctl to control the crashing rate threshold.

This way, each system can tune the detection's sensibility adjusting the
milliseconds per fault. So, if the application's crashing rate falls
under this threshold an attack will be detected. So, the higher this
value, the faster an attack will be detected.

Signed-off-by: John Wood <john.wood@gmx.com>
---
 include/fbfam/fbfam.h   |  4 ++++
 kernel/sysctl.c         |  9 +++++++++
 security/fbfam/Makefile |  1 +
 security/fbfam/fbfam.c  | 11 +++++++++++
 security/fbfam/sysctl.c | 20 ++++++++++++++++++++
 5 files changed, 45 insertions(+)
 create mode 100644 security/fbfam/sysctl.c

diff --git a/include/fbfam/fbfam.h b/include/fbfam/fbfam.h
index b5b7d1127a52..2cfe51d2b0d5 100644
--- a/include/fbfam/fbfam.h
+++ b/include/fbfam/fbfam.h
@@ -3,8 +3,12 @@
 #define _FBFAM_H_
 
 #include <linux/sched.h>
+#include <linux/sysctl.h>
 
 #ifdef CONFIG_FBFAM
+#ifdef CONFIG_SYSCTL
+extern struct ctl_table fbfam_sysctls[];
+#endif
 int fbfam_fork(struct task_struct *child);
 int fbfam_execve(void);
 int fbfam_exit(void);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 09e70ee2332e..c3b4d737bef3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -77,6 +77,8 @@
 #include <linux/uaccess.h>
 #include <asm/processor.h>
 
+#include <fbfam/fbfam.h>
+
 #ifdef CONFIG_X86
 #include <asm/nmi.h>
 #include <asm/stacktrace.h>
@@ -2660,6 +2662,13 @@ static struct ctl_table kern_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
+#endif
+#ifdef CONFIG_FBFAM
+	{
+		.procname	= "fbfam",
+		.mode		= 0555,
+		.child		= fbfam_sysctls,
+	},
 #endif
 	{ }
 };
diff --git a/security/fbfam/Makefile b/security/fbfam/Makefile
index f4b9f0b19c44..b8d5751ecea4 100644
--- a/security/fbfam/Makefile
+++ b/security/fbfam/Makefile
@@ -1,2 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_FBFAM) += fbfam.o
+obj-$(CONFIG_SYSCTL) += sysctl.o
diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
index 0387f95f6408..9be4639b72eb 100644
--- a/security/fbfam/fbfam.c
+++ b/security/fbfam/fbfam.c
@@ -7,6 +7,17 @@
 #include <linux/refcount.h>
 #include <linux/slab.h>
 
+/**
+ * sysctl_crashing_rate_threshold - Crashing rate threshold.
+ *
+ * The rate's units are in milliseconds per fault.
+ *
+ * A fork brute force attack will be detected if the application's crashing rate
+ * falls under this threshold. So, the higher this value, the faster an attack
+ * will be detected.
+ */
+unsigned long sysctl_crashing_rate_threshold = 30000;
+
 /**
  * struct fbfam_stats - Fork brute force attack mitigation statistics.
  * @refc: Reference counter.
diff --git a/security/fbfam/sysctl.c b/security/fbfam/sysctl.c
new file mode 100644
index 000000000000..430323ad8e9f
--- /dev/null
+++ b/security/fbfam/sysctl.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/sysctl.h>
+
+extern unsigned long sysctl_crashing_rate_threshold;
+static unsigned long ulong_one = 1;
+static unsigned long ulong_max = ULONG_MAX;
+
+struct ctl_table fbfam_sysctls[] = {
+	{
+		.procname	= "crashing_rate_threshold",
+		.data		= &sysctl_crashing_rate_threshold,
+		.maxlen		= sizeof(sysctl_crashing_rate_threshold),
+		.mode		= 0644,
+		.proc_handler	= proc_doulongvec_minmax,
+		.extra1		= &ulong_one,
+		.extra2		= &ulong_max,
+	},
+	{ }
+};
+
-- 
2.25.1


^ permalink raw reply related

* Re: [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)
From: Matthew Wilcox @ 2020-09-10 20:05 UTC (permalink / raw)
  To: Al Viro
  Cc: Mickaël Salaün, Mimi Zohar, linux-kernel, Aleksa Sarai,
	Alexei Starovoitov, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Brauner, Christian Heimes,
	Daniel Borkmann, Deven Bowers, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Michael Kerrisk, Miklos Szeredi,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <20200910200010.GF1236603@ZenIV.linux.org.uk>

On Thu, Sep 10, 2020 at 09:00:10PM +0100, Al Viro wrote:
> On Thu, Sep 10, 2020 at 07:40:33PM +0100, Matthew Wilcox wrote:
> > On Thu, Sep 10, 2020 at 08:38:21PM +0200, Mickaël Salaün wrote:
> > > There is also the use case of noexec mounts and file permissions. From
> > > user space point of view, it doesn't matter which kernel component is in
> > > charge of defining the policy. The syscall should then not be tied with
> > > a verification/integrity/signature/appraisal vocabulary, but simply an
> > > access control one.
> > 
> > permission()?
> 
> int lsm(int fd, const char *how, char *error, int size);
> 
> Seriously, this is "ask LSM to apply special policy to file"; let's
> _not_ mess with flags, etc. for that; give it decent bandwidth
> and since it's completely opaque for the rest of the kernel,
> just a pass a string to be parsed by LSM as it sees fit.

Hang on, it does have some things which aren't BD^W^WLSM.  It lets
the interpreter honour the mount -o noexec option.  I presume it's
not easily defeated by
	cat /home/salaun/bin/bad.pl | perl -


^ permalink raw reply

* Re: [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)
From: Al Viro @ 2020-09-10 20:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mickaël Salaün, Mimi Zohar, linux-kernel, Aleksa Sarai,
	Alexei Starovoitov, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Brauner, Christian Heimes,
	Daniel Borkmann, Deven Bowers, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Michael Kerrisk, Miklos Szeredi,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <20200910184033.GX6583@casper.infradead.org>

On Thu, Sep 10, 2020 at 07:40:33PM +0100, Matthew Wilcox wrote:
> On Thu, Sep 10, 2020 at 08:38:21PM +0200, Mickaël Salaün wrote:
> > There is also the use case of noexec mounts and file permissions. From
> > user space point of view, it doesn't matter which kernel component is in
> > charge of defining the policy. The syscall should then not be tied with
> > a verification/integrity/signature/appraisal vocabulary, but simply an
> > access control one.
> 
> permission()?

int lsm(int fd, const char *how, char *error, int size);

Seriously, this is "ask LSM to apply special policy to file"; let's
_not_ mess with flags, etc. for that; give it decent bandwidth
and since it's completely opaque for the rest of the kernel,
just a pass a string to be parsed by LSM as it sees fit.

^ permalink raw reply

* [[PATCH V4]] audit: trigger accompanying records when no rules present
From: Richard Guy Briggs @ 2020-09-10 15:01 UTC (permalink / raw)
  To: Linux-Audit Mailing List, LKML, Linux Security Module list
  Cc: Paul Moore, Eric Paris, Richard Guy Briggs

When there are no audit rules registered, mandatory records (config,
etc.) are missing their accompanying records (syscall, proctitle, etc.).

This is due to audit context dummy set on syscall entry based on absence
of rules that signals that no other records are to be printed.

Clear the dummy bit if any record is generated.

The proctitle context and dummy checks are pointless since the
proctitle record will not be printed if no syscall records are printed.

The fds array is reset to -1 after the first syscall to indicate it
isn't valid any more, but was never set to -1 when the context was
allocated to indicate it wasn't yet valid.

The audit_inode* functions can be called without going through
getname_flags() or getname_kernel() that sets audit_names and cwd, so
set the cwd if it has not already been done so due to audit_names being
valid.

The LSM dump_common_audit_data() LSM_AUDIT_DATA_NET:AF_UNIX case was
missed with the ghak96 patch, so add that case here.

Thanks to bauen1 <j2468h@googlemail.com> for reporting LSM situations in
which context->cwd is not valid, inadvertantly fixed by the ghak96 patch.

Please see upstream github issue
https://github.com/linux-audit/audit-kernel/issues/120
This is also related to upstream github issue
https://github.com/linux-audit/audit-kernel/issues/96

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
Passes audit-testsuite.

Chagelog:
v4:
- rebase on audit/next v5.9-rc1
- squash v2+v3fix
- add pwd NULL check in audit_log_name()
- resubmit after revert

v3:
- initialize fds[0] to -1
- init cwd for ghak96 LSM_AUDIT_DATA_NET:AF_UNIX case
- init cwd for audit_inode{,_child}

v2:
- unconditionally clear dummy
- create audit_clear_dummy accessor function
- remove proctitle context and dummy checks

 kernel/audit.c       |  1 +
 kernel/audit.h       |  8 ++++++++
 kernel/auditsc.c     | 11 +++++++----
 security/lsm_audit.c |  1 +
 4 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 68cee3bc8cfe..8604eccb348f 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1865,6 +1865,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
 	}
 
 	audit_get_stamp(ab->ctx, &t, &serial);
+	audit_clear_dummy(ab->ctx);
 	audit_log_format(ab, "audit(%llu.%03lu:%u): ",
 			 (unsigned long long)t.tv_sec, t.tv_nsec/1000000, serial);
 
diff --git a/kernel/audit.h b/kernel/audit.h
index 3b9c0945225a..abcfef58435b 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -290,6 +290,13 @@ extern int audit_signal_info_syscall(struct task_struct *t);
 extern void audit_filter_inodes(struct task_struct *tsk,
 				struct audit_context *ctx);
 extern struct list_head *audit_killed_trees(void);
+
+static inline void audit_clear_dummy(struct audit_context *ctx)
+{
+	if (ctx)
+		ctx->dummy = 0;
+}
+
 #else /* CONFIG_AUDITSYSCALL */
 #define auditsc_get_stamp(c, t, s) 0
 #define audit_put_watch(w) {}
@@ -323,6 +330,7 @@ static inline int audit_signal_info_syscall(struct task_struct *t)
 }
 
 #define audit_filter_inodes(t, c) AUDIT_DISABLED
+#define audit_clear_dummy(c) {}
 #endif /* CONFIG_AUDITSYSCALL */
 
 extern char *audit_unpack_string(void **bufp, size_t *remain, size_t len);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 8dba8f0983b5..9d2de93f40b3 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -929,6 +929,7 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
 	context->prio = state == AUDIT_RECORD_CONTEXT ? ~0ULL : 0;
 	INIT_LIST_HEAD(&context->killed_trees);
 	INIT_LIST_HEAD(&context->names_list);
+	context->fds[0] = -1;
 	return context;
 }
 
@@ -1367,7 +1368,10 @@ static void audit_log_name(struct audit_context *context, struct audit_names *n,
 			/* name was specified as a relative path and the
 			 * directory component is the cwd
 			 */
-			audit_log_d_path(ab, " name=", &context->pwd);
+			if (&context->pwd)
+				audit_log_d_path(ab, " name=", &context->pwd);
+			else
+				audit_log_format(ab, " name=(null)");
 			break;
 		default:
 			/* log the name's directory component */
@@ -1435,9 +1439,6 @@ static void audit_log_proctitle(void)
 	struct audit_context *context = audit_context();
 	struct audit_buffer *ab;
 
-	if (!context || context->dummy)
-		return;
-
 	ab = audit_log_start(context, GFP_KERNEL, AUDIT_PROCTITLE);
 	if (!ab)
 		return;	/* audit_panic or being filtered */
@@ -2079,6 +2080,7 @@ void __audit_inode(struct filename *name, const struct dentry *dentry,
 	}
 	handle_path(dentry);
 	audit_copy_inode(n, dentry, inode, flags & AUDIT_INODE_NOEVAL);
+	_audit_getcwd(context);
 }
 
 void __audit_file(const struct file *file)
@@ -2197,6 +2199,7 @@ void __audit_inode_child(struct inode *parent,
 		audit_copy_inode(found_child, dentry, inode, 0);
 	else
 		found_child->ino = AUDIT_INO_UNSET;
+	_audit_getcwd(context);
 }
 EXPORT_SYMBOL_GPL(__audit_inode_child);
 
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 53d0d183db8f..e93077612246 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -369,6 +369,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 					audit_log_untrustedstring(ab, p);
 				else
 					audit_log_n_hex(ab, p, len);
+				audit_getcwd();
 				break;
 			}
 		}
-- 
2.18.4


^ permalink raw reply related

* Re: [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)
From: Matthew Wilcox @ 2020-09-10 18:40 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Mimi Zohar, linux-kernel, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Brauner, Christian Heimes,
	Daniel Borkmann, Deven Bowers, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Michael Kerrisk, Miklos Szeredi,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <880bb4ee-89a2-b9b0-747b-0f779ceda995@digikod.net>

On Thu, Sep 10, 2020 at 08:38:21PM +0200, Mickaël Salaün wrote:
> There is also the use case of noexec mounts and file permissions. From
> user space point of view, it doesn't matter which kernel component is in
> charge of defining the policy. The syscall should then not be tied with
> a verification/integrity/signature/appraisal vocabulary, but simply an
> access control one.

permission()?

^ permalink raw reply

* Re: [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)
From: Mickaël Salaün @ 2020-09-10 18:38 UTC (permalink / raw)
  To: Mimi Zohar, Matthew Wilcox
  Cc: linux-kernel, Aleksa Sarai, Alexei Starovoitov, Al Viro,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Brauner, Christian Heimes, Daniel Borkmann,
	Deven Bowers, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Michael Kerrisk, Miklos Szeredi,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <a48145770780d36e90f28f1526805a7292eb74f6.camel@linux.ibm.com>


On 10/09/2020 20:08, Mimi Zohar wrote:
> On Thu, 2020-09-10 at 19:21 +0200, Mickaël Salaün wrote:
>> On 10/09/2020 19:04, Matthew Wilcox wrote:
>>> On Thu, Sep 10, 2020 at 06:46:09PM +0200, Mickaël Salaün wrote:
>>>> This ninth patch series rework the previous AT_INTERPRETED and O_MAYEXEC
>>>> series with a new syscall: introspect_access(2) .  Access check are now
>>>> only possible on a file descriptor, which enable to avoid possible race
>>>> conditions in user space.
>>>
>>> But introspection is about examining _yourself_.  This isn't about
>>> doing that.  It's about doing ... something ... to a script that you're
>>> going to execute.  If the script were going to call the syscall, then
>>> it might be introspection.  Or if the interpreter were measuring itself,
>>> that would be introspection.  But neither of those would be useful things
>>> to do, because an attacker could simply avoid doing them.
>>
> 
> Michael, is the confusion here that IMA isn't measuring anything, but
> verifying the integrity of the file?   The usecase, from an IMA
> perspective, is enforcing a system wide policy requiring everything
> executed to be signed.  In this particular use case, the interpreter is
> asking the kernel if the script is signed with a permitted key.  The
> signature may be an IMA signature or an EVM portable and immutable
> signature, based on policy.

There is also the use case of noexec mounts and file permissions. From
user space point of view, it doesn't matter which kernel component is in
charge of defining the policy. The syscall should then not be tied with
a verification/integrity/signature/appraisal vocabulary, but simply an
access control one.

> 
>> Picking a good name other than "access" (or faccessat2) is not easy. The
>> idea with introspect_access() is for the calling task to ask the kernel
>> if this task should allows to do give access to a kernel resource which
>> is already available to this task. In this sense, we think that
>> introspection makes sense because it is the choice of the task to allow
>> or deny an access.
>>
>>>
>>> So, bad name.  What might be better?  sys_security_check()?
>>> sys_measure()?  sys_verify_fd()?  I don't know.
>>>
>>
>> "security_check" looks quite broad, "measure" doesn't make sense here,
>> "verify_fd" doesn't reflect that it is an access check. Yes, not easy,
>> but if this is the only concern we are on the good track. :)
> 
> Maybe replacing the term "measure" with "integrity", but rather than
> "integrity_check", something along the lines of fgetintegrity,
> freadintegrity, fcheckintegrity.

What about entrusted_access(2)? It reflects the fact that the kernel
delegate to (trusted) user space tasks some access enforcements.

> 
> Mimi
> 
>>
>>
>> Other ideas:
>> - interpret_access (mainly, but not only, for interpreters)
>> - indirect_access
>> - may_access
>> - faccessat3
> 
> 

^ permalink raw reply

* Re: [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)
From: Mimi Zohar @ 2020-09-10 18:08 UTC (permalink / raw)
  To: Mickaël Salaün, Matthew Wilcox
  Cc: linux-kernel, Aleksa Sarai, Alexei Starovoitov, Al Viro,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Brauner, Christian Heimes, Daniel Borkmann,
	Deven Bowers, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Michael Kerrisk, Miklos Szeredi,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <f6e2358c-8e5e-e688-3e66-2cdd943e360e@digikod.net>

On Thu, 2020-09-10 at 19:21 +0200, Mickaël Salaün wrote:
> On 10/09/2020 19:04, Matthew Wilcox wrote:
> > On Thu, Sep 10, 2020 at 06:46:09PM +0200, Mickaël Salaün wrote:
> >> This ninth patch series rework the previous AT_INTERPRETED and O_MAYEXEC
> >> series with a new syscall: introspect_access(2) .  Access check are now
> >> only possible on a file descriptor, which enable to avoid possible race
> >> conditions in user space.
> > 
> > But introspection is about examining _yourself_.  This isn't about
> > doing that.  It's about doing ... something ... to a script that you're
> > going to execute.  If the script were going to call the syscall, then
> > it might be introspection.  Or if the interpreter were measuring itself,
> > that would be introspection.  But neither of those would be useful things
> > to do, because an attacker could simply avoid doing them.
> 

Michael, is the confusion here that IMA isn't measuring anything, but
verifying the integrity of the file?   The usecase, from an IMA
perspective, is enforcing a system wide policy requiring everything
executed to be signed.  In this particular use case, the interpreter is
asking the kernel if the script is signed with a permitted key.  The
signature may be an IMA signature or an EVM portable and immutable
signature, based on policy.

> Picking a good name other than "access" (or faccessat2) is not easy. The
> idea with introspect_access() is for the calling task to ask the kernel
> if this task should allows to do give access to a kernel resource which
> is already available to this task. In this sense, we think that
> introspection makes sense because it is the choice of the task to allow
> or deny an access.
> 
> > 
> > So, bad name.  What might be better?  sys_security_check()?
> > sys_measure()?  sys_verify_fd()?  I don't know.
> > 
> 
> "security_check" looks quite broad, "measure" doesn't make sense here,
> "verify_fd" doesn't reflect that it is an access check. Yes, not easy,
> but if this is the only concern we are on the good track. :)

Maybe replacing the term "measure" with "integrity", but rather than
"integrity_check", something along the lines of fgetintegrity,
freadintegrity, fcheckintegrity.

Mimi

> 
> 
> Other ideas:
> - interpret_access (mainly, but not only, for interpreters)
> - indirect_access
> - may_access
> - faccessat3



^ permalink raw reply

* Re: [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)
From: Mickaël Salaün @ 2020-09-10 17:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, Aleksa Sarai, Alexei Starovoitov, Al Viro,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Brauner, Christian Heimes, Daniel Borkmann,
	Deven Bowers, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Michael Kerrisk, Miklos Szeredi, Mimi Zohar,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <f6e2358c-8e5e-e688-3e66-2cdd943e360e@digikod.net>


On 10/09/2020 19:21, Mickaël Salaün wrote:
> 
> On 10/09/2020 19:04, Matthew Wilcox wrote:
>> On Thu, Sep 10, 2020 at 06:46:09PM +0200, Mickaël Salaün wrote:
>>> This ninth patch series rework the previous AT_INTERPRETED and O_MAYEXEC
>>> series with a new syscall: introspect_access(2) .  Access check are now
>>> only possible on a file descriptor, which enable to avoid possible race
>>> conditions in user space.
>>
>> But introspection is about examining _yourself_.  This isn't about
>> doing that.  It's about doing ... something ... to a script that you're
>> going to execute.  If the script were going to call the syscall, then
>> it might be introspection.  Or if the interpreter were measuring itself,
>> that would be introspection.  But neither of those would be useful things
>> to do, because an attacker could simply avoid doing them.
> 
> Picking a good name other than "access" (or faccessat2) is not easy. The
> idea with introspect_access() is for the calling task to ask the kernel
> if this task should allows to do give access to a kernel resource which
> is already available to this task. In this sense, we think that
> introspection makes sense because it is the choice of the task to allow
> or deny an access.
> 
>>
>> So, bad name.  What might be better?  sys_security_check()?
>> sys_measure()?  sys_verify_fd()?  I don't know.
>>
> 
> "security_check" looks quite broad, "measure" doesn't make sense here,
> "verify_fd" doesn't reflect that it is an access check. Yes, not easy,
> but if this is the only concern we are on the good track. :)
> 
> 
> Other ideas:
> - interpret_access (mainly, but not only, for interpreters)
> - indirect_access
> - may_access
> - faccessat3
> 

I think that entrusted_access(2) looks good. What do you think?

^ permalink raw reply

* Re: [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)
From: Mickaël Salaün @ 2020-09-10 17:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, Aleksa Sarai, Alexei Starovoitov, Al Viro,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Brauner, Christian Heimes, Daniel Borkmann,
	Deven Bowers, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Michael Kerrisk, Miklos Szeredi, Mimi Zohar,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <20200910170424.GU6583@casper.infradead.org>


On 10/09/2020 19:04, Matthew Wilcox wrote:
> On Thu, Sep 10, 2020 at 06:46:09PM +0200, Mickaël Salaün wrote:
>> This ninth patch series rework the previous AT_INTERPRETED and O_MAYEXEC
>> series with a new syscall: introspect_access(2) .  Access check are now
>> only possible on a file descriptor, which enable to avoid possible race
>> conditions in user space.
> 
> But introspection is about examining _yourself_.  This isn't about
> doing that.  It's about doing ... something ... to a script that you're
> going to execute.  If the script were going to call the syscall, then
> it might be introspection.  Or if the interpreter were measuring itself,
> that would be introspection.  But neither of those would be useful things
> to do, because an attacker could simply avoid doing them.

Picking a good name other than "access" (or faccessat2) is not easy. The
idea with introspect_access() is for the calling task to ask the kernel
if this task should allows to do give access to a kernel resource which
is already available to this task. In this sense, we think that
introspection makes sense because it is the choice of the task to allow
or deny an access.

> 
> So, bad name.  What might be better?  sys_security_check()?
> sys_measure()?  sys_verify_fd()?  I don't know.
> 

"security_check" looks quite broad, "measure" doesn't make sense here,
"verify_fd" doesn't reflect that it is an access check. Yes, not easy,
but if this is the only concern we are on the good track. :)


Other ideas:
- interpret_access (mainly, but not only, for interpreters)
- indirect_access
- may_access
- faccessat3

^ permalink raw reply

* Re: [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)
From: Matthew Wilcox @ 2020-09-10 17:04 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Aleksa Sarai, Alexei Starovoitov, Al Viro,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Brauner, Christian Heimes, Daniel Borkmann,
	Deven Bowers, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Michael Kerrisk, Miklos Szeredi, Mimi Zohar,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <20200910164612.114215-1-mic@digikod.net>

On Thu, Sep 10, 2020 at 06:46:09PM +0200, Mickaël Salaün wrote:
> This ninth patch series rework the previous AT_INTERPRETED and O_MAYEXEC
> series with a new syscall: introspect_access(2) .  Access check are now
> only possible on a file descriptor, which enable to avoid possible race
> conditions in user space.

But introspection is about examining _yourself_.  This isn't about
doing that.  It's about doing ... something ... to a script that you're
going to execute.  If the script were going to call the syscall, then
it might be introspection.  Or if the interpreter were measuring itself,
that would be introspection.  But neither of those would be useful things
to do, because an attacker could simply avoid doing them.

So, bad name.  What might be better?  sys_security_check()?
sys_measure()?  sys_verify_fd()?  I don't know.

^ permalink raw reply

* [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)
From: Mickaël Salaün @ 2020-09-10 16:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Christian Brauner, Christian Heimes,
	Daniel Borkmann, Deven Bowers, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk, Miklos Szeredi,
	Mimi Zohar, Philippe Trébuchet, Scott Shell,
	Sean Christopherson, Shuah Khan, Steve Dower, Steve Grubb,
	Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel

Hi,

This ninth patch series rework the previous AT_INTERPRETED and O_MAYEXEC
series with a new syscall: introspect_access(2) .  Access check are now
only possible on a file descriptor, which enable to avoid possible race
conditions in user space.

For now, the only LSM hook triggered by introspect_access(2) is
inode_permission() which takes a struct inode as argument.  However,
struct path is still available in this syscall, which enables to add a
new hook to fit the needs of IMA and other path-based LSMs.

Goal of introspect_access(2)
============================

The goal of this patch series is to enable to control script execution
with interpreters help.  A new introspect_access() system call is added
to enable user space script interpreters to delegate to the kernel (and
thus the system security policy) the permission to interpret/execute
scripts or other files containing what can be seen as commands.

A simple system-wide security policy can be enforced by the system
administrator through a sysctl configuration consistent with the mount
points or the file access rights.  The documentation patch explains the
prerequisites.

Furthermore, the security policy can also be delegated to an LSM, either
a MAC system or an integrity system.  For instance, the new kernel
MAY_INTROSPECTION_EXEC flag is required to close a major IMA
measurement/appraisal interpreter integrity gap by bringing the ability
to check the use of scripts [1].  Other uses are expected, such as for
magic-links [2], SGX integration [3], bpffs [4] or IPE [5].

Possible extended usage
=======================

For now, only the X_OK mode is compatible with introspect_access(2).
This enables to restrict the addition of new control flows in a process.
Using R_OK or W_OK with introspect_access(2) returns -EINVAL.

Possible future use-cases for R_OK with introspect_access(2) may be to
check configuration files that may impact the behavior of applications
(i.e.  influence critical part of the current control flow).  Those
should then be trusted as well.  The W_OK with introspect_access(2)
could be used to check that a file descriptor is allowed to receive
sensitive data such as debug logs.

Prerequisite of its use
=======================

User space needs to adapt to take advantage of this new feature.  For
example, the PEP 578 [6] (Runtime Audit Hooks) enables Python 3.8 to be
extended with policy enforcement points related to code interpretation,
which can be used to align with the PowerShell audit features.
Additional Python security improvements (e.g. a limited interpreter
without -c, stdin piping of code) are on their way [7].

Examples
========

The initial idea comes from CLIP OS 4 and the original implementation
has been used for more than 12 years:
https://github.com/clipos-archive/clipos4_doc
Chrome OS has a similar approach:
https://chromium.googlesource.com/chromiumos/docs/+/master/security/noexec_shell_scripts.md

Userland patches can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
Actually, there is more than the O_MAYEXEC changes (which matches this search)
e.g., to prevent Python interactive execution. There are patches for
Bash, Wine, Java (Icedtea), Busybox's ash, Perl and Python. There are
also some related patches which do not directly rely on O_MAYEXEC but
which restrict the use of browser plugins and extensions, which may be
seen as scripts too:
https://github.com/clipos-archive/clipos4_portage-overlay/tree/master/www-client

An introduction to O_MAYEXEC was given at the Linux Security Summit
Europe 2018 - Linux Kernel Security Contributions by ANSSI:
https://www.youtube.com/watch?v=chNjCRtPKQY&t=17m15s
The "write xor execute" principle was explained at Kernel Recipes 2018 -
CLIP OS: a defense-in-depth OS:
https://www.youtube.com/watch?v=PjRE0uBtkHU&t=11m14s
See also an overview article: https://lwn.net/Articles/820000/

This patch series can be applied on top of v5.9-rc4 .  This can be tested
with CONFIG_SYSCTL.  I would really appreciate constructive comments on
this patch series.

Previous version:
https://lore.kernel.org/lkml/20200908075956.1069018-1-mic@digikod.net/

[1] https://lore.kernel.org/lkml/1544647356.4028.105.camel@linux.ibm.com/
[2] https://lore.kernel.org/lkml/20190904201933.10736-6-cyphar@cyphar.com/
[3] https://lore.kernel.org/lkml/CALCETrVovr8XNZSroey7pHF46O=kj_c5D9K8h=z2T_cNrpvMig@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CALCETrVeZ0eufFXwfhtaG_j+AdvbzEWE0M3wjXMWVEO7pj+xkw@mail.gmail.com/
[5] https://lore.kernel.org/lkml/20200406221439.1469862-12-deven.desai@linux.microsoft.com/
[6] https://www.python.org/dev/peps/pep-0578/
[7] https://lore.kernel.org/lkml/0c70debd-e79e-d514-06c6-4cd1e021fa8b@python.org/

Regards,

Mickaël Salaün (3):
  fs: Add introspect_access(2) syscall implementation and related sysctl
  arch: Wire up introspect_access(2)
  selftest/interpreter: Add tests for introspect_access(2) policies

 Documentation/admin-guide/sysctl/fs.rst       |  50 +++
 arch/alpha/kernel/syscalls/syscall.tbl        |   1 +
 arch/arm/tools/syscall.tbl                    |   1 +
 arch/arm64/include/asm/unistd.h               |   2 +-
 arch/arm64/include/asm/unistd32.h             |   2 +
 arch/ia64/kernel/syscalls/syscall.tbl         |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl         |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl     |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl     |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl     |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl       |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl      |   1 +
 arch/s390/kernel/syscalls/syscall.tbl         |   1 +
 arch/sh/kernel/syscalls/syscall.tbl           |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl       |   1 +
 fs/open.c                                     |  79 ++++
 include/linux/fs.h                            |   3 +
 include/linux/syscalls.h                      |   1 +
 include/uapi/asm-generic/unistd.h             |   4 +-
 kernel/sysctl.c                               |  12 +-
 .../testing/selftests/interpreter/.gitignore  |   2 +
 tools/testing/selftests/interpreter/Makefile  |  18 +
 tools/testing/selftests/interpreter/config    |   1 +
 .../interpreter/introspection_policy_test.c   | 361 ++++++++++++++++++
 28 files changed, 547 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/interpreter/.gitignore
 create mode 100644 tools/testing/selftests/interpreter/Makefile
 create mode 100644 tools/testing/selftests/interpreter/config
 create mode 100644 tools/testing/selftests/interpreter/introspection_policy_test.c

-- 
2.28.0

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox