From: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> To: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>, Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, James Morris <jmorris-gx6/JNMH7DfYtjvyW6yDsg@public.gmane.org>, "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>, Julien Tinnes <jln-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>, David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, linux-mips-6z/3iImG2C8G8FEW9MqTrA@public.gmane.org, linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-security-module-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Subject: [PATCH v11 09/11] seccomp: introduce writer locking Date: Wed, 16 Jul 2014 14:50:40 -0700 [thread overview] Message-ID: <1405547442-26641-10-git-send-email-keescook@chromium.org> (raw) In-Reply-To: <1405547442-26641-1-git-send-email-keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> Normally, task_struct.seccomp.filter is only ever read or modified by the task that owns it (current). This property aids in fast access during system call filtering as read access is lockless. Updating the pointer from another task, however, opens up race conditions. To allow cross-thread filter pointer updates, writes to the seccomp fields are now protected by the sighand spinlock (which is shared by all threads in the thread group). Read access remains lockless because pointer updates themselves are atomic. However, writes (or cloning) often entail additional checking (like maximum instruction counts) which require locking to perform safely. In the case of cloning threads, the child is invisible to the system until it enters the task list. To make sure a child can't be cloned from a thread and left in a prior state, seccomp duplication is additionally moved under the sighand lock. Then parent and child are certain have the same seccomp state when they exit the lock. Based on patches by Will Drewry and David Drysdale. Signed-off-by: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> Reviewed-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Reviewed-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> --- include/linux/seccomp.h | 6 +++--- kernel/fork.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++- kernel/seccomp.c | 16 +++++++++++++++- 3 files changed, 66 insertions(+), 5 deletions(-) diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 4054b0994071..9ff98b4bfe2e 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -14,11 +14,11 @@ struct seccomp_filter; * * @mode: indicates one of the valid values above for controlled * system calls available to a process. - * @filter: The metadata and ruleset for determining what system calls - * are allowed for a task. + * @filter: must always point to a valid seccomp-filter or NULL as it is + * accessed without locking during system call entry. * * @filter must only be accessed from the context of current as there - * is no locking. + * is no read locking. */ struct seccomp { int mode; diff --git a/kernel/fork.c b/kernel/fork.c index 6a13c46cd87d..ed4bc339c9dc 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -315,6 +315,15 @@ static struct task_struct *dup_task_struct(struct task_struct *orig) goto free_ti; tsk->stack = ti; +#ifdef CONFIG_SECCOMP + /* + * We must handle setting up seccomp filters once we're under + * the sighand lock in case orig has changed between now and + * then. Until then, filter must be NULL to avoid messing up + * the usage counts on the error path calling free_task. + */ + tsk->seccomp.filter = NULL; +#endif setup_thread_stack(tsk, orig); clear_user_return_notifier(tsk); @@ -1081,6 +1090,39 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) return 0; } +static void copy_seccomp(struct task_struct *p) +{ +#ifdef CONFIG_SECCOMP + /* + * Must be called with sighand->lock held, which is common to + * all threads in the group. Holding cred_guard_mutex is not + * needed because this new task is not yet running and cannot + * be racing exec. + */ + BUG_ON(!spin_is_locked(¤t->sighand->siglock)); + + /* Ref-count the new filter user, and assign it. */ + get_seccomp_filter(current); + p->seccomp = current->seccomp; + + /* + * Explicitly enable no_new_privs here in case it got set + * between the task_struct being duplicated and holding the + * sighand lock. The seccomp state and nnp must be in sync. + */ + if (task_no_new_privs(current)) + task_set_no_new_privs(p); + + /* + * If the parent gained a seccomp mode after copying thread + * flags and between before we held the sighand lock, we have + * to manually enable the seccomp thread flag here. + */ + if (p->seccomp.mode != SECCOMP_MODE_DISABLED) + set_tsk_thread_flag(p, TIF_SECCOMP); +#endif +} + SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr) { current->clear_child_tid = tidptr; @@ -1196,7 +1238,6 @@ static struct task_struct *copy_process(unsigned long clone_flags, goto fork_out; ftrace_graph_init_task(p); - get_seccomp_filter(p); rt_mutex_init_task(p); @@ -1437,6 +1478,12 @@ static struct task_struct *copy_process(unsigned long clone_flags, spin_lock(¤t->sighand->siglock); /* + * Copy seccomp details explicitly here, in case they were changed + * before holding sighand lock. + */ + copy_seccomp(p); + + /* * Process group and session signals need to be delivered to just the * parent before the fork or both the parent and the child after the * fork. Restart if a signal comes in before we add the new process to diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 58125160417c..d5543e787e4e 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -199,6 +199,8 @@ static u32 seccomp_run_filters(int syscall) static inline bool seccomp_may_assign_mode(unsigned long seccomp_mode) { + BUG_ON(!spin_is_locked(¤t->sighand->siglock)); + if (current->seccomp.mode && current->seccomp.mode != seccomp_mode) return false; @@ -207,6 +209,8 @@ static inline bool seccomp_may_assign_mode(unsigned long seccomp_mode) static inline void seccomp_assign_mode(unsigned long seccomp_mode) { + BUG_ON(!spin_is_locked(¤t->sighand->siglock)); + current->seccomp.mode = seccomp_mode; set_tsk_thread_flag(current, TIF_SECCOMP); } @@ -332,6 +336,8 @@ out: * @flags: flags to change filter behavior * @filter: seccomp filter to add to the current process * + * Caller must be holding current->sighand->siglock lock. + * * Returns 0 on success, -ve on error. */ static long seccomp_attach_filter(unsigned int flags, @@ -340,6 +346,8 @@ static long seccomp_attach_filter(unsigned int flags, unsigned long total_insns; struct seccomp_filter *walker; + BUG_ON(!spin_is_locked(¤t->sighand->siglock)); + /* Validate resulting filter length. */ total_insns = filter->prog->len; for (walker = current->seccomp.filter; walker; walker = walker->prev) @@ -529,6 +537,8 @@ static long seccomp_set_mode_strict(void) const unsigned long seccomp_mode = SECCOMP_MODE_STRICT; long ret = -EINVAL; + spin_lock_irq(¤t->sighand->siglock); + if (!seccomp_may_assign_mode(seccomp_mode)) goto out; @@ -539,6 +549,7 @@ static long seccomp_set_mode_strict(void) ret = 0; out: + spin_unlock_irq(¤t->sighand->siglock); return ret; } @@ -566,13 +577,15 @@ static long seccomp_set_mode_filter(unsigned int flags, /* Validate flags. */ if (flags != 0) - goto out; + return -EINVAL; /* Prepare the new filter before holding any locks. */ prepared = seccomp_prepare_user_filter(filter); if (IS_ERR(prepared)) return PTR_ERR(prepared); + spin_lock_irq(¤t->sighand->siglock); + if (!seccomp_may_assign_mode(seccomp_mode)) goto out; @@ -584,6 +597,7 @@ static long seccomp_set_mode_filter(unsigned int flags, seccomp_assign_mode(seccomp_mode); out: + spin_unlock_irq(¤t->sighand->siglock); seccomp_filter_free(prepared); return ret; } -- 1.7.9.5
WARNING: multiple messages have this Message-ID (diff)
From: Kees Cook <keescook@chromium.org> To: linux-kernel@vger.kernel.org Cc: Kees Cook <keescook@chromium.org>, Andy Lutomirski <luto@amacapital.net>, Oleg Nesterov <oleg@redhat.com>, James Morris <jmorris@namei.org>, "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>, Alexei Starovoitov <ast@plumgrid.com>, Andrew Morton <akpm@linux-foundation.org>, Daniel Borkmann <dborkman@redhat.com>, Will Drewry <wad@chromium.org>, Julien Tinnes <jln@chromium.org>, David Drysdale <drysdale@google.com>, linux-api@vger.kernel.org, x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-mips@linux-mips.org, linux-arch@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [PATCH v11 09/11] seccomp: introduce writer locking Date: Wed, 16 Jul 2014 14:50:40 -0700 [thread overview] Message-ID: <1405547442-26641-10-git-send-email-keescook@chromium.org> (raw) Message-ID: <20140716215040.QTIQl7vG6FXhXilSSGkqkXj7xt6fkGzidkB79Ks6PdM@z> (raw) In-Reply-To: <1405547442-26641-1-git-send-email-keescook@chromium.org> Normally, task_struct.seccomp.filter is only ever read or modified by the task that owns it (current). This property aids in fast access during system call filtering as read access is lockless. Updating the pointer from another task, however, opens up race conditions. To allow cross-thread filter pointer updates, writes to the seccomp fields are now protected by the sighand spinlock (which is shared by all threads in the thread group). Read access remains lockless because pointer updates themselves are atomic. However, writes (or cloning) often entail additional checking (like maximum instruction counts) which require locking to perform safely. In the case of cloning threads, the child is invisible to the system until it enters the task list. To make sure a child can't be cloned from a thread and left in a prior state, seccomp duplication is additionally moved under the sighand lock. Then parent and child are certain have the same seccomp state when they exit the lock. Based on patches by Will Drewry and David Drysdale. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net> --- include/linux/seccomp.h | 6 +++--- kernel/fork.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++- kernel/seccomp.c | 16 +++++++++++++++- 3 files changed, 66 insertions(+), 5 deletions(-) diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 4054b0994071..9ff98b4bfe2e 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -14,11 +14,11 @@ struct seccomp_filter; * * @mode: indicates one of the valid values above for controlled * system calls available to a process. - * @filter: The metadata and ruleset for determining what system calls - * are allowed for a task. + * @filter: must always point to a valid seccomp-filter or NULL as it is + * accessed without locking during system call entry. * * @filter must only be accessed from the context of current as there - * is no locking. + * is no read locking. */ struct seccomp { int mode; diff --git a/kernel/fork.c b/kernel/fork.c index 6a13c46cd87d..ed4bc339c9dc 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -315,6 +315,15 @@ static struct task_struct *dup_task_struct(struct task_struct *orig) goto free_ti; tsk->stack = ti; +#ifdef CONFIG_SECCOMP + /* + * We must handle setting up seccomp filters once we're under + * the sighand lock in case orig has changed between now and + * then. Until then, filter must be NULL to avoid messing up + * the usage counts on the error path calling free_task. + */ + tsk->seccomp.filter = NULL; +#endif setup_thread_stack(tsk, orig); clear_user_return_notifier(tsk); @@ -1081,6 +1090,39 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) return 0; } +static void copy_seccomp(struct task_struct *p) +{ +#ifdef CONFIG_SECCOMP + /* + * Must be called with sighand->lock held, which is common to + * all threads in the group. Holding cred_guard_mutex is not + * needed because this new task is not yet running and cannot + * be racing exec. + */ + BUG_ON(!spin_is_locked(¤t->sighand->siglock)); + + /* Ref-count the new filter user, and assign it. */ + get_seccomp_filter(current); + p->seccomp = current->seccomp; + + /* + * Explicitly enable no_new_privs here in case it got set + * between the task_struct being duplicated and holding the + * sighand lock. The seccomp state and nnp must be in sync. + */ + if (task_no_new_privs(current)) + task_set_no_new_privs(p); + + /* + * If the parent gained a seccomp mode after copying thread + * flags and between before we held the sighand lock, we have + * to manually enable the seccomp thread flag here. + */ + if (p->seccomp.mode != SECCOMP_MODE_DISABLED) + set_tsk_thread_flag(p, TIF_SECCOMP); +#endif +} + SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr) { current->clear_child_tid = tidptr; @@ -1196,7 +1238,6 @@ static struct task_struct *copy_process(unsigned long clone_flags, goto fork_out; ftrace_graph_init_task(p); - get_seccomp_filter(p); rt_mutex_init_task(p); @@ -1437,6 +1478,12 @@ static struct task_struct *copy_process(unsigned long clone_flags, spin_lock(¤t->sighand->siglock); /* + * Copy seccomp details explicitly here, in case they were changed + * before holding sighand lock. + */ + copy_seccomp(p); + + /* * Process group and session signals need to be delivered to just the * parent before the fork or both the parent and the child after the * fork. Restart if a signal comes in before we add the new process to diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 58125160417c..d5543e787e4e 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -199,6 +199,8 @@ static u32 seccomp_run_filters(int syscall) static inline bool seccomp_may_assign_mode(unsigned long seccomp_mode) { + BUG_ON(!spin_is_locked(¤t->sighand->siglock)); + if (current->seccomp.mode && current->seccomp.mode != seccomp_mode) return false; @@ -207,6 +209,8 @@ static inline bool seccomp_may_assign_mode(unsigned long seccomp_mode) static inline void seccomp_assign_mode(unsigned long seccomp_mode) { + BUG_ON(!spin_is_locked(¤t->sighand->siglock)); + current->seccomp.mode = seccomp_mode; set_tsk_thread_flag(current, TIF_SECCOMP); } @@ -332,6 +336,8 @@ out: * @flags: flags to change filter behavior * @filter: seccomp filter to add to the current process * + * Caller must be holding current->sighand->siglock lock. + * * Returns 0 on success, -ve on error. */ static long seccomp_attach_filter(unsigned int flags, @@ -340,6 +346,8 @@ static long seccomp_attach_filter(unsigned int flags, unsigned long total_insns; struct seccomp_filter *walker; + BUG_ON(!spin_is_locked(¤t->sighand->siglock)); + /* Validate resulting filter length. */ total_insns = filter->prog->len; for (walker = current->seccomp.filter; walker; walker = walker->prev) @@ -529,6 +537,8 @@ static long seccomp_set_mode_strict(void) const unsigned long seccomp_mode = SECCOMP_MODE_STRICT; long ret = -EINVAL; + spin_lock_irq(¤t->sighand->siglock); + if (!seccomp_may_assign_mode(seccomp_mode)) goto out; @@ -539,6 +549,7 @@ static long seccomp_set_mode_strict(void) ret = 0; out: + spin_unlock_irq(¤t->sighand->siglock); return ret; } @@ -566,13 +577,15 @@ static long seccomp_set_mode_filter(unsigned int flags, /* Validate flags. */ if (flags != 0) - goto out; + return -EINVAL; /* Prepare the new filter before holding any locks. */ prepared = seccomp_prepare_user_filter(filter); if (IS_ERR(prepared)) return PTR_ERR(prepared); + spin_lock_irq(¤t->sighand->siglock); + if (!seccomp_may_assign_mode(seccomp_mode)) goto out; @@ -584,6 +597,7 @@ static long seccomp_set_mode_filter(unsigned int flags, seccomp_assign_mode(seccomp_mode); out: + spin_unlock_irq(¤t->sighand->siglock); seccomp_filter_free(prepared); return ret; } -- 1.7.9.5
next prev parent reply other threads:[~2014-07-16 21:50 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-07-16 21:50 [PATCH v11 0/11] seccomp: add thread sync ability Kees Cook 2014-07-16 21:50 ` [PATCH v11 01/11] seccomp: create internal mode-setting function Kees Cook 2014-07-16 21:50 ` Kees Cook [not found] ` <1405547442-26641-1-git-send-email-keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> 2014-07-16 21:50 ` [PATCH v11 02/11] seccomp: extract check/assign mode helpers Kees Cook 2014-07-16 21:50 ` Kees Cook 2014-07-16 21:50 ` [PATCH v11 08/11] seccomp: split filter prep from check and apply Kees Cook 2014-07-16 21:50 ` Kees Cook 2014-07-16 21:50 ` Kees Cook [this message] 2014-07-16 21:50 ` [PATCH v11 09/11] seccomp: introduce writer locking Kees Cook 2014-07-16 21:50 ` [PATCH v11 03/11] seccomp: split mode setting routines Kees Cook 2014-07-16 21:50 ` Kees Cook 2014-07-16 21:50 ` [PATCH v11 04/11] seccomp: add "seccomp" syscall Kees Cook 2014-07-16 21:50 ` Kees Cook 2014-07-16 21:50 ` [PATCH v11 05/11] ARM: add seccomp syscall Kees Cook 2014-07-16 21:50 ` Kees Cook 2014-07-16 21:50 ` [PATCH v11 06/11] MIPS: " Kees Cook 2014-07-16 21:50 ` Kees Cook 2014-07-16 21:50 ` [PATCH v11 07/11] sched: move no_new_privs into new atomic flags Kees Cook 2014-07-16 21:50 ` [PATCH v11 10/11] seccomp: allow mode setting across threads Kees Cook 2014-07-16 21:50 ` Kees Cook 2014-07-16 21:50 ` [PATCH v11 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC Kees Cook 2014-07-17 15:04 ` David Drysdale 2014-07-17 15:04 ` David Drysdale [not found] ` <CAHse=S_32tmusk4ceY4U6GbNpX4PkX12iPPDZFVZ7qgv-RAooA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-07-17 15:45 ` Kees Cook 2014-07-17 15:45 ` Kees Cook [not found] ` <CAGXu5j+dFZdnnK8f-HRrUs2vLeyhWyHh_AY-OynDcp-Ye+dy7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-07-17 17:52 ` Kees Cook 2014-07-17 17:52 ` Kees Cook
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1405547442-26641-10-git-send-email-keescook@chromium.org \ --to=keescook-f7+t8e8rja9g9huczpvpmw@public.gmane.org \ --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \ --cc=ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org \ --cc=dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \ --cc=drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \ --cc=jln-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org \ --cc=jmorris-gx6/JNMH7DfYtjvyW6yDsg@public.gmane.org \ --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \ --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-mips-6z/3iImG2C8G8FEW9MqTrA@public.gmane.org \ --cc=linux-security-module-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \ --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \ --cc=oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \ --cc=wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org \ --cc=x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).