From: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Jann Horn <jann-XZ1E9jl8jIdeoWH0uzbU5w@public.gmane.org>,
Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
linux-man <linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
Subject: Re: [PATCH v2 1/2] seccomp.2: Explain blacklisting problems, expand example
Date: Sun, 29 Mar 2015 18:01:35 +0200 [thread overview]
Message-ID: <551821DF.8090204@gmail.com> (raw)
In-Reply-To: <20150324183833.GB5677-J1fxOzX/cBvk1uMJSBkQmQ@public.gmane.org>
Hi Jann,
Thanks for the patch. I've applied it.
Cheers,
Michael
On 03/24/2015 07:38 PM, Jann Horn wrote:
> ---
> man2/seccomp.2 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 67 insertions(+), 6 deletions(-)
>
> diff --git a/man2/seccomp.2 b/man2/seccomp.2
> index e2a5060..b596fb8 100644
> --- a/man2/seccomp.2
> +++ b/man2/seccomp.2
> @@ -250,6 +250,55 @@ struct seccomp_data {
> .fi
> .in
>
> +Because the numbers of system calls vary between architectures and
> +some architectures (e.g. X86-64) allow user-space code to use
> +the calling conventions of multiple architectures, it is usually
> +necessary to verify the value of the
> +.IR arch
> +field.
> +
> +It is strongly recommended to use a whitelisting approach whenever
> +possible because such an approach is more robust and simple.
> +A blacklist will have to be updated whenever a potentially
> +dangerous syscall is added (or a dangerous flag or option if those
> +are blacklisted), and it is often possible to alter the
> +representation of a value without altering its meaning, leading to
> +a blacklist bypass.
> +
> +The
> +.IR arch
> +field is not unique for all calling conventions. The X86-64 ABI and
> +the X32 ABI both use
> +.BR AUDIT_ARCH_X86_64
> +as
> +.IR arch ,
> +and they run on the same processors. Instead, the mask
> +.BR __X32_SYSCALL_BIT
> +is used on the system call number to tell the two ABIs apart.
> +This means that in order to create a seccomp-based
> +blacklist for system calls performed through the X86-64 ABI,
> +it is necessary to not only check that
> +.IR arch
> +equals
> +.BR AUDIT_ARCH_X86_64 ,
> +but also to explicitly reject all syscalls that contain
> +.BR __X32_SYSCALL_BIT
> +in
> +.IR nr .
> +
> +When checking values from
> +.IR args
> +against a blacklist, keep in mind that arguments are often
> +silently truncated before being processed, but after the seccomp
> +check. For example, this happens if the i386 ABI is used on an
> +X86-64 kernel: Although the kernel will normally not look beyond
> +the 32 lowest bits of the arguments, the values of the full
> +64-bit registers will be present in the seccomp data. A less
> +surprising example is that if the X86-64 ABI is used to perform
> +a syscall that takes an argument of type int, the
> +more-significant half of the argument register is ignored by
> +the syscall, but visible in the seccomp data.
> +
> A seccomp filter returns a 32-bit value consisting of two parts:
> the most significant 16 bits
> (corresponding to the mask defined by the constant
> @@ -616,38 +665,50 @@ cecilia
> #include <linux/seccomp.h>
> #include <sys/prctl.h>
>
> +#define X32_SYSCALL_BIT 0x40000000
> +
> static int
> install_filter(int syscall_nr, int t_arch, int f_errno)
> {
> + unsigned int upper_nr_limit = 0xffffffff;
> + /* assume that AUDIT_ARCH_X86_64 means the normal X86-64 ABI */
> + if (t_arch == AUDIT_ARCH_X86_64)
> + upper_nr_limit = X32_SYSCALL_BIT - 1;
> +
> struct sock_filter filter[] = {
> /* [0] Load architecture from 'seccomp_data' buffer into
> accumulator */
> BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
> (offsetof(struct seccomp_data, arch))),
>
> - /* [1] Jump forward 4 instructions if architecture does not
> + /* [1] Jump forward 5 instructions if architecture does not
> match 't_arch' */
> - BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, t_arch, 0, 4),
> + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, t_arch, 0, 5),
>
> /* [2] Load system call number from 'seccomp_data' buffer into
> accumulator */
> BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
> (offsetof(struct seccomp_data, nr))),
>
> - /* [3] Jump forward 1 instruction if system call number
> + /* [3] Check ABI - only needed for X86-64 in blacklist usecases.
> + Use JGT instead of checking against the bitmask to avoid
> + having to reload the syscall number. */
> + BPF_JUMP(BPF_JMP | BPF_JGT | BPF_K, upper_nr_limit, 3, 0),
> +
> + /* [4] Jump forward 1 instruction if system call number
> does not match 'syscall_nr' */
> BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, syscall_nr, 0, 1),
>
> - /* [4] Matching architecture and system call: don't execute
> + /* [5] Matching architecture and system call: don't execute
> the system call, and return 'f_errno' in 'errno' */
> BPF_STMT(BPF_RET | BPF_K,
> SECCOMP_RET_ERRNO | (f_errno & SECCOMP_RET_DATA)),
>
> - /* [5] Destination of system call number mismatch: allow other
> + /* [6] Destination of system call number mismatch: allow other
> system calls */
> BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
>
> - /* [6] Destination of architecture mismatch: kill process */
> + /* [7] Destination of architecture mismatch: kill process */
> BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL),
> };
>
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Jann Horn <jann@thejh.net>, Kees Cook <keescook@chromium.org>
Cc: mtk.manpages@gmail.com, linux-man <linux-man@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Andy Lutomirski <luto@amacapital.net>,
Will Drewry <wad@chromium.org>
Subject: Re: [PATCH v2 1/2] seccomp.2: Explain blacklisting problems, expand example
Date: Sun, 29 Mar 2015 18:01:35 +0200 [thread overview]
Message-ID: <551821DF.8090204@gmail.com> (raw)
In-Reply-To: <20150324183833.GB5677@pc.thejh.net>
Hi Jann,
Thanks for the patch. I've applied it.
Cheers,
Michael
On 03/24/2015 07:38 PM, Jann Horn wrote:
> ---
> man2/seccomp.2 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 67 insertions(+), 6 deletions(-)
>
> diff --git a/man2/seccomp.2 b/man2/seccomp.2
> index e2a5060..b596fb8 100644
> --- a/man2/seccomp.2
> +++ b/man2/seccomp.2
> @@ -250,6 +250,55 @@ struct seccomp_data {
> .fi
> .in
>
> +Because the numbers of system calls vary between architectures and
> +some architectures (e.g. X86-64) allow user-space code to use
> +the calling conventions of multiple architectures, it is usually
> +necessary to verify the value of the
> +.IR arch
> +field.
> +
> +It is strongly recommended to use a whitelisting approach whenever
> +possible because such an approach is more robust and simple.
> +A blacklist will have to be updated whenever a potentially
> +dangerous syscall is added (or a dangerous flag or option if those
> +are blacklisted), and it is often possible to alter the
> +representation of a value without altering its meaning, leading to
> +a blacklist bypass.
> +
> +The
> +.IR arch
> +field is not unique for all calling conventions. The X86-64 ABI and
> +the X32 ABI both use
> +.BR AUDIT_ARCH_X86_64
> +as
> +.IR arch ,
> +and they run on the same processors. Instead, the mask
> +.BR __X32_SYSCALL_BIT
> +is used on the system call number to tell the two ABIs apart.
> +This means that in order to create a seccomp-based
> +blacklist for system calls performed through the X86-64 ABI,
> +it is necessary to not only check that
> +.IR arch
> +equals
> +.BR AUDIT_ARCH_X86_64 ,
> +but also to explicitly reject all syscalls that contain
> +.BR __X32_SYSCALL_BIT
> +in
> +.IR nr .
> +
> +When checking values from
> +.IR args
> +against a blacklist, keep in mind that arguments are often
> +silently truncated before being processed, but after the seccomp
> +check. For example, this happens if the i386 ABI is used on an
> +X86-64 kernel: Although the kernel will normally not look beyond
> +the 32 lowest bits of the arguments, the values of the full
> +64-bit registers will be present in the seccomp data. A less
> +surprising example is that if the X86-64 ABI is used to perform
> +a syscall that takes an argument of type int, the
> +more-significant half of the argument register is ignored by
> +the syscall, but visible in the seccomp data.
> +
> A seccomp filter returns a 32-bit value consisting of two parts:
> the most significant 16 bits
> (corresponding to the mask defined by the constant
> @@ -616,38 +665,50 @@ cecilia
> #include <linux/seccomp.h>
> #include <sys/prctl.h>
>
> +#define X32_SYSCALL_BIT 0x40000000
> +
> static int
> install_filter(int syscall_nr, int t_arch, int f_errno)
> {
> + unsigned int upper_nr_limit = 0xffffffff;
> + /* assume that AUDIT_ARCH_X86_64 means the normal X86-64 ABI */
> + if (t_arch == AUDIT_ARCH_X86_64)
> + upper_nr_limit = X32_SYSCALL_BIT - 1;
> +
> struct sock_filter filter[] = {
> /* [0] Load architecture from 'seccomp_data' buffer into
> accumulator */
> BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
> (offsetof(struct seccomp_data, arch))),
>
> - /* [1] Jump forward 4 instructions if architecture does not
> + /* [1] Jump forward 5 instructions if architecture does not
> match 't_arch' */
> - BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, t_arch, 0, 4),
> + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, t_arch, 0, 5),
>
> /* [2] Load system call number from 'seccomp_data' buffer into
> accumulator */
> BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
> (offsetof(struct seccomp_data, nr))),
>
> - /* [3] Jump forward 1 instruction if system call number
> + /* [3] Check ABI - only needed for X86-64 in blacklist usecases.
> + Use JGT instead of checking against the bitmask to avoid
> + having to reload the syscall number. */
> + BPF_JUMP(BPF_JMP | BPF_JGT | BPF_K, upper_nr_limit, 3, 0),
> +
> + /* [4] Jump forward 1 instruction if system call number
> does not match 'syscall_nr' */
> BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, syscall_nr, 0, 1),
>
> - /* [4] Matching architecture and system call: don't execute
> + /* [5] Matching architecture and system call: don't execute
> the system call, and return 'f_errno' in 'errno' */
> BPF_STMT(BPF_RET | BPF_K,
> SECCOMP_RET_ERRNO | (f_errno & SECCOMP_RET_DATA)),
>
> - /* [5] Destination of system call number mismatch: allow other
> + /* [6] Destination of system call number mismatch: allow other
> system calls */
> BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
>
> - /* [6] Destination of architecture mismatch: kill process */
> + /* [7] Destination of architecture mismatch: kill process */
> BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL),
> };
>
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
next prev parent reply other threads:[~2015-03-29 16:01 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-16 18:01 [PATCH] seccomp.2: Explain arch checking, value (non-)truncation, expand example Jann Horn
[not found] ` <20150316180154.GA10663-J1fxOzX/cBvk1uMJSBkQmQ@public.gmane.org>
2015-03-16 22:25 ` Kees Cook
2015-03-16 22:25 ` Kees Cook
2015-03-16 23:34 ` Jann Horn
2015-03-22 15:58 ` Michael Kerrisk (man-pages)
[not found] ` <20150316233457.GA9227-J1fxOzX/cBvk1uMJSBkQmQ@public.gmane.org>
2015-03-17 17:23 ` Kees Cook
2015-03-17 17:23 ` Kees Cook
2015-03-24 18:38 ` [PATCH v2 1/2] seccomp.2: Explain blacklisting problems, " Jann Horn
2015-03-24 18:38 ` Jann Horn
[not found] ` <20150324183833.GB5677-J1fxOzX/cBvk1uMJSBkQmQ@public.gmane.org>
2015-03-29 16:01 ` Michael Kerrisk (man-pages) [this message]
2015-03-29 16:01 ` Michael Kerrisk (man-pages)
2015-03-24 18:40 ` [PATCH v2 2/2] syscall.2: add x32 ABI Jann Horn
[not found] ` <20150324184043.GC5677-J1fxOzX/cBvk1uMJSBkQmQ@public.gmane.org>
2015-04-21 14:01 ` Michael Kerrisk (man-pages)
2015-04-21 14:01 ` Michael Kerrisk (man-pages)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=551821DF.8090204@gmail.com \
--to=mtk.manpages-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=jann-XZ1E9jl8jIdeoWH0uzbU5w@public.gmane.org \
--cc=keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
--cc=wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.