* [PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF
@ 2014-07-11 3:38 Richard Guy Briggs
2014-07-11 3:38 ` [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition Richard Guy Briggs
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Richard Guy Briggs @ 2014-07-11 3:38 UTC (permalink / raw)
To: linux-audit, linux-kernel
Cc: Will Drewry, Richard Guy Briggs, Paul Moore, H. Peter Anvin
This set reverts commit 8b4b9f2 which broke audit and potentially other users
of syscall_get_nr() which depend on that call as named without being overloaded
by architecture bits. It will satisfy other regular users of syscall_get_nr()
and syscall_get_arch() without changing the seccomp interface to BPF.
A new ARCH definition, AUDIT_ARCH_X86_X32, was added for syscall_get_arch().
Cc: Paul Moore <pmoore@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <aviro@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Link: http://lkml.kernel.org/r/cover.1405023592.git.rgb@redhat.com
Richard Guy Briggs (3):
audit: add AUDIT_ARCH_X86_X32 arch definition
seccomp: give BPF x32 bit when restoring x32 filter
Revert "x86: remove the x32 syscall bitmask from syscall_get_nr()"
arch/x86/include/asm/syscall.h | 8 ++++++--
include/uapi/linux/audit.h | 1 +
kernel/seccomp.c | 6 ++++++
3 files changed, 13 insertions(+), 2 deletions(-)
^ permalink raw reply [flat|nested] 22+ messages in thread* [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition 2014-07-11 3:38 [PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF Richard Guy Briggs @ 2014-07-11 3:38 ` Richard Guy Briggs 2014-07-11 16:15 ` Paul Moore 2014-07-11 3:38 ` [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter Richard Guy Briggs 2014-07-11 3:38 ` [PATCH 3/3] [RFC] Revert "x86: remove the x32 syscall bitmask from syscall_get_nr()" Richard Guy Briggs 2 siblings, 1 reply; 22+ messages in thread From: Richard Guy Briggs @ 2014-07-11 3:38 UTC (permalink / raw) To: linux-audit, linux-kernel Cc: Richard Guy Briggs, Paul Moore, Eric Paris, Al Viro, Will Drewry, H. Peter Anvin Add a definition for 32-bit native system calls under 64-bit x86 architectures. This is distict from 32-bit emulation under 64-bit x86 architectures. Cc: Paul Moore <pmoore@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <aviro@redhat.com> Cc: Will Drewry <wad@chromium.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> --- include/uapi/linux/audit.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h index e15d6fc..4f5607f 100644 --- a/include/uapi/linux/audit.h +++ b/include/uapi/linux/audit.h @@ -374,6 +374,7 @@ enum { #define AUDIT_ARCH_SPARC (EM_SPARC) #define AUDIT_ARCH_SPARC64 (EM_SPARCV9|__AUDIT_ARCH_64BIT) #define AUDIT_ARCH_X86_64 (EM_X86_64|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE) +#define AUDIT_ARCH_X86_X32 (EM_X86_64|__AUDIT_ARCH_LE) #define AUDIT_PERM_EXEC 1 #define AUDIT_PERM_WRITE 2 -- 1.7.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition 2014-07-11 3:38 ` [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition Richard Guy Briggs @ 2014-07-11 16:15 ` Paul Moore 0 siblings, 0 replies; 22+ messages in thread From: Paul Moore @ 2014-07-11 16:15 UTC (permalink / raw) To: Richard Guy Briggs Cc: linux-audit, linux-kernel, Eric Paris, Al Viro, Will Drewry, H. Peter Anvin On Thursday, July 10, 2014 11:38:12 PM Richard Guy Briggs wrote: > Add a definition for 32-bit native system calls under 64-bit x86 > architectures. This is distict from 32-bit emulation under 64-bit x86 > architectures. > > Cc: Paul Moore <pmoore@redhat.com> > Cc: Eric Paris <eparis@redhat.com> > Cc: Al Viro <aviro@redhat.com> > Cc: Will Drewry <wad@chromium.org> > Cc: H. Peter Anvin <hpa@zytor.com> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com> > --- > include/uapi/linux/audit.h | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h > index e15d6fc..4f5607f 100644 > --- a/include/uapi/linux/audit.h > +++ b/include/uapi/linux/audit.h > @@ -374,6 +374,7 @@ enum { > #define AUDIT_ARCH_SPARC (EM_SPARC) > #define AUDIT_ARCH_SPARC64 (EM_SPARCV9|__AUDIT_ARCH_64BIT) > #define AUDIT_ARCH_X86_64 (EM_X86_64|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE) > +#define AUDIT_ARCH_X86_X32 (EM_X86_64|__AUDIT_ARCH_LE) > > #define AUDIT_PERM_EXEC 1 > #define AUDIT_PERM_WRITE 2 While I'm opposed to the other patches in this series (comments to follow), I think this is a worthwhile addition and arguably should have been done when x32 was merged. That said, this change should probably be included in whatever patch first makes use of this new value as this patch does nothing by itself. -- paul moore security and virtualization @ redhat ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 3:38 [PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF Richard Guy Briggs 2014-07-11 3:38 ` [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition Richard Guy Briggs @ 2014-07-11 3:38 ` Richard Guy Briggs 2014-07-11 4:06 ` H. Peter Anvin 2014-07-11 16:36 ` Paul Moore 2014-07-11 3:38 ` [PATCH 3/3] [RFC] Revert "x86: remove the x32 syscall bitmask from syscall_get_nr()" Richard Guy Briggs 2 siblings, 2 replies; 22+ messages in thread From: Richard Guy Briggs @ 2014-07-11 3:38 UTC (permalink / raw) To: linux-audit, linux-kernel Cc: Will Drewry, Richard Guy Briggs, Paul Moore, H. Peter Anvin Commit fca460f hpa@zytor.com 2012-02-19 07:56:26 -0800 x32: Handle the x32 system call flag provided a method to multiplex architecture with the syscall number for X32 calls. Commit 8b4b9f2 pmoore@redhat.com 2013-02-15 12:21:43 -0500 x86: remove the x32 syscall bitmask from syscall_get_nr() broke audit and potentially other users of syscall_get_nr() which depend on that call as named. Commit audit: add AUDIT_ARCH_X86_X32 arch definition is required to provide the new ARCH definition AUDIT_ARCH_X86_X32 for syscall_get_arch(). This patch along with reverting 8b4b9f2 should satisfy other regular users of syscall_get_nr() without changing the seccomp interface to BPF. Cc: Paul Moore <pmoore@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <aviro@redhat.com> Cc: Will Drewry <wad@chromium.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Link: http://lkml.kernel.org/r/cover.1405023592.git.rgb@redhat.com --- arch/x86/include/asm/syscall.h | 4 ++++ kernel/seccomp.c | 6 ++++++ 2 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h index d6a756a..d58b6be 100644 --- a/arch/x86/include/asm/syscall.h +++ b/arch/x86/include/asm/syscall.h @@ -236,6 +236,10 @@ static inline int syscall_get_arch(void) return AUDIT_ARCH_I386; #endif /* Both x32 and x86_64 are considered "64-bit". */ +#ifdef CONFIG_X86_X32_ABI + if (task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT) + return AUDIT_ARCH_X86_X32; +#endif return AUDIT_ARCH_X86_64; } #endif /* CONFIG_X86_32 */ diff --git a/kernel/seccomp.c b/kernel/seccomp.c index b35c215..bc18214 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -73,6 +73,12 @@ static void populate_seccomp_data(struct seccomp_data *sd) sd->nr = syscall_get_nr(task, regs); sd->arch = syscall_get_arch(); +#ifdef CONFIG_X86_X32_ABI + if (sd->arch == AUDIT_ARCH_X86_X32) { + sd->arch = AUDIT_ARCH_X86_64; + sd->nr |= __X32_SYSCALL_BIT; + } +#endif syscall_get_arguments(task, regs, 0, 6, args); sd->args[0] = args[0]; sd->args[1] = args[1]; -- 1.7.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 3:38 ` [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter Richard Guy Briggs @ 2014-07-11 4:06 ` H. Peter Anvin 2014-07-11 16:11 ` Paul Moore 2014-07-11 16:36 ` Paul Moore 1 sibling, 1 reply; 22+ messages in thread From: H. Peter Anvin @ 2014-07-11 4:06 UTC (permalink / raw) To: Richard Guy Briggs, linux-audit, linux-kernel Cc: Paul Moore, Eric Paris, Al Viro, Will Drewry On 07/10/2014 08:38 PM, Richard Guy Briggs wrote: > Commit > fca460f hpa@zytor.com 2012-02-19 07:56:26 -0800 > x32: Handle the x32 system call flag > > provided a method to multiplex architecture with the syscall number for X32 > calls. > > Commit > 8b4b9f2 pmoore@redhat.com 2013-02-15 12:21:43 -0500 > x86: remove the x32 syscall bitmask from syscall_get_nr() > > broke audit and potentially other users of syscall_get_nr() which depend on > that call as named. > > Commit > audit: add AUDIT_ARCH_X86_X32 arch definition > > is required to provide the new ARCH definition AUDIT_ARCH_X86_X32 for > syscall_get_arch(). > > This patch along with reverting 8b4b9f2 should satisfy other regular users of > syscall_get_nr() without changing the seccomp interface to BPF. > Incidentally: do seccomp users know that on an x86-64 system you can recevie system calls from any of the x86 architectures, regardless of how the program is invoked? (This is unusual, so normally denying those "alien" calls is the right thing to do.) -hpa ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 4:06 ` H. Peter Anvin @ 2014-07-11 16:11 ` Paul Moore 2014-07-11 16:13 ` H. Peter Anvin 2014-07-11 16:16 ` Eric Paris 0 siblings, 2 replies; 22+ messages in thread From: Paul Moore @ 2014-07-11 16:11 UTC (permalink / raw) To: H. Peter Anvin Cc: Richard Guy Briggs, linux-audit, linux-kernel, Eric Paris, Al Viro, Will Drewry On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote: > Incidentally: do seccomp users know that on an x86-64 system you can > recevie system calls from any of the x86 architectures, regardless of > how the program is invoked? (This is unusual, so normally denying those > "alien" calls is the right thing to do.) I obviously can't speak for all seccomp users, but libseccomp handles this by checking the seccomp_data->arch value at the start of the filter and killing (by default) any non-native architectures. If you want, you can change this default behavior or add support for other architectures (e.g. create a filter that allows both x86-64 and x32 but disallows x86, or any combination of the three for that matter). -- paul moore security and virtualization @ redhat ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 16:11 ` Paul Moore @ 2014-07-11 16:13 ` H. Peter Anvin 2014-07-11 16:16 ` Eric Paris 1 sibling, 0 replies; 22+ messages in thread From: H. Peter Anvin @ 2014-07-11 16:13 UTC (permalink / raw) To: Paul Moore Cc: Richard Guy Briggs, linux-audit, linux-kernel, Eric Paris, Al Viro, Will Drewry On 07/11/2014 09:11 AM, Paul Moore wrote: > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote: >> Incidentally: do seccomp users know that on an x86-64 system you can >> recevie system calls from any of the x86 architectures, regardless of >> how the program is invoked? (This is unusual, so normally denying those >> "alien" calls is the right thing to do.) > > I obviously can't speak for all seccomp users, but libseccomp handles this by > checking the seccomp_data->arch value at the start of the filter and killing > (by default) any non-native architectures. If you want, you can change this > default behavior or add support for other architectures (e.g. create a filter > that allows both x86-64 and x32 but disallows x86, or any combination of the > three for that matter). > OK, that seems reasonable. -hpa ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 16:11 ` Paul Moore 2014-07-11 16:13 ` H. Peter Anvin @ 2014-07-11 16:16 ` Eric Paris 2014-07-11 16:21 ` Paul Moore 1 sibling, 1 reply; 22+ messages in thread From: Eric Paris @ 2014-07-11 16:16 UTC (permalink / raw) To: Paul Moore Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel, Al Viro, Will Drewry On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote: > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote: > > Incidentally: do seccomp users know that on an x86-64 system you can > > recevie system calls from any of the x86 architectures, regardless of > > how the program is invoked? (This is unusual, so normally denying those > > "alien" calls is the right thing to do.) > > I obviously can't speak for all seccomp users, but libseccomp handles this by > checking the seccomp_data->arch value at the start of the filter and killing > (by default) any non-native architectures. If you want, you can change this > default behavior or add support for other architectures (e.g. create a filter > that allows both x86-64 and x32 but disallows x86, or any combination of the > three for that matter). Maybe libseccomp does some HORRIFIC contortions under the hood, but the interface is crap... Since seccomp_data->arch can't distinguish between X32 and X86_64. If I write a seccomp filter which says KILL arch != x86_64 KILL init_module ALLOW everything else I can still call init_module, I just have to use the X32 variant. If libseccomp is translating: KILL arch != x86_64 into: KILL arch != x86_64 KILL syscall_nr >= 2000 That's just showing how dumb the kernel interface is... Good for you guys, but the kernel is just being dumb :) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 16:16 ` Eric Paris @ 2014-07-11 16:21 ` Paul Moore 2014-07-11 16:23 ` Eric Paris 0 siblings, 1 reply; 22+ messages in thread From: Paul Moore @ 2014-07-11 16:21 UTC (permalink / raw) To: Eric Paris Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel, Al Viro, Will Drewry On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote: > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote: > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote: > > > Incidentally: do seccomp users know that on an x86-64 system you can > > > recevie system calls from any of the x86 architectures, regardless of > > > how the program is invoked? (This is unusual, so normally denying those > > > "alien" calls is the right thing to do.) > > > > I obviously can't speak for all seccomp users, but libseccomp handles this > > by checking the seccomp_data->arch value at the start of the filter and > > killing (by default) any non-native architectures. If you want, you can > > change this default behavior or add support for other architectures (e.g. > > create a filter that allows both x86-64 and x32 but disallows x86, or any > > combination of the three for that matter). > > Maybe libseccomp does some HORRIFIC contortions under the hood, but the > interface is crap... Since seccomp_data->arch can't distinguish between > X32 and X86_64. If I write a seccomp filter which says > > KILL arch != x86_64 > KILL init_module > ALLOW everything else > > I can still call init_module, I just have to use the X32 variant. > > If libseccomp is translating: > > KILL arch != x86_64 into: > > KILL arch != x86_64 > KILL syscall_nr >= 2000 > > That's just showing how dumb the kernel interface is... Good for you > guys, but the kernel is just being dumb :) You're not going to hear me ever say that I like how the x32 ABI was done, it is a real mess from a seccomp filter point of view and we have to do some nasty stuff in libseccomp to make it all work correctly (see my comments on the libseccomp-devel list regarding my severe displeasure over x32), but what's done is done. I think it's too late to change the x32 seccomp filter ABI. -- paul moore security and virtualization @ redhat ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 16:21 ` Paul Moore @ 2014-07-11 16:23 ` Eric Paris 2014-07-11 16:30 ` H. Peter Anvin 2014-07-11 16:32 ` Paul Moore 0 siblings, 2 replies; 22+ messages in thread From: Eric Paris @ 2014-07-11 16:23 UTC (permalink / raw) To: Paul Moore Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel, Al Viro, Will Drewry On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote: > On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote: > > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote: > > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote: > > > > Incidentally: do seccomp users know that on an x86-64 system you can > > > > recevie system calls from any of the x86 architectures, regardless of > > > > how the program is invoked? (This is unusual, so normally denying those > > > > "alien" calls is the right thing to do.) > > > > > > I obviously can't speak for all seccomp users, but libseccomp handles this > > > by checking the seccomp_data->arch value at the start of the filter and > > > killing (by default) any non-native architectures. If you want, you can > > > change this default behavior or add support for other architectures (e.g. > > > create a filter that allows both x86-64 and x32 but disallows x86, or any > > > combination of the three for that matter). > > > > Maybe libseccomp does some HORRIFIC contortions under the hood, but the > > interface is crap... Since seccomp_data->arch can't distinguish between > > X32 and X86_64. If I write a seccomp filter which says > > > > KILL arch != x86_64 > > KILL init_module > > ALLOW everything else > > > > I can still call init_module, I just have to use the X32 variant. > > > > If libseccomp is translating: > > > > KILL arch != x86_64 into: > > > > KILL arch != x86_64 > > KILL syscall_nr >= 2000 > > > > That's just showing how dumb the kernel interface is... Good for you > > guys, but the kernel is just being dumb :) > > You're not going to hear me ever say that I like how the x32 ABI was done, it > is a real mess from a seccomp filter point of view and we have to do some > nasty stuff in libseccomp to make it all work correctly (see my comments on > the libseccomp-devel list regarding my severe displeasure over x32), but > what's done is done. > > I think it's too late to change the x32 seccomp filter ABI. So we have a security interface that is damn near impossible to get right. Perfect. I think this explains exactly why I support this idea. Make X32 look like everyone else and put these custom horrific hacks in seccomp if we are unwilling to 'do it right' Honestly, how many people are using seccomp on X32 and would be horribly pissed if we just fixed it? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 16:23 ` Eric Paris @ 2014-07-11 16:30 ` H. Peter Anvin 2014-07-11 16:32 ` Paul Moore 1 sibling, 0 replies; 22+ messages in thread From: H. Peter Anvin @ 2014-07-11 16:30 UTC (permalink / raw) To: Eric Paris, Paul Moore Cc: Richard Guy Briggs, linux-audit, linux-kernel, Al Viro, Will Drewry On 07/11/2014 09:23 AM, Eric Paris wrote: >> >> You're not going to hear me ever say that I like how the x32 ABI was done, it >> is a real mess from a seccomp filter point of view and we have to do some >> nasty stuff in libseccomp to make it all work correctly (see my comments on >> the libseccomp-devel list regarding my severe displeasure over x32), but >> what's done is done. >> >> I think it's too late to change the x32 seccomp filter ABI. > > So we have a security interface that is damn near impossible to get > right. Perfect. > > I think this explains exactly why I support this idea. Make X32 look > like everyone else and put these custom horrific hacks in seccomp if we > are unwilling to 'do it right' > > Honestly, how many people are using seccomp on X32 and would be horribly > pissed if we just fixed it? > The bigger issue is probably if we will open a problem with the older kernels. -hpa ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 16:23 ` Eric Paris 2014-07-11 16:30 ` H. Peter Anvin @ 2014-07-11 16:32 ` Paul Moore 2014-07-11 18:31 ` Eric Paris 1 sibling, 1 reply; 22+ messages in thread From: Paul Moore @ 2014-07-11 16:32 UTC (permalink / raw) To: Eric Paris Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel, Al Viro, Will Drewry On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote: > On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote: > > On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote: > > > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote: > > > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote: > > > > > Incidentally: do seccomp users know that on an x86-64 system you can > > > > > recevie system calls from any of the x86 architectures, regardless > > > > > of > > > > > how the program is invoked? (This is unusual, so normally denying > > > > > those > > > > > "alien" calls is the right thing to do.) > > > > > > > > I obviously can't speak for all seccomp users, but libseccomp handles > > > > this > > > > by checking the seccomp_data->arch value at the start of the filter > > > > and > > > > killing (by default) any non-native architectures. If you want, you > > > > can > > > > change this default behavior or add support for other architectures > > > > (e.g. > > > > create a filter that allows both x86-64 and x32 but disallows x86, or > > > > any > > > > combination of the three for that matter). > > > > > > Maybe libseccomp does some HORRIFIC contortions under the hood, but the > > > interface is crap... Since seccomp_data->arch can't distinguish between > > > X32 and X86_64. If I write a seccomp filter which says > > > > > > KILL arch != x86_64 > > > KILL init_module > > > ALLOW everything else > > > > > > I can still call init_module, I just have to use the X32 variant. > > > > > > If libseccomp is translating: > > > > > > KILL arch != x86_64 into: > > > > > > KILL arch != x86_64 > > > KILL syscall_nr >= 2000 > > > > > > That's just showing how dumb the kernel interface is... Good for you > > > guys, but the kernel is just being dumb :) > > > > You're not going to hear me ever say that I like how the x32 ABI was done, > > it is a real mess from a seccomp filter point of view and we have to do > > some nasty stuff in libseccomp to make it all work correctly (see my > > comments on the libseccomp-devel list regarding my severe displeasure > > over x32), but what's done is done. > > > > I think it's too late to change the x32 seccomp filter ABI. > > So we have a security interface that is damn near impossible to get > right. Perfect. What? Having to do two comparisons instead of one is "damn near impossible"? I think that might be a bit of an overreaction don't you think? > I think this explains exactly why I support this idea. Make X32 look > like everyone else ... You do realize that this patch set makes x32 the odd man out by having syscall_get_nr() return a different syscall number than what was used to make the syscall? I don't understand how that makes "x32 look like everyone else". > ... and put these custom horrific hacks in seccomp if we are unwilling to > 'do it right' If you want to add the new x32 audit arch #define, go for it, like I said that was something that I feel should have been in there from the beginning. As far as I'm concerned you can even put a hack in kernel/seccomp.c to rewrite the arch token value if it makes your life easier. > Honestly, how many people are using seccomp on X32 and would be horribly > pissed if we just fixed it? Okay, please stop suggesting we break the x32 kernel/user interface to workaround a flaw in audit. I get that it sucks for audit, I really do, but this is audit's problem. -- paul moore security and virtualization @ redhat ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 16:32 ` Paul Moore @ 2014-07-11 18:31 ` Eric Paris 2014-07-11 19:36 ` Paul Moore 0 siblings, 1 reply; 22+ messages in thread From: Eric Paris @ 2014-07-11 18:31 UTC (permalink / raw) To: Paul Moore Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel, Al Viro, Will Drewry On Fri, 2014-07-11 at 12:32 -0400, Paul Moore wrote: > On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote: > > On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote: > > > On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote: > > > > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote: > > > > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote: > > > > > > Incidentally: do seccomp users know that on an x86-64 system you can > > > > > > recevie system calls from any of the x86 architectures, regardless > > > > > > of > > > > > > how the program is invoked? (This is unusual, so normally denying > > > > > > those > > > > > > "alien" calls is the right thing to do.) > > > > > > > > > > I obviously can't speak for all seccomp users, but libseccomp handles > > > > > this > > > > > by checking the seccomp_data->arch value at the start of the filter > > > > > and > > > > > killing (by default) any non-native architectures. If you want, you > > > > > can > > > > > change this default behavior or add support for other architectures > > > > > (e.g. > > > > > create a filter that allows both x86-64 and x32 but disallows x86, or > > > > > any > > > > > combination of the three for that matter). > > > > > > > > Maybe libseccomp does some HORRIFIC contortions under the hood, but the > > > > interface is crap... Since seccomp_data->arch can't distinguish between > > > > X32 and X86_64. If I write a seccomp filter which says > > > > > > > > KILL arch != x86_64 > > > > KILL init_module > > > > ALLOW everything else > > > > > > > > I can still call init_module, I just have to use the X32 variant. > > > > > > > > If libseccomp is translating: > > > > > > > > KILL arch != x86_64 into: > > > > > > > > KILL arch != x86_64 > > > > KILL syscall_nr >= 2000 > > > > > > > > That's just showing how dumb the kernel interface is... Good for you > > > > guys, but the kernel is just being dumb :) > > > > > > You're not going to hear me ever say that I like how the x32 ABI was done, > > > it is a real mess from a seccomp filter point of view and we have to do > > > some nasty stuff in libseccomp to make it all work correctly (see my > > > comments on the libseccomp-devel list regarding my severe displeasure > > > over x32), but what's done is done. > > > > > > I think it's too late to change the x32 seccomp filter ABI. > > > > So we have a security interface that is damn near impossible to get > > right. Perfect. > > What? Having to do two comparisons instead of one is "damn near impossible"? > I think that might be a bit of an overreaction don't you think? Actually no. How can a normal userspace application coder POSSIBLY know this? Find this thread on an e-mail list, by accident? > > > I think this explains exactly why I support this idea. Make X32 look > > like everyone else ... > > You do realize that this patch set makes x32 the odd man out by having > syscall_get_nr() return a different syscall number than what was used to make > the syscall? I don't understand how that makes "x32 look like everyone else". Ok, I buy the __X32_SYSCALL_BIT argument. It can be dealt with in audit. No problem. We don't need to strip it in syscall_get_nr(). I'll gladly concede that part of the patch series. But given an x86_64 kernel a seccomp filter writer has to know about X32 and how to write rules to block the X32 ABI. And I stick with my assessment that x32 + seccomp is darn near impossible for a normal developer to handle. Heck, even chromium took months to realize that x32 was a weird beast. And they got it wrong on their first try. Their original implementation didn't handle __X32_SYSCALL_BIT quite right. Looking at their code I'm still not sure it does the right thing. And they are the EXPERTS. They wrote seccomp! > > Honestly, how many people are using seccomp on X32 and would be horribly > > pissed if we just fixed it? > > Okay, please stop suggesting we break the x32 kernel/user interface to > workaround a flaw in audit. I get that it sucks for audit, I really do, but > this is audit's problem. No one is asking to break X32 to fix audit. Audit can handle itself. I don't want anything in the kernel to pretend that X32 is X86_64. It isn't. It has its own syscall table. Its own syscalls. Its own ABI. I'm suggesting to fix how seccomp exposes X32 information because it is a HORRIBLE interface that even the experts have gotten wrong, over and over and over. I suggest we accept it as breakage and just return AUDIT_ARCH_X32. (Leaving the _X32_SYSCALL_BIT exposed as it is today) But I'd love to hear some thoughts on how that is a bad thing. If no one is using the x32 seccomp abi, lets fix it. If someone is, lets see what the fallout from fixing it will be. -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 18:31 ` Eric Paris @ 2014-07-11 19:36 ` Paul Moore 2014-07-11 22:48 ` Kees Cook 0 siblings, 1 reply; 22+ messages in thread From: Paul Moore @ 2014-07-11 19:36 UTC (permalink / raw) To: Eric Paris Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel, Al Viro, Will Drewry, Kees Cook On Friday, July 11, 2014 02:31:06 PM Eric Paris wrote: > On Fri, 2014-07-11 at 12:32 -0400, Paul Moore wrote: > > On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote: ... > > > So we have a security interface that is damn near impossible to get > > > right. Perfect. > > > > What? Having to do two comparisons instead of one is "damn near > > impossible"? I think that might be a bit of an overreaction don't you > > think? > > Actually no. How can a normal userspace application coder POSSIBLY know > this? Find this thread on an e-mail list, by accident? I suppose some clarification of perspective is in order. Personally, I don't think a "normal" (which brings up the age old question: what is normal anyway?) userspace developer is going to write their own seccomp BPF filter, there is just too much of a barrier to entry and it is way too easy to get it wrong. This is why I started the libseccomp project. However, I think there *may* be a solution to satisfy us both, see below. > > > I think this explains exactly why I support this idea. Make X32 look > > > like everyone else ... > > > > You do realize that this patch set makes x32 the odd man out by having > > syscall_get_nr() return a different syscall number than what was used to > > make the syscall? I don't understand how that makes "x32 look like > > everyone else". > > Ok, I buy the __X32_SYSCALL_BIT argument. It can be dealt with in > audit. No problem. We don't need to strip it in syscall_get_nr(). > I'll gladly concede that part of the patch series. > > But given an x86_64 kernel a seccomp filter writer has to know about X32 > and how to write rules to block the X32 ABI. And I stick with my > assessment that x32 + seccomp is darn near impossible for a normal > developer to handle. No argument here; like I said, I think seccomp BPF filters are too much for most folks regardless of the arch. > Heck, even chromium took months to realize that x32 was a weird beast. > And they got it wrong on their first try. Their original implementation > didn't handle __X32_SYSCALL_BIT quite right. Looking at their code I'm > still not sure it does the right thing. And they are the EXPERTS. They > wrote seccomp! I think you're helping prove my point. Or I'm providing yours. Or something. However, my point about breaking userspace still stands. Assuming the kernel interface is functional, regardless of warts, I stand fast in that anything that breaks the kernel/userspace interface is bad. > > > Honestly, how many people are using seccomp on X32 and would be horribly > > > pissed if we just fixed it? > > > > Okay, please stop suggesting we break the x32 kernel/user interface to > > workaround a flaw in audit. I get that it sucks for audit, I really do, > > but this is audit's problem. > > No one is asking to break X32 to fix audit. Audit can handle itself. I > don't want anything in the kernel to pretend that X32 is X86_64. It > isn't. It has its own syscall table. Its own syscalls. Its own ABI. > I'm suggesting to fix how seccomp exposes X32 information because it is > a HORRIBLE interface that even the experts have gotten wrong, over and > over and over. See my comments above. > I suggest we accept it as breakage and just return AUDIT_ARCH_X32. > (Leaving the _X32_SYSCALL_BIT exposed as it is today) > > But I'd love to hear some thoughts on how that is a bad thing. If no > one is using the x32 seccomp abi, lets fix it. If someone is, lets see > what the fallout from fixing it will be. This would break the seccomp filter API and any application that uses it properly. I'm not excited about the idea of "let's just break the interface and see who notices philosophy"; that's a very bad practice IMHO. Anyway, getting back to the idea I mentioned earlier ... as many of you may know, Kees (added to the CC line) is working on some seccomp filter improvements which will result in a new seccomp syscall. Perhaps one way forward is to preserve everything as it is currently with the prctl() interface, but with the new seccomp() based interface we fixup x32 and use the new AUDIT_ARCH_X32 token? It might result in a bit of ugliness in some of the kernel, but I don't think it would be too bad, and I think it would address both our concerns. Thoughts? -- paul moore security and virtualization @ redhat ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 19:36 ` Paul Moore @ 2014-07-11 22:48 ` Kees Cook 2014-07-11 22:52 ` Kees Cook 2014-07-11 23:12 ` Andy Lutomirski 0 siblings, 2 replies; 22+ messages in thread From: Kees Cook @ 2014-07-11 22:48 UTC (permalink / raw) To: Paul Moore Cc: Eric Paris, H. Peter Anvin, Richard Guy Briggs, linux-audit, LKML, Al Viro, Will Drewry, Andy Lutomirski, Chris Evans, Serge Hallyn, Tyler Hicks, stgraber On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore <pmoore@redhat.com> wrote: > Anyway, getting back to the idea I mentioned earlier ... as many of you may > know, Kees (added to the CC line) is working on some seccomp filter > improvements which will result in a new seccomp syscall. Perhaps one way > forward is to preserve everything as it is currently with the prctl() > interface, but with the new seccomp() based interface we fixup x32 and use the > new AUDIT_ARCH_X32 token? It might result in a bit of ugliness in some of the > kernel, but I don't think it would be too bad, and I think it would address > both our concerns. Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* Both x32 and x86_64 are considered "64-bit". */" should be changed...) Just so I understand: currently x86_64 and x32 both present as AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range (due to the set high bit). The seccomp used in Chrome, Chrome OS, and vsftpd should all only do whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32 without setting __X32_SYSCALL_BIT would be totally fine (it would catch the arch instead of the syscall). This sounds similar to how libseccomp is doing things, so these should be fine. The only project I know of doing blacklisting is lxc, and Eric's example looks a lot like a discussion I saw with lxc and init_module. :) So it sounds like we can get this right there. I'd like to avoid carrying a delta on filter logic based on the prctl vs syscall entry. Can we find any userspace filters being used that a "correct" fix would break? (If so, then yes, we'll need to do this proposed "via prctl or via syscall?" change.) -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 22:48 ` Kees Cook @ 2014-07-11 22:52 ` Kees Cook 2014-07-11 22:55 ` H. Peter Anvin 2014-07-11 23:12 ` Andy Lutomirski 1 sibling, 1 reply; 22+ messages in thread From: Kees Cook @ 2014-07-11 22:52 UTC (permalink / raw) To: Paul Moore Cc: Eric Paris, H. Peter Anvin, Richard Guy Briggs, linux-audit, LKML, Al Viro, Will Drewry, Andy Lutomirski, Chris Evans, Serge Hallyn, Tyler Hicks, stgraber On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook <keescook@chromium.org> wrote: > On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore <pmoore@redhat.com> wrote: >> Anyway, getting back to the idea I mentioned earlier ... as many of you may >> know, Kees (added to the CC line) is working on some seccomp filter >> improvements which will result in a new seccomp syscall. Perhaps one way >> forward is to preserve everything as it is currently with the prctl() >> interface, but with the new seccomp() based interface we fixup x32 and use the >> new AUDIT_ARCH_X32 token? It might result in a bit of ugliness in some of the >> kernel, but I don't think it would be too bad, and I think it would address >> both our concerns. > > Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* Both > x32 and x86_64 are considered "64-bit". */" should be changed...) > > Just so I understand: currently x86_64 and x32 both present as > AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range > (due to the set high bit). > > The seccomp used in Chrome, Chrome OS, and vsftpd should all only do > whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32 > without setting __X32_SYSCALL_BIT would be totally fine (it would > catch the arch instead of the syscall). This sounds similar to how > libseccomp is doing things, so these should be fine. I should clarify: seccomp expects to find whatever is sent as the syscall nr... as in the __NR_read used like this: BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), Are there native x32 users yet? What does __NR_read resolve to via the uapi on a native x32 userspace? -Kees > The only project I know of doing blacklisting is lxc, and Eric's > example looks a lot like a discussion I saw with lxc and init_module. > :) So it sounds like we can get this right there. > > I'd like to avoid carrying a delta on filter logic based on the prctl > vs syscall entry. Can we find any userspace filters being used that a > "correct" fix would break? (If so, then yes, we'll need to do this > proposed "via prctl or via syscall?" change.) > > -Kees > > -- > Kees Cook > Chrome OS Security -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 22:52 ` Kees Cook @ 2014-07-11 22:55 ` H. Peter Anvin 2014-07-11 23:02 ` Kees Cook 0 siblings, 1 reply; 22+ messages in thread From: H. Peter Anvin @ 2014-07-11 22:55 UTC (permalink / raw) To: Kees Cook, Paul Moore Cc: Eric Paris, Richard Guy Briggs, linux-audit, LKML, Al Viro, Will Drewry, Andy Lutomirski, Chris Evans, Serge Hallyn, Tyler Hicks, stgraber It includes the X32 bit. On July 11, 2014 3:52:42 PM PDT, Kees Cook <keescook@chromium.org> wrote: >On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook <keescook@chromium.org> >wrote: >> On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore <pmoore@redhat.com> >wrote: >>> Anyway, getting back to the idea I mentioned earlier ... as many of >you may >>> know, Kees (added to the CC line) is working on some seccomp filter >>> improvements which will result in a new seccomp syscall. Perhaps >one way >>> forward is to preserve everything as it is currently with the >prctl() >>> interface, but with the new seccomp() based interface we fixup x32 >and use the >>> new AUDIT_ARCH_X32 token? It might result in a bit of ugliness in >some of the >>> kernel, but I don't think it would be too bad, and I think it would >address >>> both our concerns. >> >> Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* >Both >> x32 and x86_64 are considered "64-bit". */" should be changed...) >> >> Just so I understand: currently x86_64 and x32 both present as >> AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range >> (due to the set high bit). >> >> The seccomp used in Chrome, Chrome OS, and vsftpd should all only do >> whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32 >> without setting __X32_SYSCALL_BIT would be totally fine (it would >> catch the arch instead of the syscall). This sounds similar to how >> libseccomp is doing things, so these should be fine. > >I should clarify: seccomp expects to find whatever is sent as the >syscall nr... as in the __NR_read used like this: > > BPF_STMT(BPF_LD+BPF_W+BPF_ABS, > offsetof(struct seccomp_data, nr)), > BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1), > BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL), > BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), > >Are there native x32 users yet? What does __NR_read resolve to via the >uapi on a native x32 userspace? > >-Kees > >> The only project I know of doing blacklisting is lxc, and Eric's >> example looks a lot like a discussion I saw with lxc and init_module. >> :) So it sounds like we can get this right there. >> >> I'd like to avoid carrying a delta on filter logic based on the prctl >> vs syscall entry. Can we find any userspace filters being used that a >> "correct" fix would break? (If so, then yes, we'll need to do this >> proposed "via prctl or via syscall?" change.) >> >> -Kees >> >> -- >> Kees Cook >> Chrome OS Security -- Sent from my mobile phone. Please pardon brevity and lack of formatting. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 22:55 ` H. Peter Anvin @ 2014-07-11 23:02 ` Kees Cook 0 siblings, 0 replies; 22+ messages in thread From: Kees Cook @ 2014-07-11 23:02 UTC (permalink / raw) To: H. Peter Anvin Cc: Paul Moore, Eric Paris, Richard Guy Briggs, linux-audit, LKML, Al Viro, Will Drewry, Andy Lutomirski, Chris Evans, Serge Hallyn, Tyler Hicks, stgraber On Fri, Jul 11, 2014 at 3:55 PM, H. Peter Anvin <hpa@zytor.com> wrote: > It includes the X32 bit. If the uapi for __NR_* includes the x32 bit, then that's what seccomp filters must be seeing. Building seccomp filters is documented to use the __NR_* values. -Kees > > On July 11, 2014 3:52:42 PM PDT, Kees Cook <keescook@chromium.org> wrote: >>On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook <keescook@chromium.org> >>wrote: >>> On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore <pmoore@redhat.com> >>wrote: >>>> Anyway, getting back to the idea I mentioned earlier ... as many of >>you may >>>> know, Kees (added to the CC line) is working on some seccomp filter >>>> improvements which will result in a new seccomp syscall. Perhaps >>one way >>>> forward is to preserve everything as it is currently with the >>prctl() >>>> interface, but with the new seccomp() based interface we fixup x32 >>and use the >>>> new AUDIT_ARCH_X32 token? It might result in a bit of ugliness in >>some of the >>>> kernel, but I don't think it would be too bad, and I think it would >>address >>>> both our concerns. >>> >>> Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* >>Both >>> x32 and x86_64 are considered "64-bit". */" should be changed...) >>> >>> Just so I understand: currently x86_64 and x32 both present as >>> AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range >>> (due to the set high bit). >>> >>> The seccomp used in Chrome, Chrome OS, and vsftpd should all only do >>> whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32 >>> without setting __X32_SYSCALL_BIT would be totally fine (it would >>> catch the arch instead of the syscall). This sounds similar to how >>> libseccomp is doing things, so these should be fine. >> >>I should clarify: seccomp expects to find whatever is sent as the >>syscall nr... as in the __NR_read used like this: >> >> BPF_STMT(BPF_LD+BPF_W+BPF_ABS, >> offsetof(struct seccomp_data, nr)), >> BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1), >> BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL), >> BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), >> >>Are there native x32 users yet? What does __NR_read resolve to via the >>uapi on a native x32 userspace? >> >>-Kees >> >>> The only project I know of doing blacklisting is lxc, and Eric's >>> example looks a lot like a discussion I saw with lxc and init_module. >>> :) So it sounds like we can get this right there. >>> >>> I'd like to avoid carrying a delta on filter logic based on the prctl >>> vs syscall entry. Can we find any userspace filters being used that a >>> "correct" fix would break? (If so, then yes, we'll need to do this >>> proposed "via prctl or via syscall?" change.) >>> >>> -Kees >>> >>> -- >>> Kees Cook >>> Chrome OS Security > > -- > Sent from my mobile phone. Please pardon brevity and lack of formatting. -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 22:48 ` Kees Cook 2014-07-11 22:52 ` Kees Cook @ 2014-07-11 23:12 ` Andy Lutomirski 1 sibling, 0 replies; 22+ messages in thread From: Andy Lutomirski @ 2014-07-11 23:12 UTC (permalink / raw) To: Kees Cook Cc: Stephane Graber, Paul Moore, Chris Evans, Tyler Hicks, Serge Hallyn, LKML, H. Peter Anvin, linux-audit, Will Drewry, Richard Guy Briggs, Eric Paris, Al Viro On Jul 11, 2014 3:48 PM, "Kees Cook" <keescook@chromium.org> wrote: > > On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore <pmoore@redhat.com> wrote: > > Anyway, getting back to the idea I mentioned earlier ... as many of you may > > know, Kees (added to the CC line) is working on some seccomp filter > > improvements which will result in a new seccomp syscall. Perhaps one way > > forward is to preserve everything as it is currently with the prctl() > > interface, but with the new seccomp() based interface we fixup x32 and use the > > new AUDIT_ARCH_X32 token? It might result in a bit of ugliness in some of the > > kernel, but I don't think it would be too bad, and I think it would address > > both our concerns. > > Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* Both > x32 and x86_64 are considered "64-bit". */" should be changed...) > I admit I'm not convinced that the current situation is really wrong. The "audit arch" uniquely defines a mapping between the actual syscall and nr and the regs. The natural way to convert between seccomp_data and the syscall entry and regs is completely correct. For things like ARM OABI, the situation is different: OABI and EABI really do work differently wrt the interpretation of the syscall regs. This isn't the case for x32. Of course, the audit code screws this up completely, but audit is disabled for x32. My upcoming seccomp patchset will clean it up a little. IMO there's no actual audit ABI to preserve for x32, because that code has never been anything other than terminally fvcked. --Andy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 3:38 ` [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter Richard Guy Briggs 2014-07-11 4:06 ` H. Peter Anvin @ 2014-07-11 16:36 ` Paul Moore 2014-07-11 16:44 ` H. Peter Anvin 1 sibling, 1 reply; 22+ messages in thread From: Paul Moore @ 2014-07-11 16:36 UTC (permalink / raw) To: Richard Guy Briggs Cc: linux-audit, linux-kernel, Eric Paris, Al Viro, Will Drewry, H. Peter Anvin On Thursday, July 10, 2014 11:38:13 PM Richard Guy Briggs wrote: > Commit > fca460f hpa@zytor.com 2012-02-19 07:56:26 -0800 > x32: Handle the x32 system call flag > > provided a method to multiplex architecture with the syscall number for X32 > calls. > > Commit > 8b4b9f2 pmoore@redhat.com 2013-02-15 12:21:43 -0500 > x86: remove the x32 syscall bitmask from syscall_get_nr() > > broke audit and potentially other users of syscall_get_nr() which depend on > that call as named. Arguably audit is broken anyway by not correctly treating syscall numbers as 32 bit integers like everyone else. The commit above, 8b4b9f2, changed syscall_get_nr() so that it returned the same syscall number that is used by the architecture's ABI; just like every* other architecture in the kernel. * Admittedly I didn't check every architecture's implementation, but after a half dozen I stopped checking as there was a definite trend. {snip} > diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h > index d6a756a..d58b6be 100644 > --- a/arch/x86/include/asm/syscall.h > +++ b/arch/x86/include/asm/syscall.h > @@ -236,6 +236,10 @@ static inline int syscall_get_arch(void) > return AUDIT_ARCH_I386; > #endif > /* Both x32 and x86_64 are considered "64-bit". */ > +#ifdef CONFIG_X86_X32_ABI > + if (task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT) > + return AUDIT_ARCH_X86_X32; > +#endif No. See my comments above and in other parts of this thread. > return AUDIT_ARCH_X86_64; > } > #endif /* CONFIG_X86_32 */ > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index b35c215..bc18214 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -73,6 +73,12 @@ static void populate_seccomp_data(struct seccomp_data > *sd) > > sd->nr = syscall_get_nr(task, regs); > sd->arch = syscall_get_arch(); > +#ifdef CONFIG_X86_X32_ABI > + if (sd->arch == AUDIT_ARCH_X86_X32) { > + sd->arch = AUDIT_ARCH_X86_64; > + sd->nr |= __X32_SYSCALL_BIT; > + } > +#endif Once again, I'm not really sure I need to comment further here, but don't change syscall_get_nr(), it should return the same syscall number as was used by userspace to initiate the syscall. If you really want to use the new AUDIT_ARCH_X86_X32 macro/define, go ahead, but make sure you rewrite it to the x86-64 value here so as to not break compatibility with existing seccomp filter users. -- paul moore security and virtualization @ redhat ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter 2014-07-11 16:36 ` Paul Moore @ 2014-07-11 16:44 ` H. Peter Anvin 0 siblings, 0 replies; 22+ messages in thread From: H. Peter Anvin @ 2014-07-11 16:44 UTC (permalink / raw) To: Paul Moore, Richard Guy Briggs Cc: linux-audit, linux-kernel, Eric Paris, Al Viro, Will Drewry On 07/11/2014 09:36 AM, Paul Moore wrote: > > Arguably audit is broken anyway by not correctly treating syscall numbers as > 32 bit integers like everyone else. > That is really the root cause of the problem. x86 is not the only architecture with a sparse syscall numbering scheme (in fact the x32 method was based on the MIPS syscall numbering scheme.) What syscall_get_nr() returns becomes a matter of definition at that point. -hpa ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 3/3] [RFC] Revert "x86: remove the x32 syscall bitmask from syscall_get_nr()" 2014-07-11 3:38 [PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF Richard Guy Briggs 2014-07-11 3:38 ` [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition Richard Guy Briggs 2014-07-11 3:38 ` [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter Richard Guy Briggs @ 2014-07-11 3:38 ` Richard Guy Briggs 2 siblings, 0 replies; 22+ messages in thread From: Richard Guy Briggs @ 2014-07-11 3:38 UTC (permalink / raw) To: linux-audit, linux-kernel Cc: Richard Guy Briggs, Paul Moore, Eric Paris, Al Viro, Will Drewry, H. Peter Anvin This reverts commit 8b4b9f27e57584f3d90e0bb84cf800ad81cfe3a1. which broke audit and potentially other users of syscall_get_nr() which depend on that call as named without being overloaded by architecture bits. This patch along with seccomp: give BPF x32 bit when restoring x32 filter will satisfy other regular users of syscall_get_nr() and syscall_get_arch() without changing the seccomp interface to BPF. Cc: Paul Moore <pmoore@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <aviro@redhat.com> Cc: Will Drewry <wad@chromium.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Link: http://lkml.kernel.org/r/cover.1405023592.git.rgb@redhat.com --- arch/x86/include/asm/syscall.h | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h index d58b6be..8c1bb2b 100644 --- a/arch/x86/include/asm/syscall.h +++ b/arch/x86/include/asm/syscall.h @@ -30,13 +30,13 @@ extern const sys_call_ptr_t sys_call_table[]; */ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs) { - return regs->orig_ax; + return regs->orig_ax & __SYSCALL_MASK; } static inline void syscall_rollback(struct task_struct *task, struct pt_regs *regs) { - regs->ax = regs->orig_ax; + regs->ax = regs->orig_ax & __SYSCALL_MASK; } static inline long syscall_get_error(struct task_struct *task, -- 1.7.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
end of thread, other threads:[~2014-07-11 23:12 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-11 3:38 [PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF Richard Guy Briggs 2014-07-11 3:38 ` [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition Richard Guy Briggs 2014-07-11 16:15 ` Paul Moore 2014-07-11 3:38 ` [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter Richard Guy Briggs 2014-07-11 4:06 ` H. Peter Anvin 2014-07-11 16:11 ` Paul Moore 2014-07-11 16:13 ` H. Peter Anvin 2014-07-11 16:16 ` Eric Paris 2014-07-11 16:21 ` Paul Moore 2014-07-11 16:23 ` Eric Paris 2014-07-11 16:30 ` H. Peter Anvin 2014-07-11 16:32 ` Paul Moore 2014-07-11 18:31 ` Eric Paris 2014-07-11 19:36 ` Paul Moore 2014-07-11 22:48 ` Kees Cook 2014-07-11 22:52 ` Kees Cook 2014-07-11 22:55 ` H. Peter Anvin 2014-07-11 23:02 ` Kees Cook 2014-07-11 23:12 ` Andy Lutomirski 2014-07-11 16:36 ` Paul Moore 2014-07-11 16:44 ` H. Peter Anvin 2014-07-11 3:38 ` [PATCH 3/3] [RFC] Revert "x86: remove the x32 syscall bitmask from syscall_get_nr()" Richard Guy Briggs
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).