[PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF

linux-audit.redhat.com archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF
@ 2014-07-11  3:38 Richard Guy Briggs
  2014-07-11  3:38 ` [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition Richard Guy Briggs
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Richard Guy Briggs @ 2014-07-11  3:38 UTC (permalink / raw)
  To: linux-audit, linux-kernel
  Cc: Will Drewry, Richard Guy Briggs, Paul Moore, H. Peter Anvin

This set reverts commit 8b4b9f2 which broke audit and potentially other users
of syscall_get_nr() which depend on that call as named without being overloaded
by architecture bits.  It will satisfy other regular users of syscall_get_nr()
and syscall_get_arch() without changing the seccomp interface to BPF.

A new ARCH definition, AUDIT_ARCH_X86_X32, was added for syscall_get_arch().

Cc: Paul Moore <pmoore@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <aviro@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Link: http://lkml.kernel.org/r/cover.1405023592.git.rgb@redhat.com

Richard Guy Briggs (3):
  audit: add AUDIT_ARCH_X86_X32 arch definition
  seccomp: give BPF x32 bit when restoring x32 filter
  Revert "x86: remove the x32 syscall bitmask from syscall_get_nr()"

 arch/x86/include/asm/syscall.h |    8 ++++++--
 include/uapi/linux/audit.h     |    1 +
 kernel/seccomp.c               |    6 ++++++
 3 files changed, 13 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition
  2014-07-11  3:38 [PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF Richard Guy Briggs
@ 2014-07-11  3:38 ` Richard Guy Briggs
  2014-07-11 16:15   ` Paul Moore
  2014-07-11  3:38 ` [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter Richard Guy Briggs
  2014-07-11  3:38 ` [PATCH 3/3] [RFC] Revert "x86: remove the x32 syscall bitmask from syscall_get_nr()" Richard Guy Briggs
  2 siblings, 1 reply; 22+ messages in thread
From: Richard Guy Briggs @ 2014-07-11  3:38 UTC (permalink / raw)
  To: linux-audit, linux-kernel
  Cc: Richard Guy Briggs, Paul Moore, Eric Paris, Al Viro, Will Drewry,
	H. Peter Anvin

Add a definition for 32-bit native system calls under 64-bit x86 architectures.
This is distict from 32-bit emulation under 64-bit x86 architectures.

Cc: Paul Moore <pmoore@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <aviro@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
 include/uapi/linux/audit.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index e15d6fc..4f5607f 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -374,6 +374,7 @@ enum {
 #define AUDIT_ARCH_SPARC	(EM_SPARC)
 #define AUDIT_ARCH_SPARC64	(EM_SPARCV9|__AUDIT_ARCH_64BIT)
 #define AUDIT_ARCH_X86_64	(EM_X86_64|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE)
+#define AUDIT_ARCH_X86_X32	(EM_X86_64|__AUDIT_ARCH_LE)
 
 #define AUDIT_PERM_EXEC		1
 #define AUDIT_PERM_WRITE	2
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition
  2014-07-11  3:38 ` [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition Richard Guy Briggs
@ 2014-07-11 16:15   ` Paul Moore
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Moore @ 2014-07-11 16:15 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-audit, linux-kernel, Eric Paris, Al Viro, Will Drewry,
	H. Peter Anvin

On Thursday, July 10, 2014 11:38:12 PM Richard Guy Briggs wrote:
> Add a definition for 32-bit native system calls under 64-bit x86
> architectures. This is distict from 32-bit emulation under 64-bit x86
> architectures.
> 
> Cc: Paul Moore <pmoore@redhat.com>
> Cc: Eric Paris <eparis@redhat.com>
> Cc: Al Viro <aviro@redhat.com>
> Cc: Will Drewry <wad@chromium.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/uapi/linux/audit.h |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index e15d6fc..4f5607f 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -374,6 +374,7 @@ enum {
>  #define AUDIT_ARCH_SPARC	(EM_SPARC)
>  #define AUDIT_ARCH_SPARC64	(EM_SPARCV9|__AUDIT_ARCH_64BIT)
>  #define AUDIT_ARCH_X86_64	(EM_X86_64|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE)
> +#define AUDIT_ARCH_X86_X32	(EM_X86_64|__AUDIT_ARCH_LE)
> 
>  #define AUDIT_PERM_EXEC		1
>  #define AUDIT_PERM_WRITE	2

While I'm opposed to the other patches in this series (comments to follow), I 
think this is a worthwhile addition and arguably should have been done when 
x32 was merged.

That said, this change should probably be included in whatever patch first 
makes use of this new value as this patch does nothing by itself.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11  3:38 [PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF Richard Guy Briggs
  2014-07-11  3:38 ` [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition Richard Guy Briggs
@ 2014-07-11  3:38 ` Richard Guy Briggs
  2014-07-11  4:06   ` H. Peter Anvin
  2014-07-11 16:36   ` Paul Moore
  2014-07-11  3:38 ` [PATCH 3/3] [RFC] Revert "x86: remove the x32 syscall bitmask from syscall_get_nr()" Richard Guy Briggs
  2 siblings, 2 replies; 22+ messages in thread
From: Richard Guy Briggs @ 2014-07-11  3:38 UTC (permalink / raw)
  To: linux-audit, linux-kernel
  Cc: Will Drewry, Richard Guy Briggs, Paul Moore, H. Peter Anvin

Commit
	fca460f hpa@zytor.com 2012-02-19 07:56:26 -0800
	x32: Handle the x32 system call flag

provided a method to multiplex architecture with the syscall number for X32
calls.

Commit
	8b4b9f2 pmoore@redhat.com 2013-02-15 12:21:43 -0500
	x86: remove the x32 syscall bitmask from syscall_get_nr()

broke audit and potentially other users of syscall_get_nr() which depend on
that call as named.

Commit
	audit: add AUDIT_ARCH_X86_X32 arch definition

is required to provide the new ARCH definition AUDIT_ARCH_X86_X32 for
syscall_get_arch().

This patch along with reverting 8b4b9f2 should satisfy other regular users of
syscall_get_nr() without changing the seccomp interface to BPF.

Cc: Paul Moore <pmoore@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <aviro@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Link: http://lkml.kernel.org/r/cover.1405023592.git.rgb@redhat.com
---
 arch/x86/include/asm/syscall.h |    4 ++++
 kernel/seccomp.c               |    6 ++++++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index d6a756a..d58b6be 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -236,6 +236,10 @@ static inline int syscall_get_arch(void)
 		return AUDIT_ARCH_I386;
 #endif
 	/* Both x32 and x86_64 are considered "64-bit". */
+#ifdef CONFIG_X86_X32_ABI
+	if (task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT)
+		return AUDIT_ARCH_X86_X32;
+#endif
 	return AUDIT_ARCH_X86_64;
 }
 #endif	/* CONFIG_X86_32 */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index b35c215..bc18214 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -73,6 +73,12 @@ static void populate_seccomp_data(struct seccomp_data *sd)
 
 	sd->nr = syscall_get_nr(task, regs);
 	sd->arch = syscall_get_arch();
+#ifdef CONFIG_X86_X32_ABI
+	if (sd->arch == AUDIT_ARCH_X86_X32) {
+		sd->arch = AUDIT_ARCH_X86_64;
+		sd->nr |= __X32_SYSCALL_BIT;
+	}
+#endif
 	syscall_get_arguments(task, regs, 0, 6, args);
 	sd->args[0] = args[0];
 	sd->args[1] = args[1];
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11  3:38 ` [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter Richard Guy Briggs
@ 2014-07-11  4:06   ` H. Peter Anvin
  2014-07-11 16:11     ` Paul Moore
  2014-07-11 16:36   ` Paul Moore
  1 sibling, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2014-07-11  4:06 UTC (permalink / raw)
  To: Richard Guy Briggs, linux-audit, linux-kernel
  Cc: Paul Moore, Eric Paris, Al Viro, Will Drewry

On 07/10/2014 08:38 PM, Richard Guy Briggs wrote:
> Commit
> 	fca460f hpa@zytor.com 2012-02-19 07:56:26 -0800
> 	x32: Handle the x32 system call flag
> 
> provided a method to multiplex architecture with the syscall number for X32
> calls.
> 
> Commit
> 	8b4b9f2 pmoore@redhat.com 2013-02-15 12:21:43 -0500
> 	x86: remove the x32 syscall bitmask from syscall_get_nr()
> 
> broke audit and potentially other users of syscall_get_nr() which depend on
> that call as named.
> 
> Commit
> 	audit: add AUDIT_ARCH_X86_X32 arch definition
> 
> is required to provide the new ARCH definition AUDIT_ARCH_X86_X32 for
> syscall_get_arch().
> 
> This patch along with reverting 8b4b9f2 should satisfy other regular users of
> syscall_get_nr() without changing the seccomp interface to BPF.
> 

Incidentally: do seccomp users know that on an x86-64 system you can
recevie system calls from any of the x86 architectures, regardless of
how the program is invoked?  (This is unusual, so normally denying those
"alien" calls is the right thing to do.)

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11  4:06   ` H. Peter Anvin
@ 2014-07-11 16:11     ` Paul Moore
  2014-07-11 16:13       ` H. Peter Anvin
  2014-07-11 16:16       ` Eric Paris
  0 siblings, 2 replies; 22+ messages in thread
From: Paul Moore @ 2014-07-11 16:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Richard Guy Briggs, linux-audit, linux-kernel, Eric Paris,
	Al Viro, Will Drewry

On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> Incidentally: do seccomp users know that on an x86-64 system you can
> recevie system calls from any of the x86 architectures, regardless of
> how the program is invoked?  (This is unusual, so normally denying those
> "alien" calls is the right thing to do.)

I obviously can't speak for all seccomp users, but libseccomp handles this by 
checking the seccomp_data->arch value at the start of the filter and killing 
(by default) any non-native architectures.  If you want, you can change this 
default behavior or add support for other architectures (e.g. create a filter 
that allows both x86-64 and x32 but disallows x86, or any combination of the 
three for that matter).

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 16:11     ` Paul Moore
@ 2014-07-11 16:13       ` H. Peter Anvin
  2014-07-11 16:16       ` Eric Paris
  1 sibling, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2014-07-11 16:13 UTC (permalink / raw)
  To: Paul Moore
  Cc: Richard Guy Briggs, linux-audit, linux-kernel, Eric Paris,
	Al Viro, Will Drewry

On 07/11/2014 09:11 AM, Paul Moore wrote:
> On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
>> Incidentally: do seccomp users know that on an x86-64 system you can
>> recevie system calls from any of the x86 architectures, regardless of
>> how the program is invoked?  (This is unusual, so normally denying those
>> "alien" calls is the right thing to do.)
> 
> I obviously can't speak for all seccomp users, but libseccomp handles this by 
> checking the seccomp_data->arch value at the start of the filter and killing 
> (by default) any non-native architectures.  If you want, you can change this 
> default behavior or add support for other architectures (e.g. create a filter 
> that allows both x86-64 and x32 but disallows x86, or any combination of the 
> three for that matter).
> 

OK, that seems reasonable.

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 16:11     ` Paul Moore
  2014-07-11 16:13       ` H. Peter Anvin
@ 2014-07-11 16:16       ` Eric Paris
  2014-07-11 16:21         ` Paul Moore
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Paris @ 2014-07-11 16:16 UTC (permalink / raw)
  To: Paul Moore
  Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel,
	Al Viro, Will Drewry

On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > Incidentally: do seccomp users know that on an x86-64 system you can
> > recevie system calls from any of the x86 architectures, regardless of
> > how the program is invoked?  (This is unusual, so normally denying those
> > "alien" calls is the right thing to do.)
> 
> I obviously can't speak for all seccomp users, but libseccomp handles this by 
> checking the seccomp_data->arch value at the start of the filter and killing 
> (by default) any non-native architectures.  If you want, you can change this 
> default behavior or add support for other architectures (e.g. create a filter 
> that allows both x86-64 and x32 but disallows x86, or any combination of the 
> three for that matter).

Maybe libseccomp does some HORRIFIC contortions under the hood, but the
interface is crap...  Since seccomp_data->arch can't distinguish between
X32 and X86_64.  If I write a seccomp filter which says

KILL arch != x86_64
KILL init_module
ALLOW everything else

I can still call init_module, I just have to use the X32 variant.

If libseccomp is translating:

KILL arch != x86_64 into:

KILL arch != x86_64
KILL syscall_nr >= 2000

That's just showing how dumb the kernel interface is...   Good for you
guys, but the kernel is just being dumb   :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 16:16       ` Eric Paris
@ 2014-07-11 16:21         ` Paul Moore
  2014-07-11 16:23           ` Eric Paris
  0 siblings, 1 reply; 22+ messages in thread
From: Paul Moore @ 2014-07-11 16:21 UTC (permalink / raw)
  To: Eric Paris
  Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel,
	Al Viro, Will Drewry

On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
> On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > > Incidentally: do seccomp users know that on an x86-64 system you can
> > > recevie system calls from any of the x86 architectures, regardless of
> > > how the program is invoked?  (This is unusual, so normally denying those
> > > "alien" calls is the right thing to do.)
> > 
> > I obviously can't speak for all seccomp users, but libseccomp handles this
> > by checking the seccomp_data->arch value at the start of the filter and
> > killing (by default) any non-native architectures.  If you want, you can
> > change this default behavior or add support for other architectures (e.g.
> > create a filter that allows both x86-64 and x32 but disallows x86, or any
> > combination of the three for that matter).
> 
> Maybe libseccomp does some HORRIFIC contortions under the hood, but the
> interface is crap...  Since seccomp_data->arch can't distinguish between
> X32 and X86_64.  If I write a seccomp filter which says
> 
> KILL arch != x86_64
> KILL init_module
> ALLOW everything else
> 
> I can still call init_module, I just have to use the X32 variant.
> 
> If libseccomp is translating:
> 
> KILL arch != x86_64 into:
> 
> KILL arch != x86_64
> KILL syscall_nr >= 2000
> 
> That's just showing how dumb the kernel interface is...   Good for you
> guys, but the kernel is just being dumb   :)

You're not going to hear me ever say that I like how the x32 ABI was done, it 
is a real mess from a seccomp filter point of view and we have to do some 
nasty stuff in libseccomp to make it all work correctly (see my comments on 
the libseccomp-devel list regarding my severe displeasure over x32), but 
what's done is done.

I think it's too late to change the x32 seccomp filter ABI.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 16:21         ` Paul Moore
@ 2014-07-11 16:23           ` Eric Paris
  2014-07-11 16:30             ` H. Peter Anvin
  2014-07-11 16:32             ` Paul Moore
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Paris @ 2014-07-11 16:23 UTC (permalink / raw)
  To: Paul Moore
  Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel,
	Al Viro, Will Drewry

On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
> On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
> > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > > > Incidentally: do seccomp users know that on an x86-64 system you can
> > > > recevie system calls from any of the x86 architectures, regardless of
> > > > how the program is invoked?  (This is unusual, so normally denying those
> > > > "alien" calls is the right thing to do.)
> > > 
> > > I obviously can't speak for all seccomp users, but libseccomp handles this
> > > by checking the seccomp_data->arch value at the start of the filter and
> > > killing (by default) any non-native architectures.  If you want, you can
> > > change this default behavior or add support for other architectures (e.g.
> > > create a filter that allows both x86-64 and x32 but disallows x86, or any
> > > combination of the three for that matter).
> > 
> > Maybe libseccomp does some HORRIFIC contortions under the hood, but the
> > interface is crap...  Since seccomp_data->arch can't distinguish between
> > X32 and X86_64.  If I write a seccomp filter which says
> > 
> > KILL arch != x86_64
> > KILL init_module
> > ALLOW everything else
> > 
> > I can still call init_module, I just have to use the X32 variant.
> > 
> > If libseccomp is translating:
> > 
> > KILL arch != x86_64 into:
> > 
> > KILL arch != x86_64
> > KILL syscall_nr >= 2000
> > 
> > That's just showing how dumb the kernel interface is...   Good for you
> > guys, but the kernel is just being dumb   :)
> 
> You're not going to hear me ever say that I like how the x32 ABI was done, it 
> is a real mess from a seccomp filter point of view and we have to do some 
> nasty stuff in libseccomp to make it all work correctly (see my comments on 
> the libseccomp-devel list regarding my severe displeasure over x32), but 
> what's done is done.
> 
> I think it's too late to change the x32 seccomp filter ABI.

So we have a security interface that is damn near impossible to get
right.  Perfect.

I think this explains exactly why I support this idea.  Make X32 look
like everyone else and put these custom horrific hacks in seccomp if we
are unwilling to 'do it right'

Honestly, how many people are using seccomp on X32 and would be horribly
pissed if we just fixed it?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 16:23           ` Eric Paris
@ 2014-07-11 16:30             ` H. Peter Anvin
  2014-07-11 16:32             ` Paul Moore
  1 sibling, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2014-07-11 16:30 UTC (permalink / raw)
  To: Eric Paris, Paul Moore
  Cc: Richard Guy Briggs, linux-audit, linux-kernel, Al Viro,
	Will Drewry

On 07/11/2014 09:23 AM, Eric Paris wrote:
>>
>> You're not going to hear me ever say that I like how the x32 ABI was done, it 
>> is a real mess from a seccomp filter point of view and we have to do some 
>> nasty stuff in libseccomp to make it all work correctly (see my comments on 
>> the libseccomp-devel list regarding my severe displeasure over x32), but 
>> what's done is done.
>>
>> I think it's too late to change the x32 seccomp filter ABI.
> 
> So we have a security interface that is damn near impossible to get
> right.  Perfect.
> 
> I think this explains exactly why I support this idea.  Make X32 look
> like everyone else and put these custom horrific hacks in seccomp if we
> are unwilling to 'do it right'
> 
> Honestly, how many people are using seccomp on X32 and would be horribly
> pissed if we just fixed it?
> 

The bigger issue is probably if we will open a problem with the older
kernels.

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 16:23           ` Eric Paris
  2014-07-11 16:30             ` H. Peter Anvin
@ 2014-07-11 16:32             ` Paul Moore
  2014-07-11 18:31               ` Eric Paris
  1 sibling, 1 reply; 22+ messages in thread
From: Paul Moore @ 2014-07-11 16:32 UTC (permalink / raw)
  To: Eric Paris
  Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel,
	Al Viro, Will Drewry

On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote:
> On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
> > On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
> > > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> > > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > > > > Incidentally: do seccomp users know that on an x86-64 system you can
> > > > > recevie system calls from any of the x86 architectures, regardless
> > > > > of
> > > > > how the program is invoked?  (This is unusual, so normally denying
> > > > > those
> > > > > "alien" calls is the right thing to do.)
> > > > 
> > > > I obviously can't speak for all seccomp users, but libseccomp handles
> > > > this
> > > > by checking the seccomp_data->arch value at the start of the filter
> > > > and
> > > > killing (by default) any non-native architectures.  If you want, you
> > > > can
> > > > change this default behavior or add support for other architectures
> > > > (e.g.
> > > > create a filter that allows both x86-64 and x32 but disallows x86, or
> > > > any
> > > > combination of the three for that matter).
> > > 
> > > Maybe libseccomp does some HORRIFIC contortions under the hood, but the
> > > interface is crap...  Since seccomp_data->arch can't distinguish between
> > > X32 and X86_64.  If I write a seccomp filter which says
> > > 
> > > KILL arch != x86_64
> > > KILL init_module
> > > ALLOW everything else
> > > 
> > > I can still call init_module, I just have to use the X32 variant.
> > > 
> > > If libseccomp is translating:
> > > 
> > > KILL arch != x86_64 into:
> > > 
> > > KILL arch != x86_64
> > > KILL syscall_nr >= 2000
> > > 
> > > That's just showing how dumb the kernel interface is...   Good for you
> > > guys, but the kernel is just being dumb   :)
> > 
> > You're not going to hear me ever say that I like how the x32 ABI was done,
> > it is a real mess from a seccomp filter point of view and we have to do
> > some nasty stuff in libseccomp to make it all work correctly (see my
> > comments on the libseccomp-devel list regarding my severe displeasure
> > over x32), but what's done is done.
> > 
> > I think it's too late to change the x32 seccomp filter ABI.
> 
> So we have a security interface that is damn near impossible to get
> right.  Perfect.

What?  Having to do two comparisons instead of one is "damn near impossible"?  
I think that might be a bit of an overreaction don't you think?

> I think this explains exactly why I support this idea.  Make X32 look
> like everyone else ...

You do realize that this patch set makes x32 the odd man out by having 
syscall_get_nr() return a different syscall number than what was used to make 
the syscall?  I don't understand how that makes "x32 look like everyone else".

> ... and put these custom horrific hacks in seccomp if we are unwilling to
> 'do it right'

If you want to add the new x32 audit arch #define, go for it, like I said that 
was something that I feel should have been in there from the beginning.  As 
far as I'm concerned you can even put a hack in kernel/seccomp.c to rewrite 
the arch token value if it makes your life easier.

> Honestly, how many people are using seccomp on X32 and would be horribly
> pissed if we just fixed it?

Okay, please stop suggesting we break the x32 kernel/user interface to 
workaround a flaw in audit.  I get that it sucks for audit, I really do, but 
this is audit's problem.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 16:32             ` Paul Moore
@ 2014-07-11 18:31               ` Eric Paris
  2014-07-11 19:36                 ` Paul Moore
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Paris @ 2014-07-11 18:31 UTC (permalink / raw)
  To: Paul Moore
  Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel,
	Al Viro, Will Drewry

On Fri, 2014-07-11 at 12:32 -0400, Paul Moore wrote:
> On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote:
> > On Fri, 2014-07-11 at 12:21 -0400, Paul Moore wrote:
> > > On Friday, July 11, 2014 12:16:47 PM Eric Paris wrote:
> > > > On Fri, 2014-07-11 at 12:11 -0400, Paul Moore wrote:
> > > > > On Thursday, July 10, 2014 09:06:02 PM H. Peter Anvin wrote:
> > > > > > Incidentally: do seccomp users know that on an x86-64 system you can
> > > > > > recevie system calls from any of the x86 architectures, regardless
> > > > > > of
> > > > > > how the program is invoked?  (This is unusual, so normally denying
> > > > > > those
> > > > > > "alien" calls is the right thing to do.)
> > > > > 
> > > > > I obviously can't speak for all seccomp users, but libseccomp handles
> > > > > this
> > > > > by checking the seccomp_data->arch value at the start of the filter
> > > > > and
> > > > > killing (by default) any non-native architectures.  If you want, you
> > > > > can
> > > > > change this default behavior or add support for other architectures
> > > > > (e.g.
> > > > > create a filter that allows both x86-64 and x32 but disallows x86, or
> > > > > any
> > > > > combination of the three for that matter).
> > > > 
> > > > Maybe libseccomp does some HORRIFIC contortions under the hood, but the
> > > > interface is crap...  Since seccomp_data->arch can't distinguish between
> > > > X32 and X86_64.  If I write a seccomp filter which says
> > > > 
> > > > KILL arch != x86_64
> > > > KILL init_module
> > > > ALLOW everything else
> > > > 
> > > > I can still call init_module, I just have to use the X32 variant.
> > > > 
> > > > If libseccomp is translating:
> > > > 
> > > > KILL arch != x86_64 into:
> > > > 
> > > > KILL arch != x86_64
> > > > KILL syscall_nr >= 2000
> > > > 
> > > > That's just showing how dumb the kernel interface is...   Good for you
> > > > guys, but the kernel is just being dumb   :)
> > > 
> > > You're not going to hear me ever say that I like how the x32 ABI was done,
> > > it is a real mess from a seccomp filter point of view and we have to do
> > > some nasty stuff in libseccomp to make it all work correctly (see my
> > > comments on the libseccomp-devel list regarding my severe displeasure
> > > over x32), but what's done is done.
> > > 
> > > I think it's too late to change the x32 seccomp filter ABI.
> > 
> > So we have a security interface that is damn near impossible to get
> > right.  Perfect.
> 
> What?  Having to do two comparisons instead of one is "damn near impossible"?  
> I think that might be a bit of an overreaction don't you think?

Actually no.  How can a normal userspace application coder POSSIBLY know
this?  Find this thread on an e-mail list, by accident?  
> 
> > I think this explains exactly why I support this idea.  Make X32 look
> > like everyone else ...
> 
> You do realize that this patch set makes x32 the odd man out by having 
> syscall_get_nr() return a different syscall number than what was used to make 
> the syscall?  I don't understand how that makes "x32 look like everyone else".

Ok, I buy the __X32_SYSCALL_BIT argument.  It can be dealt with in
audit.  No problem.  We don't need to strip it in syscall_get_nr().
I'll gladly concede that part of the patch series.

But given an x86_64 kernel a seccomp filter writer has to know about X32
and how to write rules to block the X32 ABI.  And I stick with my
assessment that x32 + seccomp is darn near impossible for a normal
developer to handle.  

Heck, even chromium took months to realize that x32 was a weird beast.
And they got it wrong on their first try.  Their original implementation
didn't handle __X32_SYSCALL_BIT quite right.  Looking at their code I'm
still not sure it does the right thing.  And they are the EXPERTS.  They
wrote seccomp!

> > Honestly, how many people are using seccomp on X32 and would be horribly
> > pissed if we just fixed it?
> 
> Okay, please stop suggesting we break the x32 kernel/user interface to 
> workaround a flaw in audit.  I get that it sucks for audit, I really do, but 
> this is audit's problem.

No one is asking to break X32 to fix audit.  Audit can handle itself.  I
don't want anything in the kernel to pretend that X32 is X86_64.  It
isn't.  It has its own syscall table.  Its own syscalls.  Its own ABI.
I'm suggesting to fix how seccomp exposes X32 information because it is
a HORRIBLE interface that even the experts have gotten wrong, over and
over and over.

I suggest we accept it as breakage and just return AUDIT_ARCH_X32.
(Leaving the _X32_SYSCALL_BIT exposed as it is today)

But I'd love to hear some thoughts on how that is a bad thing.  If no
one is using the x32 seccomp abi, lets fix it.  If someone is, lets see
what the fallout from fixing it will be.

-Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 18:31               ` Eric Paris
@ 2014-07-11 19:36                 ` Paul Moore
  2014-07-11 22:48                   ` Kees Cook
  0 siblings, 1 reply; 22+ messages in thread
From: Paul Moore @ 2014-07-11 19:36 UTC (permalink / raw)
  To: Eric Paris
  Cc: H. Peter Anvin, Richard Guy Briggs, linux-audit, linux-kernel,
	Al Viro, Will Drewry, Kees Cook

On Friday, July 11, 2014 02:31:06 PM Eric Paris wrote:
> On Fri, 2014-07-11 at 12:32 -0400, Paul Moore wrote:
> > On Friday, July 11, 2014 12:23:33 PM Eric Paris wrote:

...

> > > So we have a security interface that is damn near impossible to get
> > > right.  Perfect.
> > 
> > What?  Having to do two comparisons instead of one is "damn near
> > impossible"? I think that might be a bit of an overreaction don't you
> > think?
> 
> Actually no.  How can a normal userspace application coder POSSIBLY know
> this?  Find this thread on an e-mail list, by accident?

I suppose some clarification of perspective is in order.  Personally, I don't 
think a "normal" (which brings up the age old question: what is normal 
anyway?) userspace developer is going to write their own seccomp BPF filter, 
there is just too much of a barrier to entry and it is way too easy to get it 
wrong.  This is why I started the libseccomp project.

However, I think there *may* be a solution to satisfy us both, see below.

> > > I think this explains exactly why I support this idea.  Make X32 look
> > > like everyone else ...
> > 
> > You do realize that this patch set makes x32 the odd man out by having
> > syscall_get_nr() return a different syscall number than what was used to
> > make the syscall?  I don't understand how that makes "x32 look like
> > everyone else".
>
> Ok, I buy the __X32_SYSCALL_BIT argument.  It can be dealt with in
> audit.  No problem.  We don't need to strip it in syscall_get_nr().
> I'll gladly concede that part of the patch series.
> 
> But given an x86_64 kernel a seccomp filter writer has to know about X32
> and how to write rules to block the X32 ABI.  And I stick with my
> assessment that x32 + seccomp is darn near impossible for a normal
> developer to handle.

No argument here; like I said, I think seccomp BPF filters are too much for 
most folks regardless of the arch.

> Heck, even chromium took months to realize that x32 was a weird beast.
> And they got it wrong on their first try.  Their original implementation
> didn't handle __X32_SYSCALL_BIT quite right.  Looking at their code I'm
> still not sure it does the right thing.  And they are the EXPERTS.  They
> wrote seccomp!

I think you're helping prove my point.  Or I'm providing yours.  Or something.

However, my point about breaking userspace still stands.  Assuming the kernel 
interface is functional, regardless of warts, I stand fast in that anything 
that breaks the kernel/userspace interface is bad.  

> > > Honestly, how many people are using seccomp on X32 and would be horribly
> > > pissed if we just fixed it?
> > 
> > Okay, please stop suggesting we break the x32 kernel/user interface to
> > workaround a flaw in audit.  I get that it sucks for audit, I really do,
> > but this is audit's problem.
> 
> No one is asking to break X32 to fix audit.  Audit can handle itself.  I
> don't want anything in the kernel to pretend that X32 is X86_64.  It
> isn't.  It has its own syscall table.  Its own syscalls.  Its own ABI.
> I'm suggesting to fix how seccomp exposes X32 information because it is
> a HORRIBLE interface that even the experts have gotten wrong, over and
> over and over.

See my comments above.

> I suggest we accept it as breakage and just return AUDIT_ARCH_X32.
> (Leaving the _X32_SYSCALL_BIT exposed as it is today)
> 
> But I'd love to hear some thoughts on how that is a bad thing.  If no
> one is using the x32 seccomp abi, lets fix it.  If someone is, lets see
> what the fallout from fixing it will be.

This would break the seccomp filter API and any application that uses it 
properly.  I'm not excited about the idea of "let's just break the interface 
and see who notices philosophy"; that's a very bad practice IMHO.

Anyway, getting back to the idea I mentioned earlier ... as many of you may 
know, Kees (added to the CC line) is working on some seccomp filter 
improvements which will result in a new seccomp syscall.  Perhaps one way 
forward is to preserve everything as it is currently with the prctl() 
interface, but with the new seccomp() based interface we fixup x32 and use the 
new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of the 
kernel, but I don't think it would be too bad, and I think it would address 
both our concerns.

Thoughts?

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 19:36                 ` Paul Moore
@ 2014-07-11 22:48                   ` Kees Cook
  2014-07-11 22:52                     ` Kees Cook
  2014-07-11 23:12                     ` Andy Lutomirski
  0 siblings, 2 replies; 22+ messages in thread
From: Kees Cook @ 2014-07-11 22:48 UTC (permalink / raw)
  To: Paul Moore
  Cc: Eric Paris, H. Peter Anvin, Richard Guy Briggs, linux-audit, LKML,
	Al Viro, Will Drewry, Andy Lutomirski, Chris Evans, Serge Hallyn,
	Tyler Hicks, stgraber

On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore <pmoore@redhat.com> wrote:
> Anyway, getting back to the idea I mentioned earlier ... as many of you may
> know, Kees (added to the CC line) is working on some seccomp filter
> improvements which will result in a new seccomp syscall.  Perhaps one way
> forward is to preserve everything as it is currently with the prctl()
> interface, but with the new seccomp() based interface we fixup x32 and use the
> new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of the
> kernel, but I don't think it would be too bad, and I think it would address
> both our concerns.

Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* Both
x32 and x86_64 are considered "64-bit". */" should be changed...)

Just so I understand: currently x86_64 and x32 both present as
AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
(due to the set high bit).

The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
without setting __X32_SYSCALL_BIT would be totally fine (it would
catch the arch instead of the syscall). This sounds similar to how
libseccomp is doing things, so these should be fine.

The only project I know of doing blacklisting is lxc, and Eric's
example looks a lot like a discussion I saw with lxc and init_module.
:) So it sounds like we can get this right there.

I'd like to avoid carrying a delta on filter logic based on the prctl
vs syscall entry. Can we find any userspace filters being used that a
"correct" fix would break? (If so, then yes, we'll need to do this
proposed "via prctl or via syscall?" change.)

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 22:48                   ` Kees Cook
@ 2014-07-11 22:52                     ` Kees Cook
  2014-07-11 22:55                       ` H. Peter Anvin
  2014-07-11 23:12                     ` Andy Lutomirski
  1 sibling, 1 reply; 22+ messages in thread
From: Kees Cook @ 2014-07-11 22:52 UTC (permalink / raw)
  To: Paul Moore
  Cc: Eric Paris, H. Peter Anvin, Richard Guy Briggs, linux-audit, LKML,
	Al Viro, Will Drewry, Andy Lutomirski, Chris Evans, Serge Hallyn,
	Tyler Hicks, stgraber

On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore <pmoore@redhat.com> wrote:
>> Anyway, getting back to the idea I mentioned earlier ... as many of you may
>> know, Kees (added to the CC line) is working on some seccomp filter
>> improvements which will result in a new seccomp syscall.  Perhaps one way
>> forward is to preserve everything as it is currently with the prctl()
>> interface, but with the new seccomp() based interface we fixup x32 and use the
>> new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of the
>> kernel, but I don't think it would be too bad, and I think it would address
>> both our concerns.
>
> Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* Both
> x32 and x86_64 are considered "64-bit". */" should be changed...)
>
> Just so I understand: currently x86_64 and x32 both present as
> AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
> (due to the set high bit).
>
> The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
> whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
> without setting __X32_SYSCALL_BIT would be totally fine (it would
> catch the arch instead of the syscall). This sounds similar to how
> libseccomp is doing things, so these should be fine.

I should clarify: seccomp expects to find whatever is sent as the
syscall nr... as in the __NR_read used like this:

                BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
                        offsetof(struct seccomp_data, nr)),
                BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1),
                BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
                BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),

Are there native x32 users yet? What does __NR_read resolve to via the
uapi on a native x32 userspace?

-Kees

> The only project I know of doing blacklisting is lxc, and Eric's
> example looks a lot like a discussion I saw with lxc and init_module.
> :) So it sounds like we can get this right there.
>
> I'd like to avoid carrying a delta on filter logic based on the prctl
> vs syscall entry. Can we find any userspace filters being used that a
> "correct" fix would break? (If so, then yes, we'll need to do this
> proposed "via prctl or via syscall?" change.)
>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 22:52                     ` Kees Cook
@ 2014-07-11 22:55                       ` H. Peter Anvin
  2014-07-11 23:02                         ` Kees Cook
  0 siblings, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2014-07-11 22:55 UTC (permalink / raw)
  To: Kees Cook, Paul Moore
  Cc: Eric Paris, Richard Guy Briggs, linux-audit, LKML, Al Viro,
	Will Drewry, Andy Lutomirski, Chris Evans, Serge Hallyn,
	Tyler Hicks, stgraber

It includes the X32 bit.

On July 11, 2014 3:52:42 PM PDT, Kees Cook <keescook@chromium.org> wrote:
>On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook <keescook@chromium.org>
>wrote:
>> On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore <pmoore@redhat.com>
>wrote:
>>> Anyway, getting back to the idea I mentioned earlier ... as many of
>you may
>>> know, Kees (added to the CC line) is working on some seccomp filter
>>> improvements which will result in a new seccomp syscall.  Perhaps
>one way
>>> forward is to preserve everything as it is currently with the
>prctl()
>>> interface, but with the new seccomp() based interface we fixup x32
>and use the
>>> new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in
>some of the
>>> kernel, but I don't think it would be too bad, and I think it would
>address
>>> both our concerns.
>>
>> Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/*
>Both
>> x32 and x86_64 are considered "64-bit". */" should be changed...)
>>
>> Just so I understand: currently x86_64 and x32 both present as
>> AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
>> (due to the set high bit).
>>
>> The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
>> whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
>> without setting __X32_SYSCALL_BIT would be totally fine (it would
>> catch the arch instead of the syscall). This sounds similar to how
>> libseccomp is doing things, so these should be fine.
>
>I should clarify: seccomp expects to find whatever is sent as the
>syscall nr... as in the __NR_read used like this:
>
>                BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
>                        offsetof(struct seccomp_data, nr)),
>                BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1),
>                BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
>                BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
>
>Are there native x32 users yet? What does __NR_read resolve to via the
>uapi on a native x32 userspace?
>
>-Kees
>
>> The only project I know of doing blacklisting is lxc, and Eric's
>> example looks a lot like a discussion I saw with lxc and init_module.
>> :) So it sounds like we can get this right there.
>>
>> I'd like to avoid carrying a delta on filter logic based on the prctl
>> vs syscall entry. Can we find any userspace filters being used that a
>> "correct" fix would break? (If so, then yes, we'll need to do this
>> proposed "via prctl or via syscall?" change.)
>>
>> -Kees
>>
>> --
>> Kees Cook
>> Chrome OS Security

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 22:55                       ` H. Peter Anvin
@ 2014-07-11 23:02                         ` Kees Cook
  0 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2014-07-11 23:02 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Paul Moore, Eric Paris, Richard Guy Briggs, linux-audit, LKML,
	Al Viro, Will Drewry, Andy Lutomirski, Chris Evans, Serge Hallyn,
	Tyler Hicks, stgraber

On Fri, Jul 11, 2014 at 3:55 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> It includes the X32 bit.

If the uapi for __NR_* includes the x32 bit, then that's what seccomp
filters must be seeing. Building seccomp filters is documented to use
the __NR_* values.

-Kees

>
> On July 11, 2014 3:52:42 PM PDT, Kees Cook <keescook@chromium.org> wrote:
>>On Fri, Jul 11, 2014 at 3:48 PM, Kees Cook <keescook@chromium.org>
>>wrote:
>>> On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore <pmoore@redhat.com>
>>wrote:
>>>> Anyway, getting back to the idea I mentioned earlier ... as many of
>>you may
>>>> know, Kees (added to the CC line) is working on some seccomp filter
>>>> improvements which will result in a new seccomp syscall.  Perhaps
>>one way
>>>> forward is to preserve everything as it is currently with the
>>prctl()
>>>> interface, but with the new seccomp() based interface we fixup x32
>>and use the
>>>> new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in
>>some of the
>>>> kernel, but I don't think it would be too bad, and I think it would
>>address
>>>> both our concerns.
>>>
>>> Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/*
>>Both
>>> x32 and x86_64 are considered "64-bit". */" should be changed...)
>>>
>>> Just so I understand: currently x86_64 and x32 both present as
>>> AUDIT_ARCH_X86_64. The x32 syscalls are seen as in a different range
>>> (due to the set high bit).
>>>
>>> The seccomp used in Chrome, Chrome OS, and vsftpd should all only do
>>> whitelisting by both arch and syscall, so adding AUDIT_ARCH_X32
>>> without setting __X32_SYSCALL_BIT would be totally fine (it would
>>> catch the arch instead of the syscall). This sounds similar to how
>>> libseccomp is doing things, so these should be fine.
>>
>>I should clarify: seccomp expects to find whatever is sent as the
>>syscall nr... as in the __NR_read used like this:
>>
>>                BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
>>                        offsetof(struct seccomp_data, nr)),
>>                BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 0, 1),
>>                BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
>>                BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
>>
>>Are there native x32 users yet? What does __NR_read resolve to via the
>>uapi on a native x32 userspace?
>>
>>-Kees
>>
>>> The only project I know of doing blacklisting is lxc, and Eric's
>>> example looks a lot like a discussion I saw with lxc and init_module.
>>> :) So it sounds like we can get this right there.
>>>
>>> I'd like to avoid carrying a delta on filter logic based on the prctl
>>> vs syscall entry. Can we find any userspace filters being used that a
>>> "correct" fix would break? (If so, then yes, we'll need to do this
>>> proposed "via prctl or via syscall?" change.)
>>>
>>> -Kees
>>>
>>> --
>>> Kees Cook
>>> Chrome OS Security
>
> --
> Sent from my mobile phone.  Please pardon brevity and lack of formatting.



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 22:48                   ` Kees Cook
  2014-07-11 22:52                     ` Kees Cook
@ 2014-07-11 23:12                     ` Andy Lutomirski
  1 sibling, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2014-07-11 23:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Stephane Graber, Paul Moore, Chris Evans, Tyler Hicks,
	Serge Hallyn, LKML, H. Peter Anvin, linux-audit, Will Drewry,
	Richard Guy Briggs, Eric Paris, Al Viro

On Jul 11, 2014 3:48 PM, "Kees Cook" <keescook@chromium.org> wrote:
>
> On Fri, Jul 11, 2014 at 12:36 PM, Paul Moore <pmoore@redhat.com> wrote:
> > Anyway, getting back to the idea I mentioned earlier ... as many of you may
> > know, Kees (added to the CC line) is working on some seccomp filter
> > improvements which will result in a new seccomp syscall.  Perhaps one way
> > forward is to preserve everything as it is currently with the prctl()
> > interface, but with the new seccomp() based interface we fixup x32 and use the
> > new AUDIT_ARCH_X32 token?  It might result in a bit of ugliness in some of the
> > kernel, but I don't think it would be too bad, and I think it would address
> > both our concerns.
>
> Adding AUDIT_ARCH_X32: yes please. (On that note, the comment "/* Both
> x32 and x86_64 are considered "64-bit". */" should be changed...)
>

I admit I'm not convinced that the current situation is really wrong.
The "audit arch" uniquely defines a mapping between the actual syscall
and nr and the regs.  The natural way to convert between seccomp_data
and the syscall entry and regs is completely correct.

For things like ARM OABI, the situation is different: OABI and EABI
really do work differently wrt the interpretation of the syscall regs.
This isn't the case for x32.

Of course, the audit code screws this up completely, but audit is
disabled for x32.  My upcoming seccomp patchset will clean it up a
little.  IMO there's no actual audit ABI to preserve for x32, because
that code has never been anything other than terminally fvcked.

--Andy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11  3:38 ` [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter Richard Guy Briggs
  2014-07-11  4:06   ` H. Peter Anvin
@ 2014-07-11 16:36   ` Paul Moore
  2014-07-11 16:44     ` H. Peter Anvin
  1 sibling, 1 reply; 22+ messages in thread
From: Paul Moore @ 2014-07-11 16:36 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-audit, linux-kernel, Eric Paris, Al Viro, Will Drewry,
	H. Peter Anvin

On Thursday, July 10, 2014 11:38:13 PM Richard Guy Briggs wrote:
> Commit
> 	fca460f hpa@zytor.com 2012-02-19 07:56:26 -0800
> 	x32: Handle the x32 system call flag
> 
> provided a method to multiplex architecture with the syscall number for X32
> calls.
> 
> Commit
> 	8b4b9f2 pmoore@redhat.com 2013-02-15 12:21:43 -0500
> 	x86: remove the x32 syscall bitmask from syscall_get_nr()
> 
> broke audit and potentially other users of syscall_get_nr() which depend on
> that call as named.

Arguably audit is broken anyway by not correctly treating syscall numbers as 
32 bit integers like everyone else.

The commit above, 8b4b9f2, changed syscall_get_nr() so that it returned the 
same syscall number that is used by the architecture's ABI; just like every* 
other architecture in the kernel.

* Admittedly I didn't check every architecture's implementation, but after a 
half dozen I stopped checking as there was a definite trend.

{snip}

> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> index d6a756a..d58b6be 100644
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -236,6 +236,10 @@ static inline int syscall_get_arch(void)
>  		return AUDIT_ARCH_I386;
>  #endif
>  	/* Both x32 and x86_64 are considered "64-bit". */
> +#ifdef CONFIG_X86_X32_ABI
> +	if (task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT)
> +		return AUDIT_ARCH_X86_X32;
> +#endif

No.  See my comments above and in other parts of this thread.

>  	return AUDIT_ARCH_X86_64;
>  }
>  #endif	/* CONFIG_X86_32 */
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index b35c215..bc18214 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -73,6 +73,12 @@ static void populate_seccomp_data(struct seccomp_data
> *sd)
> 
>  	sd->nr = syscall_get_nr(task, regs);
>  	sd->arch = syscall_get_arch();
> +#ifdef CONFIG_X86_X32_ABI
> +	if (sd->arch == AUDIT_ARCH_X86_X32) {
> +		sd->arch = AUDIT_ARCH_X86_64;
> +		sd->nr |= __X32_SYSCALL_BIT;
> +	}
> +#endif

Once again, I'm not really sure I need to comment further here, but don't 
change syscall_get_nr(), it should return the same syscall number as was used 
by userspace to initiate the syscall.  If you really want to use the new 
AUDIT_ARCH_X86_X32 macro/define, go ahead, but make sure you rewrite it to the 
x86-64 value here so as to not break compatibility with existing seccomp 
filter users.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter
  2014-07-11 16:36   ` Paul Moore
@ 2014-07-11 16:44     ` H. Peter Anvin
  0 siblings, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2014-07-11 16:44 UTC (permalink / raw)
  To: Paul Moore, Richard Guy Briggs
  Cc: linux-audit, linux-kernel, Eric Paris, Al Viro, Will Drewry

On 07/11/2014 09:36 AM, Paul Moore wrote:
> 
> Arguably audit is broken anyway by not correctly treating syscall numbers as 
> 32 bit integers like everyone else.
> 

That is really the root cause of the problem.  x86 is not the only
architecture with a sparse syscall numbering scheme (in fact the x32
method was based on the MIPS syscall numbering scheme.)

What syscall_get_nr() returns becomes a matter of definition at that point.

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/3] [RFC] Revert "x86: remove the x32 syscall bitmask from syscall_get_nr()"
  2014-07-11  3:38 [PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF Richard Guy Briggs
  2014-07-11  3:38 ` [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition Richard Guy Briggs
  2014-07-11  3:38 ` [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter Richard Guy Briggs
@ 2014-07-11  3:38 ` Richard Guy Briggs
  2 siblings, 0 replies; 22+ messages in thread
From: Richard Guy Briggs @ 2014-07-11  3:38 UTC (permalink / raw)
  To: linux-audit, linux-kernel
  Cc: Richard Guy Briggs, Paul Moore, Eric Paris, Al Viro, Will Drewry,
	H. Peter Anvin

This reverts commit 8b4b9f27e57584f3d90e0bb84cf800ad81cfe3a1.
which broke audit and potentially other users of syscall_get_nr() which depend
on that call as named without being overloaded by architecture bits.

This patch along with
	seccomp: give BPF x32 bit when restoring x32 filter
will satisfy other regular users of syscall_get_nr() and syscall_get_arch()
without changing the seccomp interface to BPF.

Cc: Paul Moore <pmoore@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <aviro@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Link: http://lkml.kernel.org/r/cover.1405023592.git.rgb@redhat.com
---
 arch/x86/include/asm/syscall.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index d58b6be..8c1bb2b 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -30,13 +30,13 @@ extern const sys_call_ptr_t sys_call_table[];
  */
 static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
 {
-	return regs->orig_ax;
+	return regs->orig_ax & __SYSCALL_MASK;
 }
 
 static inline void syscall_rollback(struct task_struct *task,
 				    struct pt_regs *regs)
 {
-	regs->ax = regs->orig_ax;
+	regs->ax = regs->orig_ax & __SYSCALL_MASK;
 }
 
 static inline long syscall_get_error(struct task_struct *task,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-07-11 23:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-11  3:38 [PATCH 0/3] [RFC] X32: fix syscall_get_nr while not breaking seccomp BPF Richard Guy Briggs
2014-07-11  3:38 ` [PATCH 1/3] [RFC] audit: add AUDIT_ARCH_X86_X32 arch definition Richard Guy Briggs
2014-07-11 16:15   ` Paul Moore
2014-07-11  3:38 ` [PATCH 2/3] [RFC] seccomp: give BPF x32 bit when restoring x32 filter Richard Guy Briggs
2014-07-11  4:06   ` H. Peter Anvin
2014-07-11 16:11     ` Paul Moore
2014-07-11 16:13       ` H. Peter Anvin
2014-07-11 16:16       ` Eric Paris
2014-07-11 16:21         ` Paul Moore
2014-07-11 16:23           ` Eric Paris
2014-07-11 16:30             ` H. Peter Anvin
2014-07-11 16:32             ` Paul Moore
2014-07-11 18:31               ` Eric Paris
2014-07-11 19:36                 ` Paul Moore
2014-07-11 22:48                   ` Kees Cook
2014-07-11 22:52                     ` Kees Cook
2014-07-11 22:55                       ` H. Peter Anvin
2014-07-11 23:02                         ` Kees Cook
2014-07-11 23:12                     ` Andy Lutomirski
2014-07-11 16:36   ` Paul Moore
2014-07-11 16:44     ` H. Peter Anvin
2014-07-11  3:38 ` [PATCH 3/3] [RFC] Revert "x86: remove the x32 syscall bitmask from syscall_get_nr()" Richard Guy Briggs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).