[kernel-hardening] System call interface changes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [kernel-hardening] System call interface changes
@ 2015-11-20 10:30 Florian Weimer
  2015-11-20 19:16 ` Rich Felker
  2015-11-24 20:10 ` Kees Cook
  0 siblings, 2 replies; 10+ messages in thread
From: Florian Weimer @ 2015-11-20 10:30 UTC (permalink / raw)
  To: kernel-hardening

Not sure if this in scope for this list.  If not, please say so.

Currently, the system call interface to user space expects the system
call number in a register (on i386 and x86_64, and probably most other
architectures).  This means that once you have a system call instruction
in the process image, it can be theoretically used to run *any* system
call, including ones that are not actually referenced in the binary.  As
a result, you need seccomp or a Linux security module to interdict
certain system calls.

This would have to be an opt-in feature, obviously, and applications
would have to opt in explicitly via some ELF flag (similar to what we
did for non-executable stacks).

Do you think it would be feasible to encode the system call number in
the instruction stream instead, next to the instruction?  I think this
would have to set the system call MSR (LSTAR) on some context switches
at least (to avoid a conditional branch in the system call handler,
depending on whether the process has opted in to the new interface), and
add a few instructions (two loads and an add, so that the system call
number can be encoded in multiple ways, to avoid creating otherwise
useful bit patterns).  On the other hand, it would avoid the need to
load the sixth argument from the stack on i386.

Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kernel-hardening] System call interface changes
  2015-11-20 10:30 [kernel-hardening] System call interface changes Florian Weimer
@ 2015-11-20 19:16 ` Rich Felker
  2015-11-24 19:54   ` Florian Weimer
  2015-11-24 20:10 ` Kees Cook
  1 sibling, 1 reply; 10+ messages in thread
From: Rich Felker @ 2015-11-20 19:16 UTC (permalink / raw)
  To: kernel-hardening

On Fri, Nov 20, 2015 at 11:30:39AM +0100, Florian Weimer wrote:
> Not sure if this in scope for this list.  If not, please say so.
> 
> Currently, the system call interface to user space expects the system
> call number in a register (on i386 and x86_64, and probably most other
> architectures).  This means that once you have a system call instruction
> in the process image, it can be theoretically used to run *any* system
> call, including ones that are not actually referenced in the binary.  As
> a result, you need seccomp or a Linux security module to interdict
> certain system calls.
> 
> This would have to be an opt-in feature, obviously, and applications
> would have to opt in explicitly via some ELF flag (similar to what we
> did for non-executable stacks).

I don't think that's necessary. The application (or for typical
dynamic linking, just the build of libc.so) would just need to refrain
from using the parameterized syscall so that the old opcode would not
appear in its executable mappings.

> Do you think it would be feasible to encode the system call number in
> the instruction stream instead, next to the instruction?  I think this

This was done on ARM in the old pre-EABI ABI, and it turned out to be
a bad design, at least from standpoints other than security. Reading
the syscall number out of the instruction stream was more expensive,
incompatible with syscall() (which ended up requiring a special
SYS_syscall that needed messy argument conventions), and incompatible
with reasonable userspace coding of syscalls using inline functions
rather than macros, where you would have to rely on constant
propagation optimizations to be able to satisfy asm constraints. See
how we're currently doing syscall asm for mips in musl:

http://git.musl-libc.org/cgit/musl/tree/arch/mips/syscall_arch.h

The "ir" constraint allows the compiler to use an immediate if
constant propagation succeeds, but also allows the syscall number in a
register if it doesn't.

I think your idea is also problematic for syscall restart when the
kernel needs to arrange for a special restartblock to be used rather
than the original syscall, since the kernel would have no way of
storing that state (or if there were a way to store that, userspace
could exploit it to make its own restartblocks for the sake of
executing arbitrary syscalls).

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kernel-hardening] System call interface changes
  2015-11-20 19:16 ` Rich Felker
@ 2015-11-24 19:54   ` Florian Weimer
  2015-11-24 20:29     ` Rich Felker
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2015-11-24 19:54 UTC (permalink / raw)
  To: kernel-hardening

On 11/20/2015 08:16 PM, Rich Felker wrote:

>> This would have to be an opt-in feature, obviously, and applications
>> would have to opt in explicitly via some ELF flag (similar to what we
>> did for non-executable stacks).
> 
> I don't think that's necessary. The application (or for typical
> dynamic linking, just the build of libc.so) would just need to refrain
> from using the parameterized syscall so that the old opcode would not
> appear in its executable mappings.

The SYSCALL instruction is fairly short (0x0f 0x05), so it ends up in
process images by accident.  I think this calls for explicit blocking.

>> Do you think it would be feasible to encode the system call number in
>> the instruction stream instead, next to the instruction?  I think this
> 
> This was done on ARM in the old pre-EABI ABI, and it turned out to be
> a bad design, at least from standpoints other than security. Reading
> the syscall number out of the instruction stream was more expensive,
> incompatible with syscall() (which ended up requiring a special
> SYS_syscall that needed messy argument conventions), and incompatible
> with reasonable userspace coding of syscalls using inline functions
> rather than macros, where you would have to rely on constant
> propagation optimizations to be able to satisfy asm constraints.

Wouldn't it be possible to embed the constant in the assembly text,
using the C preprocessor?

But I appreciate your comments, they have been helpful.

Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kernel-hardening] System call interface changes
  2015-11-24 19:54   ` Florian Weimer
@ 2015-11-24 20:29     ` Rich Felker
  0 siblings, 0 replies; 10+ messages in thread
From: Rich Felker @ 2015-11-24 20:29 UTC (permalink / raw)
  To: kernel-hardening

On Tue, Nov 24, 2015 at 08:54:28PM +0100, Florian Weimer wrote:
> On 11/20/2015 08:16 PM, Rich Felker wrote:
> 
> >> This would have to be an opt-in feature, obviously, and applications
> >> would have to opt in explicitly via some ELF flag (similar to what we
> >> did for non-executable stacks).
> > 
> > I don't think that's necessary. The application (or for typical
> > dynamic linking, just the build of libc.so) would just need to refrain
> > from using the parameterized syscall so that the old opcode would not
> > appear in its executable mappings.
> 
> The SYSCALL instruction is fairly short (0x0f 0x05), so it ends up in
> process images by accident.  I think this calls for explicit blocking.

Indeed, that makes sense at least for x86 where you have misaligned
instructions and immediates in the instruction stream.

> >> Do you think it would be feasible to encode the system call number in
> >> the instruction stream instead, next to the instruction?  I think this
> > 
> > This was done on ARM in the old pre-EABI ABI, and it turned out to be
> > a bad design, at least from standpoints other than security. Reading
> > the syscall number out of the instruction stream was more expensive,
> > incompatible with syscall() (which ended up requiring a special
> > SYS_syscall that needed messy argument conventions), and incompatible
> > with reasonable userspace coding of syscalls using inline functions
> > rather than macros, where you would have to rely on constant
> > propagation optimizations to be able to satisfy asm constraints.
> 
> Wouldn't it be possible to embed the constant in the assembly text,
> using the C preprocessor?

Not unless you're using macros all the way down, with no inline
functions. I suppose that's a choice glibc could make, but I'd
consider it a bad choice in terms of style and flexibility. Inline
functions are preferable IMO, but they don't allow you to propagate
the constant expression semantic (or string literals) from the caller
into the callee.

BTW, as long as libc.so has the syscall() function, there's going to
be at least one way to make arbitrary syscalls by jumping to a
particular address with the right values in the right registers or
stack slots.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kernel-hardening] System call interface changes
  2015-11-20 10:30 [kernel-hardening] System call interface changes Florian Weimer
  2015-11-20 19:16 ` Rich Felker
@ 2015-11-24 20:10 ` Kees Cook
  2015-11-24 20:54   ` Florian Weimer
  1 sibling, 1 reply; 10+ messages in thread
From: Kees Cook @ 2015-11-24 20:10 UTC (permalink / raw)
  To: kernel-hardening@lists.openwall.com

On Fri, Nov 20, 2015 at 2:30 AM, Florian Weimer <fweimer@redhat.com> wrote:
> Not sure if this in scope for this list.  If not, please say so.

Sure! I think it's certainly in scope for the list. I'm glad to see new ideas.

> Currently, the system call interface to user space expects the system
> call number in a register (on i386 and x86_64, and probably most other
> architectures).  This means that once you have a system call instruction
> in the process image, it can be theoretically used to run *any* system
> call, including ones that are not actually referenced in the binary.  As
> a result, you need seccomp or a Linux security module to interdict
> certain system calls.

Ah, are you looking at this as an anti-ROP idea? What's the bug class
or exploitation method you're looking at addressing?

> This would have to be an opt-in feature, obviously, and applications
> would have to opt in explicitly via some ELF flag (similar to what we
> did for non-executable stacks).

There have been a growing number of things that seem like they'd be
nice to control with ELF flags. Andy Lutomirski recently wanted to
make the x86-64 vsyscall presence be selectable on a per-process
basis. (I think it's easier to just turn it off entirely.)

> Do you think it would be feasible to encode the system call number in
> the instruction stream instead, next to the instruction?  I think this
> would have to set the system call MSR (LSTAR) on some context switches
> at least (to avoid a conditional branch in the system call handler,
> depending on whether the process has opted in to the new interface), and
> add a few instructions (two loads and an add, so that the system call
> number can be encoded in multiple ways, to avoid creating otherwise
> useful bit patterns).  On the other hand, it would avoid the need to
> load the sixth argument from the stack on i386.

I always come back to "how can we measure this?" If you've got an
exploit method you're trying to kill, etc, then it should be easy to
evaluate its utility.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kernel-hardening] System call interface changes
  2015-11-24 20:10 ` Kees Cook
@ 2015-11-24 20:54   ` Florian Weimer
  2015-11-24 21:43     ` Kees Cook
  2015-11-24 22:33     ` Rich Felker
  0 siblings, 2 replies; 10+ messages in thread
From: Florian Weimer @ 2015-11-24 20:54 UTC (permalink / raw)
  To: kernel-hardening

On 11/24/2015 09:10 PM, Kees Cook wrote:

> Ah, are you looking at this as an anti-ROP idea? What's the bug class
> or exploitation method you're looking at addressing?

The idea, as it has been presented to me, is to remove the ability to
invoke execve (and a few other system calls) completely, to push
attackers *towards* ROP as the only means for injecting code into a
process (not all processes, obviously, but a large class of processes as
feasible).

As far as I know, this is intended as a post-exploitation mitigation
(before policy-based mechanisms or containers kick in).  I don't know
enough about real-world attack scenarios to tell if these changes would
make a difference.

>> This would have to be an opt-in feature, obviously, and applications
>> would have to opt in explicitly via some ELF flag (similar to what we
>> did for non-executable stacks).
> 
> There have been a growing number of things that seem like they'd be
> nice to control with ELF flags. Andy Lutomirski recently wanted to
> make the x86-64 vsyscall presence be selectable on a per-process
> basis. (I think it's easier to just turn it off entirely.)

Yes, we have a backlog in this area.  For usability reasons, we also
need ways to mark separated debuginfo files, and PIE binaries (which
currently look like DSOs).

> I always come back to "how can we measure this?" If you've got an
> exploit method you're trying to kill, etc, then it should be easy to
> evaluate its utility.

The final paragraph in

  <http://openwall.com/lists/oss-security/2015/11/18/12>

has some more context.  I think you are definitely asking the right
questions, and I desire a more empirical approach to security
improvements.  But this is the position I'm in.

Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kernel-hardening] System call interface changes
  2015-11-24 20:54   ` Florian Weimer
@ 2015-11-24 21:43     ` Kees Cook
  2015-12-07 15:44       ` Florian Weimer
  2015-11-24 22:33     ` Rich Felker
  1 sibling, 1 reply; 10+ messages in thread
From: Kees Cook @ 2015-11-24 21:43 UTC (permalink / raw)
  To: kernel-hardening@lists.openwall.com

On Tue, Nov 24, 2015 at 12:54 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 11/24/2015 09:10 PM, Kees Cook wrote:
>
>> Ah, are you looking at this as an anti-ROP idea? What's the bug class
>> or exploitation method you're looking at addressing?
>
> The idea, as it has been presented to me, is to remove the ability to
> invoke execve (and a few other system calls) completely, to push
> attackers *towards* ROP as the only means for injecting code into a
> process (not all processes, obviously, but a large class of processes as
> feasible).
>
> As far as I know, this is intended as a post-exploitation mitigation
> (before policy-based mechanisms or containers kick in).  I don't know
> enough about real-world attack scenarios to tell if these changes would
> make a difference.
>
>>> This would have to be an opt-in feature, obviously, and applications
>>> would have to opt in explicitly via some ELF flag (similar to what we
>>> did for non-executable stacks).
>>
>> There have been a growing number of things that seem like they'd be
>> nice to control with ELF flags. Andy Lutomirski recently wanted to
>> make the x86-64 vsyscall presence be selectable on a per-process
>> basis. (I think it's easier to just turn it off entirely.)
>
> Yes, we have a backlog in this area.  For usability reasons, we also
> need ways to mark separated debuginfo files, and PIE binaries (which
> currently look like DSOs).
>
>> I always come back to "how can we measure this?" If you've got an
>> exploit method you're trying to kill, etc, then it should be easy to
>> evaluate its utility.
>
> The final paragraph in
>
>   <http://openwall.com/lists/oss-security/2015/11/18/12>
>
> has some more context.  I think you are definitely asking the right
> questions, and I desire a more empirical approach to security
> improvements.  But this is the position I'm in.

Cool. Well, we can certainly look at existing public exploits and
PoCs. That's what I've been trying to collect on the Kernel
Self-Protection Project wiki pages. There are plenty of things we
could add to that list from the ROP world. Maybe it'd be good to look
through various exploit lists to find stuff that use techniques that
are either missing from the wiki page or are better examples? Do you
(or someone else) have time to go on a research/collection exercise?
Even pulling from academic papers can be useful here.

I think having concrete examples really helps both demonstrate
specifically what we're fixing and convince people that these are real
issues worth solving.

-Kees


-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kernel-hardening] System call interface changes
  2015-11-24 21:43     ` Kees Cook
@ 2015-12-07 15:44       ` Florian Weimer
  2015-12-07 16:26         ` lazytyped
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2015-12-07 15:44 UTC (permalink / raw)
  To: kernel-hardening

On 11/24/2015 10:43 PM, Kees Cook wrote:

> Cool. Well, we can certainly look at existing public exploits and
> PoCs. That's what I've been trying to collect on the Kernel
> Self-Protection Project wiki pages. There are plenty of things we
> could add to that list from the ROP world. Maybe it'd be good to look
> through various exploit lists to find stuff that use techniques that
> are either missing from the wiki page or are better examples? Do you
> (or someone else) have time to go on a research/collection exercise?

The proposal I have (and which prompted the question about the system
call interface) is to block execve because many proof-of-concept exploit
use execve or system to spawn a shell or run arbitrary commands.

But I'm concerned that it's basically the same thing Yves-Alexis mention
in the discussion about commit_creds(prepare_creds(0)):

  <http://www.openwall.com/lists/kernel-hardening/2015/11/26/8>

So execve is perhaps just a demo, just like running CALC.EXE on Windows.
 I'm worried that execve blocking is a bit like renaming CALC.EXE in
terms of impact regarding actual successful compromises, that is, it
won't make a difference.

In terms of actual impact on compromised Linux servers, SAML support in
FTP, OpenSSH and sudo, combined with popular Windows clients that can
interoperate (I don't know if people still use WinSCP) would probably
reduce compromises due to leaked credentials substantially.  My hunch is
that such compromises (even delayed reuse of credentials) cover by far
the largest number of server compromises.  But this is an extremely
high-level issue—we are basically providing mitigation for a client-side
issue, where we don't control the client at all.  And large scale web
hosters primarily affected by this do not approach us with such
requirements, as far as I know.

Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kernel-hardening] System call interface changes
  2015-12-07 15:44       ` Florian Weimer
@ 2015-12-07 16:26         ` lazytyped
  0 siblings, 0 replies; 10+ messages in thread
From: lazytyped @ 2015-12-07 16:26 UTC (permalink / raw)
  To: kernel-hardening



On 12/7/15 4:44 PM, Florian Weimer wrote:
> On 11/24/2015 10:43 PM, Kees Cook wrote:
>
>> Cool. Well, we can certainly look at existing public exploits and
>> PoCs. That's what I've been trying to collect on the Kernel
>> Self-Protection Project wiki pages. There are plenty of things we
>> could add to that list from the ROP world. Maybe it'd be good to look
>> through various exploit lists to find stuff that use techniques that
>> are either missing from the wiki page or are better examples? Do you
>> (or someone else) have time to go on a research/collection exercise?
> The proposal I have (and which prompted the question about the system
> call interface) is to block execve because many proof-of-concept exploit
> use execve or system to spawn a shell or run arbitrary commands.
So just drop that privilege? Pretty much any operating system implements
a form of sandboxing/privilege/capability filtering. It's, of course,
application specific.

Keep in mind that blocking execve leaves open chown(), chmod(), mount()
and a large number of other system calls that can be leveraged to
achieve a similar result.


     -  twiz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kernel-hardening] System call interface changes
  2015-11-24 20:54   ` Florian Weimer
  2015-11-24 21:43     ` Kees Cook
@ 2015-11-24 22:33     ` Rich Felker
  1 sibling, 0 replies; 10+ messages in thread
From: Rich Felker @ 2015-11-24 22:33 UTC (permalink / raw)
  To: kernel-hardening

On Tue, Nov 24, 2015 at 09:54:27PM +0100, Florian Weimer wrote:
> On 11/24/2015 09:10 PM, Kees Cook wrote:
> 
> > Ah, are you looking at this as an anti-ROP idea? What's the bug class
> > or exploitation method you're looking at addressing?
> 
> The idea, as it has been presented to me, is to remove the ability to
> invoke execve (and a few other system calls) completely, to push
> attackers *towards* ROP as the only means for injecting code into a
> process (not all processes, obviously, but a large class of processes as
> feasible).
> 
> As far as I know, this is intended as a post-exploitation mitigation
> (before policy-based mechanisms or containers kick in).  I don't know
> enough about real-world attack scenarios to tell if these changes would
> make a difference.

If the intent is only to block a single syscall, why would you break
the existing user-kernel interface boundary and require a new
alternate one rather than just doing something like
seccomp/pledge/prctl to disable exec* syscalls and whatever else needs
to be blocked?

I'm also a bit confused how you would take advantage of this without
static linking since libc.so will contain all the syscalls and
syscall().

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-12-07 16:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-20 10:30 [kernel-hardening] System call interface changes Florian Weimer
2015-11-20 19:16 ` Rich Felker
2015-11-24 19:54   ` Florian Weimer
2015-11-24 20:29     ` Rich Felker
2015-11-24 20:10 ` Kees Cook
2015-11-24 20:54   ` Florian Weimer
2015-11-24 21:43     ` Kees Cook
2015-12-07 15:44       ` Florian Weimer
2015-12-07 16:26         ` lazytyped
2015-11-24 22:33     ` Rich Felker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.