* eBPF / seccomp globals?
@ 2015-09-04 1:01 Michael Tirado
2015-09-04 3:17 ` Alexei Starovoitov
2015-09-04 4:01 ` Kees Cook
0 siblings, 2 replies; 6+ messages in thread
From: Michael Tirado @ 2015-09-04 1:01 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel
Hiyall,
I have created a seccomp white list filter for a program that launches
other less trustworthy programs. It's working great so far, but I
have run into a little roadblock. the launcher program needs to call
execve as it's final step, but that may not be present in the white
list. I am wondering if there is any way to use some sort of global
variable that will be preserved between syscall filter calls so that I
can allow only one execve, if not present in white list by
incrementing a counter variable.
I see that in Documentation/networking/filter.txt one of the registers
is documented as being a pointer to struct sk_buff, in the seccomp
context this is a pointer to struct seccomp_data instead, right? and
the line about callee saved registers R6-R9 probably refers to them
being saved across calls within that filter, and not calls between
filters?
My apologies if this is not the appropriate place to ask for help, but
it is difficult to find useful information on how eBPF works, and is a
bit confusing trying to figure out the differences between seccomp and
net filters, and the old bpf code kicking around short of spending
countless hours reading through all of it. If anybody has a some
links to share I would be very grateful. the only way I can think to
make this work otherwise is to mount everything as MS_NOEXEC in the
new namespace, but that just feels wrong.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: eBPF / seccomp globals?
2015-09-04 1:01 eBPF / seccomp globals? Michael Tirado
@ 2015-09-04 3:17 ` Alexei Starovoitov
2015-09-04 14:03 ` Tycho Andersen
2015-09-04 4:01 ` Kees Cook
1 sibling, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2015-09-04 3:17 UTC (permalink / raw)
To: Michael Tirado
Cc: netdev, linux-kernel, Tycho Andersen, Serge E. Hallyn, Kees Cook
On Fri, Sep 04, 2015 at 01:01:20AM +0000, Michael Tirado wrote:
> Hiyall,
>
> I have created a seccomp white list filter for a program that launches
> other less trustworthy programs. It's working great so far, but I
> have run into a little roadblock. the launcher program needs to call
> execve as it's final step, but that may not be present in the white
> list. I am wondering if there is any way to use some sort of global
> variable that will be preserved between syscall filter calls so that I
> can allow only one execve, if not present in white list by
> incrementing a counter variable.
>
> I see that in Documentation/networking/filter.txt one of the registers
> is documented as being a pointer to struct sk_buff, in the seccomp
> context this is a pointer to struct seccomp_data instead, right? and
> the line about callee saved registers R6-R9 probably refers to them
> being saved across calls within that filter, and not calls between
> filters?
R6-R9 are the registered preserved across calls to helper functions
within single program. They are not preserved across invocations
of the same program. At the start of the program only R1 (pointer
to context) is valid.
The eBPF programs used for kprobes, sockets and TC can simulate
global state via maps. Like a map of one element can have some
'struct globals { ... }' as a value in such map. Then programs
can keep global state in there. If a key into such map is cpu_id,
then such state becomes per-cpu global. Other tricks possible too.
Unfortunately seccomp doesn't have access to eBPF yet
(only classic BPF is supported), but, I believe, Tycho is
working on adding eBPF to seccomp and criu of eBPF programs...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: eBPF / seccomp globals?
2015-09-04 1:01 eBPF / seccomp globals? Michael Tirado
2015-09-04 3:17 ` Alexei Starovoitov
@ 2015-09-04 4:01 ` Kees Cook
2015-09-04 20:29 ` Michael Tirado
1 sibling, 1 reply; 6+ messages in thread
From: Kees Cook @ 2015-09-04 4:01 UTC (permalink / raw)
To: Michael Tirado; +Cc: Network Development, LKML
On Thu, Sep 3, 2015 at 6:01 PM, Michael Tirado <mtirado418@gmail.com> wrote:
> Hiyall,
>
> I have created a seccomp white list filter for a program that launches
> other less trustworthy programs. It's working great so far, but I
> have run into a little roadblock. the launcher program needs to call
> execve as it's final step, but that may not be present in the white
> list. I am wondering if there is any way to use some sort of global
> variable that will be preserved between syscall filter calls so that I
> can allow only one execve, if not present in white list by
> incrementing a counter variable.
>
> I see that in Documentation/networking/filter.txt one of the registers
> is documented as being a pointer to struct sk_buff, in the seccomp
> context this is a pointer to struct seccomp_data instead, right? and
> the line about callee saved registers R6-R9 probably refers to them
> being saved across calls within that filter, and not calls between
> filters?
>
> My apologies if this is not the appropriate place to ask for help, but
> it is difficult to find useful information on how eBPF works, and is a
> bit confusing trying to figure out the differences between seccomp and
> net filters, and the old bpf code kicking around short of spending
> countless hours reading through all of it. If anybody has a some
> links to share I would be very grateful. the only way I can think to
> make this work otherwise is to mount everything as MS_NOEXEC in the
> new namespace, but that just feels wrong.
For documentation, there's some great slides on seccomp from Plumber's
this year[1].
At present, there is no variable state beyond the syscall context (PC,
args) available to seccomp filters. The no_new_privs prctl was added
to reduce the risk of including execve in a filter's whitelist, but
that isn't as strong as the "exec once" feature you want.
What we did in Chrome OS was to use the "minijail" tool[2] to
LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
a bit of a hack, but works in well-defined environments. You are
talking about namespaces, though, so maybe minijail is worth a look?
It does that too and a whole lot more.
As for using maps via eBPF in seccomp, it's on the horizon, but it
comes with a lot exposure that I haven't finished pondering, so I
don't think those features will be added soon.
-Kees
[1] http://man7.org/conf/lpc2015/limiting_kernel_attack_surface_with_seccomp-LPC_2015-Kerrisk.pdf
[2] see subdirectory "minijail" after "git clone
https://chromium.googlesource.com/chromiumos/platform2/"
--
Kees Cook
Chrome OS Security
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: eBPF / seccomp globals?
2015-09-04 3:17 ` Alexei Starovoitov
@ 2015-09-04 14:03 ` Tycho Andersen
0 siblings, 0 replies; 6+ messages in thread
From: Tycho Andersen @ 2015-09-04 14:03 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Michael Tirado, netdev, linux-kernel, Serge E. Hallyn, Kees Cook
Hi all,
On Thu, Sep 03, 2015 at 08:17:05PM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 04, 2015 at 01:01:20AM +0000, Michael Tirado wrote:
> > Hiyall,
> >
> > I have created a seccomp white list filter for a program that launches
> > other less trustworthy programs. It's working great so far, but I
> > have run into a little roadblock. the launcher program needs to call
> > execve as it's final step, but that may not be present in the white
> > list. I am wondering if there is any way to use some sort of global
> > variable that will be preserved between syscall filter calls so that I
> > can allow only one execve, if not present in white list by
> > incrementing a counter variable.
> >
> > I see that in Documentation/networking/filter.txt one of the registers
> > is documented as being a pointer to struct sk_buff, in the seccomp
> > context this is a pointer to struct seccomp_data instead, right? and
> > the line about callee saved registers R6-R9 probably refers to them
> > being saved across calls within that filter, and not calls between
> > filters?
>
> R6-R9 are the registered preserved across calls to helper functions
> within single program. They are not preserved across invocations
> of the same program. At the start of the program only R1 (pointer
> to context) is valid.
> The eBPF programs used for kprobes, sockets and TC can simulate
> global state via maps. Like a map of one element can have some
> 'struct globals { ... }' as a value in such map. Then programs
> can keep global state in there. If a key into such map is cpu_id,
> then such state becomes per-cpu global. Other tricks possible too.
> Unfortunately seccomp doesn't have access to eBPF yet
> (only classic BPF is supported), but, I believe, Tycho is
> working on adding eBPF to seccomp and criu of eBPF programs...
Indeed I am, however my patches don't have support for seccomp
programs using eBPF maps. I'm intending to post them later today, so
stay tuned.
Tycho
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: eBPF / seccomp globals?
2015-09-04 4:01 ` Kees Cook
@ 2015-09-04 20:29 ` Michael Tirado
2015-09-04 20:37 ` Kees Cook
0 siblings, 1 reply; 6+ messages in thread
From: Michael Tirado @ 2015-09-04 20:29 UTC (permalink / raw)
To: Kees Cook; +Cc: Network Development, LKML, alexei.starovoitov
> What we did in Chrome OS was to use the "minijail" tool[2] to
> LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
> a bit of a hack, but works in well-defined environments. You are
> talking about namespaces, though, so maybe minijail is worth a look?
> It does that too and a whole lot more.
Minijail is pretty similar to what I have been working on the past few
months, unfortunately I have already written it, doh! Those slides
are a good resource, definitely helpful as introduction to seccomp.
So it seems there are no easy solutions to this problem. Using
LD_PRELOAD to defer seccomp filter application scares me a little bit,
and won't work with file capabilities IIRC, though it is a damn clever
solution. I think for now I will explore the possibility of
validating argument 1 of exec to allow only the program I am launching
to be exec'd, so if somehow by Thor's hammer that program escapes it's
sandbox, it will only be able to exec itself. I suppose it will have
to now be restricted to absolute paths only.
Thanks everyone for the clarification!
On Fri, Sep 4, 2015 at 4:01 AM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Sep 3, 2015 at 6:01 PM, Michael Tirado <mtirado418@gmail.com> wrote:
>> Hiyall,
>>
>> I have created a seccomp white list filter for a program that launches
>> other less trustworthy programs. It's working great so far, but I
>> have run into a little roadblock. the launcher program needs to call
>> execve as it's final step, but that may not be present in the white
>> list. I am wondering if there is any way to use some sort of global
>> variable that will be preserved between syscall filter calls so that I
>> can allow only one execve, if not present in white list by
>> incrementing a counter variable.
>>
>> I see that in Documentation/networking/filter.txt one of the registers
>> is documented as being a pointer to struct sk_buff, in the seccomp
>> context this is a pointer to struct seccomp_data instead, right? and
>> the line about callee saved registers R6-R9 probably refers to them
>> being saved across calls within that filter, and not calls between
>> filters?
>>
>> My apologies if this is not the appropriate place to ask for help, but
>> it is difficult to find useful information on how eBPF works, and is a
>> bit confusing trying to figure out the differences between seccomp and
>> net filters, and the old bpf code kicking around short of spending
>> countless hours reading through all of it. If anybody has a some
>> links to share I would be very grateful. the only way I can think to
>> make this work otherwise is to mount everything as MS_NOEXEC in the
>> new namespace, but that just feels wrong.
>
> For documentation, there's some great slides on seccomp from Plumber's
> this year[1].
>
> At present, there is no variable state beyond the syscall context (PC,
> args) available to seccomp filters. The no_new_privs prctl was added
> to reduce the risk of including execve in a filter's whitelist, but
> that isn't as strong as the "exec once" feature you want.
>
> What we did in Chrome OS was to use the "minijail" tool[2] to
> LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
> a bit of a hack, but works in well-defined environments. You are
> talking about namespaces, though, so maybe minijail is worth a look?
> It does that too and a whole lot more.
>
> As for using maps via eBPF in seccomp, it's on the horizon, but it
> comes with a lot exposure that I haven't finished pondering, so I
> don't think those features will be added soon.
>
> -Kees
>
> [1] http://man7.org/conf/lpc2015/limiting_kernel_attack_surface_with_seccomp-LPC_2015-Kerrisk.pdf
> [2] see subdirectory "minijail" after "git clone
> https://chromium.googlesource.com/chromiumos/platform2/"
>
>
> --
> Kees Cook
> Chrome OS Security
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: eBPF / seccomp globals?
2015-09-04 20:29 ` Michael Tirado
@ 2015-09-04 20:37 ` Kees Cook
0 siblings, 0 replies; 6+ messages in thread
From: Kees Cook @ 2015-09-04 20:37 UTC (permalink / raw)
To: Michael Tirado; +Cc: Network Development, LKML, Alexei Starovoitov
On Fri, Sep 4, 2015 at 1:29 PM, Michael Tirado <mtirado418@gmail.com> wrote:
>> What we did in Chrome OS was to use the "minijail" tool[2] to
>> LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
>> a bit of a hack, but works in well-defined environments. You are
>> talking about namespaces, though, so maybe minijail is worth a look?
>> It does that too and a whole lot more.
>
> Minijail is pretty similar to what I have been working on the past few
> months, unfortunately I have already written it, doh! Those slides
> are a good resource, definitely helpful as introduction to seccomp.
>
> So it seems there are no easy solutions to this problem. Using
> LD_PRELOAD to defer seccomp filter application scares me a little bit,
> and won't work with file capabilities IIRC, though it is a damn clever
Do you still need file capabilities with the availability of the new
ambient capabilities?
https://s3hh.wordpress.com/2015/07/25/ambient-capabilities/
http://thread.gmane.org/gmane.linux.kernel.lsm/24034
> solution. I think for now I will explore the possibility of
> validating argument 1 of exec to allow only the program I am launching
> to be exec'd, so if somehow by Thor's hammer that program escapes it's
> sandbox, it will only be able to exec itself. I suppose it will have
> to now be restricted to absolute paths only.
Well, you can only examine the memory address and not what's pointed
to, so you may be out of luck there too. Sorry! On the TODO list is
doing deep argument inspection, but it is not an easy thing to get
right. :)
-Kees
>
> Thanks everyone for the clarification!
>
> On Fri, Sep 4, 2015 at 4:01 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Sep 3, 2015 at 6:01 PM, Michael Tirado <mtirado418@gmail.com> wrote:
>>> Hiyall,
>>>
>>> I have created a seccomp white list filter for a program that launches
>>> other less trustworthy programs. It's working great so far, but I
>>> have run into a little roadblock. the launcher program needs to call
>>> execve as it's final step, but that may not be present in the white
>>> list. I am wondering if there is any way to use some sort of global
>>> variable that will be preserved between syscall filter calls so that I
>>> can allow only one execve, if not present in white list by
>>> incrementing a counter variable.
>>>
>>> I see that in Documentation/networking/filter.txt one of the registers
>>> is documented as being a pointer to struct sk_buff, in the seccomp
>>> context this is a pointer to struct seccomp_data instead, right? and
>>> the line about callee saved registers R6-R9 probably refers to them
>>> being saved across calls within that filter, and not calls between
>>> filters?
>>>
>>> My apologies if this is not the appropriate place to ask for help, but
>>> it is difficult to find useful information on how eBPF works, and is a
>>> bit confusing trying to figure out the differences between seccomp and
>>> net filters, and the old bpf code kicking around short of spending
>>> countless hours reading through all of it. If anybody has a some
>>> links to share I would be very grateful. the only way I can think to
>>> make this work otherwise is to mount everything as MS_NOEXEC in the
>>> new namespace, but that just feels wrong.
>>
>> For documentation, there's some great slides on seccomp from Plumber's
>> this year[1].
>>
>> At present, there is no variable state beyond the syscall context (PC,
>> args) available to seccomp filters. The no_new_privs prctl was added
>> to reduce the risk of including execve in a filter's whitelist, but
>> that isn't as strong as the "exec once" feature you want.
>>
>> What we did in Chrome OS was to use the "minijail" tool[2] to
>> LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
>> a bit of a hack, but works in well-defined environments. You are
>> talking about namespaces, though, so maybe minijail is worth a look?
>> It does that too and a whole lot more.
>>
>> As for using maps via eBPF in seccomp, it's on the horizon, but it
>> comes with a lot exposure that I haven't finished pondering, so I
>> don't think those features will be added soon.
>>
>> -Kees
>>
>> [1] http://man7.org/conf/lpc2015/limiting_kernel_attack_surface_with_seccomp-LPC_2015-Kerrisk.pdf
>> [2] see subdirectory "minijail" after "git clone
>> https://chromium.googlesource.com/chromiumos/platform2/"
>>
>>
>> --
>> Kees Cook
>> Chrome OS Security
--
Kees Cook
Chrome OS Security
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-09-04 20:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-04 1:01 eBPF / seccomp globals? Michael Tirado
2015-09-04 3:17 ` Alexei Starovoitov
2015-09-04 14:03 ` Tycho Andersen
2015-09-04 4:01 ` Kees Cook
2015-09-04 20:29 ` Michael Tirado
2015-09-04 20:37 ` Kees Cook
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox