Re: bind mount that lies to apps about st

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: bind mount that lies to apps about st_uid?
       [not found] <CAOP=4wipj4u_hpXRMh4qWn=ofb-br_=qS8vY8-WjqUrphQNw4w@mail.gmail.com>
@ 2015-07-15  0:41 ` Andrew Lutomirski
  2015-07-15  0:53   ` Dave Chinner
  2015-07-15  1:12   ` Eric W. Biederman
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Lutomirski @ 2015-07-15  0:41 UTC (permalink / raw)
  To: Kenton Varda; +Cc: Eric W. Biederman, Drew Fisher, David Renshaw, linux-fsdevel

On Tue, Jul 14, 2015 at 5:35 PM, Kenton Varda <kenton@sandstorm.io> wrote:
> Hi Eric,
>
> I'm curious to know if you think this the following is something that could
> realistically be implemented, or if it's a totally outrageous idea. :)
>
> For Sandstorm, we've found a case where it would be useful if we could
> create a bind-mount in which stat() calls lie and claim all files have UID
> 65534 (as if their real owner wasn't mapped into the user namespace). All
> actual security checks would behave normally; only the result of stat()
> would be affected.
>
> Specifically, the problem is that some apps have checks of the form: "If
> file is owned by my UID, then modify it, otherwise do something else
> reasonable." These checks are wrong in the case of read-only bind mounts,
> but legacy code rarely considers such a possibility.
>
> Making maters worse, it is the case currently that some configurations of
> Sandstorm will have these read-only files owned by the UID running the app
> while others will not. Thus this is an observable difference between servers
> that can affect app behavior, causing them to work fine on some servers but
> break on others, which is something we want to avoid at all costs. We can't
> see any reasonable way to resolve the difference without kernel help,
> unfortunately.
>
> So, it would be convenient if I could trick these apps into behaving
> correctly by asking the kernel to lie to them about the UID. For our use
> case, a boolean flag would work. It could simply say: "For stat() purposes
> on this mount, act as if the uid_map and gid_map are empty, therefore return
> st_uid and st_gid as 65534." Alternatively, I'd also be happy with a "uid="
> setting that overrides all reported UIDs, which seems possibly more broadly
> useful, but also possibly riskier(?). I'm also happy with restricting this
> to only mounts with MS_RDONLY and MS_NOSUID, if that helps.
>
> It's a hack, but it's simple and would save a bunch of confusion for our
> developers in the long run, I think.

There was some talk at LSF/MM about more generally allowing remapping
of the view of uids.  This would probably be a per-superblock thing
instead of a per-mount-point thing, but some kind of remapping per
mount point seems like a sensible thing to me.

Eric, thoughts?

--Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bind mount that lies to apps about st_uid?
  2015-07-15  0:41 ` bind mount that lies to apps about st_uid? Andrew Lutomirski
@ 2015-07-15  0:53   ` Dave Chinner
  2015-07-15  1:14     ` Kenton Varda
  2015-07-15  1:12   ` Eric W. Biederman
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2015-07-15  0:53 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Kenton Varda, Eric W. Biederman, Drew Fisher, David Renshaw,
	linux-fsdevel

On Tue, Jul 14, 2015 at 05:41:01PM -0700, Andrew Lutomirski wrote:
> On Tue, Jul 14, 2015 at 5:35 PM, Kenton Varda <kenton@sandstorm.io> wrote:
> > Hi Eric,
> >
> > I'm curious to know if you think this the following is something that could
> > realistically be implemented, or if it's a totally outrageous idea. :)
> >
> > For Sandstorm, we've found a case where it would be useful if we could
> > create a bind-mount in which stat() calls lie and claim all files have UID
> > 65534 (as if their real owner wasn't mapped into the user namespace). All
> > actual security checks would behave normally; only the result of stat()
> > would be affected.
> >
> > Specifically, the problem is that some apps have checks of the form: "If
> > file is owned by my UID, then modify it, otherwise do something else
> > reasonable." These checks are wrong in the case of read-only bind mounts,
> > but legacy code rarely considers such a possibility.
> >
> > Making maters worse, it is the case currently that some configurations of
> > Sandstorm will have these read-only files owned by the UID running the app
> > while others will not. Thus this is an observable difference between servers
> > that can affect app behavior, causing them to work fine on some servers but
> > break on others, which is something we want to avoid at all costs. We can't
> > see any reasonable way to resolve the difference without kernel help,
> > unfortunately.
> >
> > So, it would be convenient if I could trick these apps into behaving
> > correctly by asking the kernel to lie to them about the UID. For our use
> > case, a boolean flag would work. It could simply say: "For stat() purposes
> > on this mount, act as if the uid_map and gid_map are empty, therefore return
> > st_uid and st_gid as 65534." Alternatively, I'd also be happy with a "uid="
> > setting that overrides all reported UIDs, which seems possibly more broadly
> > useful, but also possibly riskier(?). I'm also happy with restricting this
> > to only mounts with MS_RDONLY and MS_NOSUID, if that helps.
> >
> > It's a hack, but it's simple and would save a bunch of confusion for our
> > developers in the long run, I think.

Why wouldn't you just use an ld preload library to intercept stat()
calls in userspace and manipulate them there? No kernel changes
required, no app changes required....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bind mount that lies to apps about st_uid?
  2015-07-15  0:53   ` Dave Chinner
@ 2015-07-15  1:14     ` Kenton Varda
  2015-07-15  1:24       ` Andrew Lutomirski
  0 siblings, 1 reply; 8+ messages in thread
From: Kenton Varda @ 2015-07-15  1:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andrew Lutomirski, Eric W. Biederman, Drew Fisher, David Renshaw,
	Linux FS Devel

On Tue, Jul 14, 2015 at 5:53 PM, Dave Chinner <david@fromorbit.com> wrote:
> Why wouldn't you just use an ld preload library to intercept stat()
> calls in userspace and manipulate them there? No kernel changes
> required, no app changes required....

In theory I love this approach but, sadly, using LD_PRELOADed
libraries in practice has a lot of problems:
- You can't intercept intra-libc calls, so other libc functions that
might call stat() potentially need to be overridden.
- You can't intercept calls from static binaries, which are common on Sandstorm.
- You can't intercept syscalls made directly via syscall() or inline
assembly, which happen from time to time.
- The library file needs to be mapped into the container somewhere.
Normally the application controls the entire container layout, so this
is awkward.
- Lots of other issues.

Some of the above can theoretically be solved by using seccomp to
raise SIGSYS on every stat() call and perhaps using a userspace ELF
loader, but this brings its own set of complications and performance
issues and would take a long time to build.

Note that we would *love* the ability to more cleanly and robustly
filter / rewrite syscalls in userspace, for exactly this sort of
thing. But it seems the available options (ptrace and LD_PRELOAD) are
far too difficult and quirky to use effectively as-is.

-Kenton

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bind mount that lies to apps about st_uid?
  2015-07-15  1:14     ` Kenton Varda
@ 2015-07-15  1:24       ` Andrew Lutomirski
  2015-07-15  1:40         ` Kenton Varda
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Lutomirski @ 2015-07-15  1:24 UTC (permalink / raw)
  To: Kenton Varda
  Cc: Dave Chinner, Eric W. Biederman, Drew Fisher, David Renshaw,
	Linux FS Devel, Kees Cook

On Tue, Jul 14, 2015 at 6:14 PM, Kenton Varda <kenton@sandstorm.io> wrote:
> On Tue, Jul 14, 2015 at 5:53 PM, Dave Chinner <david@fromorbit.com> wrote:
>> Why wouldn't you just use an ld preload library to intercept stat()
>> calls in userspace and manipulate them there? No kernel changes
>> required, no app changes required....
>
> In theory I love this approach but, sadly, using LD_PRELOADed
> libraries in practice has a lot of problems:
> - You can't intercept intra-libc calls, so other libc functions that
> might call stat() potentially need to be overridden.
> - You can't intercept calls from static binaries, which are common on Sandstorm.
> - You can't intercept syscalls made directly via syscall() or inline
> assembly, which happen from time to time.
> - The library file needs to be mapped into the container somewhere.
> Normally the application controls the entire container layout, so this
> is awkward.
> - Lots of other issues.
>
> Some of the above can theoretically be solved by using seccomp to
> raise SIGSYS on every stat() call and perhaps using a userspace ELF
> loader, but this brings its own set of complications and performance
> issues and would take a long time to build.
>
> Note that we would *love* the ability to more cleanly and robustly
> filter / rewrite syscalls in userspace, for exactly this sort of
> thing. But it seems the available options (ptrace and LD_PRELOAD) are
> far too difficult and quirky to use effectively as-is.

The latter is very much on my todo list.  The performance will suck
less than ptrace, but it still won't be fantastic.  It'll be extra
tricky here because you'll be accessing user memory, too.

Kees, have you or any of the Chromium people played with better
trapping mechanisms?  There's my awful patch set to help enable
something better, but I've tabled it until the x86 entry cleanups are
done, because it's just too messy right now.

--Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bind mount that lies to apps about st_uid?
  2015-07-15  1:24       ` Andrew Lutomirski
@ 2015-07-15  1:40         ` Kenton Varda
  2015-07-16 17:59           ` Kees Cook
  0 siblings, 1 reply; 8+ messages in thread
From: Kenton Varda @ 2015-07-15  1:40 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Dave Chinner, Eric W. Biederman, Drew Fisher, David Renshaw,
	Linux FS Devel, Kees Cook

On Tue, Jul 14, 2015 at 6:24 PM, Andrew Lutomirski <andy@luto.us> wrote:
> On Tue, Jul 14, 2015 at 6:14 PM, Kenton Varda <kenton@sandstorm.io> wrote:
>> Note that we would *love* the ability to more cleanly and robustly
>> filter / rewrite syscalls in userspace, for exactly this sort of
>> thing. But it seems the available options (ptrace and LD_PRELOAD) are
>> far too difficult and quirky to use effectively as-is.
>
> The latter is very much on my todo list.  The performance will suck
> less than ptrace, but it still won't be fantastic.  It'll be extra
> tricky here because you'll be accessing user memory, too.
>
> Kees, have you or any of the Chromium people played with better
> trapping mechanisms?  There's my awful patch set to help enable
> something better, but I've tabled it until the x86 entry cleanups are
> done, because it's just too messy right now.

For our purposes, I think I really just want to convert `syscall`
instructions into regular old calls to some well-defined address where
I can put my filtering code. I'm fine with the filter being in
userspace where the process can attack it because the purpose of the
filter is only compatibility fix-ups, not security enforcement.

seccomp-bpf can basically do this by raising SIGSYS, but it seems like
things break down when you get into details. E.g. you need the filter
itself to be able to make syscalls, so you need seccomp to *not* block
the syscall instructions made by the filter, which seems to require
that the filter code be loaded in a reliable location in memory, which
is kind of impossible. Also, performance.

-Kenton

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bind mount that lies to apps about st_uid?
  2015-07-15  1:40         ` Kenton Varda
@ 2015-07-16 17:59           ` Kees Cook
  0 siblings, 0 replies; 8+ messages in thread
From: Kees Cook @ 2015-07-16 17:59 UTC (permalink / raw)
  To: Kenton Varda
  Cc: Andrew Lutomirski, Dave Chinner, Eric W. Biederman, Drew Fisher,
	David Renshaw, Linux FS Devel

On Tue, Jul 14, 2015 at 6:40 PM, Kenton Varda <kenton@sandstorm.io> wrote:
> On Tue, Jul 14, 2015 at 6:24 PM, Andrew Lutomirski <andy@luto.us> wrote:
>> On Tue, Jul 14, 2015 at 6:14 PM, Kenton Varda <kenton@sandstorm.io> wrote:
>>> Note that we would *love* the ability to more cleanly and robustly
>>> filter / rewrite syscalls in userspace, for exactly this sort of
>>> thing. But it seems the available options (ptrace and LD_PRELOAD) are
>>> far too difficult and quirky to use effectively as-is.
>>
>> The latter is very much on my todo list.  The performance will suck
>> less than ptrace, but it still won't be fantastic.  It'll be extra
>> tricky here because you'll be accessing user memory, too.
>>
>> Kees, have you or any of the Chromium people played with better
>> trapping mechanisms?  There's my awful patch set to help enable
>> something better, but I've tabled it until the x86 entry cleanups are
>> done, because it's just too messy right now.

Not yet, no. So far, using the seccomp ptrace hook
(PTRACE_O_TRACESECCOMP) is how we trap weird stuff. It's actually
pretty decent as long as you're not doing it A LOT. :) The downside is
that changing syscall return values requires per-arch register
manipulation, but once that's written, it's not going to change. (The
seccomp regression test suite already has all this, FWIW.)

> For our purposes, I think I really just want to convert `syscall`
> instructions into regular old calls to some well-defined address where
> I can put my filtering code. I'm fine with the filter being in
> userspace where the process can attack it because the purpose of the
> filter is only compatibility fix-ups, not security enforcement.
>
> seccomp-bpf can basically do this by raising SIGSYS, but it seems like
> things break down when you get into details. E.g. you need the filter
> itself to be able to make syscalls, so you need seccomp to *not* block
> the syscall instructions made by the filter, which seems to require
> that the filter code be loaded in a reliable location in memory, which
> is kind of impossible. Also, performance.

Performance will likely be the limiting factor, but I'm curious about
using a ptrace manager and only catching SECCOMP_RET_TRACE seccomp
filter actions. Is it only stat you need to manipulate? You can catch
the syscall entry with seccomp (no need for SIGSYS), then catch the
syscall exit with PTRACE_SYSCALL, check the results, rewrite them, and
PTRACE_CONT? Are you needing to catch stat really often (or more than
just stat)?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bind mount that lies to apps about st_uid?
  2015-07-15  0:41 ` bind mount that lies to apps about st_uid? Andrew Lutomirski
  2015-07-15  0:53   ` Dave Chinner
@ 2015-07-15  1:12   ` Eric W. Biederman
  2015-07-15  1:27     ` Kenton Varda
  1 sibling, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2015-07-15  1:12 UTC (permalink / raw)
  To: Andrew Lutomirski; +Cc: Kenton Varda, Drew Fisher, David Renshaw, linux-fsdevel

Andrew Lutomirski <andy@luto.us> writes:

> On Tue, Jul 14, 2015 at 5:35 PM, Kenton Varda <kenton@sandstorm.io> wrote:
>> Hi Eric,
>>
>> I'm curious to know if you think this the following is something that could
>> realistically be implemented, or if it's a totally outrageous idea. :)
>>
>> For Sandstorm, we've found a case where it would be useful if we could
>> create a bind-mount in which stat() calls lie and claim all files have UID
>> 65534 (as if their real owner wasn't mapped into the user namespace). All
>> actual security checks would behave normally; only the result of stat()
>> would be affected.
>>
>> Specifically, the problem is that some apps have checks of the form: "If
>> file is owned by my UID, then modify it, otherwise do something else
>> reasonable." These checks are wrong in the case of read-only bind mounts,
>> but legacy code rarely considers such a possibility.
>>
>> Making maters worse, it is the case currently that some configurations of
>> Sandstorm will have these read-only files owned by the UID running the app
>> while others will not. Thus this is an observable difference between servers
>> that can affect app behavior, causing them to work fine on some servers but
>> break on others, which is something we want to avoid at all costs. We can't
>> see any reasonable way to resolve the difference without kernel help,
>> unfortunately.
>>
>> So, it would be convenient if I could trick these apps into behaving
>> correctly by asking the kernel to lie to them about the UID. For our use
>> case, a boolean flag would work. It could simply say: "For stat() purposes
>> on this mount, act as if the uid_map and gid_map are empty, therefore return
>> st_uid and st_gid as 65534." Alternatively, I'd also be happy with a "uid="
>> setting that overrides all reported UIDs, which seems possibly more broadly
>> useful, but also possibly riskier(?). I'm also happy with restricting this
>> to only mounts with MS_RDONLY and MS_NOSUID, if that helps.
>>
>> It's a hack, but it's simple and would save a bunch of confusion for our
>> developers in the long run, I think.
>
> There was some talk at LSF/MM about more generally allowing remapping
> of the view of uids.  This would probably be a per-superblock thing
> instead of a per-mount-point thing, but some kind of remapping per
> mount point seems like a sensible thing to me.

Remapping the ids on a filesystem to the ids in the system on
a superblock basis is comparatively straight forward and
something that is necessary to allow unprivileged users to mount
filesystems that have any kind of storage.

Basically all that would require is having a user namespace parameter
on the superblock.

Once overlayfs is handled that will probably give your the per mount
flexibility you dream of.

As to the original question of remapping ids only in stat.  I agree with
Dave Chinner use LD_PRELOAD.  Maintaining kernel code is costly and that
is a hack I don't want to maintain for the rest of eternity it appears
to have entirely too little value.

Eric






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bind mount that lies to apps about st_uid?
  2015-07-15  1:12   ` Eric W. Biederman
@ 2015-07-15  1:27     ` Kenton Varda
  0 siblings, 0 replies; 8+ messages in thread
From: Kenton Varda @ 2015-07-15  1:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Lutomirski, Drew Fisher, David Renshaw, Linux FS Devel

On Tue, Jul 14, 2015 at 6:12 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> As to the original question of remapping ids only in stat.  I agree with
> Dave Chinner use LD_PRELOAD.  Maintaining kernel code is costly and that
> is a hack I don't want to maintain for the rest of eternity it appears
> to have entirely too little value.

Fair enough, I figured as much but thought I'd register interest in
case it comes up again elsewhere.

(But LD_PRELOAD does not work for us, so we'll just have to deal with
this inconsistency for now.)

-Kenton

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-16 17:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAOP=4wipj4u_hpXRMh4qWn=ofb-br_=qS8vY8-WjqUrphQNw4w@mail.gmail.com>
2015-07-15  0:41 ` bind mount that lies to apps about st_uid? Andrew Lutomirski
2015-07-15  0:53   ` Dave Chinner
2015-07-15  1:14     ` Kenton Varda
2015-07-15  1:24       ` Andrew Lutomirski
2015-07-15  1:40         ` Kenton Varda
2015-07-16 17:59           ` Kees Cook
2015-07-15  1:12   ` Eric W. Biederman
2015-07-15  1:27     ` Kenton Varda

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.