* Re: bind mount that lies to apps about st_uid? [not found] <CAOP=4wipj4u_hpXRMh4qWn=ofb-br_=qS8vY8-WjqUrphQNw4w@mail.gmail.com> @ 2015-07-15 0:41 ` Andrew Lutomirski 2015-07-15 0:53 ` Dave Chinner 2015-07-15 1:12 ` Eric W. Biederman 0 siblings, 2 replies; 8+ messages in thread From: Andrew Lutomirski @ 2015-07-15 0:41 UTC (permalink / raw) To: Kenton Varda; +Cc: Eric W. Biederman, Drew Fisher, David Renshaw, linux-fsdevel On Tue, Jul 14, 2015 at 5:35 PM, Kenton Varda <kenton@sandstorm.io> wrote: > Hi Eric, > > I'm curious to know if you think this the following is something that could > realistically be implemented, or if it's a totally outrageous idea. :) > > For Sandstorm, we've found a case where it would be useful if we could > create a bind-mount in which stat() calls lie and claim all files have UID > 65534 (as if their real owner wasn't mapped into the user namespace). All > actual security checks would behave normally; only the result of stat() > would be affected. > > Specifically, the problem is that some apps have checks of the form: "If > file is owned by my UID, then modify it, otherwise do something else > reasonable." These checks are wrong in the case of read-only bind mounts, > but legacy code rarely considers such a possibility. > > Making maters worse, it is the case currently that some configurations of > Sandstorm will have these read-only files owned by the UID running the app > while others will not. Thus this is an observable difference between servers > that can affect app behavior, causing them to work fine on some servers but > break on others, which is something we want to avoid at all costs. We can't > see any reasonable way to resolve the difference without kernel help, > unfortunately. > > So, it would be convenient if I could trick these apps into behaving > correctly by asking the kernel to lie to them about the UID. For our use > case, a boolean flag would work. It could simply say: "For stat() purposes > on this mount, act as if the uid_map and gid_map are empty, therefore return > st_uid and st_gid as 65534." Alternatively, I'd also be happy with a "uid=" > setting that overrides all reported UIDs, which seems possibly more broadly > useful, but also possibly riskier(?). I'm also happy with restricting this > to only mounts with MS_RDONLY and MS_NOSUID, if that helps. > > It's a hack, but it's simple and would save a bunch of confusion for our > developers in the long run, I think. There was some talk at LSF/MM about more generally allowing remapping of the view of uids. This would probably be a per-superblock thing instead of a per-mount-point thing, but some kind of remapping per mount point seems like a sensible thing to me. Eric, thoughts? --Andy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bind mount that lies to apps about st_uid? 2015-07-15 0:41 ` bind mount that lies to apps about st_uid? Andrew Lutomirski @ 2015-07-15 0:53 ` Dave Chinner 2015-07-15 1:14 ` Kenton Varda 2015-07-15 1:12 ` Eric W. Biederman 1 sibling, 1 reply; 8+ messages in thread From: Dave Chinner @ 2015-07-15 0:53 UTC (permalink / raw) To: Andrew Lutomirski Cc: Kenton Varda, Eric W. Biederman, Drew Fisher, David Renshaw, linux-fsdevel On Tue, Jul 14, 2015 at 05:41:01PM -0700, Andrew Lutomirski wrote: > On Tue, Jul 14, 2015 at 5:35 PM, Kenton Varda <kenton@sandstorm.io> wrote: > > Hi Eric, > > > > I'm curious to know if you think this the following is something that could > > realistically be implemented, or if it's a totally outrageous idea. :) > > > > For Sandstorm, we've found a case where it would be useful if we could > > create a bind-mount in which stat() calls lie and claim all files have UID > > 65534 (as if their real owner wasn't mapped into the user namespace). All > > actual security checks would behave normally; only the result of stat() > > would be affected. > > > > Specifically, the problem is that some apps have checks of the form: "If > > file is owned by my UID, then modify it, otherwise do something else > > reasonable." These checks are wrong in the case of read-only bind mounts, > > but legacy code rarely considers such a possibility. > > > > Making maters worse, it is the case currently that some configurations of > > Sandstorm will have these read-only files owned by the UID running the app > > while others will not. Thus this is an observable difference between servers > > that can affect app behavior, causing them to work fine on some servers but > > break on others, which is something we want to avoid at all costs. We can't > > see any reasonable way to resolve the difference without kernel help, > > unfortunately. > > > > So, it would be convenient if I could trick these apps into behaving > > correctly by asking the kernel to lie to them about the UID. For our use > > case, a boolean flag would work. It could simply say: "For stat() purposes > > on this mount, act as if the uid_map and gid_map are empty, therefore return > > st_uid and st_gid as 65534." Alternatively, I'd also be happy with a "uid=" > > setting that overrides all reported UIDs, which seems possibly more broadly > > useful, but also possibly riskier(?). I'm also happy with restricting this > > to only mounts with MS_RDONLY and MS_NOSUID, if that helps. > > > > It's a hack, but it's simple and would save a bunch of confusion for our > > developers in the long run, I think. Why wouldn't you just use an ld preload library to intercept stat() calls in userspace and manipulate them there? No kernel changes required, no app changes required.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bind mount that lies to apps about st_uid? 2015-07-15 0:53 ` Dave Chinner @ 2015-07-15 1:14 ` Kenton Varda 2015-07-15 1:24 ` Andrew Lutomirski 0 siblings, 1 reply; 8+ messages in thread From: Kenton Varda @ 2015-07-15 1:14 UTC (permalink / raw) To: Dave Chinner Cc: Andrew Lutomirski, Eric W. Biederman, Drew Fisher, David Renshaw, Linux FS Devel On Tue, Jul 14, 2015 at 5:53 PM, Dave Chinner <david@fromorbit.com> wrote: > Why wouldn't you just use an ld preload library to intercept stat() > calls in userspace and manipulate them there? No kernel changes > required, no app changes required.... In theory I love this approach but, sadly, using LD_PRELOADed libraries in practice has a lot of problems: - You can't intercept intra-libc calls, so other libc functions that might call stat() potentially need to be overridden. - You can't intercept calls from static binaries, which are common on Sandstorm. - You can't intercept syscalls made directly via syscall() or inline assembly, which happen from time to time. - The library file needs to be mapped into the container somewhere. Normally the application controls the entire container layout, so this is awkward. - Lots of other issues. Some of the above can theoretically be solved by using seccomp to raise SIGSYS on every stat() call and perhaps using a userspace ELF loader, but this brings its own set of complications and performance issues and would take a long time to build. Note that we would *love* the ability to more cleanly and robustly filter / rewrite syscalls in userspace, for exactly this sort of thing. But it seems the available options (ptrace and LD_PRELOAD) are far too difficult and quirky to use effectively as-is. -Kenton ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bind mount that lies to apps about st_uid? 2015-07-15 1:14 ` Kenton Varda @ 2015-07-15 1:24 ` Andrew Lutomirski 2015-07-15 1:40 ` Kenton Varda 0 siblings, 1 reply; 8+ messages in thread From: Andrew Lutomirski @ 2015-07-15 1:24 UTC (permalink / raw) To: Kenton Varda Cc: Dave Chinner, Eric W. Biederman, Drew Fisher, David Renshaw, Linux FS Devel, Kees Cook On Tue, Jul 14, 2015 at 6:14 PM, Kenton Varda <kenton@sandstorm.io> wrote: > On Tue, Jul 14, 2015 at 5:53 PM, Dave Chinner <david@fromorbit.com> wrote: >> Why wouldn't you just use an ld preload library to intercept stat() >> calls in userspace and manipulate them there? No kernel changes >> required, no app changes required.... > > In theory I love this approach but, sadly, using LD_PRELOADed > libraries in practice has a lot of problems: > - You can't intercept intra-libc calls, so other libc functions that > might call stat() potentially need to be overridden. > - You can't intercept calls from static binaries, which are common on Sandstorm. > - You can't intercept syscalls made directly via syscall() or inline > assembly, which happen from time to time. > - The library file needs to be mapped into the container somewhere. > Normally the application controls the entire container layout, so this > is awkward. > - Lots of other issues. > > Some of the above can theoretically be solved by using seccomp to > raise SIGSYS on every stat() call and perhaps using a userspace ELF > loader, but this brings its own set of complications and performance > issues and would take a long time to build. > > Note that we would *love* the ability to more cleanly and robustly > filter / rewrite syscalls in userspace, for exactly this sort of > thing. But it seems the available options (ptrace and LD_PRELOAD) are > far too difficult and quirky to use effectively as-is. The latter is very much on my todo list. The performance will suck less than ptrace, but it still won't be fantastic. It'll be extra tricky here because you'll be accessing user memory, too. Kees, have you or any of the Chromium people played with better trapping mechanisms? There's my awful patch set to help enable something better, but I've tabled it until the x86 entry cleanups are done, because it's just too messy right now. --Andy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bind mount that lies to apps about st_uid? 2015-07-15 1:24 ` Andrew Lutomirski @ 2015-07-15 1:40 ` Kenton Varda 2015-07-16 17:59 ` Kees Cook 0 siblings, 1 reply; 8+ messages in thread From: Kenton Varda @ 2015-07-15 1:40 UTC (permalink / raw) To: Andrew Lutomirski Cc: Dave Chinner, Eric W. Biederman, Drew Fisher, David Renshaw, Linux FS Devel, Kees Cook On Tue, Jul 14, 2015 at 6:24 PM, Andrew Lutomirski <andy@luto.us> wrote: > On Tue, Jul 14, 2015 at 6:14 PM, Kenton Varda <kenton@sandstorm.io> wrote: >> Note that we would *love* the ability to more cleanly and robustly >> filter / rewrite syscalls in userspace, for exactly this sort of >> thing. But it seems the available options (ptrace and LD_PRELOAD) are >> far too difficult and quirky to use effectively as-is. > > The latter is very much on my todo list. The performance will suck > less than ptrace, but it still won't be fantastic. It'll be extra > tricky here because you'll be accessing user memory, too. > > Kees, have you or any of the Chromium people played with better > trapping mechanisms? There's my awful patch set to help enable > something better, but I've tabled it until the x86 entry cleanups are > done, because it's just too messy right now. For our purposes, I think I really just want to convert `syscall` instructions into regular old calls to some well-defined address where I can put my filtering code. I'm fine with the filter being in userspace where the process can attack it because the purpose of the filter is only compatibility fix-ups, not security enforcement. seccomp-bpf can basically do this by raising SIGSYS, but it seems like things break down when you get into details. E.g. you need the filter itself to be able to make syscalls, so you need seccomp to *not* block the syscall instructions made by the filter, which seems to require that the filter code be loaded in a reliable location in memory, which is kind of impossible. Also, performance. -Kenton ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bind mount that lies to apps about st_uid? 2015-07-15 1:40 ` Kenton Varda @ 2015-07-16 17:59 ` Kees Cook 0 siblings, 0 replies; 8+ messages in thread From: Kees Cook @ 2015-07-16 17:59 UTC (permalink / raw) To: Kenton Varda Cc: Andrew Lutomirski, Dave Chinner, Eric W. Biederman, Drew Fisher, David Renshaw, Linux FS Devel On Tue, Jul 14, 2015 at 6:40 PM, Kenton Varda <kenton@sandstorm.io> wrote: > On Tue, Jul 14, 2015 at 6:24 PM, Andrew Lutomirski <andy@luto.us> wrote: >> On Tue, Jul 14, 2015 at 6:14 PM, Kenton Varda <kenton@sandstorm.io> wrote: >>> Note that we would *love* the ability to more cleanly and robustly >>> filter / rewrite syscalls in userspace, for exactly this sort of >>> thing. But it seems the available options (ptrace and LD_PRELOAD) are >>> far too difficult and quirky to use effectively as-is. >> >> The latter is very much on my todo list. The performance will suck >> less than ptrace, but it still won't be fantastic. It'll be extra >> tricky here because you'll be accessing user memory, too. >> >> Kees, have you or any of the Chromium people played with better >> trapping mechanisms? There's my awful patch set to help enable >> something better, but I've tabled it until the x86 entry cleanups are >> done, because it's just too messy right now. Not yet, no. So far, using the seccomp ptrace hook (PTRACE_O_TRACESECCOMP) is how we trap weird stuff. It's actually pretty decent as long as you're not doing it A LOT. :) The downside is that changing syscall return values requires per-arch register manipulation, but once that's written, it's not going to change. (The seccomp regression test suite already has all this, FWIW.) > For our purposes, I think I really just want to convert `syscall` > instructions into regular old calls to some well-defined address where > I can put my filtering code. I'm fine with the filter being in > userspace where the process can attack it because the purpose of the > filter is only compatibility fix-ups, not security enforcement. > > seccomp-bpf can basically do this by raising SIGSYS, but it seems like > things break down when you get into details. E.g. you need the filter > itself to be able to make syscalls, so you need seccomp to *not* block > the syscall instructions made by the filter, which seems to require > that the filter code be loaded in a reliable location in memory, which > is kind of impossible. Also, performance. Performance will likely be the limiting factor, but I'm curious about using a ptrace manager and only catching SECCOMP_RET_TRACE seccomp filter actions. Is it only stat you need to manipulate? You can catch the syscall entry with seccomp (no need for SIGSYS), then catch the syscall exit with PTRACE_SYSCALL, check the results, rewrite them, and PTRACE_CONT? Are you needing to catch stat really often (or more than just stat)? -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bind mount that lies to apps about st_uid? 2015-07-15 0:41 ` bind mount that lies to apps about st_uid? Andrew Lutomirski 2015-07-15 0:53 ` Dave Chinner @ 2015-07-15 1:12 ` Eric W. Biederman 2015-07-15 1:27 ` Kenton Varda 1 sibling, 1 reply; 8+ messages in thread From: Eric W. Biederman @ 2015-07-15 1:12 UTC (permalink / raw) To: Andrew Lutomirski; +Cc: Kenton Varda, Drew Fisher, David Renshaw, linux-fsdevel Andrew Lutomirski <andy@luto.us> writes: > On Tue, Jul 14, 2015 at 5:35 PM, Kenton Varda <kenton@sandstorm.io> wrote: >> Hi Eric, >> >> I'm curious to know if you think this the following is something that could >> realistically be implemented, or if it's a totally outrageous idea. :) >> >> For Sandstorm, we've found a case where it would be useful if we could >> create a bind-mount in which stat() calls lie and claim all files have UID >> 65534 (as if their real owner wasn't mapped into the user namespace). All >> actual security checks would behave normally; only the result of stat() >> would be affected. >> >> Specifically, the problem is that some apps have checks of the form: "If >> file is owned by my UID, then modify it, otherwise do something else >> reasonable." These checks are wrong in the case of read-only bind mounts, >> but legacy code rarely considers such a possibility. >> >> Making maters worse, it is the case currently that some configurations of >> Sandstorm will have these read-only files owned by the UID running the app >> while others will not. Thus this is an observable difference between servers >> that can affect app behavior, causing them to work fine on some servers but >> break on others, which is something we want to avoid at all costs. We can't >> see any reasonable way to resolve the difference without kernel help, >> unfortunately. >> >> So, it would be convenient if I could trick these apps into behaving >> correctly by asking the kernel to lie to them about the UID. For our use >> case, a boolean flag would work. It could simply say: "For stat() purposes >> on this mount, act as if the uid_map and gid_map are empty, therefore return >> st_uid and st_gid as 65534." Alternatively, I'd also be happy with a "uid=" >> setting that overrides all reported UIDs, which seems possibly more broadly >> useful, but also possibly riskier(?). I'm also happy with restricting this >> to only mounts with MS_RDONLY and MS_NOSUID, if that helps. >> >> It's a hack, but it's simple and would save a bunch of confusion for our >> developers in the long run, I think. > > There was some talk at LSF/MM about more generally allowing remapping > of the view of uids. This would probably be a per-superblock thing > instead of a per-mount-point thing, but some kind of remapping per > mount point seems like a sensible thing to me. Remapping the ids on a filesystem to the ids in the system on a superblock basis is comparatively straight forward and something that is necessary to allow unprivileged users to mount filesystems that have any kind of storage. Basically all that would require is having a user namespace parameter on the superblock. Once overlayfs is handled that will probably give your the per mount flexibility you dream of. As to the original question of remapping ids only in stat. I agree with Dave Chinner use LD_PRELOAD. Maintaining kernel code is costly and that is a hack I don't want to maintain for the rest of eternity it appears to have entirely too little value. Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bind mount that lies to apps about st_uid? 2015-07-15 1:12 ` Eric W. Biederman @ 2015-07-15 1:27 ` Kenton Varda 0 siblings, 0 replies; 8+ messages in thread From: Kenton Varda @ 2015-07-15 1:27 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Lutomirski, Drew Fisher, David Renshaw, Linux FS Devel On Tue, Jul 14, 2015 at 6:12 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > As to the original question of remapping ids only in stat. I agree with > Dave Chinner use LD_PRELOAD. Maintaining kernel code is costly and that > is a hack I don't want to maintain for the rest of eternity it appears > to have entirely too little value. Fair enough, I figured as much but thought I'd register interest in case it comes up again elsewhere. (But LD_PRELOAD does not work for us, so we'll just have to deal with this inconsistency for now.) -Kenton ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-07-16 17:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAOP=4wipj4u_hpXRMh4qWn=ofb-br_=qS8vY8-WjqUrphQNw4w@mail.gmail.com>
2015-07-15 0:41 ` bind mount that lies to apps about st_uid? Andrew Lutomirski
2015-07-15 0:53 ` Dave Chinner
2015-07-15 1:14 ` Kenton Varda
2015-07-15 1:24 ` Andrew Lutomirski
2015-07-15 1:40 ` Kenton Varda
2015-07-16 17:59 ` Kees Cook
2015-07-15 1:12 ` Eric W. Biederman
2015-07-15 1:27 ` Kenton Varda
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.