From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jim Lieb Subject: Re: Re: Re: Re: [PATCH 1/3] switch_creds: Syscall to switch creds for file server ops Date: Thu, 31 Oct 2013 13:39:50 -0700 Message-ID: <1397707.n4PsIJcfRu@jlieb-e6410> References: <1381960919-4542-1-git-send-email-jlieb@panasas.com> <3154208.hemKm7CgkG@jlieb-e6410> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Linux FS Devel , "Eric W. Biederman" , Al Viro , "Theodore Ts'o" , "linux-kernel@vger.kernel.org" , , Jeff Layton To: Andy Lutomirski Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thursday, October 31, 2013 12:48:54 Andy Lutomirski wrote: > On Thu, Oct 31, 2013 at 12:43 PM, Jim Lieb wrote: > > On Thursday, October 31, 2013 12:09:08 Andy Lutomirski wrote: > >> On Thu, Oct 24, 2013 at 1:24 PM, Jim Lieb wrot= e: > >> > On Thursday, October 24, 2013 12:28:15 Andy Lutomirski wrote: > >> >> On Wed, Oct 23, 2013 at 10:59 PM, Eric W. Biederman > >> >>=20 > >> >> wrote: > >> >> > Andy Lutomirski writes: > >> >> >> On 10/16/2013 08:52 PM, Eric W. Biederman wrote: > >> >> >>> Al Viro writes: > >> >> >>>> On Wed, Oct 16, 2013 at 06:18:16PM -0700, Eric W. Biederma= n=20 wrote: > >> >> >>>>> That doesn't look bad but it does need capable(CAP_SETUID= ) && > >> >> >>>>> capable(CAP_SETGID) or possibly something a little more r= efined. > >> >> >>>>=20 > >> >> >>>> D'oh > >> >> >>>>=20 > >> >> >>>>> I don't think we want file descriptor passing to all of a= sudden > >> >> >>>>> become > >> >> >>>>> a grant of privilege, beyond what the passed fd can do. > >> >> >>>>=20 > >> >> >>>> Definitely. And an extra ) to make it compile wouldn't hu= rt > >> >> >>>> either... > >> >> >>>=20 > >> >> >>> There also appears to need to be a check that we don't gain= any > >> >> >>> capabilities. > >> >> >>>=20 > >> >> >>> We also need a check so that you don't gain any capabilitie= s, and > >> >> >>> possibly a few other things. > >> >> >>=20 > >> >> >> Why? I like the user_ns part, but I'm not immediately seein= g the > >> >> >> issue > >> >> >> with capabilities. > >> >> >=20 > >> >> > My reasoning was instead of making this syscall as generic as > >> >> > possible > >> >> > start it out by only allowing the cases Jim cares about and w= orking > >> >> > with > >> >> > a model where you can't gain any permissions you couldn't gai= n > >> >> > otherwise. > >> >> >=20 > >> >> > Although the fd -1 trick to revert to your other existing cre= d seems > >> >> > reasonable. > >> >> >=20 > >> >> >>> So I suspect we want a check something like: > >> >> >>>=20 > >> >> >>> if ((new_cred->securebits !=3D current_cred->securebits) |= | > >> >> >>>=20 > >> >> >>> (new_cred->cap_inheritable !=3D current_cred->cap_inher= itable) > >> >> >>> || > >> >> >>> (new_cred->cap_permitted !=3D current_cred->cap_permitt= ed) || > >> >> >>> (new_cred->cap_effective !=3D current_cred->cap_effecti= ve) || > >> >> >>> (new_cred->cap_bset !=3D current_cred->cap_bset) || > >> >> >>> (new_cred->jit_keyring !=3D current_cred->jit_keyring) = || > >> >> >>> (new_cred->session_keyring !=3D current_cred->session_k= eyring) > >> >> >>> || > >> >> >>> (new_cred->process_keyring !=3D current_cred->process_k= eyring) > >> >> >>> || > >> >> >>> (new_cred->thread_keyring !=3D current_cred->thread_key= ring) || > >> >> >>> (new_cred->request_keyring !=3D current_cred->request_k= eyring) > >> >> >>> || > >> >> >>> (new_cred->security !=3D current_cred->security) || > >> >> >>> (new_cred->user_ns !=3D current_cred->user_ns)) { > >> >> >>> =20 > >> >> >>> return -EPERM; > >> >> >>>=20 > >> >> >>> } > >> >> >>=20 > >> >> >> I *really* don't like the idea of being able to use any old = file > >> >> >> descriptor. I barely care what rights the caller needs to h= ave to > >> >> >> invoke this -- if you're going to pass an fd that grants a > >> >> >> capability > >> >> >> (in the non-Linux sense of the work), please make sure that = the > >> >> >> sender > >> >> >> actually wants that behavior. > >> >> >>=20 > >> >> >> IOW, have a syscall to generate a special fd for this purpos= e.=20 > >> >> >> It's > >> >> >> only a couple lines of code, and I think we'll really regret= it if > >> >> >> we > >> >> >> fsck this up. > >> >> >>=20 > >> >> >> (I will take it as a personal challenge to find at least one > >> >> >> exploitable > >> >> >> privilege escalation in this if an arbitrary fd works.) > >> >> >=20 > >> >> > If you can't switch to a uid or a gid you couldn't switch to > >> >> > otherwise > >> >> > then the worst that can happen is an information leak. And > >> >> > information > >> >> > leaks are rarely directly exploitable. > >> >>=20 > >> >> Here's the attack: > >> >>=20 > >> >> Suppose there's a daemon that uses this in conjunction with > >> >> SCM_RIGHTS. The daemon is effectively root (under the current > >> >> proposal, it has to be). So a client connects, sends a credent= ial fd, > >> >> and the daemon impersonates those credentials. > >> >>=20 > >> >> Now a malicious user obtains *any* fd opened by root. It sends= that > >> >> fd to the daemon. The daemon then impersonates root. We lose.= (It > >> >> can't possibly be hard to obtain an fd with highly privileged f= _cred > >> >> -- I bet that most console programs have stdin like that, for e= xample. > >> >>=20 > >> >> There are probably many setuid programs that will happily open > >> >>=20 > >> >> /dev/null for you, too.) > >> >=20 > >> > In my reply to Eric, I note that I need to add a check that the = fd > >> > passed > >> > is one from switch_creds. With that test, not any fd will do and= the > >> > one > >> > that does has only been able to set fsuid, fsgid, altgroups, and > >> > reduced > >> > (the nfsd set) caps. They can do no more damage than what the o= riginal > >> > switch_creds allowed. The any fd by root no longer applies so u= se > >> > doesn't get much (no escalation). > >> >=20 > >> >> >> Also... real_cred looks confusing. AFAICS it is used *only*= for > >> >> >> knfsd > >> >> >> and faccessat. That is, current userspace can't see it. Bu= t now > >> >> >> you'll > >> >> >> expose various oddities. For example, AFAICS a capability-l= ess > >> >> >> process > >> >> >> that's a userns owner can always use setuid. This will *ove= rwrite* > >> >> >> real_cred. Then you're screwed, especially if this happens = by > >> >> >> accident. > >> >> >=20 > >> >> > And doing in userland what faccessat, and knfsd do in the ker= nel is > >> >> > exactly what is desired here. But maybe there are issues wit= h that. > >> >> >=20 > >> >> >> That being said, Windows has had functions like this for a l= ong > >> >> >> time. > >> >> >> Processes have a primary token and possibly an impersonation= token. > >> >> >> Any > >> >> >> process can call ImpersonateLoggedOnUser (no privilege requi= red) to > >> >> >> impersonate the credentials of a token (which is special kin= d of > >> >> >> fd). > >> >> >> Similarly, any process can call RevertToSelf to undo it. > >> >> >>=20 > >> >> >> Is there any actual problem with allowing completely unprivi= leged > >> >> >> tasks > >> >> >> to switch to one of these magic cred fds? That would avoid = needing > >> >> >> a > >> >> >> "revert" operation. > >> >> >=20 > >> >> > If the permission model is this switching of credentials does= n't get > >> >> > you > >> >> > anything you couldn't get some other way. That would seem to > >> >> > totally > >> >> > rules out unprivileged processes switching these things. > >> >>=20 > >> >> IMO, there are two reasonable models that involve fds carrying = some > >> >> kind of credential. > >> >>=20 > >> >> 1. The fd actually carries the ability to use the credentials. = You > >> >> need to be very careful whom you send these to. The security > >> >> implications are obvious (which is good) and the receiver doesn= 't need > >> >> privilege (which is good, as long as the receiver is careful). > >> >=20 > >> > I test for caps. I think using switch_creds in all forms should > >> > require > >> > privs. I thought of the revert case and although it does simply= return > >> > to > >> > "real" creds, you still need CAP_SETUID and CAP_SETGID to do it.= This > >> > means, of course, if you have really messed up things, you may n= ot be > >> > able to get back home which, although a bad thing for the proces= s, is a > >> > good thing for the system as a whole. > >> >=20 > >> >> 2. The fd is for identification only. But this means that the = fd > >> >> carries the ability to identify as a user. So you *still* have= to be > >> >> very careful about whom you send it to. What you need is an op= eration > >> >> that allows you to identify using the fd without transitively g= ranting > >> >> the recipient the same ability. On networks, this is done by s= igning > >> >> some kind of challenge. The kernel could work the same way, or= there > >> >> could be a new CMSG_IDENTITY that you need an identity fd to se= nd but > >> >> that does not copy that fd to the recipient. > >> >=20 > >> > I am not sure I understand this. CMSG only applies to UNIX_DOMA= IN > >> > sockets > >> > which means that the switch_creds fd test still applies here. I= t is > >> > identification but only for within the same kernel. As for name= spaces, > >> > the > >> > translation was done when the creds fd was created. I suppose i= f it > >> > was > >> > passed across namespace boundaries there could be a problem but = what is > >> > in > >> > the creds structure is the translated fduid,fsgid etc., not the > >> > untranslated one. We are still doing access checks and quota stu= ff with > >> > the translated creds. If one namespace has "fred" as uid 52 and > >> > another > >> > has "mary" as 52, the quota will only be credited to whoever "fr= ed" > >> > really is only if "fred" gets to be "mary" in the second alterna= te > >> > universe... > >>=20 > >> I'm not talking about namespaces here -- I think we're not really = on > >> the same page. Can you describe what you're envisioning doing wit= h > >> these fds? > >=20 > > I have a new version that is now two syscalls and no multiplexing h= as more > > testing etc. to mirror exactly the tests in setfsuid(). I still am > > testing > > but plan to send this new patchset next week for review. > >=20 > > Ok, I may have missed something. What I meant to say is that I'm = using > > the same namespace functions for switch_creds as all the set*uid sy= scalls > > use so what I have should work as well. If one were to pass this f= d via > > CMSG, it could cross a namespace boundary. The changed mappings of= this > > fd would be the same as for any other fd passed across now. Maybe = that > > is irrelevant in this case. > >=20 > > The purpose of the fd is to create a handle to a creds set that a s= erver > > can then efficiently switch to prior to filesystem syscalls where f= suid > > etc. are relevant (mkdir, creat, write, etc.). This is exposing th= e > > override_creds() and revert_creds() to these servers. I do not int= end in > > our server to hand these fds to anyone else. After all, they are u= seless > > for anything other than switch_creds/use_creds. > >=20 > > The server keeps a cache of its known, currently active client cred= s along > > with this fd. The fast path is to lookup the creds and use the fd.= On > > cache misses, it does a switch_creds() and saves the fd in the cach= e. >=20 > So if I understand you correctly, you're not planning on sending this > fd between process at all. In that case, adding a new API that seems > designed for interprocess use and having exactly zero usecases to > think about makes me nervous. Why is it going to use fds again? Correct. They are handles to a set of creds for filesystem calls. I u= se fds=20 because every filp has a pointer to a set of creds that gets swapped in= for=20 various things like coredumps, and in the case of knfsd, impersonating = the nfs=20 client for when it does filesystem morphing ops. In order to currently= do what=20 nfsd_setuser() does we must do: setfsuid();setfsgid(); setgroups(); setcaps(); followed by=20 open/creat/mknod/write followed by setfsuid(); setfsgid(); setgroups(); setcaps(); Each one of these 4 syscalls discards an RCU copy of the creds, both go= ing in=20 and going out. In the first case below, the revert doesn't recycle the= RCU=20 because the open fd still holds a ref. Same applied to the second case= =2E In=20 fact here, no new creds are created either. The creds only get recycle= d when=20 the fd is closed. Using an fd to have a handle on what all these do is the simplest thing= to=20 pass back and forth between user and kernel space. This is what Ted T'= so and=20 Al Viro suggested. This ends up looking like: fd =3D switch_creds(creds struct); followed by open/creat/mknod/write followed by use_creds(-1); The -1 arg is used as I proposed in the earlier round, namely, it tells= =20 use_creds to revert to real creds. Subsequent uses look like: use_creds(cached fd); followed by open/creat/mknod/write followed by use_creds(-1); There is no need to pass the fd because if the other side has the caps = to use=20 the fd, they have the caps to get their own. We are saving overhead. >=20 > Or maybe I should wait to see the updated API before complaining :) >=20 > >> >> >> Another note: I think that there may be issues if the creato= r of a > >> >> >> token > >> >> >> has no_new_privs set and the user doesn't. Imagine a daemon= that > >> >> >> accepts one of these fds, impersonates it, and calls exec. = This > >> >> >> could > >> >> >> be used to escape from no_new_privs land. > >> >> >=20 > >> >> > Which is why I was suggesting that we don't allow changing an= y field > >> >> > in > >> >> > the cred except for uids and gids. > >> >>=20 > >> >> If the daemon impersonates and execs, we still lose. > >> >=20 > >> > I answered this. You only get to impersonate for purposes of fi= le > >> > access > >> > with reduced caps to prevent you from being root as well. Also,= since > >> > these are O_CLOEXEC and switch_creds "magic" fds, this can't hap= pen > >> > because the fd is gone post-exec. > >>=20 > >> If a no_new_privs process authenticates to a daemon and that daemo= n > >> execs, then there's now a dumpable, ptraceable, non-no-new-privs > >> process running as the uid/gid of the no_new_privs process. This = may > >> be dangerous. > >=20 > > Two things. My new patch set now throws an error if the fd is not = one > > returned by switch_creds(). The new syscall here is use_creds() bt= w.=20 > > That fd is opened O_CLOEXEC so the syscall has two guards. First, = it can > > only be one that switch_creds() returned and second, it gets closed= on an > > exec so the no_new_privs process doesn't get it. In addition, > > switch_creds() reduces the effective caps set to the same minimal s= et > > that nfsd_setuser() has. > >=20 > > If someone else chooses to pass this fd via CMSG, whoever gets it h= as > > reduced privs and in order to use it for anything, i.e. use_creds()= , it > > still needs to inherit SETUID and SETGID caps from its parent or it= will > > get an EPERM error. Passing this in "friendly" code defeats the pur= pose > > of the cache above in that we could get fd leaks. If there is anot= her > > flag that could prevent its being passed along via CMSG, I can add = that > > too because using CMSG is well outside the use case. >=20 > I still don't see the point of lowering effective caps, but this is > IMO minor. Are you checking the permitted set on revert? We lower specific filesystem related caps to prevent using the root byp= ass of=20 acess checks, prevent root bypass of quota restrictions, and to correct= ly=20 charge the impersonated user for quota'd resource usage. This is what=20 nfsd_setuser() does. I use the same mask set. I don't check the permitted set associated with the fd because I've alr= eady=20 checked that it is a switch_creds() returned fd (they are immutable). = I do=20 make a check at the beginning of both syscalls for the caps to set uid/= gid,=20 the same test as setfsuid et al. >=20 > --Andy --=20 Jim Lieb Linux Systems Engineer Panasas Inc. "If ease of use was the only requirement, we would all be riding tricyc= les" - Douglas Engelbart 1925=E2=80=932013