From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jim Lieb Subject: Re: Re: Re: Thoughts on credential switching Date: Thu, 27 Mar 2014 13:47:37 -0700 Message-ID: <4001272.s8O7Wrizcc@jlieb-e6410> References: <2487921.a0QJrtikX0@jlieb-e6410> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Jeremy Allison , Jeff Layton , Florian Weimer , "Eric W. Biederman" , LSM List , "Serge E. Hallyn" , Kees Cook , Linux FS Devel , "Theodore Ts'o" , "linux-kernel@vger.kernel.org" , To: Andy Lutomirski Return-path: Received: from static-209-166-131-148.expedient.com ([209.166.131.148]:37132 "EHLO natasha.panasas.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1757242AbaC0Urw convert rfc822-to-8bit (ORCPT ); Thu, 27 Mar 2014 16:47:52 -0400 In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thursday, March 27, 2014 12:45:30 Andy Lutomirski wrote: > On Thu, Mar 27, 2014 at 12:30 PM, Jim Lieb wrote: > > Rather than inline, I'm responding in the context of Jeremy's comme= nts but > > I have to answer others as well. It is Jeremy after all who said m= y baby > > was ugly ;). > >=20 > > Jeremy is right about overloading "fd". Maybe I can call it someth= ing > > else > > but an fd (in implementation) has merit because a creds struct hang= s off > > of > > either/or/both task_struct and file but nothing else. Coming up wi= th a > > creds token struct adds/intros another struct to the kernel that ha= s to > > justify its existence. This fd points to an anon inode so the reso= urce > > consumption is pretty low and all the paths for managing it are wel= l > > known and *work*. I'm trying to get across the swamp, not build a = new > > Golden Gate bridge... > >=20 > > As for POSIX, all of the pthreads API was based on CDE threads whos= e > > initial implementation was on Ultrix (BSD 4.2). Process wide was a= ssumed > > because utheeads was a user space hack and setuid was process wide > > because a proc was a just that. Good grief, that kernel was UP... = When > > OSF/1 appeared, the Mach 2.5 kernel just carried that forward with = its > > proc + thread(s) model to be compatible with the old world. In oth= er > > words, we incrementally backed ourselves into a corner. Declaring = POSIX > > now avoids admitting that we didn't see all that far into the futur= e.=20 > > Enough said. These calls are *outside* POSIX. Pthreads in 2014 is > > broken/obsolete. > >=20 > > For the interface, I changed it from what is in the cited lkml belo= w. It > > is>=20 > > now: > > int switch_creds(struct user_creds *creds); >=20 > What is struct user_creds? And why is this called switch_creds? It > doesn't switch anything. Sorry, it was in the cited lkml... It is: struct user_creds { uid_t fsuid; gid_t fsgid; int ngroups; gid_t altgroups[]; }; ngroups is limited by NGROUPS. It is called switch_creds because it does switch them to the contents o= f=20 creds. When you return from switch_creds "bob" your fsuid is "Bob". T= he=20 return value is a handle to those creds so the next time you need to be= "Bob"=20 you can just: use_creds(bob_creds); >=20 > > Behavior is the same except the mux'd syscall idea is gone. Adding= a > > flags arg to this is a useful idea both for current use and future > > proofing the API. But this requires a second call > >=20 > > int use_creds(int fd); > >=20 > > This call does the "use an fd" case but adds -1 to revert to real c= reds.=20 > > Its guts are either an override_creds(fd->file->creds) or a > > revert_creds(). Nice and quick. Note that use_creds(fd) is used i= f I > > have cached the fd/token from switch_creds(). Also nice and quick. > >=20 > > Given that I've done the checking in switch_creds and I don't pass > > anything > > back other than the token/fd and this token/fd is/will be confined = to the > > thread group, use_creds(fd) only needs to validate that it is a > > switch_creds one, not from an arbitrary open(). I do this. >=20 > Not so fast... what if you start privileged, create a cred fd, call > unshare, do a dyntransition, chroot, drop privileges, and call > use_creds? I don't immediately see why this is insecure, but having > it be secure seems to be more or less the same condition as having my > credfd_activate be secure. >=20 > And I still don't see why you need revert at all. Just create a > second token/fd/whatever representing your initial creds and switch t= o > that. The creds you get have also had the capabilities reduced. You can't do= a=20 chroot because it will EPERM. Same with a lot of others. This is a=20 restricted file access jail in itself. We *reduce* privs+caps here. The use_creds has a shortcut of -1 to revert to real creds. So if you = do a=20 series of these, how do you get back to your priv'd state? You can eit= her=20 keep track of that yourself or just pass in -1 to use_creds() and know = you=20 are... >=20 > > I have focused this down to "fsuid" because I intend this for ops t= hat > > file > > perms. Other stuff is out of scope unless someone can come up with= a use > > case and add flag defs... The other variations on the theme uid, e= uid, > > and that other one I don't understand the use for, are for long las= ting > > creds change, i.e. become user "bob" and go off an fork a shell... = I am > > wrapping I/O. > Isn't there euid for that? Right. A crappy interface given that creds are really more than just e= uid or=20 even egid but it does that deed just fine. It is for long term as in "= become=20 Dave and run this program..." I don't change that. I'm dealing with "= As Bob,=20 do a pwrite(s)+fstat" across N cores and M ops per core... >=20 > > I do not like the idea of spinning off a separate proc to do creds = work.=20 > > It doesn't buy anything in performance (everybody is a task to the > > kernel) but it does open a door to scary places. Jeremy and I agre= e that > > this token/fd must stay within the thread group, aka, process. I h= ave > > already (in the newer patchset) tied off inheritance by opening the= anon > > fd with close-on-exec. I think I tied off passing the fd thru a un= ix > > socket but if not, I will in my next submission. This fd/token sho= uld > > stay within the thread group, period. > Maybe I'm uniquely not scared of adding sane interfaces. setuid(2) i= s > insane. Impersonating a token is quite sane and even has lots of > prior art. >=20 > > As to the "get an fd and then do set*id", you have to do this twice > > because > > that fd gets the creds at the time of open, not after fooling aroun= d. I > > am > > trying to avoid multiple RCU cycles, not add more. Second, the abo= ve path > > makes the creds switch atomic because I use the creds swapping > > infrastructure. Popping back up to user space before that *all* hap= pens > > opens all kinds of ptrace+signal+??? holes. >=20 > I assume you're planning on caching these things. So spending some > cycles setting this thing up shouldn't matter much. Indeed. NFS usage patterns as such that the RCU overhead (1 instead of= 6) is=20 more than fine given that the "untar" the user is doing first LOOKUP, L= OOKUP,=20 OPEN followed by a boatload of WRITEs. >=20 > If you want to add a totally separate syscall > setresfsuidgidgroupscaps, be my guest :) It would actually be > generally useful. >=20 > > Note also that I mask caps as well in the override creds including = the > > caps to do things like set*id. That is intentional. This whole id= ea is > > to constrain the thread, not open a door *but* still provide a way = to get > > back home (safely). That is via use_creds(-1). > >=20 > > Some have proposed that we personalize a worker thread (the rpc op > > processor) to the creds of the client user right off. Ganesha only= needs > > to do this user creds work for VFS (kernel local) filesystems. Mos= t of > > our cluster filesystems have apis that allow us to pass creds direc= tly on > > calls. We only need this for that local mounted namespace. The se= rver > > core owns rpc and ops, the backend (FSAL) owns the shim layer. Use= r > > creds are backend... Having a personalized thread complicates the = core. > >=20 > > As I mentioned at LSF, I have another set pending that needs a bit = more > > polish to answer issues from the last cycle. I need to fix the iss= ue of > > handling syscalls that would do their own creds swapping inside the > > switch_creds ... use_creds(-1) region. The patch causes these sysc= alls, > > e.g. setuid() to EPERM. Again, I like this because I want the clie= nt > > creds impersonating thread to only be able to do I/O, not escape in= to the > > wild. >=20 > Eek! You want this for I/O. What if someone else wants it for > something else? Any where does the actual list of what syscalls get > blocked come from? The "blocked" list is twofold. First, any syscall that needs privs tha= t I=20 have explicitly given up here (see the nfs_setuser caps that knfsd mask= s) The second is, with the new patches, any syscall that ends up doing a=20 commit_creds(). That is, in general, all the set*id/groups calls, anyt= hing=20 messing with keys, execve, fork, and security twiddling code. The more= I=20 think about this, it is a good thing. They would return EPERM which fo= r my=20 use case wouldn't happen but for exploit attempts it slams the door wit= h a=20 clean error. >=20 > I think that your patches will get a *lot* simpler if you get rid of > this override_creds and revert stuff and just switch the entire set o= f > creds. No setuid blocking, no BUG, and no need to audit the whole > tree for odd real_creds uses. I really do want override. Otherwise, it is an RCU and I can get into = a=20 position where I can't revert leaving me with the even worse situation = of=20 doing a thread exit because it is no good for anything else anymore. As for someone else coming up with a new use case, sure, we can think a= bout=20 this. One idea has been proposed to have a flags arg to spec the "type= " of uid=20 etc. If there is motivation for that, I could add a flags arg with onl= y one=20 "do fsuid" as its first and, for now, only defined bit. As for the BUG(task->creds !=3D task->real_creds), this should really b= e handled=20 much earlier and more friendly (EPERM) than oops'ing the task. That is= what=20 the promised additional patch does. I did audit the commit_creds calls= to=20 come up with the patch and the list above. >=20 > --Andy --=20 Jim Lieb Linux Systems Engineer Panasas Inc. "If ease of use was the only requirement, we would all be riding tricyc= les" - Douglas Engelbart 1925=E2=80=932013 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html