From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jim Lieb Subject: Re: Re: Thoughts on credential switching Date: Thu, 27 Mar 2014 12:30:04 -0700 Message-ID: <2487921.a0QJrtikX0@jlieb-e6410> References: <20140327070126.41ac75ac@ipyr.poochiereds.net> <20140327182617.GC2526@jeremy-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Jeff Layton , Florian Weimer , Andy Lutomirski , "Eric W. Biederman" , LSM List , "Serge E. Hallyn" , Kees Cook , Linux FS Devel , "Theodore Ts'o" , "linux-kernel@vger.kernel.org" , To: Jeremy Allison Return-path: In-Reply-To: <20140327182617.GC2526@jeremy-laptop> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Rather than inline, I'm responding in the context of Jeremy's comments = but I=20 have to answer others as well. It is Jeremy after all who said my baby= was=20 ugly ;). Jeremy is right about overloading "fd". Maybe I can call it something = else=20 but an fd (in implementation) has merit because a creds struct hangs of= f of=20 either/or/both task_struct and file but nothing else. Coming up with a= creds=20 token struct adds/intros another struct to the kernel that has to justi= fy its=20 existence. This fd points to an anon inode so the resource consumption= is=20 pretty low and all the paths for managing it are well known and *work*.= I'm=20 trying to get across the swamp, not build a new Golden Gate bridge... As for POSIX, all of the pthreads API was based on CDE threads whose in= itial=20 implementation was on Ultrix (BSD 4.2). Process wide was assumed becau= se=20 utheeads was a user space hack and setuid was process wide because a pr= oc was=20 a just that. Good grief, that kernel was UP... When OSF/1 appeared, t= he Mach=20 2.5 kernel just carried that forward with its proc + thread(s) model to= be=20 compatible with the old world. In other words, we incrementally backed= =20 ourselves into a corner. Declaring POSIX now avoids admitting that we = didn't=20 see all that far into the future. Enough said. These calls are *outsi= de*=20 POSIX. Pthreads in 2014 is broken/obsolete. =46or the interface, I changed it from what is in the cited lkml below.= It is=20 now: int switch_creds(struct user_creds *creds); Behavior is the same except the mux'd syscall idea is gone. Adding a f= lags arg=20 to this is a useful idea both for current use and future proofing the A= PI. But=20 this requires a second call int use_creds(int fd); This call does the "use an fd" case but adds -1 to revert to real creds= =2E Its=20 guts are either an override_creds(fd->file->creds) or a revert_creds().= Nice=20 and quick. Note that use_creds(fd) is used if I have cached the fd/tok= en from=20 switch_creds(). Also nice and quick. Given that I've done the checking in switch_creds and I don't pass anyt= hing=20 back other than the token/fd and this token/fd is/will be confined to t= he=20 thread group, use_creds(fd) only needs to validate that it is a switch_= creds=20 one, not from an arbitrary open(). I do this. I have focused this down to "fsuid" because I intend this for ops that = file=20 perms. Other stuff is out of scope unless someone can come up with a u= se case=20 and add flag defs... The other variations on the theme uid, euid, and = that=20 other one I don't understand the use for, are for long lasting creds ch= ange,=20 i.e. become user "bob" and go off an fork a shell... I am wrapping I/O= =2E I do not like the idea of spinning off a separate proc to do creds work= =2E It=20 doesn't buy anything in performance (everybody is a task to the kernel)= but it=20 does open a door to scary places. Jeremy and I agree that this token/f= d must=20 stay within the thread group, aka, process. I have already (in the new= er=20 patchset) tied off inheritance by opening the anon fd with close-on-exe= c. I=20 think I tied off passing the fd thru a unix socket but if not, I will i= n my=20 next submission. This fd/token should stay within the thread group, pe= riod. As to the "get an fd and then do set*id", you have to do this twice bec= ause=20 that fd gets the creds at the time of open, not after fooling around. = I am=20 trying to avoid multiple RCU cycles, not add more. Second, the above p= ath=20 makes the creds switch atomic because I use the creds swapping infrastr= ucture.=20 Popping back up to user space before that *all* happens opens all kinds= of=20 ptrace+signal+??? holes. Note also that I mask caps as well in the override creds including the = caps to=20 do things like set*id. That is intentional. This whole idea is to con= strain=20 the thread, not open a door *but* still provide a way to get back home=20 (safely). That is via use_creds(-1). Some have proposed that we personalize a worker thread (the rpc op proc= essor)=20 to the creds of the client user right off. Ganesha only needs to do th= is user=20 creds work for VFS (kernel local) filesystems. Most of our cluster fil= esystems=20 have apis that allow us to pass creds directly on calls. We only need = this=20 for that local mounted namespace. The server core owns rpc and ops, th= e=20 backend (FSAL) owns the shim layer. User creds are backend... Having = a=20 personalized thread complicates the core. As I mentioned at LSF, I have another set pending that needs a bit more= polish=20 to answer issues from the last cycle. I need to fix the issue of handl= ing=20 syscalls that would do their own creds swapping inside the switch_creds= ...=20 use_creds(-1) region. The patch causes these syscalls, e.g. setuid() t= o=20 EPERM. Again, I like this because I want the client creds impersonatin= g=20 thread to only be able to do I/O, not escape into the wild. Jim On Thursday, March 27, 2014 11:26:17 Jeremy Allison wrote: > On Thu, Mar 27, 2014 at 07:01:26AM -0700, Jeff Layton wrote: > > On Thu, 27 Mar 2014 14:06:32 +0100 > >=20 > > Florian Weimer wrote: > > > On 03/27/2014 02:02 PM, Jeff Layton wrote: > > > >> This interface does not address the long-term lack of POSIX > > > >> compliance in setuid and friends, which are required to be > > > >> process-global and not thread-specific (as they are on the ker= nel > > > >> side). > > > >>=20 > > > >> glibc works around this by reserving a signal and running set*= id on > > > >> every thread in a special signal handler. This is just crass,= and > > > >> it is likely impossible to restore the original process state = in > > > >> case of partial failure. We really need kernel support to per= form > > > >> the process-wide switch in an all-or-nothing manner. > > > >=20 > > > > I disagree. We're treading new ground here with this syscall. I= t's > > > > not defined by POSIX so we're under no obligation to follow its > > > > silly directives in this regard. Per-process cred switching doe= sn't > > > > really make much sense in this day and age, IMO. Wasn't part of= the > > > > spec was written before threading existed > > >=20 > > > Okay, then we need to add a separate set of system calls. > > >=20 > > > I really, really want to get rid of that signal handler mess in > > > glibc, with its partial failures. > >=20 > > I agree, it's a hack, but hardly anyone these days really wants to > > switch creds on a per-process basis. It's just that we're saddled w= ith > > a spec for those calls that was written before threads really exist= ed. > >=20 > > The kernel syscalls already do the right thing as far as I'm concer= ned. > > What would be nice however is a blessed glibc interface to them > > that didn't involve all of the signal handling stuff. Then samba et= =2E al. > > wouldn't need to call syscall() directly to get at them. >=20 > Amen to that :-). >=20 > However, after talking with Jeff and Jim at CollabSummit, > I was 'encouraged' to make my opinions known on the list. >=20 > To me, calling the creds handle a file descriptor just > feels wrong. IT *isn't* an fd, you can't read/write/poll > on it, and it's only done as a convenience to get the > close-on-exec semantics and the fact that the creds are > already hung off the fd's in kernel space. >=20 > I'd rather any creads call use a different type, even if > it's a typedef of 'int -> creds_handle_t', just to make > it really clear it's *NOT* an fd. >=20 > That way we can also make it clear this thing only has > meaning to a thread group, and SHOULD NOT (and indeed > preferably CAN NOT) be passed between processes. >=20 > Cheers, >=20 > Jeremy. --=20 Jim Lieb Linux Systems Engineer Panasas Inc. "If ease of use was the only requirement, we would all be riding tricyc= les" - Douglas Engelbart 1925=E2=80=932013