From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jim Lieb Subject: Re: Re: [5/8] syscall_cred() a system call that receives alternate CREDs Date: Mon, 8 Apr 2013 11:23:14 -0700 Message-ID: <3325648.XNlCoRUAAr@jlieb-e6410> References: <516299A5.8030109@panasas.com> <51629DBE.1060508@panasas.com> <20130408144201.GB2169@pad.fieldses.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="nextPart3615686.UXQoNnq4qv" Content-Transfer-Encoding: 7Bit Cc: Boaz Harrosh , Steven Whitehouse , Steve Dickson , Jeff Layton , , linux-fsdevel , Ganesha NFS List , Frank S Filz , Venkateswararao Jujjuri , DENIEL Philippe To: "J. Bruce Fields" Return-path: Received: from natasha.panasas.com ([67.152.220.90]:43546 "EHLO natasha.panasas.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934911Ab3DHSZr (ORCPT ); Mon, 8 Apr 2013 14:25:47 -0400 In-Reply-To: <20130408144201.GB2169@pad.fieldses.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: --nextPart3615686.UXQoNnq4qv Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" On Monday, April 08, 2013 10:42:02 J. Bruce Fields wrote: > On Mon, Apr 08, 2013 at 01:36:46PM +0300, Boaz Harrosh wrote: > > From: Jim Lieb > > > > In current NFS Server (Ganesha) lots of operation becomes 6 syscalls > > (Or is it 7?) > > > > - setfsuid(), setfsgid(), thread_setgroups() > > - The OP > > - Revert setfsuid(), setfsgid() to root > > > > This is because if we do all these file operations as root then > > FS will not account for the quota a user have on create files, > > data space, and so on. > > To make sure I understand, you're saying that: > > - the behavior you get out of those 6 syscalls is correct, > - you just want to be able to do exactly the same thing, but > with 1 syscall. (For performance?) > > Or is there some other issue? I have attached the email I sent around on the nfs-ganesha list with a model api so we know the details. Boaz replied "performance" but there are also race conditions to consider. If we get signals or ??? somewhere in the sequence, what is our state? Yes, the setfsuid call back to root can still be done but masquerading has any signals etc. be in the context of that user/group and there is one syscall to deal with, not a stream. There may be selinux/apparmor issues to deal with too. If we first masquerade the thread and then apply all these access checks, as far as the kernel is concerned, it is the masqueraded user. > > > (Note that permission checking is done by Ganesha core, because > > > > We may cache open fd(s) and such not, another topic) > > Is there anything we could do to make it possible for you to depend on > the kernel's permissions checking instead? > I concur with Frank's assessment here. There are more instances where nfs- ganesha is doing a syscall as the server than as the masqueraded user. In the pNFS case, this hardly happens at all. We looked at having the kernel do it but found that we also had to do it and mixing gets seriously messy. For starters, we really do want to share fd's. > --b. > > > We could maybe with hard work save the last two calls for reverting > > to root, but this will force us to audit lots of code that we are > > not prepared to do right now. And will not save us much. > > > > [thread_setgroups()] > > thread_setgroups() is what we use at Ganesha and what Samaba guys use > > for a per-thread setgroups() call. In the Linux Kernel the setgroups is > > actually always per thread. It is only the POSIX (crap) pthread layer > > at glibc that intercepts the setgroups() call (and others), Iterates on > > all threads that belong to a process, and calls the native Kernel > > setgroups > > on them. So thread_setgroups() is just the raw syscall bypassing glibc's > > processing. We will eventually push this API to glibc. > > BTW: this is done exactly the same on FreeBSD, with same exact glibc > > intervention. > > > > [Proposed] > > What Jim proposed is a syscall that receives a struct that has > > the regular syscalls parameters plus the creds structure with fsuid/fsgid > > and groups array. Kernel will set these in, call the original syscall, > > and revert. This will be done on only an interested subset of the > > syscalls that are one - are related to filesystems (setfsXid) and two - > > are of interest to us Servers. > > > > Jim care to scribble a structure definition? > > > > Thanks > > Boaz -- Jim Lieb Linux Systems Engineer Panasas Inc. --nextPart3615686.UXQoNnq4qv Content-Type: message/rfc822 Content-Disposition: inline; filename="forwarded message" Content-Description: Jim Lieb : filesystem summit idea From: Jim Lieb To: CC: Subject: filesystem summit idea Date: Thu, 31 Jan 2013 13:44:30 -0800 Message-ID: <9007132.QKd3F9o8Qa@jlieb-e6410> Organization: Panasas Inc. User-Agent: KMail/4.8.5 (Linux/3.6.11-4.fc16.x86_64; KDE/4.8.5; x86_64; ; ) Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 In replying to the creds RFC branch, an idea came to me. What we need is a syscall for server syscalls. At first, I thought of doing something like what was done for the *at calls. That got pretty silly with some calls only needing an extra flag and others needing extra args. All of the glibc and abi pain was a mess I'd rather not repeat. How about this idea: /** * @brief Syscall entry point for servers that need to masquerade as others * * This is a privileged syscall. * * @param syscall_number [IN] syscall number from syscall.h * @param syscall_args IN] the arguments for that syscall in a vector mimicing the syscall prototype. * @param creds [IN] credentials to use. See definition in fsal_types.h */ int server_syscall(int syscall_number, void *syscall_args, struct creds *creds); This syscall would have its own matching vector of the kernels calls it does. Maybe this is a bit in the syscall vector. Point being not all calls would be supported, only a small set. The syscall args would be packaged and managed like ioctl does it now. This is an extra dereference in the syscall processing to validate the struct and copy the args in/out. The same applies to creds only instead of applying them to the specific syscall's stack frame, they would go into the "effective" uid/gid for the thread. We save the back and forth across the syscall barrier with slightly more overhead per affected call which is less than the multiple roundtrips for setfsuid/gid. As a priv'd syscall, it becomes outside the set of "posix" compliance so we can also bypass things like posix lock behavior. It is also expandable without breaking the bank on syscalls or moving ABIs. Further rationale for this is that the *at calls and handle calls do have more general use and therefore fit in the set of general syscalls. This is an enabler for servers that can take over in user space tasks that once were mandated into the kernel because of these user masquerading issues. Last point, No, I haven't researched what the Samba team has lobbied for but I suspect that if they are asking for variant syscalls like the *at case, this has lower impact. Jim -- Jim Lieb Linux Systems Engineer Panasas Inc. --nextPart3615686.UXQoNnq4qv--