From: Jim Lieb <jlieb@panasas.com>
To: "J. Bruce Fields" <bfields@redhat.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>,
Steven Whitehouse <swhiteho@redhat.com>,
Steve Dickson <steved@redhat.com>,
Jeff Layton <jlayton@redhat.com>,
<lsf-pc@lists.linux-foundation.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Ganesha NFS List <nfs-ganesha-devel@lists.sourceforge.net>,
Frank S Filz <ffilz@us.ibm.com>,
Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>,
DENIEL Philippe <philippe.deniel@cea.fr>
Subject: Re: Re: [5/8] syscall_cred() a system call that receives alternate CREDs
Date: Mon, 8 Apr 2013 11:23:14 -0700 [thread overview]
Message-ID: <3325648.XNlCoRUAAr@jlieb-e6410> (raw)
In-Reply-To: <20130408144201.GB2169@pad.fieldses.org>
[-- Attachment #1: Type: text/plain, Size: 3444 bytes --]
On Monday, April 08, 2013 10:42:02 J. Bruce Fields wrote:
> On Mon, Apr 08, 2013 at 01:36:46PM +0300, Boaz Harrosh wrote:
> > From: Jim Lieb <jlieb@panasas.com>
> >
> > In current NFS Server (Ganesha) lots of operation becomes 6 syscalls
> > (Or is it 7?)
> >
> > - setfsuid(), setfsgid(), thread_setgroups()
> > - The OP
> > - Revert setfsuid(), setfsgid() to root
> >
> > This is because if we do all these file operations as root then
> > FS will not account for the quota a user have on create files,
> > data space, and so on.
>
> To make sure I understand, you're saying that:
>
> - the behavior you get out of those 6 syscalls is correct,
> - you just want to be able to do exactly the same thing, but
> with 1 syscall. (For performance?)
>
> Or is there some other issue?
I have attached the email I sent around on the nfs-ganesha list with a model
api so we know the details.
Boaz replied "performance" but there are also race conditions to consider. If
we get signals or ??? somewhere in the sequence, what is our state? Yes, the
setfsuid call back to root can still be done but masquerading has any signals
etc. be in the context of that user/group and there is one syscall to deal
with, not a stream.
There may be selinux/apparmor issues to deal with too. If we first masquerade
the thread and then apply all these access checks, as far as the kernel is
concerned, it is the masqueraded user.
>
> > (Note that permission checking is done by Ganesha core, because
> >
> > We may cache open fd(s) and such not, another topic)
>
> Is there anything we could do to make it possible for you to depend on
> the kernel's permissions checking instead?
>
I concur with Frank's assessment here. There are more instances where nfs-
ganesha is doing a syscall as the server than as the masqueraded user. In the
pNFS case, this hardly happens at all. We looked at having the kernel do it
but found that we also had to do it and mixing gets seriously messy. For
starters, we really do want to share fd's.
> --b.
>
> > We could maybe with hard work save the last two calls for reverting
> > to root, but this will force us to audit lots of code that we are
> > not prepared to do right now. And will not save us much.
> >
> > [thread_setgroups()]
> > thread_setgroups() is what we use at Ganesha and what Samaba guys use
> > for a per-thread setgroups() call. In the Linux Kernel the setgroups is
> > actually always per thread. It is only the POSIX (crap) pthread layer
> > at glibc that intercepts the setgroups() call (and others), Iterates on
> > all threads that belong to a process, and calls the native Kernel
> > setgroups
> > on them. So thread_setgroups() is just the raw syscall bypassing glibc's
> > processing. We will eventually push this API to glibc.
> > BTW: this is done exactly the same on FreeBSD, with same exact glibc
> > intervention.
> >
> > [Proposed]
> > What Jim proposed is a syscall that receives a struct that has
> > the regular syscalls parameters plus the creds structure with fsuid/fsgid
> > and groups array. Kernel will set these in, call the original syscall,
> > and revert. This will be done on only an interested subset of the
> > syscalls that are one - are related to filesystems (setfsXid) and two -
> > are of interest to us Servers.
> >
> > Jim care to scribble a structure definition?
> >
> > Thanks
> > Boaz
--
Jim Lieb
Linux Systems Engineer
Panasas Inc.
[-- Attachment #2: Jim Lieb <jlieb@panasas.com>: filesystem summit idea --]
[-- Type: message/rfc822, Size: 2657 bytes --]
From: Jim Lieb <jlieb@panasas.com>
To: <bharrosh@panasas.com>
Cc: <nfs-ganesha-devel@lists.sourceforge.net>
Subject: filesystem summit idea
Date: Thu, 31 Jan 2013 13:44:30 -0800
Message-ID: <9007132.QKd3F9o8Qa@jlieb-e6410>
In replying to the creds RFC branch, an idea came to me. What we need is a
syscall for server syscalls. At first, I thought of doing something like what
was done for the *at calls. That got pretty silly with some calls only
needing an extra flag and others needing extra args. All of the glibc and abi
pain was a mess I'd rather not repeat.
How about this idea:
/**
* @brief Syscall entry point for servers that need to masquerade as others
*
* This is a privileged syscall.
*
* @param syscall_number [IN] syscall number from syscall.h
* @param syscall_args IN] the arguments for that syscall in a vector
mimicing the syscall prototype.
* @param creds [IN] credentials to use. See definition in fsal_types.h
*/
int server_syscall(int syscall_number, void *syscall_args, struct creds
*creds);
This syscall would have its own matching vector of the kernels calls it does.
Maybe this is a bit in the syscall vector. Point being not all calls would be
supported, only a small set.
The syscall args would be packaged and managed like ioctl does it now. This
is an extra dereference in the syscall processing to validate the struct and
copy the args in/out. The same applies to creds only instead of applying them
to the specific syscall's stack frame, they would go into the "effective"
uid/gid for the thread.
We save the back and forth across the syscall barrier with slightly more
overhead per affected call which is less than the multiple roundtrips for
setfsuid/gid. As a priv'd syscall, it becomes outside the set of "posix"
compliance so we can also bypass things like posix lock behavior. It is also
expandable without breaking the bank on syscalls or moving ABIs.
Further rationale for this is that the *at calls and handle calls do have more
general use and therefore fit in the set of general syscalls. This is an
enabler for servers that can take over in user space tasks that once were
mandated into the kernel because of these user masquerading issues.
Last point, No, I haven't researched what the Samba team has lobbied for but I
suspect that if they are asking for variant syscalls like the *at case, this
has lower impact.
Jim
--
Jim Lieb
Linux Systems Engineer
Panasas Inc.
next prev parent reply other threads:[~2013-04-08 18:25 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-08 10:19 [LSF/MM TOPIC (expanded) 0/8] New API's for better exporting of VFS from user-mode daemons Boaz Harrosh
2013-04-08 10:22 ` [1/8] readdir-plus system call Boaz Harrosh
2013-04-08 10:26 ` Steven Whitehouse
2013-04-08 15:18 ` [Nfs-ganesha-devel] " Matt W. Benjamin
2013-04-08 13:51 ` DENIEL Philippe
2013-04-08 19:02 ` Abhijith Das
2013-04-10 20:31 ` Andreas Dilger
2013-05-24 16:14 ` [1/8] readdir-plus system call - LSF/MM follow up Abhijith Das
2013-05-24 19:41 ` Zach Brown
2013-05-28 14:49 ` Abhijith Das
2013-05-28 15:13 ` Jim Lieb
[not found] ` <OF27E1911F.3FBABA22-ON87257B79.005C087F-88257B79.005C320B@us.ibm.com>
2013-05-29 0:57 ` Jim Lieb
[not found] ` <OF067A3B49.F63109B6-ON87257B7A.00137A60-88257B7A.00140BC7@us.ibm.com>
2013-05-29 10:06 ` Jeff Layton
2013-05-29 14:04 ` J. Bruce Fields
2013-06-04 15:38 ` [Lsf-pc] " Christoph Hellwig
2013-06-04 15:52 ` J. Bruce Fields
2013-05-29 16:52 ` Re: Re: " Jim Lieb
2013-05-28 20:00 ` Andreas Dilger
2013-05-28 20:11 ` Abhijith Das
2013-04-08 10:25 ` [LSF/MM TOPIC (expanded) 0/8] New API's for better exporting of VFS from user-mode daemons Steven Whitehouse
2013-04-08 10:25 ` [2/8] Sane locks (UNPOSIX locks) Boaz Harrosh
2013-04-08 12:02 ` [Lsf-pc] " Jeff Layton
2013-04-08 10:28 ` [3/8] File delegations, Usermode API of Bruce's pending patches Boaz Harrosh
2013-04-08 10:32 ` [4/8] PNFS ioctls/syscall Boaz Harrosh
2013-04-08 10:36 ` [5/8] syscall_cred() a system call that receives alternate CREDs Boaz Harrosh
2013-04-08 13:54 ` DENIEL Philippe
2013-04-08 14:42 ` J. Bruce Fields
2013-04-08 14:58 ` Boaz Harrosh
2013-04-08 18:23 ` Jim Lieb [this message]
2013-04-08 18:31 ` J. Bruce Fields
2013-04-08 19:45 ` Jim Lieb
2013-04-08 21:33 ` Boaz Harrosh
2013-04-09 16:40 ` Jim Lieb
2013-04-08 10:42 ` [6/8] Rich ACLs (continued, drive through this time) Boaz Harrosh
2013-04-08 11:12 ` Vyacheslav Dubeyko
2013-04-08 14:27 ` Venkateswararao Jujjuri
2013-04-08 10:43 ` [7/8] Single call interface to getattr/setattr Boaz Harrosh
[not found] ` <OF4A1A78E0.CB4DED3E-ON87257B47.00549E35-88257B47.005520A8@us.ibm.com>
2013-04-08 16:41 ` Boaz Harrosh
2013-04-08 10:45 ` [8/8] Fix fsnotify short comings (single fd with recursive notifications) Boaz Harrosh
2013-04-08 13:59 ` DENIEL Philippe
2013-04-08 15:22 ` Al Viro
2013-04-08 15:36 ` J. Bruce Fields
2013-04-08 14:31 ` [LSF/MM TOPIC (expanded) 0/8] New API's for better exporting of VFS from user-mode daemons Venkateswararao Jujjuri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3325648.XNlCoRUAAr@jlieb-e6410 \
--to=jlieb@panasas.com \
--cc=bfields@redhat.com \
--cc=bharrosh@panasas.com \
--cc=ffilz@us.ibm.com \
--cc=jlayton@redhat.com \
--cc=jvrao@linux.vnet.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=nfs-ganesha-devel@lists.sourceforge.net \
--cc=philippe.deniel@cea.fr \
--cc=steved@redhat.com \
--cc=swhiteho@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox