From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rob Ross Subject: Re: openg Date: Wed, 06 Dec 2006 09:42:47 -0600 Message-ID: <4576E4F7.1040308@mcs.anl.gov> References: <6.2.3.4.2.20061127213243.04f786c0@cic-mail.lanl.gov> <20061128055428.GA29891@infradead.org> <20061129090450.GA16296@infradead.org> <20061129122313.GG14315@parisc-linux.org> <20061129123913.GA15994@infradead.org> <4570ACD1.7060800@mcs.anl.gov> <4574BF52.6090600@mcs.anl.gov> <4575E83F.8090501@mcs.anl.gov> <20061206110158.GA3780@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Latchesar Ionkov , Matthew Wilcox , Gary Grider , linux-fsdevel@vger.kernel.org Return-path: Received: from mailgw.mcs.anl.gov ([140.221.9.4]:35585 "EHLO mailgw.mcs.anl.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935757AbWLFPmu (ORCPT ); Wed, 6 Dec 2006 10:42:50 -0500 To: Christoph Hellwig In-Reply-To: <20061206110158.GA3780@infradead.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Christoph Hellwig wrote: > On Tue, Dec 05, 2006 at 03:44:31PM -0600, Rob Ross wrote: >> The openg() really just does the lookup and permission checking). The >> openfh() creates the file descriptor and starts that context if the >> particular FS tracks that sort of thing. > > ... > >> Well you've caught me. I don't want to cache the values, because I >> fundamentally believe that sharing state between clients and servers is >> braindead (to use Christoph's phrase) in systems of this scale >> (thousands to tens of thousands of clients). So I don't want locks, so I >> can't keep the cache consistent, ... So someone else will have to run >> the tests you propose :)... > > Besides the whole ugliness you miss a few points about the fundamental > architecture of the unix filesystem permission model unfortunately. > > Say you want to lookup a path /foo/bar/baz, then the access permission > is based on the following things: > > - the credentials of the user. let's only take traditional uid/gid > for this example although credentials are much more complex these > days > - the kind of operation you want to perform > - the access permission of the actual object the path points to (inode) > - the lookup permission (x bit) for every object on the way to you object > > In your proposal sutoc is a simple conversion operation, that means > openg needs to perfom all these access checks and encodes them in the > fh_t. This is exactly right and is the intention of the call. > That means an fh_t must fundamentally be an object that is kept > in the kernel aka a capability as defined by Henry Levy. This does imply > you _do_ need to keep state. The fh_t is indeed a type of capability. fh_t, properly protected, could be passed into user space and validated by the file system when presented back to the file system. There is state here, clearly. I feel ok about that because we allow servers to forget that they handed out these fh_ts if they feel like it; there is no guaranteed lifetime in the current proposal. This allows servers to come and go without needing to persistently store these. Likewise, clients can forget them with no real penalty. This approach is ok because of the use case. Because we expect the fh_t to be used relatively soon after its creation, servers will not need to hold onto these long before the openfh() is performed and we're back into a normal "everyone has an valid fd" use case. > And because it needs kernel support you > fh_t is more or less equivalent to a file descriptor with sutoc equivalent > to a dup variant that really duplicates the backing object instead of just > the userspace index into it. Well, a FD has some additional state associated with it (position, etc.), but yes there are definitely similarities to dup(). > Note somewhat similar open by filehandle APIs like oben by inode number > as used by lustre or the XFS *_by_handle APIs are privilegued operations > because of exactly this problem. I'm not sure what a properly protected fh_t couldn't be passed back into user space and handed around, but I'm not a security expert. What am I missing? > What according to your mail is the most important bit in this proposal is > that you thing the filehandles should be easily shared with other system > in a cluster. That fact is not mentioned in the actual proposal at all, > and is in fact that hardest part because of inherent statefulness of > the API. The documentation of the calls is complicated by the way POSIX calls are described. We need to have a second document describing use cases also available, so that we can avoid misunderstandings as best we can, get straight to the real issues. Sorry that document wasn't available. I think I've addressed the statefulness of the API above? >> What's the etiquette on changing subject lines here? It might be useful >> to separate the openg() etc. discussion from the readdirplus() etc. >> discussion. > > Changing subject lines is fine. Thanks. Rob