From mboxrd@z Thu Jan  1 00:00:00 1970
From: Trond Myklebust <trond.myklebust@fys.uio.no>
Subject: Re: [RFC] Support for stackable file systems on top of nfs
Date: Fri, 11 Nov 2005 14:22:55 -0500
Message-ID: <1131736975.8793.43.camel@lade.trondhjem.org>
References: <OFC1BAB240.43F22A03-ON882570B6.005FEB66-882570B6.006486F1@us.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Cc: fsdevel <linux-fsdevel@vger.kernel.org>,
	Shaya Potter <spotter@cs.columbia.edu>, nfsv4 <nfsv4@linux-nfs.org>,
	Dave Kleikamp <shaggy@austin.ibm.com>
Return-path: <nfsv4-bounces@linux-nfs.org>
To: Bryan Henderson <hbryan@us.ibm.com>
In-Reply-To: <OFC1BAB240.43F22A03-ON882570B6.005FEB66-882570B6.006486F1@us.ibm.com>
List-Unsubscribe: <http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4>,
	<mailto:nfsv4-request@linux-nfs.org?subject=unsubscribe>
List-Archive: <http://linux-nfs.org/pipermail/nfsv4>
List-Post: <mailto:nfsv4@linux-nfs.org>
List-Help: <mailto:nfsv4-request@linux-nfs.org?subject=help>
List-Subscribe: <http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4>,
	<mailto:nfsv4-request@linux-nfs.org?subject=subscribe>
Sender: nfsv4-bounces@linux-nfs.org
Errors-To: nfsv4-bounces@linux-nfs.org
List-Id: linux-fsdevel.vger.kernel.org

On Fri, 2005-11-11 at 10:18 -0800, Bryan Henderson wrote:
> >> >It should hardly come as a newsflash that remote filesystems are
> >> >inherently different to local filesystems.
> >> 
> >> You'd have to give a pretty specific definition of remote filesystem 
> >> before I'd agree with that.  At its most basic level, remote just means 
> 
> >> distant, and the matter of needing a credential to access a file has 
> more 
> >> to do with the fact that the filesystem is shared than that it is 
> distant.
> >
> >Show me a remote filesystem that doesn't have some form of
> >authentication.
> 
> An ordinary NFS filesystem is remote and does not have authentication. I'm 
> sure you mean identification (Identification is saying who is writing the 
> data; authentication is proving it is he).  In NFS, authentication is done 
> by the client operating system and in a Linux client it's totally outside 
> of the filesystem function.

I mean authentication. The act of proving that a given remote procedure
call is being sent on behalf of a given authorised individual.

That is precisely what an RPCSEC_GSS session allows by virtue of a
secure per-user channel which has been set up using some standard strong
authentication method (krb5 being currently the most commonly used such
method).
Even the old and untrusty AUTH_SYS (still the NFS default) does some
limited form of authentication: the server identifies the client, checks
if the client is on a trusted list, then reads off the RPC's user+group
information. It's not particularly secure, but it is authentication.

>   Identification is something local filesystems 
> do as well.  What sets NFS apart here is that the identification happens 
> at physical write (cache clean) time instead of just at open time.  That's 
> not an inherent part of being remote (distant).  In fact, I don't even 
> know a word, other than NFS, for the class of filesystems that have this 
> characteristic.

As long as you're prepared to group filesystems such as AFS/DFS, CIFS,
etc under the NFS umbrella.

> >It is an inherent feature of shared mmapped files that the pages can
> >be written to by different users.  When the VM finally gets round to
> >flushing them out, all it knows is that this page is dirty.
> >> I'm a little fuzzy on how that works anyway, ...
> 
> You acknowledge this burning question without answering it, and I'd really 
> like to understand.  How do you determine at pageout time what credential 
> to give the NFS server?  I think you said it has to do with credentials 
> cached in the struct file, but the same way you can't attach a credential 
> to the dirty page, you can't attach a struct file to it, right?  And is it 
> just a shared mmap problem, or is it the same thing if multiple users 
> simultaneously write() to the file cache?

Ordinary writes go through the prepare_write()/commit_write() interface,
and so we tag them with the appropriate credentials + state there. We
don't bother to tag the pages with the "PG_dirty" bit 'cos we don't want
the VM to cycle them through the writepage() interface. Instead we track
the page state ourselves.

The only place weirdness can come from is mmap(), since there we are at
the mercy of the limitations of the VM's dirty page tracking.

Cheers,
  Trond