From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH 2/3] vfs: Add open by file handle support
Date: Mon, 22 Feb 2010 11:43:43 +0530
Message-ID: <87tyt9n6g8.fsf@linux.vnet.ibm.com>
References: <1266558149-11460-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
	<1266558149-11460-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
	<AD5A6061-9B7B-4B29-A458-822E781C2588@sun.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: hch@infradead.org, viro@zeniv.linux.org.uk,
	linux-fsdevel@vger.kernel.org
To: Andreas Dilger <adilger@sun.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from e23smtp06.au.ibm.com ([202.81.31.148]:48568 "EHLO
	e23smtp06.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751739Ab0BVGNt (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Mon, 22 Feb 2010 01:13:49 -0500
Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [202.81.31.245])
	by e23smtp06.au.ibm.com (8.14.3/8.13.1) with ESMTP id o1M6DkOs002046
	for <linux-fsdevel@vger.kernel.org>; Mon, 22 Feb 2010 17:13:46 +1100
Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139])
	by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o1M6Dlic1810456
	for <linux-fsdevel@vger.kernel.org>; Mon, 22 Feb 2010 17:13:47 +1100
Received: from d23av04.au.ibm.com (loopback [127.0.0.1])
	by d23av04.au.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id o1M6Dk4f032431
	for <linux-fsdevel@vger.kernel.org>; Mon, 22 Feb 2010 17:13:47 +1100
In-Reply-To: <AD5A6061-9B7B-4B29-A458-822E781C2588@sun.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Sat, 20 Feb 2010 11:58:34 -0700, Andreas Dilger <adilger@sun.com> wrote:
> On 2010-02-18, at 22:42, Aneesh Kumar K.V wrote:
> > +static struct dentry *handle_to_dentry(struct path *path,
> > +				struct file_handle *fh)
> > +{
> > +	inode = path->dentry->d_inode;
> > +	if (!S_ISREG(inode->i_mode) &&
> > +		!S_ISDIR(inode->i_mode) &&
> > +		!S_ISLNK(inode->i_mode)) {
> 
> Please follow normal indenting rules, like:
> 
> 	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
> 	    !S_ISLNK(inode->i_mode)) {
> 
> > +	/* should we do some check on handle size ?*/
> > +	handle = kmalloc(handle_size, GFP_KERNEL);
> 
> Good question.  kmalloc() allows up to 32MB allocations, so a user  
> running 1024 threads calling this with bad handles could OOM most  
> systems today due to unswappable kernel allocations.  Should there be  
> an upper limit on the handle size, like 4096 bytes or so?  Should some  
> filesystem legitimately need a larger size the kernel can always be  
> changed without problems since the size isn't part of the ABI (it  
> should be an internal sanity check), but I can't imagine what needs to  
> be put into a file handle that would exceed this size.

At this point i will limit it to 4096 bytes.


> 
> > +	dentry = exportfs_decode_fh(path->mnt, (struct fid *)handle,
> > +					handle_size, fh->handle_type,
> > +					vfs_dentry_acceptable, NULL);
> 
> One serious problem I can see with this is that the handle only  
> encodes the ino+generation of the inode, and nothing about the device  
> itself.  NFS handles this by using the fsid to identify the  
> filesystem, but it would be highly confusing to applications if a  
> process with a different CWD gets a different file or an error trying  
> to open the file handle, or if a new filesystem gets mounted and the  
> file handle stops working.  Having the file handle able to positively  
> identify a single inode in the system is the most important property  
> it has, I think.
> 
> Probably one of the important use cases of open_by_handle() is to have  
> one process doing pathname resolution and slave threads doing work on  
> those handles.  Usually daemon threads change their working directory  
> to "/" to avoid pinning some directory.  Even without that, it would  
> be impossible for a process to create handles in 2 different  
> filesystems, which is totally broken:
> 
>           (CWD is /home/adilger)
>           name_to_handle("/usr/src/linux/COPYING", &fh);
> 
>           fd = open_by_handle(&fh, O_RDONLY);
>           (fd = -1; errno = ENOENT or EBADF or whatever)
> 

do_sys_open_by_handle actually takes an fd. I was planning to do an openat
style syscall also. That should handle the above problem. In case of the
new syscall dirfd will be a file descriptor for a file in the
particular filesystem.


> Without the handle identifying the filesystem in some way this is  
> mostly useless.  Encoding the device number would be a simple (though  
> non-robust) way to do this, a better way would be with the filesystem  
> UUID and adding an in-kernel UUID->sb mapping function (which is  
> useful for other things as well).
> 

But i understand this is an good requirement. I will see what best I
can do.

> > +long do_sys_open_by_handle(int dfd, struct file_handle *fh, int  
> > flags)
> > +{
> > +	if (!capable(CAP_SYS_ADMIN))
> > +		/* Allow open by handle only by sysadmin */
> > +		return -EPERM;
> 
> Hmm, I guess this avoids some of the security concerns, but makes it a  
> lot less useful.  I was thinking this could be used for e.g. user NFS  
> serving or such, but if it is limited to root only then you might as  
> well just set up the in-kernel NFSd.  By making the handle hard to  
> forge (e.g. generate random key per superblock, sha1(ino+gen+key) and  
> store that into fh; someone with more security experience can think of  
> a better scheme) then you can reasonably safely dispense with the  
> CAP_SYS_ADMIN check because you can be sure that the proper path  
> traversal has been done by a trusted process and there is no more  
> exposure than unix socket fd passing.

If we make the handle depend on the random key, how do we make sure
that after a server crash when get the same inode with the same handle.
That would be a requirement for an NFS server right? We can definitely
encode device details or UUID details the same way nfs server does in
fh_compose. But i guess making it random. 

What are the issues of being able to guess the file handle ? May be
there are other ways to handle those.


> 
> > +	if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
> > +		(flags & FMODE_WRITE) && IS_APPEND(inode)) {
> 
> Please use normal indenting style, aligning on the parenthesis.


-aneesh