From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K. V" Subject: Re: [RFC PATCH 2/3] vfs: Add open by file handle support Date: Mon, 22 Feb 2010 11:43:43 +0530 Message-ID: <87tyt9n6g8.fsf@linux.vnet.ibm.com> References: <1266558149-11460-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1266558149-11460-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: hch@infradead.org, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org To: Andreas Dilger Return-path: Received: from e23smtp06.au.ibm.com ([202.81.31.148]:48568 "EHLO e23smtp06.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751739Ab0BVGNt (ORCPT ); Mon, 22 Feb 2010 01:13:49 -0500 Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [202.81.31.245]) by e23smtp06.au.ibm.com (8.14.3/8.13.1) with ESMTP id o1M6DkOs002046 for ; Mon, 22 Feb 2010 17:13:46 +1100 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o1M6Dlic1810456 for ; Mon, 22 Feb 2010 17:13:47 +1100 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id o1M6Dk4f032431 for ; Mon, 22 Feb 2010 17:13:47 +1100 In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, 20 Feb 2010 11:58:34 -0700, Andreas Dilger wrote: > On 2010-02-18, at 22:42, Aneesh Kumar K.V wrote: > > +static struct dentry *handle_to_dentry(struct path *path, > > + struct file_handle *fh) > > +{ > > + inode = path->dentry->d_inode; > > + if (!S_ISREG(inode->i_mode) && > > + !S_ISDIR(inode->i_mode) && > > + !S_ISLNK(inode->i_mode)) { > > Please follow normal indenting rules, like: > > if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) && > !S_ISLNK(inode->i_mode)) { > > > + /* should we do some check on handle size ?*/ > > + handle = kmalloc(handle_size, GFP_KERNEL); > > Good question. kmalloc() allows up to 32MB allocations, so a user > running 1024 threads calling this with bad handles could OOM most > systems today due to unswappable kernel allocations. Should there be > an upper limit on the handle size, like 4096 bytes or so? Should some > filesystem legitimately need a larger size the kernel can always be > changed without problems since the size isn't part of the ABI (it > should be an internal sanity check), but I can't imagine what needs to > be put into a file handle that would exceed this size. At this point i will limit it to 4096 bytes. > > > + dentry = exportfs_decode_fh(path->mnt, (struct fid *)handle, > > + handle_size, fh->handle_type, > > + vfs_dentry_acceptable, NULL); > > One serious problem I can see with this is that the handle only > encodes the ino+generation of the inode, and nothing about the device > itself. NFS handles this by using the fsid to identify the > filesystem, but it would be highly confusing to applications if a > process with a different CWD gets a different file or an error trying > to open the file handle, or if a new filesystem gets mounted and the > file handle stops working. Having the file handle able to positively > identify a single inode in the system is the most important property > it has, I think. > > Probably one of the important use cases of open_by_handle() is to have > one process doing pathname resolution and slave threads doing work on > those handles. Usually daemon threads change their working directory > to "/" to avoid pinning some directory. Even without that, it would > be impossible for a process to create handles in 2 different > filesystems, which is totally broken: > > (CWD is /home/adilger) > name_to_handle("/usr/src/linux/COPYING", &fh); > > fd = open_by_handle(&fh, O_RDONLY); > (fd = -1; errno = ENOENT or EBADF or whatever) > do_sys_open_by_handle actually takes an fd. I was planning to do an openat style syscall also. That should handle the above problem. In case of the new syscall dirfd will be a file descriptor for a file in the particular filesystem. > Without the handle identifying the filesystem in some way this is > mostly useless. Encoding the device number would be a simple (though > non-robust) way to do this, a better way would be with the filesystem > UUID and adding an in-kernel UUID->sb mapping function (which is > useful for other things as well). > But i understand this is an good requirement. I will see what best I can do. > > +long do_sys_open_by_handle(int dfd, struct file_handle *fh, int > > flags) > > +{ > > + if (!capable(CAP_SYS_ADMIN)) > > + /* Allow open by handle only by sysadmin */ > > + return -EPERM; > > Hmm, I guess this avoids some of the security concerns, but makes it a > lot less useful. I was thinking this could be used for e.g. user NFS > serving or such, but if it is limited to root only then you might as > well just set up the in-kernel NFSd. By making the handle hard to > forge (e.g. generate random key per superblock, sha1(ino+gen+key) and > store that into fh; someone with more security experience can think of > a better scheme) then you can reasonably safely dispense with the > CAP_SYS_ADMIN check because you can be sure that the proper path > traversal has been done by a trusted process and there is no more > exposure than unix socket fd passing. If we make the handle depend on the random key, how do we make sure that after a server crash when get the same inode with the same handle. That would be a requirement for an NFS server right? We can definitely encode device details or UUID details the same way nfs server does in fh_compose. But i guess making it random. What are the issues of being able to guess the file handle ? May be there are other ways to handle those. > > > + if ((!(flags & O_APPEND) || (flags & O_TRUNC)) && > > + (flags & FMODE_WRITE) && IS_APPEND(inode)) { > > Please use normal indenting style, aligning on the parenthesis. -aneesh