From mboxrd@z Thu Jan  1 00:00:00 1970
From: Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok
Date: Wed, 9 Jun 2010 20:06:16 +1000
Message-ID: <20100609100616.GA26335@laptop>
References: <20100601113915.GA4861@lst.de>
 <20100601113937.GB4929@lst.de>
 <20100609073336.GV26335@laptop>
 <20100609094121.GC3393@quack.suse.cz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Christoph Hellwig <hch@lst.de>, viro@zeniv.linux.org.uk,
	linux-fsdevel@vger.kernel.org, mfasheh@suse.de
To: Jan Kara <jack@suse.cz>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from cantor.suse.de ([195.135.220.2]:60858 "EHLO mx1.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1757113Ab0FIKGU (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 9 Jun 2010 06:06:20 -0400
Content-Disposition: inline
In-Reply-To: <20100609094121.GC3393@quack.suse.cz>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Wed, Jun 09, 2010 at 11:41:21AM +0200, Jan Kara wrote:
> On Wed 09-06-10 17:33:36, Nick Piggin wrote:
> > On Tue, Jun 01, 2010 at 01:39:37PM +0200, Christoph Hellwig wrote:
> > >  int inode_change_ok(const struct inode *inode, struct iattr *attr)
> > >  {
> > > -	int retval = -EPERM;
> > >  	unsigned int ia_valid = attr->ia_valid;
> > >  
> > > +	/*
> > > +	 * First check size constraints.  These can't be overriden using
> > > +	 * ATTR_FORCE.
> > > +	 */
> > > +	if (attr->ia_mode & ATTR_SIZE) {
> > > +		int error = inode_newsize_ok(inode, attr->ia_size);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > 
> > Hmm, I don't know if we can do this unless you have audited the
> > filesystems (in which case they should be on the cc list of this
> > pach).
> > 
> > The problem is whether the i_size is valid and stable at this
> > point. And it doesn't help even if you do leave the inode_newsize_ok
> > check inside the vmtruncate part of the fs if the check incorrectly
> > fails here.
> > 
> > ocfs2 performs inode_change_ok outside ocfs2_rw_lock and
> > ocfs2_inode_lock, and inode_newsize_ok inside; cifs holds i_lock
> > while checking inode_newsize_ok and updating size; gfs2 inside
> > gfs2_trans_begin.
>   That's a good point. For all local filesystems I know, holding i_mutex is
> enough for having stable i_size. But for clustered filesystems it
> definitely isn't. They have to hold cluster locks to be able to reliably
> check current i_size (at least OCFS2 does). Looking at what
> inode_newsize_ok currently does, i_size is only used to decide whether
> we need to check for rlimit or not. So we could falsely miss this
> check (other node is truncating the file below new offset)...

Yes, or falsely disallow a shrinking truncate if it is above our
rlimit.


> Hmm, OK, so
> we really need the cluster lock...
>   BTW: Mark, don't we need the cluster lock also for the permission
> checks in inode_change_ok? Otherwise we could see:
> 	Node1				Node2
> 	chmod("file", 000);
> 					truncate("file", 0)
> 					  inode_change_ok still see old perms
> 					    -> success
> 
>   And Node1 and Node2 can be fully serialized via some userspace
> synchronization and still hit this so it's not just a race...

That's a good point too, yes. I think if the inode_change_ok check
were moved inside the cluster lock, that would solve that problem
and Christoph's i_size problem here.