From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Staubach Subject: Re: i_version changes Date: Thu, 14 Feb 2008 09:34:21 -0500 Message-ID: <47B4516D.4050304@redhat.com> References: <20080210073041.GA23529@lst.de> <20080212200625.GE18625@fieldses.org> <20080213125214.GA12362@lst.de> <20080213202611.GM13462@fieldses.org> <43290.192.168.1.70.1202937559.squirrel@neil.brown.name> <47B361D8.1070708@redhat.com> <43087.192.168.1.70.1202940404.squirrel@neil.brown.name> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "J. Bruce Fields" , Christoph Hellwig , jean-noel.cordenner@bull.net, linux-fsdevel@vger.kernel.org To: NeilBrown Return-path: Received: from mx1.redhat.com ([66.187.233.31]:44083 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751233AbYBNOfE (ORCPT ); Thu, 14 Feb 2008 09:35:04 -0500 In-Reply-To: <43087.192.168.1.70.1202940404.squirrel@neil.brown.name> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: NeilBrown wrote: > On Thu, February 14, 2008 8:32 am, Peter Staubach wrote: > >> I don't think that this is quite true. If the file is changed >> when the NFS server is not running, then the value of i_version >> which is used when the NFS server starts up again must be >> different than the value which was previously used when the NFS >> server was previously running. >> > > As I said, the "NFS has seen this i_version" flag needs to be on > stable storage, e.g. the lsb of the i_version. This will ensure that > any change after NFSD saw the i_version will cause the i_version to > be updated. > So I think it can provide correct semantics. > Precise details: > NFSD: when reading i_version > take lock > tmp = i_version > i_version |= 1 > drop lock > return tmp & ~1; > > VFS when making any change: > take lock > if (i_version & 1) { > i_version++; > changed=1 > } > drop lock > if changed, sync inode > > Yes, this does seem like it would do the job. It could perhaps be optimized somewhat to avoid lock contention, but I do think that this would suffice. > >> Is the perceived performance hit really going to be as large >> as suspected? We already update the time fields fairly often >> and we don't pay a huge penalty for those, or at least not a >> penalty that we aren't willing to pay. Has anyone measured >> the cost? >> > > Correct NFS semantics require that the i_version be written to disk > before (or when) the change is committed. That means lots more inodes > in the journal. > If you are already doing data=journal, it the hit probably isn't too > high.(?) > > Correct NFS semantics also require that any modified metadata, including file times and file size, also be written to stable storage. Isn't this just another piece of modified metadata that would go hand-in-hand with updated file times? We should also require that the file mtime change when the contents of the file are modified. This should happen whether or not the clock has ticked. Unfortunately, to implement this, we would need file time resolutions which are smaller granularity than the system clock. We could probably get away with nano- second resolutions in the file system. Thanx... ps > You are right: measuring the cost is important. However as we are > designing a generic filesystem interface, we need to understand the > cost on multiple filesystems in a variety of configuration .... or > give the filesystem complete information and let it decide the optimal > implementation. > > Giving the filesystem full information means having an inode_operation > "nfsd_reads_version" which returns the number to be used as change_id. > > > NeilBrown > >