From: Peter Staubach <staubach@redhat.com>
To: NeilBrown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
Christoph Hellwig <hch@lst.de>,
jean-noel.cordenner@bull.net, linux-fsdevel@vger.kernel.org
Subject: Re: i_version changes
Date: Thu, 14 Feb 2008 09:34:21 -0500 [thread overview]
Message-ID: <47B4516D.4050304@redhat.com> (raw)
In-Reply-To: <43087.192.168.1.70.1202940404.squirrel@neil.brown.name>
NeilBrown wrote:
> On Thu, February 14, 2008 8:32 am, Peter Staubach wrote:
>
>> I don't think that this is quite true. If the file is changed
>> when the NFS server is not running, then the value of i_version
>> which is used when the NFS server starts up again must be
>> different than the value which was previously used when the NFS
>> server was previously running.
>>
>
> As I said, the "NFS has seen this i_version" flag needs to be on
> stable storage, e.g. the lsb of the i_version. This will ensure that
> any change after NFSD saw the i_version will cause the i_version to
> be updated.
> So I think it can provide correct semantics.
> Precise details:
> NFSD: when reading i_version
> take lock
> tmp = i_version
> i_version |= 1
> drop lock
> return tmp & ~1;
>
> VFS when making any change:
> take lock
> if (i_version & 1) {
> i_version++;
> changed=1
> }
> drop lock
> if changed, sync inode
>
>
Yes, this does seem like it would do the job. It could perhaps
be optimized somewhat to avoid lock contention, but I do think
that this would suffice.
>
>> Is the perceived performance hit really going to be as large
>> as suspected? We already update the time fields fairly often
>> and we don't pay a huge penalty for those, or at least not a
>> penalty that we aren't willing to pay. Has anyone measured
>> the cost?
>>
>
> Correct NFS semantics require that the i_version be written to disk
> before (or when) the change is committed. That means lots more inodes
> in the journal.
> If you are already doing data=journal, it the hit probably isn't too
> high.(?)
>
>
Correct NFS semantics also require that any modified metadata,
including file times and file size, also be written to stable
storage. Isn't this just another piece of modified metadata
that would go hand-in-hand with updated file times?
We should also require that the file mtime change when the
contents of the file are modified. This should happen whether
or not the clock has ticked. Unfortunately, to implement this,
we would need file time resolutions which are smaller granularity
than the system clock. We could probably get away with nano-
second resolutions in the file system.
Thanx...
ps
> You are right: measuring the cost is important. However as we are
> designing a generic filesystem interface, we need to understand the
> cost on multiple filesystems in a variety of configuration .... or
> give the filesystem complete information and let it decide the optimal
> implementation.
>
> Giving the filesystem full information means having an inode_operation
> "nfsd_reads_version" which returns the number to be used as change_id.
>
>
> NeilBrown
>
>
next prev parent reply other threads:[~2008-02-14 14:35 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-10 7:30 i_version changes Christoph Hellwig
2008-02-12 20:06 ` J. Bruce Fields
2008-02-13 9:25 ` Andreas Dilger
2008-02-13 12:52 ` Christoph Hellwig
2008-02-13 14:07 ` Trond Myklebust
2008-02-13 15:12 ` Andreas Dilger
2008-02-13 20:26 ` J. Bruce Fields
2008-02-13 21:19 ` NeilBrown
2008-02-13 21:32 ` Peter Staubach
2008-02-13 22:06 ` NeilBrown
2008-02-14 14:34 ` Peter Staubach [this message]
2008-02-14 8:40 ` Jean noel Cordenner
2008-02-14 14:38 ` Peter Staubach
2008-02-15 10:31 ` Jean noel Cordenner
2008-02-13 21:36 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47B4516D.4050304@redhat.com \
--to=staubach@redhat.com \
--cc=bfields@fieldses.org \
--cc=hch@lst.de \
--cc=jean-noel.cordenner@bull.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).