linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mingming Cao <cmm@us.ibm.com>
To: Neil Brown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-ext4@vger.kernel.org, nfsv4@linux-nfs.org
Subject: Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version
Date: Tue, 10 Jul 2007 22:09:08 -0400	[thread overview]
Message-ID: <1184119748.20193.37.camel@localhost.localdomain> (raw)
In-Reply-To: <18068.19667.942363.686858@notabene.brown>

On Wed, 2007-07-11 at 13:21 +1000, Neil Brown wrote:
> On Tuesday July 10, akpm@linux-foundation.org wrote:
> > 
> > Yes, thanks.  It doesn't actually tell us why we want to implement
> > this attribute and it doesn't tell us what the implications of failing
> > to do so are, but I guess we can take that on trust from the NFS guys.
> 
> You would like to think so, but remember NFSv4 was designed by a
> committee :-)
> 
> The 'change' number is used for cache consistency, and as the spec
> makes very strong statements about the 'change' number, it is very
> hard (or impossible) to implement a server correctly without storing a
> change number in stable storage (just one of my grips about V4).
> 
> > 
> > But I suspect the ext4 implementation doesn't actually do this.  afaict we
> > won't update i_version for file overwrites (especially if s_time_gran can
> > indeed be 1,000,000,000) and of course for MAP_SHARED modifications.  What
> > would be the implications of this?
> 
> The first part sounds like a bug - i_version should really be updated
> by every call to ->commit_write (if that is still what it is called).
> 
> The MAP_SHARED thing is less obvious.  I guess every time we notice
> that the page might have been changed, we need to increment i_version.
> 
> > 
> > And how does the NFS server know that the filesystem implements i_version? 
> > Will a zero-value of i_version have special significance, telling the
> > server to not send this attribute, perhaps?
> 
> That is a very important question.  Zero probably makes sense, but
> what ever it is needs to be agreed and documented.
> And just by-the-way, the server doesn't really have the option of not
> sending the attribute.  If i_version isn't defined, it has to fake
> something using mtime, and hope that is good enough.
> 
> Alternately we could mandate that i_version is always kept up-to-date
> and if a filesystem doesn't have anything to load from storage, it
> just sets it to the current time in nanoseconds.
> 
> That would mean that a client would need to flush it's cache whenever
> the inode fell out of cache on the server, but I don't think we can
> reliably do better than that.
> 
> I think I like that approach.
> 
> So my vote is to increment i_version in common code every time any
> change is made to the file, 

David Chinneer pointed that we need to journal the version number
updates together with the operations that causes the change of the inode
version number, in order to survive server crashes so clients won't see
the counter go backwards.

So increment i_version in fs code is probably the place to ensure the
inode version changes are stored to disk. It's seems update the ext4
inode version in every ext4_mark_inode_dirty() is the easiest way.

> and alloc_inode should initialise it to
> current time, which might be changed by the filesystem before it calls
> unlock_new_inode. 
> ... but doesn't lustre want to control its i_version... so maybe not :-(
> 
> NeilBrown

  reply	other threads:[~2007-07-11  2:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-01  7:37 [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version Mingming Cao
2007-07-02 14:58 ` Mingming Cao
2007-07-03 14:24   ` Trond Myklebust
2007-07-03 21:56     ` Andreas Dilger
2007-07-03 22:15   ` J. Bruce Fields
2007-07-03 23:32     ` Andreas Dilger
2007-07-06 13:51       ` J. Bruce Fields
2007-07-06 22:53         ` Andreas Dilger
2007-07-09 21:16           ` Mingming Cao
2007-07-10 23:30 ` Andrew Morton
2007-07-10 22:09   ` Mingming Cao
2007-07-11  1:22     ` Andrew Morton
2007-07-11  0:19       ` Mingming Cao
2007-07-11  4:22         ` Andrew Morton
2007-07-11  2:27           ` Mingming Cao
2007-07-11 16:57         ` J. Bruce Fields
2007-07-11  3:21       ` Neil Brown
2007-07-11  2:09         ` Mingming Cao [this message]
2007-07-11  5:17           ` Andrew Morton
2007-07-11  3:18             ` Mingming Cao
2007-07-11  6:35               ` Andrew Morton
2007-07-11  3:34         ` Trond Myklebust
2007-07-11 11:41           ` Andreas Dilger
2007-07-11  5:05         ` Neil Brown
2007-07-11  5:22           ` Andrew Morton
2007-07-11 14:28           ` Dave Kleikamp
2007-07-11 20:04             ` J. Bruce Fields
2007-07-12  4:56               ` Andreas Dilger
2007-07-11 17:26         ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1184119748.20193.37.camel@localhost.localdomain \
    --to=cmm@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=nfsv4@linux-nfs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).