linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: NeilBrown <neilb@suse.com>, linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hch@lst.de, neilb@suse.de,
	bfields@fieldses.org, amir73il@gmail.com, jack@suse.de,
	viro@zeniv.linux.org.uk
Subject: Re: [PATCH 01/19] fs: new API for handling inode->i_version
Date: Sun, 17 Dec 2017 08:01:41 -0500	[thread overview]
Message-ID: <1513515701.3529.18.camel@kernel.org> (raw)
In-Reply-To: <87mv2jmhmm.fsf@notabene.neil.brown.name>

On Sat, 2017-12-16 at 15:17 +1100, NeilBrown wrote:
> On Wed, Dec 13 2017, Jeff Layton wrote:
> 
> > On Thu, 2017-12-14 at 09:04 +1100, NeilBrown wrote:
> > > On Wed, Dec 13 2017, Jeff Layton wrote:
> > > 
> > > > +/*
> > > > + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> > > > + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> > > > + * appear different to observers if there was a change to the inode's data or
> > > > + * metadata since it was last queried.
> > > > + *
> > > > + * It should be considered an opaque value by observers. If it remains the same
> > > > + * since it was last checked, then nothing has changed in the inode. If it's
> > > > + * different then something has changed. Observers cannot infer anything about
> > > > + * the nature or magnitude of the changes from the value, only that the inode
> > > > + * has changed in some fashion.
> > > 
> > > I agree that it "should be" considered opaque, but I have a suspicion
> > > that NFSv4 doesn't consider it opaque.
> > > There is something about write delegations and the server performing a
> > > GETATTR callback to the delegated client so that it can answer GETATTR
> > > from other clients without recalling the delegation.
> > > 
> > > Specifically section "10.4.3 Handling of CB_GETATTR" of RFC5661 contains
> > > the text:
> > > 
> > >    o  The client will create a value greater than c that will be used
> > >       for communicating that modified data is held at the client.  Let
> > >       this value be represented by d.
> > > 
> > > "c" here is a 'change' attribute.
> > > 
> > > Then:
> > > 
> > >    While the change attribute is opaque to the client in the sense that
> > >    it has no idea what units of time, if any, the server is counting
> > >    change with, it is not opaque in that the client has to treat it as
> > >    an unsigned integer, and the server has to be able to see the results
> > >    of the client's changes to that integer.  Therefore, the server MUST
> > >    encode the change attribute in network order when sending it to the
> > >    client.  The client MUST decode it from network order to its native
> > >    order when receiving it, and the client MUST encode it in network
> > >    order when sending it to the server.  For this reason, change is
> > >    defined as an unsigned integer rather than an opaque array of bytes.
> > > 
> > > This all suggests that nfsd needs to be certain that "incrementing" the
> > > change id will produce a new changeid, which has not been used before,
> > > and also suggests that nfsd needs to be able to control the changeid
> > > stored after writes that result from a delegation being returned.
> > > 
> > > I'd just like to say that this is one of the most annoying dumb features
> > > of NFSv4, because it is trivial to fix and I suggested a fix before
> > > NFSv4.0 was finalized.  Grumble.
> > > 
> > > Otherwise the patch set looks good.  I haven't gone over the code
> > > closely, the but approach is spot-on.
> > 
> > I don't think we have to do that. There are really only two states with
> > a client holding a write delegation, as far as the server is concerned.
> > Either:
> > 
> > a) the client has done no writes to the file, in which case it'll return
> > the same i_version that the server has when issued a CB_GETATTR
> > 
> > ...or...
> > 
> > b) it has written to the file while holding the delegation, in which
> > case it'll return a different CB_GETATTR to the server
> > 
> > The simplest thing for the server to do is to just increment the change
> > attribute _once_ when it gets back a CB_GETATTR with a different change
> > attr than it has.
> > 
> > That's sufficient to tell another client issuing a a GETATTR that the
> > file has changed without needing to recall the delegation.
> > 
> > Prior to the delegation being returned, the client will send at least
> > one WRITE RPC, and that's enough to ensure that the the next stat will
> > see the thing increase.
> 
> "increment" and "increase" are not words that mean anything for an
> "opaque value".
> NFSd is, presumably, an "observer" of i_version (as it isn't the
> filesytem that controls it), so your text says it must treat i_version as
> opaque.  That means it cannot detect an "increase" (only a change), and
> it certainly cannot "increment" the value.
> 
> I think you need to allow observers to treat i_version as a 64 bit number
> which will monotonically increase.  Any change to the file will result
> in an increment of at least '1'.

Here, I was mostly speaking about NFS in general. I think the above
method is the cheapest/best way to ensure that you don't end up with
reused change attributes, within the confines of the protocol.

With this implementation, it's probably safe enough to make a guarantee
that the value will increase wrt a previously sampled value if there was
a change. I'll have to think about how best to document that.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2017-12-17 13:01 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-13 14:19 [PATCH 00/19] fs: rework and optimize i_version handling in filesystems Jeff Layton
2017-12-13 14:19 ` [PATCH 01/19] fs: new API for handling inode->i_version Jeff Layton
2017-12-13 22:04   ` NeilBrown
2017-12-14  0:27     ` Jeff Layton
2017-12-16  4:17       ` NeilBrown
2017-12-17 13:01         ` Jeff Layton [this message]
2017-12-18 14:03         ` Jeff Layton
2017-12-13 14:20 ` [PATCH 02/19] fs: don't take the i_lock in inode_inc_iversion Jeff Layton
2017-12-13 21:52   ` Jeff Layton
2017-12-13 22:07     ` NeilBrown
2017-12-13 14:20 ` [PATCH 03/19] fat: convert to new i_version API Jeff Layton
2017-12-13 14:20 ` [PATCH 04/19] affs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 05/19] afs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 06/19] btrfs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 07/19] exofs: switch " Jeff Layton
2017-12-13 14:20 ` [PATCH 08/19] ext2: convert " Jeff Layton
2017-12-18 12:47   ` Jan Kara
2017-12-13 14:20 ` [PATCH 09/19] ext4: " Jeff Layton
2017-12-14 21:52   ` Theodore Ts'o
2017-12-13 14:20 ` [PATCH 10/19] nfs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 11/19] nfsd: " Jeff Layton
2017-12-13 14:20 ` [PATCH 12/19] ocfs2: " Jeff Layton
2017-12-18 12:49   ` Jan Kara
2017-12-13 14:20 ` [PATCH 13/19] ufs: use " Jeff Layton
2017-12-13 14:20 ` [PATCH 14/19] xfs: convert to " Jeff Layton
2017-12-13 22:48   ` Dave Chinner
2017-12-13 23:25     ` Dave Chinner
2017-12-14  0:10       ` Jeff Layton
2017-12-14  2:17         ` Dave Chinner
2017-12-14 11:16           ` Jeff Layton
2017-12-13 14:20 ` [PATCH 15/19] IMA: switch IMA over " Jeff Layton
2017-12-13 14:20 ` [PATCH 16/19] fs: only set S_VERSION when updating times if necessary Jeff Layton
2017-12-15 12:59   ` Jeff Layton
2017-12-13 14:20 ` [PATCH 17/19] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2017-12-13 14:20 ` [PATCH 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2017-12-15 13:03   ` Jeff Layton
2017-12-13 14:20 ` [PATCH 19/19] fs: handle inode->i_version more efficiently Jeff Layton
2017-12-13 15:05 ` [PATCH 00/19] fs: rework and optimize i_version handling in filesystems J. Bruce Fields
2017-12-13 20:14   ` Jeff Layton
2017-12-13 22:10     ` Jeff Layton
2017-12-13 23:03     ` Dave Chinner
2017-12-14  0:02       ` Jeff Layton
2017-12-14 14:14         ` Jeff Layton
2017-12-14 15:14           ` J. Bruce Fields
2017-12-15 15:15             ` Jeff Layton
2017-12-15 15:26               ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1513515701.3529.18.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=hch@lst.de \
    --cc=jack@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=neilb@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).