All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: nfsv4@linux-nfs.org, linux-kernel@vger.kernel.org,
	cmm@us.ibm.com, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-ext4@vger.kernel.org
Subject: Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version
Date: Wed, 11 Jul 2007 16:04:30 -0400	[thread overview]
Message-ID: <20070711200430.GF4138@fieldses.org> (raw)
In-Reply-To: <1184164086.12154.15.camel@kleikamp.austin.ibm.com>

On Wed, Jul 11, 2007 at 09:28:06AM -0500, Dave Kleikamp wrote:
> On Wed, 2007-07-11 at 15:05 +1000, Neil Brown wrote:
> > It just occurred to me:
> > 
> >  If i_version is 64bit, then knfsd would need to be careful when
> >  reading it on a 32bit host.  What are the locking rules?
> 
> How does knfsd use i_version?  I would think that if all it was doing
> was to compare (i_version == previous_version)

That's correct.  (Though it's the client that's doing the comparison,
actually--the server is just reporting the value.)

> then locking wouldn't really matter.  Well, theoretically,
> previous_version could be 0x100000000, and i_version could be
> 0x1ffffffff, knfsd checks the high word, then ext4 updates i_version
> to 0x200000000, then knfsd checks the low word, detecting no change.
> How likely is this?

The choice of upper word in your example is arbitrary, but other than
that I believe your example is essentially the only one.  So this would
only happen when *both*

	- the read of the new value of the low word happens precisely
	  2^32 i_version updates after the word was read on the client's
	  previous cache revalidation, and
	- the value of i_version itself is close enough to a 32-bit
	  boundary that wraparound can happen between the reads of the
	  high and low words.

> (I don't understand why i_version even needs to be 64 bits in the
> first place.)

A 32-bit i_version could in theory wrap pretty quickly, couldn't it?
That's not a problem in itself--the problem would only arise if two
subsequent client queries of the change attribute happened a multiple of
2^32 i_version increments apart.

This is more likely than the previous scenario, but still very unlikely.
I would have guessed that even in situations with a very high rate of
updates and a low rate of client revalidations, the chance of two
revalidations happening exactly 2^32 updates apart would still be no
more than 1 in 2^32.  (Could odd characteristics of the workloads (like
updates that tend to happen in power-of-2 groups?) make it any more
likely?)

I'd be happier if ext4 at least allowed the possibility of 64 bits in
the future.  And there's always the chance someone would find a use for
an i_version that was nondecreasing, even if nfs didn't care.

> >  Presumably it is only updated under i_mutex protection, but having to
> >  get i_mutex to read it would seem a little heavy handed.
> 
> How does knfsd protect itself from the inode changing after i_version is
> checked?  Is any locking being done otherwise?

If the client always requests the change attribute before reading, and
the i_version is always updated after data is modified, I think we're
OK.  Admittedly this is a little subtle.

--b.

WARNING: multiple messages have this Message-ID (diff)
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Neil Brown <neilb@suse.de>,
	nfsv4@linux-nfs.org, linux-kernel@vger.kernel.org,
	cmm@us.ibm.com, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-ext4@vger.kernel.org
Subject: Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version
Date: Wed, 11 Jul 2007 16:04:30 -0400	[thread overview]
Message-ID: <20070711200430.GF4138@fieldses.org> (raw)
In-Reply-To: <1184164086.12154.15.camel@kleikamp.austin.ibm.com>

On Wed, Jul 11, 2007 at 09:28:06AM -0500, Dave Kleikamp wrote:
> On Wed, 2007-07-11 at 15:05 +1000, Neil Brown wrote:
> > It just occurred to me:
> > 
> >  If i_version is 64bit, then knfsd would need to be careful when
> >  reading it on a 32bit host.  What are the locking rules?
> 
> How does knfsd use i_version?  I would think that if all it was doing
> was to compare (i_version == previous_version)

That's correct.  (Though it's the client that's doing the comparison,
actually--the server is just reporting the value.)

> then locking wouldn't really matter.  Well, theoretically,
> previous_version could be 0x100000000, and i_version could be
> 0x1ffffffff, knfsd checks the high word, then ext4 updates i_version
> to 0x200000000, then knfsd checks the low word, detecting no change.
> How likely is this?

The choice of upper word in your example is arbitrary, but other than
that I believe your example is essentially the only one.  So this would
only happen when *both*

	- the read of the new value of the low word happens precisely
	  2^32 i_version updates after the word was read on the client's
	  previous cache revalidation, and
	- the value of i_version itself is close enough to a 32-bit
	  boundary that wraparound can happen between the reads of the
	  high and low words.

> (I don't understand why i_version even needs to be 64 bits in the
> first place.)

A 32-bit i_version could in theory wrap pretty quickly, couldn't it?
That's not a problem in itself--the problem would only arise if two
subsequent client queries of the change attribute happened a multiple of
2^32 i_version increments apart.

This is more likely than the previous scenario, but still very unlikely.
I would have guessed that even in situations with a very high rate of
updates and a low rate of client revalidations, the chance of two
revalidations happening exactly 2^32 updates apart would still be no
more than 1 in 2^32.  (Could odd characteristics of the workloads (like
updates that tend to happen in power-of-2 groups?) make it any more
likely?)

I'd be happier if ext4 at least allowed the possibility of 64 bits in
the future.  And there's always the chance someone would find a use for
an i_version that was nondecreasing, even if nfs didn't care.

> >  Presumably it is only updated under i_mutex protection, but having to
> >  get i_mutex to read it would seem a little heavy handed.
> 
> How does knfsd protect itself from the inode changing after i_version is
> checked?  Is any locking being done otherwise?

If the client always requests the change attribute before reading, and
the i_version is always updated after data is modified, I think we're
OK.  Admittedly this is a little subtle.

--b.

  reply	other threads:[~2007-07-11 20:04 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-01  7:37 [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version Mingming Cao
2007-07-02 14:58 ` Mingming Cao
2007-07-03 14:24   ` Trond Myklebust
2007-07-03 21:56     ` Andreas Dilger
2007-07-03 22:15   ` J. Bruce Fields
2007-07-03 23:32     ` Andreas Dilger
2007-07-03 23:32       ` Andreas Dilger
2007-07-06 13:51       ` J. Bruce Fields
2007-07-06 22:53         ` Andreas Dilger
2007-07-09 21:16           ` Mingming Cao
2007-07-10 23:30 ` Andrew Morton
2007-07-10 22:09   ` Mingming Cao
2007-07-10 22:09     ` Mingming Cao
2007-07-11  1:22     ` Andrew Morton
2007-07-11  0:19       ` Mingming Cao
2007-07-11  0:19         ` Mingming Cao
2007-07-11  4:22         ` Andrew Morton
2007-07-11  2:27           ` Mingming Cao
2007-07-11 16:57         ` J. Bruce Fields
2007-07-11  3:21       ` Neil Brown
2007-07-11  2:09         ` Mingming Cao
2007-07-11  5:17           ` Andrew Morton
2007-07-11  5:17             ` Andrew Morton
2007-07-11  3:18             ` Mingming Cao
2007-07-11  6:35               ` Andrew Morton
2007-07-11  3:34         ` Trond Myklebust
2007-07-11  3:34           ` Trond Myklebust
2007-07-11 11:41           ` Andreas Dilger
2007-07-11 11:41             ` Andreas Dilger
2007-07-11  5:05         ` Neil Brown
2007-07-11  5:22           ` Andrew Morton
2007-07-11  5:22             ` Andrew Morton
2007-07-11 14:28           ` Dave Kleikamp
2007-07-11 20:04             ` J. Bruce Fields [this message]
2007-07-11 20:04               ` J. Bruce Fields
2007-07-12  4:56               ` Andreas Dilger
2007-07-11 17:26         ` J. Bruce Fields
2007-07-11 17:26           ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070711200430.GF4138@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=akpm@linux-foundation.org \
    --cc=cmm@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nfsv4@linux-nfs.org \
    --cc=shaggy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.