All of lore.kernel.org
 help / color / mirror / Atom feed
From: bfields@fieldses.org (J. Bruce Fields)
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: dsterba@suse.cz, Liu Bo <bo.li.liu@oracle.com>,
	linux-btrfs@vger.kernel.org, fdmanana@suse.com, kzak@redhat.com,
	linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
	linux-nfs@vger.kernel.org, chuck.lever@oracle.com,
	mingming.cao@oracle.com
Subject: Re: i_version vs iversion (Was: Re: [RFC PATCH v2 1/2] Btrfs: add noi_version option to disable MS_I_VERSION)
Date: Thu, 25 Jun 2015 14:46:44 -0400	[thread overview]
Message-ID: <20150625184644.GA12300@fieldses.org> (raw)
In-Reply-To: <20150623163241.GA6645@thunk.org>

On Tue, Jun 23, 2015 at 12:32:41PM -0400, Theodore Ts'o wrote:
> On Thu, Jun 18, 2015 at 04:38:56PM +0200, David Sterba wrote:
> > Moving the discussion to fsdevel.
> > 
> > Summary: disabling MS_I_VERSION brings some speedups to btrfs, but the
> > generic 'noiversion' option cannot be used to achieve that. It is
> > processed before it reaches btrfs superblock callback, where
> > MS_I_VERSION is forced.
> > 
> > The proposed fix is to add btrfs-specific i_version/noi_version to btrfs,
> > to which I object.
> 
> I was talking to Mingming about this on today's ext4 conference call,
> and one of the reasons why ext4 turns off i_version update by default
> is because it does a real number on our performance as well --- and
> furthermore, the only real user of the field from what we can tell is
> NFSv4, which not all that many ext4 users actually care about.
> 
> This has caused pain for the nfsv4 folks since it means that they need
> to tell people to use a special mount option for ext4 if they are
> actually using this for nfsv4, and I suspect they won't be all that
> eager to hear that btrfs is going to go the same way.

Yes, thanks for looking into this!

> This however got us thinking --- even in if NFSv4 is depending on
> i_version, it doesn't actually _look_ at that field all that often.

Most clients will query it on every write.  (I just took a quick look at
the code and I believe the Linux client's requesting it immediately
after every write, except in the O_DIRECT and delegated cases.)

> It's only going to look at it in a response to a client's getattr
> call, and that in turn is used to so the client can do its local disk
> cache invalidation if anby of the data blocks of the inode has changed.
> 
> So what if we have a per-inode flag which "don't update I_VERSION",
> which is off by default, but after the i_version has been updated at
> least once, is set, so the i_version field won't be updated again ---
> at least until something has actually looked at the i_version field,
> when the "don't update I_VERSOIN" flag will get cleared again.
> 
> So basically, if we know there are no microphones in the forest, we
> don't need to make the tree fall.  However, if someone has sampled the
> i_version field, then the next time the inode gets updated, we will
> update the i_version field so the NFSv4 client can hear the sound of
> the tree crashing to the forst floor and so it can invalidate its
> local cache of the file.  :-)
> 
> This should significantly improve the performance of using the
> i_version field if the file system is being exported via NFSv4, and if
> NFSv4 is not in use, no one will be looking at the i_version field, so
> the performance impact will be very slight, and thus we could enable
> i_version updates by default for btrfs and ext4.
> 
> And this should make the distribution folks happy, since it will unify
> the behavior of all file systems, and make life easier for users who
> won't need to set certain magic mount options depending on what file
> system they are using and whether they are using NFSv4 or not.
> 
> Does this sound reasonable?

Just to make sure I understand, the logic is something like:

	to read the i_version:

		inode->i_version_seen = true;
		return inode->i_version

	to update the i_version:

		/*
		 * If nobody's seen this value of i_version then we can
		 * keep using it, otherwise we need a new one:
		 */
		if (inode->i_version_seen)
			inode->i_version++;
		inode->i_version_seen = false;

Looks OK to me.  As I say I'd expect i_version_seen == true to end up
being the common case in a lot of v4 workloads, so I'm more skeptical of
the claim of a performance improvement in the v4 case.

Could maintaining the new flag be a significant drag in itself?  If not,
then I guess we're not making things any worse there, so fine.

--b.

WARNING: multiple messages have this Message-ID (diff)
From: bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org (J. Bruce Fields)
To: Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>
Cc: dsterba-AlSwsSmVLrQ@public.gmane.org,
	Liu Bo <bo.li.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	fdmanana-IBi9RG/b67k@public.gmane.org,
	kzak-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	mingming.cao-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org
Subject: Re: i_version vs iversion (Was: Re: [RFC PATCH v2 1/2] Btrfs: add noi_version option to disable MS_I_VERSION)
Date: Thu, 25 Jun 2015 14:46:44 -0400	[thread overview]
Message-ID: <20150625184644.GA12300@fieldses.org> (raw)
In-Reply-To: <20150623163241.GA6645-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>

On Tue, Jun 23, 2015 at 12:32:41PM -0400, Theodore Ts'o wrote:
> On Thu, Jun 18, 2015 at 04:38:56PM +0200, David Sterba wrote:
> > Moving the discussion to fsdevel.
> > 
> > Summary: disabling MS_I_VERSION brings some speedups to btrfs, but the
> > generic 'noiversion' option cannot be used to achieve that. It is
> > processed before it reaches btrfs superblock callback, where
> > MS_I_VERSION is forced.
> > 
> > The proposed fix is to add btrfs-specific i_version/noi_version to btrfs,
> > to which I object.
> 
> I was talking to Mingming about this on today's ext4 conference call,
> and one of the reasons why ext4 turns off i_version update by default
> is because it does a real number on our performance as well --- and
> furthermore, the only real user of the field from what we can tell is
> NFSv4, which not all that many ext4 users actually care about.
> 
> This has caused pain for the nfsv4 folks since it means that they need
> to tell people to use a special mount option for ext4 if they are
> actually using this for nfsv4, and I suspect they won't be all that
> eager to hear that btrfs is going to go the same way.

Yes, thanks for looking into this!

> This however got us thinking --- even in if NFSv4 is depending on
> i_version, it doesn't actually _look_ at that field all that often.

Most clients will query it on every write.  (I just took a quick look at
the code and I believe the Linux client's requesting it immediately
after every write, except in the O_DIRECT and delegated cases.)

> It's only going to look at it in a response to a client's getattr
> call, and that in turn is used to so the client can do its local disk
> cache invalidation if anby of the data blocks of the inode has changed.
> 
> So what if we have a per-inode flag which "don't update I_VERSION",
> which is off by default, but after the i_version has been updated at
> least once, is set, so the i_version field won't be updated again ---
> at least until something has actually looked at the i_version field,
> when the "don't update I_VERSOIN" flag will get cleared again.
> 
> So basically, if we know there are no microphones in the forest, we
> don't need to make the tree fall.  However, if someone has sampled the
> i_version field, then the next time the inode gets updated, we will
> update the i_version field so the NFSv4 client can hear the sound of
> the tree crashing to the forst floor and so it can invalidate its
> local cache of the file.  :-)
> 
> This should significantly improve the performance of using the
> i_version field if the file system is being exported via NFSv4, and if
> NFSv4 is not in use, no one will be looking at the i_version field, so
> the performance impact will be very slight, and thus we could enable
> i_version updates by default for btrfs and ext4.
> 
> And this should make the distribution folks happy, since it will unify
> the behavior of all file systems, and make life easier for users who
> won't need to set certain magic mount options depending on what file
> system they are using and whether they are using NFSv4 or not.
> 
> Does this sound reasonable?

Just to make sure I understand, the logic is something like:

	to read the i_version:

		inode->i_version_seen = true;
		return inode->i_version

	to update the i_version:

		/*
		 * If nobody's seen this value of i_version then we can
		 * keep using it, otherwise we need a new one:
		 */
		if (inode->i_version_seen)
			inode->i_version++;
		inode->i_version_seen = false;

Looks OK to me.  As I say I'd expect i_version_seen == true to end up
being the common case in a lot of v4 workloads, so I'm more skeptical of
the claim of a performance improvement in the v4 case.

Could maintaining the new flag be a significant drag in itself?  If not,
then I guess we're not making things any worse there, so fine.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2015-06-25 18:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-17  7:54 [RFC PATCH v2 1/2] Btrfs: add noi_version option to disable MS_I_VERSION Liu Bo
2015-06-17  7:54 ` [RFC PATCH v2 2/2] Btrfs: improve fsync for nocow file Liu Bo
2015-06-17 15:58   ` David Sterba
2015-06-18  3:27     ` Liu Bo
2015-06-24 18:21       ` David Sterba
2015-06-25  2:24         ` Liu Bo
2015-06-25 16:10           ` David Sterba
2015-06-17 15:33 ` [RFC PATCH v2 1/2] Btrfs: add noi_version option to disable MS_I_VERSION David Sterba
2015-06-17 15:52   ` Liu Bo
2015-06-17 17:01     ` David Sterba
2015-06-18  2:46       ` Liu Bo
2015-06-18 14:38         ` i_version vs iversion (Was: Re: [RFC PATCH v2 1/2] Btrfs: add noi_version option to disable MS_I_VERSION) David Sterba
2015-06-19 11:44           ` Karel Zak
2015-06-19 11:44             ` Karel Zak
2015-06-22 20:42           ` Dave Chinner
2015-06-22 20:42             ` Dave Chinner
2015-06-24 17:28             ` David Sterba
2015-06-23 16:32           ` Theodore Ts'o
2015-06-24  8:23             ` Liu Bo
2015-06-24 18:02             ` David Sterba
2015-06-24 23:17               ` Theodore Ts'o
2015-06-24 23:17                 ` Theodore Ts'o
2015-06-24 23:59                 ` Dave Chinner
2015-06-24 23:59                   ` Dave Chinner
2015-06-25 18:46             ` J. Bruce Fields [this message]
2015-06-25 18:46               ` J. Bruce Fields
2015-06-25 22:12               ` Theodore Ts'o
2015-06-26 13:32                 ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150625184644.GA12300@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=bo.li.liu@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=dsterba@suse.cz \
    --cc=fdmanana@suse.com \
    --cc=kzak@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mingming.cao@oracle.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.