From: Jeff Layton <jlayton@kernel.org>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk,
linux-nfs@vger.kernel.org, bfields@fieldses.org, neilb@suse.de,
jack@suse.de, linux-ext4@vger.kernel.org, tytso@mit.edu,
adilger.kernel@dilger.ca, linux-xfs@vger.kernel.org,
darrick.wong@oracle.com, david@fromorbit.com,
linux-btrfs@vger.kernel.org, clm@fb.com, jbacik@fb.com,
dsterba@suse.com, linux-integrity@vger.kernel.org,
zohar@linux.vnet.ibm.com, dmitry.kasatkin@gmail.com,
linux-afs@lists.infradead.org, dhowells@redhat.com,
jaltman@auristor.com
Subject: Re: [PATCH v3 01/19] fs: new API for handling inode->i_version
Date: Mon, 18 Dec 2017 12:46:44 -0500 [thread overview]
Message-ID: <1513619204.3422.33.camel@kernel.org> (raw)
In-Reply-To: <20171218151156.14565-2-jlayton@kernel.org>
On Mon, 2017-12-18 at 10:11 -0500, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
>
> Add a documentation blob that explains what the i_version field is, how
> it is expected to work, and how it is currently implemented by various
> filesystems.
>
> We already have inode_inc_iversion. Add several other functions for
> manipulating and accessing the i_version counter. For now, the
> implementation is trivial and basically works the way that all of the
> open-coded i_version accesses work today.
>
> Future patches will convert existing users of i_version to use the new
> API, and then convert the backend implementation to do things more
> efficiently.
>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
> include/linux/fs.h | 15 ----
> include/linux/iversion.h | 205 +++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 205 insertions(+), 15 deletions(-)
> create mode 100644 include/linux/iversion.h
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 511fbaabf624..76382c24e9d0 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2036,21 +2036,6 @@ static inline void inode_dec_link_count(struct inode *inode)
> mark_inode_dirty(inode);
> }
>
> -/**
> - * inode_inc_iversion - increments i_version
> - * @inode: inode that need to be updated
> - *
> - * Every time the inode is modified, the i_version field will be incremented.
> - * The filesystem has to be mounted with i_version flag
> - */
> -
> -static inline void inode_inc_iversion(struct inode *inode)
> -{
> - spin_lock(&inode->i_lock);
> - inode->i_version++;
> - spin_unlock(&inode->i_lock);
> -}
> -
Note that this patch is broken, as I neglected to include
linux/iversion.h in the files with the existing callers of
inode_inc_iversion. Fixed in my in-tree version.
> enum file_time_flags {
> S_ATIME = 1,
> S_MTIME = 2,
> diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> new file mode 100644
> index 000000000000..bb50d27c71f9
> --- /dev/null
> +++ b/include/linux/iversion.h
> @@ -0,0 +1,205 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_IVERSION_H
> +#define _LINUX_IVERSION_H
> +
> +#include <linux/fs.h>
> +
> +/*
> + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> + * appear different to observers if there was a change to the inode's data or
> + * metadata since it was last queried.
> + *
> + * It should be considered an opaque value by observers. If it remains the same
> + * since it was last checked, then nothing has changed in the inode. If it's
> + * different then something has changed. Observers cannot infer anything about
> + * the nature or magnitude of the changes from the value, only that the inode
> + * has changed in some fashion.
> + *
> + * Not all filesystems properly implement the i_version counter. Subsystems that
> + * want to use i_version field on an inode should first check whether the
> + * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro).
> + *
> + * Those that set SB_I_VERSION will automatically have their i_version counter
> + * incremented on writes to normal files. If the SB_I_VERSION is not set, then
> + * the VFS will not touch it on writes, and the filesystem can use it how it
> + * wishes. Note that the filesystem is always responsible for updating the
> + * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.).
> + * We consider these sorts of filesystems to have a kernel-managed i_version.
> + *
> + * Note that some filesystems (e.g. NFS and AFS) just use the field to store
> + * a server-provided value (for the most part). For that reason, those
> + * filesystems do not set SB_I_VERSION. These filesystems are considered to
> + * have a self-managed i_version.
> + */
> +
> +/**
> + * inode_set_iversion_raw - set i_version to the specified raw value
> + * @inode: inode to set
> + * @new: new i_version value to set
> + *
> + * Set @inode's i_version field to @new. This function is for use by
> + * filesystems that self-manage the i_version.
> + *
> + * For example, the NFS client stores its NFSv4 change attribute in this way,
> + * and the AFS client stores the data_version from the server here.
> + */
> +static inline void
> +inode_set_iversion_raw(struct inode *inode, const u64 new)
> +{
> + inode->i_version = new;
> +}
> +
> +/**
> + * inode_set_iversion - set i_version to a particular value
> + * @inode: inode to set
> + * @new: new i_version value to set
> + *
> + * Set @inode's i_version field to @new. This function is for filesystems with
> + * a kernel-managed i_version.
> + *
> + * For now, this just does the same thing as the _raw variant.
> + */
> +static inline void
> +inode_set_iversion(struct inode *inode, const u64 new)
> +{
> + inode_set_iversion_raw(inode, new);
> +}
> +
> +/**
> + * inode_set_iversion_queried - set i_version to a particular value and set
> + * flag to indicate that it has been viewed
> + * @inode: inode to set
> + * @new: new i_version value to set
> + *
> + * When loading in an i_version value from a backing store, we typically don't
> + * know whether it was previously viewed before being stored or not. Thus, we
> + * must assume that it was, to ensure that any changes will result in the
> + * value changing.
> + *
> + * This function will set the inode's i_version, and possibly flag the value
> + * as if it has already been viewed at least once.
> + *
> + * For now, this just does what inode_set_iversion does.
> + */
> +static inline void
> +inode_set_iversion_queried(struct inode *inode, const u64 new)
> +{
> + inode_set_iversion(inode, new);
> +}
> +
> +/**
> + * inode_maybe_inc_iversion - increments i_version
> + * @inode: inode with the i_version that should be updated
> + * @force: increment the counter even if it's not necessary
> + *
> + * Every time the inode is modified, the i_version field must be seen to have
> + * changed by any observer.
> + *
> + * In this implementation, we always increment it after taking the i_lock to
> + * ensure that we don't race with other incrementors.
> + *
> + * Returns true if counter was bumped, and false if it wasn't.
> + */
> +static inline bool
> +inode_maybe_inc_iversion(struct inode *inode, bool force)
> +{
> + spin_lock(&inode->i_lock);
> + inode->i_version++;
> + spin_unlock(&inode->i_lock);
> + return true;
> +}
> +
> +/**
> + * inode_inc_iversion - forcibly increment i_version
> + * @inode: inode that needs to be updated
> + *
> + * Forcbily increment the i_version field. This always results in a change to
> + * the observable value.
> + */
> +static inline void
> +inode_inc_iversion(struct inode *inode)
> +{
> + inode_maybe_inc_iversion(inode, true);
> +}
> +
> +/**
> + * inode_iversion_need_inc - is the i_version in need of being incremented?
> + * @inode: inode to check
> + *
> + * Returns whether the inode->i_version counter needs incrementing on the next
> + * change.
> + *
> + * For now, we assume that it always does.
> + */
> +static inline bool
> +inode_iversion_need_inc(struct inode *inode)
> +{
> + return true;
> +}
> +
> +/**
> + * inode_peek_iversion_raw - grab a "raw" iversion value
> + * @inode: inode from which i_version should be read
> + *
> + * Grab a "raw" inode->i_version value and return it. The i_version is not
> + * flagged or converted in any way. This is mostly used to access a self-managed
> + * i_version.
> + *
> + * With those filesystems, we want to treat the i_version as an entirely
> + * opaque value.
> + */
> +static inline u64
> +inode_peek_iversion_raw(const struct inode *inode)
> +{
> + return inode->i_version;
> +}
> +
> +/**
> + * inode_peek_iversion - read i_version without flagging it to be incremented
> + * @inode: inode from which i_version should be read
> + *
> + * Read the inode i_version counter for an inode without registering it as a
> + * query.
> + *
> + * This is typically used by local filesystems that need to store an i_version
> + * on disk. In that situation, it's not necessary to flag it as having been
> + * viewed, as the result won't be used to gauge changes from that point.
> + */
> +static inline u64
> +inode_peek_iversion(const struct inode *inode)
> +{
> + return inode_peek_iversion_raw(inode);
> +}
> +
> +/**
> + * inode_query_iversion - read i_version for later use
> + * @inode: inode from which i_version should be read
> + *
> + * Read the inode i_version counter. This should be used by callers that wish
> + * to store the returned i_version for later comparison. This will guarantee
> + * that a later query of the i_version will result in a different value if
> + * anything has changed.
> + *
> + * This implementation just does a peek.
> + */
> +static inline u64
> +inode_query_iversion(struct inode *inode)
> +{
> + return inode_peek_iversion(inode);
> +}
> +
> +/**
> + * inode_cmp_iversion - check whether the i_version counter has changed
> + * @inode: inode to check
> + * @old: old value to check against its i_version
> + *
> + * Compare an i_version counter with a previous one. Returns 0 if they are
> + * the same or non-zero if they are different.
> + */
> +static inline s64
> +inode_cmp_iversion(const struct inode *inode, const u64 old)
> +{
> + return (s64)inode_peek_iversion(inode) - (s64)old;
> +}
> +#endif
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2017-12-18 17:46 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-18 15:11 [PATCH v3 00/19] fs: rework and optimize i_version handling in filesystems Jeff Layton
2017-12-18 15:11 ` [PATCH v3 01/19] fs: new API for handling inode->i_version Jeff Layton
2017-12-18 17:46 ` Jeff Layton [this message]
2017-12-18 15:11 ` [PATCH v3 02/19] fs: don't take the i_lock in inode_inc_iversion Jeff Layton
2017-12-18 15:11 ` [PATCH v3 03/19] fat: convert to new i_version API Jeff Layton
2017-12-18 15:11 ` [PATCH v3 04/19] affs: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 05/19] afs: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 06/19] btrfs: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 07/19] exofs: switch " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 08/19] ext2: convert " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 09/19] ext4: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 10/19] nfs: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 11/19] nfsd: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 12/19] ocfs2: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 13/19] ufs: use " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 14/19] xfs: convert to " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 15/19] IMA: switch IMA over " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 16/19] fs: only set S_VERSION when updating times if necessary Jeff Layton
2017-12-18 16:07 ` Jan Kara
2017-12-18 17:25 ` Jeff Layton
2017-12-18 15:11 ` [PATCH v3 17/19] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2017-12-18 15:11 ` [PATCH v3 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2017-12-18 15:11 ` [PATCH v3 19/19] fs: handle inode->i_version more efficiently Jeff Layton
2017-12-18 16:34 ` Jan Kara
2017-12-18 17:22 ` Jeff Layton
2017-12-18 17:36 ` J. Bruce Fields
2017-12-18 19:35 ` Jeff Layton
2017-12-18 22:07 ` Dave Chinner
2017-12-20 14:03 ` Jeff Layton
2017-12-20 16:41 ` Jan Kara
2017-12-21 11:25 ` Jeff Layton
2017-12-21 11:48 ` Jan Kara
2017-12-19 9:29 ` Jan Kara
2017-12-19 17:14 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1513619204.3422.33.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=adilger.kernel@dilger.ca \
--cc=bfields@fieldses.org \
--cc=clm@fb.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=dhowells@redhat.com \
--cc=dmitry.kasatkin@gmail.com \
--cc=dsterba@suse.com \
--cc=jack@suse.de \
--cc=jaltman@auristor.com \
--cc=jbacik@fb.com \
--cc=linux-afs@lists.infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-integrity@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=zohar@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).