From: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
To: Jeff Layton <jlayton-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
"J. Bruce Fields"
<bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
neilb-l3A5Bk7waGM@public.gmane.org,
jack-l3A5Bk7waGM@public.gmane.org,
linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
tytso-3s7WtUTddSA@public.gmane.org,
adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org,
linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
clm-b10kYP2dOMg@public.gmane.org,
jbacik-b10kYP2dOMg@public.gmane.org,
dsterba-IBi9RG/b67k@public.gmane.org,
linux-integrity-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
zohar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org,
dmitry.kasatkin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jaltman-hRzEac23uH1Wk0Htik3J/w@public.gmane.org
Subject: Re: [PATCH v3 19/19] fs: handle inode->i_version more efficiently
Date: Wed, 20 Dec 2017 17:41:50 +0100 [thread overview]
Message-ID: <20171220164150.GF31584@quack2.suse.cz> (raw)
In-Reply-To: <1513778586.4513.18.camel-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
On Wed 20-12-17 09:03:06, Jeff Layton wrote:
> On Tue, 2017-12-19 at 09:07 +1100, Dave Chinner wrote:
> > On Mon, Dec 18, 2017 at 02:35:20PM -0500, Jeff Layton wrote:
> > > [PATCH] SQUASH: add memory barriers around i_version accesses
> >
> > Why explicit memory barriers rather than annotating the operations
> > with the required semantics and getting the barriers the arch
> > requires automatically? I suspect this should be using
> > atomic_read_acquire() and atomic_cmpxchg_release(), because AFAICT
> > the atomic_cmpxchg needs to have release semantics to match the
> > acquire semantics needed for the load of the current value.
> >
> > From include/linux/atomics.h:
> >
> > * For compound atomics performing both a load and a store, ACQUIRE
> > * semantics apply only to the load and RELEASE semantics only to the
> > * store portion of the operation. Note that a failed cmpxchg_acquire
> > * does -not- imply any memory ordering constraints.
> >
> > Memory barriers hurt my brain. :/
> >
> > At minimum, shouldn't the atomic op specific barriers be used rather
> > than full memory barriers? i.e:
> >
>
> They hurt my brain too. Yes, definitely atomic-specific barriers should
> be used here instead, since this is an atomic64_t now.
>
> After going over the docs again...my understanding has always been that
> you primarily need memory barriers to order accesses to different areas
> of memory.
That is correct.
> As Jan and I were discussing in the other thread, i_version is not
> synchronized with anything else. In this code, we're only dealing with a
> single 64-bit word. I don't think there are any races there wrt the API
> itself.
There are not but it is like saying that lock implementation is correct
because the lock state does not get corrupted ;). Who cares about protected
data...
> The "legacy" inode_inc_iversion() however _does_ have implied memory
> barriers from the i_lock. There could be some subtle de-facto ordering
> there, so I think we probably do want some barriers in here if only to
> preserve that. It's not likely to cost much, and may save us tracking
> down some fiddly bugs.
>
> What about this patch? Note that I've only added barriers to
> inode_maybe_inc_iversion. I don't see that we need it for the other
> functions, but please do tell me if I'm wrong there:
>
> --------------------8<---------------------------
>
> [PATCH] SQUASH: add memory barriers around i_version accesses
>
> Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> include/linux/iversion.h | 53 +++++++++++++++++++++++++++++-------------------
> 1 file changed, 32 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> index a9fbf99709df..02187a3bec3b 100644
> --- a/include/linux/iversion.h
> +++ b/include/linux/iversion.h
> @@ -89,6 +89,23 @@ inode_set_iversion_raw(struct inode *inode, const u64 val)
> atomic64_set(&inode->i_version, val);
> }
>
> +/**
> + * inode_peek_iversion_raw - grab a "raw" iversion value
> + * @inode: inode from which i_version should be read
> + *
> + * Grab a "raw" inode->i_version value and return it. The i_version is not
> + * flagged or converted in any way. This is mostly used to access a self-managed
> + * i_version.
> + *
> + * With those filesystems, we want to treat the i_version as an entirely
> + * opaque value.
> + */
> +static inline u64
> +inode_peek_iversion_raw(const struct inode *inode)
> +{
> + return atomic64_read(&inode->i_version);
> +}
> +
> /**
> * inode_set_iversion - set i_version to a particular value
> * @inode: inode to set
> @@ -152,7 +169,16 @@ inode_maybe_inc_iversion(struct inode *inode, bool force)
> {
> u64 cur, old, new;
>
> - cur = (u64)atomic64_read(&inode->i_version);
> + /*
> + * The i_version field is not strictly ordered with any other inode
> + * information, but the legacy inode_inc_iversion code used a spinlock
> + * to serialize increments.
> + *
> + * This code adds full memory barriers to ensure that any de-facto
> + * ordering with other info is preserved.
> + */
> + smp_mb__before_atomic();
This should be just smp_mb(). __before_atomic() pairs with atomic
operations like atomic_inc(). atomic_read() is completely unordered
operation (happens to be plain memory read on x86) and so __before_atomic()
is not enough.
> + cur = inode_peek_iversion_raw(inode);
> for (;;) {
> /* If flag is clear then we needn't do anything */
> if (!force && !(cur & I_VERSION_QUERIED))
> @@ -162,8 +188,10 @@ inode_maybe_inc_iversion(struct inode *inode, bool force)
> new = (cur & ~I_VERSION_QUERIED) + I_VERSION_INCREMENT;
>
> old = atomic64_cmpxchg(&inode->i_version, cur, new);
> - if (likely(old == cur))
> + if (likely(old == cur)) {
> + smp_mb__after_atomic();
I don't think you need this. Cmpxchg is guaranteed to be full memory
barrier - from Documentation/atomic_t.txt:
- RMW operations that have a return value are fully ordered;
> break;
> + }
> cur = old;
> }
> return true;
...
> @@ -248,7 +259,7 @@ inode_query_iversion(struct inode *inode)
> {
> u64 cur, old, new;
>
> - cur = atomic64_read(&inode->i_version);
> + cur = inode_peek_iversion_raw(inode);
> for (;;) {
> /* If flag is already set, then no need to swap */
> if (cur & I_VERSION_QUERIED)
And here I'd expect smp_mb() after inode_peek_iversion_raw() (actually be
needed only if you are not going to do cmpxchg as that implies barrier as
well). "Safe" use of i_version would be:
Update:
modify inode
inode_maybe_inc_iversion(inode)
Read:
my_version = inode_query_iversion(inode)
get inode data
And you need to make sure 'get inode data' does not get speculatively
evaluated before you actually sample i_version so that you are guaranteed
that if data changes, you will observe larger i_version in the future.
Also please add a comment smp_mb() in inode_maybe_inc_iversion() like:
/* This barrier pairs with the barrier in inode_query_iversion() */
and a similar comment to inode_query_iversion(). Because memory barriers
make sense only in pairs (see SMP BARRIER PAIRING in
Documentation/memory-barriers.txt).
Honza
--
Jan Kara <jack-IBi9RG/b67k@public.gmane.org>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-12-20 16:41 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-18 15:11 [PATCH v3 00/19] fs: rework and optimize i_version handling in filesystems Jeff Layton
2017-12-18 15:11 ` [PATCH v3 01/19] fs: new API for handling inode->i_version Jeff Layton
2017-12-18 17:46 ` Jeff Layton
2017-12-18 15:11 ` [PATCH v3 02/19] fs: don't take the i_lock in inode_inc_iversion Jeff Layton
2017-12-18 15:11 ` [PATCH v3 03/19] fat: convert to new i_version API Jeff Layton
2017-12-18 15:11 ` [PATCH v3 04/19] affs: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 05/19] afs: " Jeff Layton
[not found] ` <20171218151156.14565-1-jlayton-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-12-18 15:11 ` [PATCH v3 06/19] btrfs: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 12/19] ocfs2: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 15/19] IMA: switch IMA over " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 07/19] exofs: switch " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 08/19] ext2: convert " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 09/19] ext4: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 10/19] nfs: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 11/19] nfsd: " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 13/19] ufs: use " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 14/19] xfs: convert to " Jeff Layton
2017-12-18 15:11 ` [PATCH v3 16/19] fs: only set S_VERSION when updating times if necessary Jeff Layton
[not found] ` <20171218151156.14565-17-jlayton-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-12-18 16:07 ` Jan Kara
2017-12-18 17:25 ` Jeff Layton
2017-12-18 15:11 ` [PATCH v3 17/19] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2017-12-18 15:11 ` [PATCH v3 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2017-12-18 15:11 ` [PATCH v3 19/19] fs: handle inode->i_version more efficiently Jeff Layton
2017-12-18 16:34 ` Jan Kara
2017-12-18 17:22 ` Jeff Layton
2017-12-18 17:36 ` J. Bruce Fields
2017-12-18 19:35 ` Jeff Layton
2017-12-18 22:07 ` Dave Chinner
2017-12-20 14:03 ` Jeff Layton
[not found] ` <1513778586.4513.18.camel-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-12-20 16:41 ` Jan Kara [this message]
[not found] ` <20171220164150.GF31584-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-12-21 11:25 ` Jeff Layton
2017-12-21 11:48 ` Jan Kara
2017-12-19 9:29 ` Jan Kara
[not found] ` <20171219092947.GC2277-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-12-19 17:14 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171220164150.GF31584@quack2.suse.cz \
--to=jack-alswssmvlrq@public.gmane.org \
--cc=adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org \
--cc=bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org \
--cc=clm-b10kYP2dOMg@public.gmane.org \
--cc=darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
--cc=dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=dmitry.kasatkin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=dsterba-IBi9RG/b67k@public.gmane.org \
--cc=jack-l3A5Bk7waGM@public.gmane.org \
--cc=jaltman-hRzEac23uH1Wk0Htik3J/w@public.gmane.org \
--cc=jbacik-b10kYP2dOMg@public.gmane.org \
--cc=jlayton-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-integrity-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=neilb-l3A5Bk7waGM@public.gmane.org \
--cc=tytso-3s7WtUTddSA@public.gmane.org \
--cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
--cc=zohar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).