public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: brauner@kernel.org, viro@zeniv.linux.org.uk, jack@suse.cz,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Jeff Layton <jlayton@kernel.org>
Subject: Re: [PATCH] vfs: elide smp_mb in iversion handling in the common case
Date: Tue, 27 Aug 2024 12:00:45 +0200	[thread overview]
Message-ID: <20240827100045.m3mpko3tvmmjkmvm@quack3> (raw)
In-Reply-To: <20240815083310.3865-1-mjguzik@gmail.com>

On Thu 15-08-24 10:33:10, Mateusz Guzik wrote:
> According to bpftrace on these routines most calls result in cmpxchg,
> which already provides the same guarantee.
> 
> In inode_maybe_inc_iversion elision is possible because even if the
> wrong value was read due to now missing smp_mb fence, the issue is going
> to correct itself after cmpxchg. If it appears cmpxchg wont be issued,
> the fence + reload are there bringing back previous behavior.
> 
> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> ---
> 
> chances are this entire barrier guarantee is of no significance, but i'm
> not signing up to review it

Jeff might have a ready answer here - added to CC. I think the barrier is
needed in principle so that you can guarantee that after a data change you
will be able to observe an i_version change.

> I verified the force flag is not *always* set (but it is set in the most
> common case).

Well, I'm not convinced the more complicated code is really worth it.
'force' will be set when we update timestamps which happens once per tick
(usually 1-4 ms). So that is common case on lightly / moderately loaded
system. On heavily write(2)-loaded system, 'force' should be mostly false
and unless you also heavily stat(2) the modified files, the common path is
exactly the "if (!force && !(cur & I_VERSION_QUERIED))" branch. So saving
one smp_mb() on moderately loaded system per couple of ms (per inode)
doesn't seem like a noticeable win...

									Honza
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 8aa34870449f..61ae4811270a 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1990,13 +1990,19 @@ bool inode_maybe_inc_iversion(struct inode *inode, bool force)
>  	 * information, but the legacy inode_inc_iversion code used a spinlock
>  	 * to serialize increments.
>  	 *
> -	 * Here, we add full memory barriers to ensure that any de-facto
> -	 * ordering with other info is preserved.
> +	 * We add a full memory barrier to ensure that any de facto ordering
> +	 * with other state is preserved (either implicitly coming from cmpxchg
> +	 * or explicitly from smp_mb if we don't know upfront if we will execute
> +	 * the former).
>  	 *
> -	 * This barrier pairs with the barrier in inode_query_iversion()
> +	 * These barriers pair with inode_query_iversion().
>  	 */
> -	smp_mb();
>  	cur = inode_peek_iversion_raw(inode);
> +	if (!force && !(cur & I_VERSION_QUERIED)) {
> +		smp_mb();
> +		cur = inode_peek_iversion_raw(inode);
> +	}
> +
>  	do {
>  		/* If flag is clear then we needn't do anything */
>  		if (!force && !(cur & I_VERSION_QUERIED))
> @@ -2025,20 +2031,22 @@ EXPORT_SYMBOL(inode_maybe_inc_iversion);
>  u64 inode_query_iversion(struct inode *inode)
>  {
>  	u64 cur, new;
> +	bool fenced = false;
>  
> +	/*
> +	 * Memory barriers (implicit in cmpxchg, explicit in smp_mb) pair with
> +	 * inode_maybe_inc_iversion(), see that routine for more details.
> +	 */
>  	cur = inode_peek_iversion_raw(inode);
>  	do {
>  		/* If flag is already set, then no need to swap */
>  		if (cur & I_VERSION_QUERIED) {
> -			/*
> -			 * This barrier (and the implicit barrier in the
> -			 * cmpxchg below) pairs with the barrier in
> -			 * inode_maybe_inc_iversion().
> -			 */
> -			smp_mb();
> +			if (!fenced)
> +				smp_mb();
>  			break;
>  		}
>  
> +		fenced = true;
>  		new = cur | I_VERSION_QUERIED;
>  	} while (!atomic64_try_cmpxchg(&inode->i_version, &cur, new));
>  	return cur >> I_VERSION_QUERIED_SHIFT;
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2024-08-27 10:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-15  8:33 [PATCH] vfs: elide smp_mb in iversion handling in the common case Mateusz Guzik
2024-08-16 10:56 ` Christian Brauner
2024-08-27 10:00 ` Jan Kara [this message]
2024-08-27 10:21   ` Mateusz Guzik
2024-08-27 10:50 ` Jeff Layton
2024-08-27 12:29 ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240827100045.m3mpko3tvmmjkmvm@quack3 \
    --to=jack@suse.cz \
    --cc=brauner@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox