Re: lockless case of retain_dentry() (was Re: [PATCH 09/15] fold the call of retain_dentry() into fast_dput())

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Guo Ren <guoren@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: lockless case of retain_dentry() (was Re: [PATCH 09/15] fold the call of retain_dentry() into fast_dput())
Date: Thu, 30 Nov 2023 22:36:43 -0500	[thread overview]
Message-ID: <ZWlUy1wElujRfDLA@gmail.com> (raw)
In-Reply-To: <CAHk-=wgSsUKn0piCv_=7XZh6L07BNQHLH3CX1YUQ0G=MEpRSJA@mail.gmail.com>

On Fri, Dec 01, 2023 at 10:09:01AM +0900, Linus Torvalds wrote:
> On Thu, 30 Nov 2023 at 19:01, Guo Ren <guoren@kernel.org> wrote:
> >
> > That needs the expensive mechanism DynAMO [1], but some power-efficient
> > core lacks the capability. Yes, powerful OoO hardware could virtually
> > satisfy you by a minimum number of retries, but why couldn't we
> > explicitly tell hardware for "prefetchw"?
> 
> Because every single time we've had a prefetch in the kernel, it has
> caused problems. A bit like cpu_relax() - these things get added for
> random hardware where it helps, and then a few years later it turns
> out that it hurts almost everywhere else.
> 
> We've had particular problems with 'prefetch' because it turns out
> that (a) nobody sane uses them so (b) hardware is often buggy. And
> here "buggy" may be just performance (ie "prefetch actually stalls on
> TLB lookup" etc broken behavior that means that prefetch is not even
> remotely like a no-op that just hints to the cache subsystem), but
> sometimes even in actual semantics (ie "prefetch causes spurious
> faulting behavior")
Thanks for sharing your experience, I now know the problem with generic
prefetchw.

But what to do with these codes?
➜  linux git:(master) ✗ grep prefetchw mm/ fs/ kernel/ -r
mm/slub.c:      prefetchw(object + s->offset);
mm/slab.c:      prefetchw(objp);
mm/page_alloc.c:        prefetchw(p);
mm/page_alloc.c:                prefetchw(p + 1);
mm/vmscan.c:#define prefetchw_prev_lru_folio(_folio, _base, _field)
mm/vmscan.c:                    prefetchw(&prev->_field);
mm/vmscan.c:#define prefetchw_prev_lru_folio(_folio, _base, _field) do
mm/vmscan.c:            prefetchw_prev_lru_folio(folio, src, flags);
fs/mpage.c:             prefetchw(&folio->flags);
fs/f2fs/data.c:                 prefetchw(&page->flags);
fs/ext4/readpage.c:             prefetchw(&folio->flags);
kernel/bpf/cpumap.c:                    prefetchw(page);
kernel/locking/qspinlock.c:                     prefetchw(next);
➜  linux git:(master) ✗ grep prefetchw drivers/ -r | wc -l
80
...

> 
> > Advanced hardware would treat cmpxchg as interconnect transactions when
> > cache miss(far atomic), which means L3 cache wouldn't return a unique
> > cacheline even when cmpxchg fails. The cmpxchg loop would continue to
> > read data bypassing the L1/L2 cache, which means every failure cmpxchg
> > is a cache-miss read.
> 
> Honestly, I wouldn't call that "advanced hardware". I would call that
> ridiculous.
Ridiculous Hardware:
When CAS fails, the hardware still keeps "far atomic" mode.

Correct Hardware:
When CAS fails, the hardware should change to "near-atomic," which means
acquiring an exclusive cache line and making progress.

> 
> If the cmpxchg isn't guaranteed to make progress, then the cmpxchg is
> broken. It's really that simple.
I totally agree, and it's a correct guide, Thx.

> 
> It does sound like on your hardware, maybe you just want to make the
> RISC-V cmpxchg function always do a "prefetchw" if the 'sc.d' fails,
> something like
> 
>                         "0:     lr.w %0, %2\n"                          \
>                         "       bne  %0, %z3, 1f\n"                     \
>                         "       sc.w %1, %z4, %2\n"                     \
> -                       "       bnez %1, 0b\n"                          \
> +                       "       beqz %1, 1f\n"                          \
> +                       "       prefetchw %2\n"                         \
> +                       "       j 0b\n"                                 \
>                         "1:\n"                                          \

I modify your code to guarantee the progress of the comparison failure
situation:
Final version (for easy read):
                         "0:     lr.w %0, %2\n"                          \
                         "       bne  %0, %z3, 2f\n"                     \
                         "       sc.w %1, %z4, %2\n"                     \
                         "       beqz %1, 1f\n"                          \
                         "       prefetchw %2\n"                         \
                         "       j 0b\n"                         	 \
                         "2:\n"                                          \
                         "       prefetchw %2\n"                         \
                         "1:\n"                                          \

Diff version:
                         "0:     lr.w %0, %2\n"                          \
 -                       "       bne  %0, %z3, 1f\n"                     \
 +                       "       bne  %0, %z3, 2f\n"                     \
                         "       sc.w %1, %z4, %2\n"                     \
 -                       "       bnez %1, 0b\n"                          \
 +                       "       beqz %1, 1f\n"                          \
 +                       "       prefetchw %2\n"                         \
 +                       "       j 0b\n"                         	 \
 +                       "2:\n"                                          \
 +                       "       prefetchw %2\n"                         \
                         "1:\n"                                          \

> 
> (quick entirely untested hack, you get the idea). A better
> implementation might use "asm goto" and expose the different error
> cases to the compiler so that it can move things around, but I'm not
> convinced it's worth the effort.
> 
> But no, we're *not* adding a prefetchw to generic code just because
> apparently some RISC-V code is doing bad things. You need to keep
> workarounds for RISC-V behavior to RISC-V.
> 
> And yes, the current "retry count" in our lockref implementation comes
> from another "some hardware does bad things for cmpxchg". But that
> workaround at most causes a few extra (regular) ALU instructions, and
> while not optimal, it's at least not going to cause any bigger
> problems.
> 
>            Linus
>

next prev parent reply	other threads:[~2023-12-01  3:36 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-30  0:37 [RFC] simplifying fast_dput(), dentry_kill() et.al Al Viro
2023-10-30 21:53 ` Al Viro
2023-10-30 22:18   ` Linus Torvalds
2023-10-31  0:18     ` Al Viro
2023-10-31  1:53       ` Al Viro
2023-10-31  6:12         ` Al Viro
2023-11-01  6:18           ` Al Viro
2023-11-01  6:20           ` [PATCH 01/15] fast_dput(): having ->d_delete() is not reason to delay refcount decrement Al Viro
2023-11-01  6:20             ` [PATCH 02/15] fast_dput(): handle underflows gracefully Al Viro
2023-11-01  6:20             ` [PATCH 03/15] fast_dput(): new rules for refcount Al Viro
2023-11-01  6:20             ` [PATCH 04/15] __dput_to_list(): do decrement of refcount in the caller Al Viro
2023-11-01  6:20             ` [PATCH 05/15] retain_dentry(): lift decrement of ->d_count into callers Al Viro
2023-11-01  6:20             ` [PATCH 06/15] __dentry_kill(): get consistent rules for ->d_count Al Viro
2023-11-01  6:20             ` [PATCH 07/15] dentry_kill(): don't bother with retain_dentry() on slow path Al Viro
2023-11-01  6:20             ` [PATCH 08/15] Call retain_dentry() with refcount 0 Al Viro
2023-11-01  6:20             ` [PATCH 09/15] fold the call of retain_dentry() into fast_dput() Al Viro
2023-11-01  8:45               ` Al Viro
2023-11-01 17:30                 ` Linus Torvalds
2023-11-01 18:19                   ` Al Viro
2023-11-10  4:20                     ` lockless case of retain_dentry() (was Re: [PATCH 09/15] fold the call of retain_dentry() into fast_dput()) Al Viro
2023-11-10  5:57                       ` Linus Torvalds
2023-11-10  6:22                         ` Linus Torvalds
2023-11-22  6:29                           ` Guo Ren
2023-11-10  8:19                         ` Al Viro
2023-11-22  7:19                         ` Guo Ren
2023-11-22 17:20                           ` Linus Torvalds
2023-11-22 17:52                             ` Linus Torvalds
2023-11-22 18:05                               ` Linus Torvalds
2023-11-22 19:11                               ` Linus Torvalds
2023-11-29  7:14                                 ` Guo Ren
2023-11-29 12:25                                 ` Guo Ren
2023-11-29 14:42                                   ` Linus Torvalds
2023-11-26 16:39                             ` Guo Ren
2023-11-26 16:51                               ` Linus Torvalds
2023-11-30 10:00                                 ` Guo Ren
2023-12-01  1:09                                   ` Linus Torvalds
2023-12-01  3:36                                     ` Guo Ren [this message]
2023-12-01  5:15                                       ` Linus Torvalds
2023-12-01  7:31                                         ` Guo Ren
2023-11-26 16:51                               ` Guo Ren
2023-11-26 17:06                               ` Linus Torvalds
2023-11-26 17:59                                 ` Linus Torvalds
2023-11-29  9:52                                 ` Guo Ren
2023-11-01  6:20             ` [PATCH 10/15] don't try to cut corners in shrink_lock_dentry() Al Viro
2023-11-01  6:21             ` [PATCH 11/15] fold dentry_kill() into dput() Al Viro
2023-11-01  6:21             ` [PATCH 12/15] get rid of __dget() Al Viro
2023-11-01  6:21             ` [PATCH 13/15] shrink_dentry_list(): no need to check that dentry refcount is marked dead Al Viro
2023-11-01  6:21             ` [PATCH 14/15] to_shrink_list(): call only if refcount is 0 Al Viro
2023-11-01  6:21             ` [PATCH 15/15] switch select_collect{,2}() to use of to_shrink_list() Al Viro
2023-11-01  2:22       ` [RFC] simplifying fast_dput(), dentry_kill() et.al Al Viro
2023-11-01 14:29         ` Benjamin Coddington
2023-11-05 19:54       ` Al Viro
2023-11-05 21:59         ` Al Viro
2023-11-06  5:53         ` Al Viro
2023-11-07  2:08           ` Al Viro
2023-11-09  6:19             ` [RFC][PATCHSET v2] " Al Viro
2023-11-09  6:20               ` [PATCH 01/22] struct dentry: get rid of randomize_layout idiocy Al Viro
2023-11-09  6:20                 ` [PATCH 02/22] switch nfsd_client_rmdir() to use of simple_recursive_removal() Al Viro
2023-11-09 13:42                   ` Christian Brauner
2023-11-09 14:01                   ` Chuck Lever
2023-11-09 18:47                     ` Al Viro
2023-11-09 18:50                       ` Chuck Lever III
2023-11-09  6:20                 ` [PATCH 03/22] coda_flag_children(): cope with dentries turning negative Al Viro
2023-11-09 13:43                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 04/22] dentry: switch the lists of children to hlist Al Viro
2023-11-09 13:48                   ` Christian Brauner
2023-11-09 19:32                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 05/22] centralize killing dentry from shrink list Al Viro
2023-11-09 13:49                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 06/22] get rid of __dget() Al Viro
2023-11-09 13:50                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 07/22] shrink_dentry_list(): no need to check that dentry refcount is marked dead Al Viro
2023-11-09 13:53                   ` Christian Brauner
2023-11-09 20:28                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 08/22] fast_dput(): having ->d_delete() is not reason to delay refcount decrement Al Viro
2023-11-09 13:58                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 09/22] fast_dput(): handle underflows gracefully Al Viro
2023-11-09 14:46                   ` Christian Brauner
2023-11-09 20:39                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 10/22] fast_dput(): new rules for refcount Al Viro
2023-11-09 14:54                   ` Christian Brauner
2023-11-09 20:52                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 11/22] __dput_to_list(): do decrement of refcount in the callers Al Viro
2023-11-09 15:21                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 12/22] Make retain_dentry() neutral with respect to refcounting Al Viro
2023-11-09 15:22                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 13/22] __dentry_kill(): get consistent rules for victim's refcount Al Viro
2023-11-09 15:27                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 14/22] dentry_kill(): don't bother with retain_dentry() on slow path Al Viro
2023-11-09 15:53                   ` Christian Brauner
2023-11-09 21:29                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 15/22] Call retain_dentry() with refcount 0 Al Viro
2023-11-09 16:09                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 16/22] fold the call of retain_dentry() into fast_dput() Al Viro
2023-11-09 16:17                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 17/22] don't try to cut corners in shrink_lock_dentry() Al Viro
2023-11-09 17:20                   ` Christian Brauner
2023-11-09 21:45                     ` Al Viro
2023-11-10  9:07                       ` Christian Brauner
2023-11-09 17:39                   ` Linus Torvalds
2023-11-09 18:11                     ` Linus Torvalds
2023-11-09 18:20                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 18/22] fold dentry_kill() into dput() Al Viro
2023-11-09 17:22                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 19/22] to_shrink_list(): call only if refcount is 0 Al Viro
2023-11-09 17:29                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 20/22] switch select_collect{,2}() to use of to_shrink_list() Al Viro
2023-11-09 17:31                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 21/22] d_prune_aliases(): use a shrink list Al Viro
2023-11-09 17:33                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 22/22] __dentry_kill(): new locking scheme Al Viro
2023-11-10 13:34                   ` Christian Brauner
2023-11-09 13:33                 ` [PATCH 01/22] struct dentry: get rid of randomize_layout idiocy Christian Brauner
2023-10-31  2:25     ` [RFC] simplifying fast_dput(), dentry_kill() et.al Gao Xiang
2023-10-31  2:29       ` Gao Xiang
2023-10-31  3:02       ` Linus Torvalds
2023-10-31  3:13         ` Gao Xiang
2023-10-31  3:26         ` Al Viro
2023-10-31  3:41           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZWlUy1wElujRfDLA@gmail.com \
    --to=guoren@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).