From: Peter Zijlstra <peterz@infradead.org>
To: Andrii Nakryiko <andrii@kernel.org>
Cc: linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org,
oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org,
surenb@google.com, akpm@linux-foundation.org, mjguzik@gmail.com,
brauner@kernel.org, jannh@google.com, mhocko@kernel.org,
vbabka@suse.cz, shakeel.butt@linux.dev, hannes@cmpxchg.org,
Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com
Subject: Re: [PATCH v3 tip/perf/core 1/4] mm: introduce mmap_lock_speculation_{start|end}
Date: Wed, 23 Oct 2024 22:10:31 +0200 [thread overview]
Message-ID: <20241023201031.GF11151@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20241010205644.3831427-2-andrii@kernel.org>
On Thu, Oct 10, 2024 at 01:56:41PM -0700, Andrii Nakryiko wrote:
> From: Suren Baghdasaryan <surenb@google.com>
>
> Add helper functions to speculatively perform operations without
> read-locking mmap_lock, expecting that mmap_lock will not be
> write-locked and mm is not modified from under us.
>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> Link: https://lore.kernel.org/bpf/20240912210222.186542-1-surenb@google.com
> ---
> include/linux/mm_types.h | 3 ++
> include/linux/mmap_lock.h | 72 ++++++++++++++++++++++++++++++++-------
> kernel/fork.c | 3 --
> 3 files changed, 63 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6e3bdf8e38bc..5d8cdebd42bc 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -887,6 +887,9 @@ struct mm_struct {
> * Roughly speaking, incrementing the sequence number is
> * equivalent to releasing locks on VMAs; reading the sequence
> * number can be part of taking a read lock on a VMA.
> + * Incremented every time mmap_lock is write-locked/unlocked.
> + * Initialized to 0, therefore odd values indicate mmap_lock
> + * is write-locked and even values that it's released.
> *
> * Can be modified under write mmap_lock using RELEASE
> * semantics.
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index de9dc20b01ba..9d23635bc701 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -71,39 +71,84 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm)
> }
>
> #ifdef CONFIG_PER_VMA_LOCK
> +static inline void init_mm_lock_seq(struct mm_struct *mm)
> +{
> + mm->mm_lock_seq = 0;
> +}
> +
> /*
> - * Drop all currently-held per-VMA locks.
> - * This is called from the mmap_lock implementation directly before releasing
> - * a write-locked mmap_lock (or downgrading it to read-locked).
> - * This should normally NOT be called manually from other places.
> - * If you want to call this manually anyway, keep in mind that this will release
> - * *all* VMA write locks, including ones from further up the stack.
> + * Increment mm->mm_lock_seq when mmap_lock is write-locked (ACQUIRE semantics)
> + * or write-unlocked (RELEASE semantics).
> */
> -static inline void vma_end_write_all(struct mm_struct *mm)
> +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire)
> {
> mmap_assert_write_locked(mm);
> /*
> * Nobody can concurrently modify mm->mm_lock_seq due to exclusive
> * mmap_lock being held.
> - * We need RELEASE semantics here to ensure that preceding stores into
> - * the VMA take effect before we unlock it with this store.
> - * Pairs with ACQUIRE semantics in vma_start_read().
> */
> - smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
> +
> + if (acquire) {
> + WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1);
> + /*
> + * For ACQUIRE semantics we should ensure no following stores are
> + * reordered to appear before the mm->mm_lock_seq modification.
> + */
> + smp_wmb();
Strictly speaking this isn't ACQUIRE, nor do we care about ACQUIRE here.
This really is about subsequent stores, loads are irrelevant.
> + } else {
> + /*
> + * We need RELEASE semantics here to ensure that preceding stores
> + * into the VMA take effect before we unlock it with this store.
> + * Pairs with ACQUIRE semantics in vma_start_read().
> + */
Again, not strictly true. We don't care about loads. Using RELEASE here
is fine and probably cheaper on a few platforms, but we don't strictly
need/care about RELEASE.
> + smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
> + }
> +}
Also, it might be saner to stick closer to the seqcount naming of
things and use two different functions for these two different things.
/* straight up copy of do_raw_write_seqcount_begin() */
static inline void mm_write_seqlock_begin(struct mm_struct *mm)
{
kcsan_nestable_atomic_begin();
mm->mm_lock_seq++;
smp_wmb();
}
/* straigjt up copy of do_raw_write_seqcount_end() */
static inline void mm_write_seqcount_end(struct mm_struct *mm)
{
smp_wmb();
mm->mm_lock_seq++;
kcsan_nestable_atomic_end();
}
Or better yet, just use seqcount...
> +
> +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, int *seq)
> +{
> + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */
> + *seq = smp_load_acquire(&mm->mm_lock_seq);
> + /* Allow speculation if mmap_lock is not write-locked */
> + return (*seq & 1) == 0;
> +}
> +
> +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq)
> +{
> + /* Pairs with ACQUIRE semantics in inc_mm_lock_seq(). */
> + smp_rmb();
> + return seq == READ_ONCE(mm->mm_lock_seq);
> }
Because there's nothing better than well known functions with a randomly
different name and interface I suppose...
Anyway, all the actual code proposed is not wrong. I'm just a bit
annoyed its a random NIH of seqcount.
next prev parent reply other threads:[~2024-10-23 20:10 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-10 20:56 [PATCH v3 tip/perf/core 0/4] uprobes,mm: speculative lockless VMA-to-uprobe lookup Andrii Nakryiko
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 1/4] mm: introduce mmap_lock_speculation_{start|end} Andrii Nakryiko
2024-10-13 7:56 ` Shakeel Butt
2024-10-14 20:27 ` Andrii Nakryiko
2024-10-14 20:48 ` Suren Baghdasaryan
2024-10-23 20:10 ` Peter Zijlstra [this message]
2024-10-23 22:17 ` Suren Baghdasaryan
2024-10-24 9:56 ` Peter Zijlstra
2024-10-24 16:28 ` Suren Baghdasaryan
2024-10-24 21:04 ` Suren Baghdasaryan
2024-10-24 23:20 ` Andrii Nakryiko
2024-10-24 23:33 ` Suren Baghdasaryan
2024-10-25 5:12 ` Andrii Nakryiko
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 2/4] mm: switch to 64-bit mm_lock_seq/vm_lock_seq on 64-bit architectures Andrii Nakryiko
2024-10-13 7:56 ` Shakeel Butt
2024-10-17 2:01 ` Suren Baghdasaryan
2024-10-17 18:55 ` Andrii Nakryiko
2024-10-17 19:42 ` Suren Baghdasaryan
2024-10-17 20:12 ` Andrii Nakryiko
2024-10-23 19:02 ` Peter Zijlstra
2024-10-23 19:12 ` Andrii Nakryiko
2024-10-23 19:31 ` Peter Zijlstra
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 3/4] uprobes: simplify find_active_uprobe_rcu() VMA checks Andrii Nakryiko
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 4/4] uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution Andrii Nakryiko
2024-10-11 5:01 ` Oleg Nesterov
2024-10-23 19:22 ` Peter Zijlstra
2024-10-23 20:02 ` Andrii Nakryiko
2024-10-23 20:19 ` Peter Zijlstra
2024-10-23 17:54 ` [PATCH v3 tip/perf/core 0/4] uprobes,mm: speculative lockless VMA-to-uprobe lookup Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241023201031.GF11151@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jannh@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@kernel.org \
--cc=mjguzik@gmail.com \
--cc=oleg@redhat.com \
--cc=paulmck@kernel.org \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.