All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Yang <richard.weiyang@gmail.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>,
	akpm@linux-foundation.org, peterz@infradead.org,
	willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com,
	mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org,
	oliver.sang@intel.com, mgorman@techsingularity.net,
	david@redhat.com, peterx@redhat.com, oleg@redhat.com,
	dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com,
	pasha.tatashin@soleen.com, klarasmodin@gmail.com,
	richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count
Date: Mon, 13 Jan 2025 01:47:29 +0000	[thread overview]
Message-ID: <20250113014729.ms5sdfnhynlamgrk@master> (raw)
In-Reply-To: <CAJuCfpGu4UVXiBaivTVOGNBVVz3rhZ+VY27gT3_R0cTij5fTGw@mail.gmail.com>

On Sat, Jan 11, 2025 at 12:14:47PM -0800, Suren Baghdasaryan wrote:
>On Sat, Jan 11, 2025 at 3:24 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
>>
>> On Fri, Jan 10, 2025 at 08:25:58PM -0800, Suren Baghdasaryan wrote:
>>
>> So there were quite a few iterations of the patch and I have not been
>> reading majority of the feedback, so it may be I missed something,
>> apologies upfront. :)
>>

Hi, I am new to memory barriers. Hope not bothering.

>> >  /*
>> >   * Try to read-lock a vma. The function is allowed to occasionally yield false
>> >   * locked result to avoid performance overhead, in which case we fall back to
>> > @@ -710,6 +742,8 @@ static inline void vma_lock_init(struct vm_area_struct *vma)
>> >   */
>> >  static inline bool vma_start_read(struct vm_area_struct *vma)
>> >  {
>> > +     int oldcnt;
>> > +
>> >       /*
>> >        * Check before locking. A race might cause false locked result.
>> >        * We can use READ_ONCE() for the mm_lock_seq here, and don't need
>> > @@ -720,13 +754,19 @@ static inline bool vma_start_read(struct vm_area_struct *vma)
>> >       if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq.sequence))
>> >               return false;
>> >
>> > -     if (unlikely(down_read_trylock(&vma->vm_lock.lock) == 0))
>> > +     /*
>> > +      * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited() will fail
>> > +      * because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET.
>> > +      */
>> > +     if (unlikely(!__refcount_inc_not_zero_limited(&vma->vm_refcnt, &oldcnt,
>> > +                                                   VMA_REF_LIMIT)))
>> >               return false;
>> >
>>
>> Replacing down_read_trylock() with the new routine loses an acquire
>> fence. That alone is not a problem, but see below.
>
>Hmm. I think this acquire fence is actually necessary. We don't want
>the later vm_lock_seq check to be reordered and happen before we take
>the refcount. Otherwise this might happen:
>
>reader             writer
>if (vm_lock_seq == mm_lock_seq) // check got reordered
>        return false;
>                       vm_refcnt += VMA_LOCK_OFFSET
>                       vm_lock_seq == mm_lock_seq
>                       vm_refcnt -= VMA_LOCK_OFFSET
>if (!__refcount_inc_not_zero_limited())
>        return false;
>
>Both reader's checks will pass and the reader would read-lock a vma
>that was write-locked.
>

Here what we plan to do is define __refcount_inc_not_zero_limited() with
acquire fence, e.g. with atomic_try_cmpxchg_acquire(), right?

>>
>> > +     rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
>> >       /*
>> > -      * Overflow might produce false locked result.
>> > +      * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
>> >        * False unlocked result is impossible because we modify and check
>> > -      * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq
>> > +      * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq
>> >        * modification invalidates all existing locks.
>> >        *
>> >        * We must use ACQUIRE semantics for the mm_lock_seq so that if we are
>> > @@ -735,9 +775,10 @@ static inline bool vma_start_read(struct vm_area_struct *vma)
>> >        * This pairs with RELEASE semantics in vma_end_write_all().
>> >        */
>> >       if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&vma->vm_mm->mm_lock_seq))) {

One question here is would compiler optimize the read of vm_lock_seq here,
since we have read it at the beginning?

Or with the acquire fence added above, compiler won't optimize it.
Or we should use REACE_ONCE(vma->vm_lock_seq) here?

>>
>> The previous modification of this spot to raw_read_seqcount loses the
>> acquire fence, making the above comment not line up with the code.
>
>Is it? From reading the seqcount code
>(https://elixir.bootlin.com/linux/v6.13-rc3/source/include/linux/seqlock.h#L211):
>
>raw_read_seqcount()
>    seqprop_sequence()
>        __seqprop(s, sequence)
>            __seqprop_sequence()
>                smp_load_acquire()
>
>smp_load_acquire() still provides the acquire fence. Am I missing something?
>
>>
>> I don't know if the stock code (with down_read_trylock()) is correct as
>> is -- looks fine for cursory reading fwiw. However, if it indeed works,
>> the acquire fence stemming from the lock routine is a mandatory part of
>> it afaics.
>>
>> I think the best way forward is to add a new refcount routine which
>> ships with an acquire fence.
>
>I plan on replacing refcount_t usage here with an atomic since, as
>Hillf noted, refcount is not designed to be used for locking. And will
>make sure the down_read_trylock() replacement will provide an acquire
>fence.
>

Hmm.. refcount_t is defined with atomic_t. I am lost why replacing refcount_t
with atomic_t would help.

>>
>> Otherwise I would suggest:
>> 1. a comment above __refcount_inc_not_zero_limited saying there is an
>>    acq fence issued later
>> 2. smp_rmb() slapped between that and seq accesses
>>
>> If the now removed fence is somehow not needed, I think a comment
>> explaining it is necessary.
>>
>> > @@ -813,36 +856,33 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma)
>> >
>> >  static inline void vma_assert_locked(struct vm_area_struct *vma)
>> >  {
>> > -     if (!rwsem_is_locked(&vma->vm_lock.lock))
>> > +     if (refcount_read(&vma->vm_refcnt) <= 1)
>> >               vma_assert_write_locked(vma);
>> >  }
>> >
>>
>> This now forces the compiler to emit a load from vm_refcnt even if
>> vma_assert_write_locked expands to nothing. iow this wants to hide
>> behind the same stuff as vma_assert_write_locked.
>
>True. I guess I'll have to avoid using vma_assert_write_locked() like this:
>
>static inline void vma_assert_locked(struct vm_area_struct *vma)
>{
>        unsigned int mm_lock_seq;
>
>        VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
>                                          !__is_vma_write_locked(vma,
>&mm_lock_seq), vma);
>}
>
>Will make the change.
>
>Thanks for the feedback!

-- 
Wei Yang
Help you, Help me

  parent reply	other threads:[~2025-01-13  1:47 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-11  4:25 [PATCH v9 00/17] reimplement per-vma lock as a refcount Suren Baghdasaryan
2025-01-11  4:25 ` [PATCH v9 01/17] mm: introduce vma_start_read_locked{_nested} helpers Suren Baghdasaryan
2025-01-11  4:25 ` [PATCH v9 02/17] mm: move per-vma lock into vm_area_struct Suren Baghdasaryan
2025-01-11  4:25 ` [PATCH v9 03/17] mm: mark vma as detached until it's added into vma tree Suren Baghdasaryan
2025-01-11  4:25 ` [PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas Suren Baghdasaryan
2025-01-13 11:58   ` Lorenzo Stoakes
2025-01-13 16:31     ` Suren Baghdasaryan
2025-01-13 16:44       ` Lorenzo Stoakes
2025-01-13 16:47       ` Lorenzo Stoakes
2025-01-13 19:09         ` Suren Baghdasaryan
2025-01-14 11:38           ` Lorenzo Stoakes
2025-01-11  4:25 ` [PATCH v9 05/17] mm: mark vmas detached upon exit Suren Baghdasaryan
2025-01-13 12:05   ` Lorenzo Stoakes
2025-01-13 17:02     ` Suren Baghdasaryan
2025-01-13 17:13       ` Lorenzo Stoakes
2025-01-13 19:11         ` Suren Baghdasaryan
2025-01-13 20:32           ` Vlastimil Babka
2025-01-13 20:42             ` Suren Baghdasaryan
2025-01-14 11:36               ` Lorenzo Stoakes
2025-01-11  4:25 ` [PATCH v9 06/17] types: move struct rcuwait into types.h Suren Baghdasaryan
2025-01-13 14:46   ` Lorenzo Stoakes
2025-01-11  4:25 ` [PATCH v9 07/17] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail Suren Baghdasaryan
2025-01-13 15:25   ` Lorenzo Stoakes
2025-01-13 17:53     ` Suren Baghdasaryan
2025-01-14 11:48       ` Lorenzo Stoakes
2025-01-11  4:25 ` [PATCH v9 08/17] mm: move mmap_init_lock() out of the header file Suren Baghdasaryan
2025-01-13 15:27   ` Lorenzo Stoakes
2025-01-13 17:53     ` Suren Baghdasaryan
2025-01-11  4:25 ` [PATCH v9 09/17] mm: uninline the main body of vma_start_write() Suren Baghdasaryan
2025-01-13 15:52   ` Lorenzo Stoakes
2025-01-11  4:25 ` [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited Suren Baghdasaryan
2025-01-11  6:31   ` Hillf Danton
2025-01-11  9:59     ` Suren Baghdasaryan
2025-01-11 10:00       ` Suren Baghdasaryan
2025-01-11 12:13       ` Hillf Danton
2025-01-11 17:11         ` Suren Baghdasaryan
2025-01-11 23:44           ` Hillf Danton
2025-01-12  0:31             ` Suren Baghdasaryan
2025-01-15  9:39           ` Peter Zijlstra
2025-01-16 10:52             ` Hillf Danton
2025-01-11 12:39   ` David Laight
2025-01-11 17:07     ` Matthew Wilcox
2025-01-11 18:30     ` Paul E. McKenney
2025-01-11 22:19       ` David Laight
2025-01-11 22:50         ` [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited - clang 17.0.1 bug David Laight
2025-01-12 11:37           ` David Laight
2025-01-12 17:56             ` Paul E. McKenney
2025-01-11  4:25 ` [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count Suren Baghdasaryan
2025-01-11 11:24   ` Mateusz Guzik
2025-01-11 20:14     ` Suren Baghdasaryan
2025-01-11 20:16       ` Suren Baghdasaryan
2025-01-11 20:31       ` Mateusz Guzik
2025-01-11 20:58         ` Suren Baghdasaryan
2025-01-11 20:38       ` Vlastimil Babka
2025-01-13  1:47       ` Wei Yang [this message]
2025-01-13  2:25         ` Wei Yang
2025-01-13 21:14           ` Suren Baghdasaryan
2025-01-13 21:08         ` Suren Baghdasaryan
2025-01-15 10:48       ` Peter Zijlstra
2025-01-15 11:13         ` Peter Zijlstra
2025-01-15 15:00           ` Suren Baghdasaryan
2025-01-15 15:35             ` Peter Zijlstra
2025-01-15 15:38               ` Peter Zijlstra
2025-01-15 16:22                 ` Suren Baghdasaryan
2025-01-15 16:00           ` [PATCH] refcount: Strengthen inc_not_zero() Peter Zijlstra
2025-01-16 15:12             ` Suren Baghdasaryan
2025-01-17 15:41             ` Will Deacon
2025-01-27 14:09               ` Will Deacon
2025-01-27 19:21                 ` Suren Baghdasaryan
2025-01-28 23:51                   ` Suren Baghdasaryan
2025-02-06  2:52                     ` [PATCH 1/1] refcount: provide ops for cases when object's memory can be reused Suren Baghdasaryan
2025-02-06 10:41                       ` Vlastimil Babka
2025-02-06  3:03                     ` [PATCH] refcount: Strengthen inc_not_zero() Suren Baghdasaryan
2025-02-13 23:04                       ` Suren Baghdasaryan
2025-01-17 16:13             ` Matthew Wilcox
2025-01-12  2:59   ` [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count Wei Yang
2025-01-12 17:35     ` Suren Baghdasaryan
2025-01-13  0:59       ` Wei Yang
2025-01-13  2:37   ` Wei Yang
2025-01-13 21:16     ` Suren Baghdasaryan
2025-01-13  9:36   ` Wei Yang
2025-01-13 21:18     ` Suren Baghdasaryan
2025-01-15  2:58   ` Wei Yang
2025-01-15  3:12     ` Suren Baghdasaryan
2025-01-15 12:05       ` Wei Yang
2025-01-15 15:01         ` Suren Baghdasaryan
2025-01-16  1:37           ` Wei Yang
2025-01-16  1:41             ` Suren Baghdasaryan
2025-01-16  9:10               ` Wei Yang
2025-01-11  4:25 ` [PATCH v9 12/17] mm: move lesser used vma_area_struct members into the last cacheline Suren Baghdasaryan
2025-01-13 16:15   ` Lorenzo Stoakes
2025-01-15 10:50   ` Peter Zijlstra
2025-01-15 16:39     ` Suren Baghdasaryan
2025-02-13 22:59       ` Suren Baghdasaryan
2025-01-11  4:26 ` [PATCH v9 13/17] mm/debug: print vm_refcnt state when dumping the vma Suren Baghdasaryan
2025-01-13 16:21   ` Lorenzo Stoakes
2025-01-13 16:35     ` Liam R. Howlett
2025-01-13 17:57       ` Suren Baghdasaryan
2025-01-14 11:41         ` Lorenzo Stoakes
2025-01-11  4:26 ` [PATCH v9 14/17] mm: remove extra vma_numab_state_init() call Suren Baghdasaryan
2025-01-13 16:28   ` Lorenzo Stoakes
2025-01-13 17:56     ` Suren Baghdasaryan
2025-01-14 11:45       ` Lorenzo Stoakes
2025-01-11  4:26 ` [PATCH v9 15/17] mm: prepare lock_vma_under_rcu() for vma reuse possibility Suren Baghdasaryan
2025-01-11  4:26 ` [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU Suren Baghdasaryan
2025-01-15  2:27   ` Wei Yang
2025-01-15  3:15     ` Suren Baghdasaryan
2025-01-15  3:58       ` Liam R. Howlett
2025-01-15  5:41         ` Suren Baghdasaryan
2025-01-15  3:59       ` Mateusz Guzik
2025-01-15  5:47         ` Suren Baghdasaryan
2025-01-15  5:51           ` Mateusz Guzik
2025-01-15  6:41             ` Suren Baghdasaryan
2025-01-15  7:58       ` Vlastimil Babka
2025-01-15 15:10         ` Suren Baghdasaryan
2025-02-13 22:56           ` Suren Baghdasaryan
2025-01-15 12:17       ` Wei Yang
2025-01-15 21:46         ` Suren Baghdasaryan
2025-01-11  4:26 ` [PATCH v9 17/17] docs/mm: document latest changes to vm_lock Suren Baghdasaryan
2025-01-13 16:33   ` Lorenzo Stoakes
2025-01-13 17:56     ` Suren Baghdasaryan
2025-01-11  4:52 ` [PATCH v9 00/17] reimplement per-vma lock as a refcount Matthew Wilcox
2025-01-11  9:45   ` Suren Baghdasaryan
2025-01-13 12:14 ` Lorenzo Stoakes
2025-01-13 16:58   ` Suren Baghdasaryan
2025-01-13 17:11     ` Lorenzo Stoakes
2025-01-13 19:00       ` Suren Baghdasaryan
2025-01-14 11:35         ` Lorenzo Stoakes
2025-01-14  1:49   ` Andrew Morton
2025-01-14  2:53     ` Suren Baghdasaryan
2025-01-14  4:09       ` Andrew Morton
2025-01-14  9:09         ` Vlastimil Babka
2025-01-14 10:27           ` Hillf Danton
2025-01-14  9:47         ` Lorenzo Stoakes
2025-01-14 14:59         ` Liam R. Howlett
2025-01-14 15:54           ` Suren Baghdasaryan
2025-01-15 11:34             ` Lorenzo Stoakes
2025-01-15 15:14               ` Suren Baghdasaryan
2025-01-28  5:26 ` Shivank Garg
2025-01-28  5:50   ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250113014729.ms5sdfnhynlamgrk@master \
    --to=richard.weiyang@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave@stgolabs.net \
    --cc=david.laight.linux@gmail.com \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=kernel-team@android.com \
    --cc=klarasmodin@gmail.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@google.com \
    --cc=mjguzik@gmail.com \
    --cc=oleg@redhat.com \
    --cc=oliver.sang@intel.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulmck@kernel.org \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=shakeel.butt@linux.dev \
    --cc=souravpanda@google.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.