All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, willy@infradead.org,
	liam.howlett@oracle.com, lorenzo.stoakes@oracle.com,
	mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org,
	mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org,
	brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com,
	hughd@google.com, lokeshgidra@google.com, minchan@google.com,
	jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com,
	pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count
Date: Wed, 18 Dec 2024 10:41:04 +0100	[thread overview]
Message-ID: <20241218094104.GC2354@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <CAJuCfpHzsQeejdPPbDdA6B3Wa=-KusnYRUyt1U0WnCRr8OKfGw@mail.gmail.com>

On Tue, Dec 17, 2024 at 08:27:46AM -0800, Suren Baghdasaryan wrote:

> > So I just replied there, and no, I don't think it makes sense. Just put
> > the kmem_cache_free() in vma_refcount_put(), to be done on 0.
> 
> That's very appealing indeed and makes things much simpler. The
> problem I see with that is the case when we detach a vma from the tree
> to isolate it, then do some cleanup and only then free it. That's done
> in vms_gather_munmap_vmas() here:
> https://elixir.bootlin.com/linux/v6.12.5/source/mm/vma.c#L1240 and we
> even might reattach detached vmas back:
> https://elixir.bootlin.com/linux/v6.12.5/source/mm/vma.c#L1312. IOW,
> detached state is not final and we can't destroy the object that
> reached this state. 

Urgh, so that's the munmap() path, but arguably when that fails, the
map stays in place.

I think this means you're marking detached too soon; you should only
mark detached once you reach the point of no return.

That said, once you've reached the point of no return; and are about to
go remove the page-tables, you very much want to ensure a lack of
concurrency.

So perhaps waiting for out-standing readers at this point isn't crazy.

Also, I'm having a very hard time reading this maple tree stuff :/
Afaict vms_gather_munmap_vmas() only adds the VMAs to be removed to a
second tree, it does not in fact unlink them from the mm yet.

AFAICT it's vma_iter_clear_gfp() that actually wipes the vmas from the
mm -- and that being able to fail is mind boggling and I suppose is what
gives rise to much of this insanity :/

Anyway, I would expect remove_vma() to be the one that marks it detached
(it's already unreachable through vma_lookup() at this point) and there
you should wait for concurrent readers to bugger off.

> We could change states to: 0=unused (we can free
> the object), 1=detached, 2=attached, etc. but then vma_start_read()
> should do something like refcount_inc_more_than_one() instead of
> refcount_inc_not_zero(). Would you be ok with such an approach?

Urgh, I would strongly suggest ditching refcount_t if we go this route.
The thing is; refcount_t should remain a 'simple' straight forward
interface and not allow people to do the wrong thing. Its not meant to
be the kitchen sink -- we have atomic_t for that.

Anyway, the more common scheme at that point is using -1 for 'free', I
think folio->_mapcount uses that even. For that see:
atomic_add_negative*().

> > Additionally, having vma_end_write() would allow you to put a lockdep
> > annotation in vma_{start,end}_write() -- which was I think the original
> > reason I proposed it a while back, that and having improved clarity when
> > reading the code, since explicitly marking the end of a section is
> > helpful.
> 
> The vma->vmlock_dep_map is tracking vma->vm_refcnt, not the
> vma->vm_lock_seq (similar to how today vma->vm_lock has its lockdep
> tracking that rw_semaphore). If I implement vma_end_write() then it
> will simply be something like:
> 
> void vma_end_write(vma)
> {
>          vma_assert_write_locked(vma);
>          vma->vm_lock_seq = UINT_MAX;
> }
> 
> so, vmlock_dep_map would not be involved.

That's just weird; why would you not track vma_{start,end}_write() with
the exclusive side of the 'rwsem' dep_map ?

> If you want to track vma->vm_lock_seq with a separate lockdep, that
> would be more complicated. Specifically for vma_end_write_all() that
> would require us to call rwsem_release() on all locked vmas, however
> we currently do not track individual locked vmas. vma_end_write_all()
> allows us not to worry about tracking them, knowing that once we do
> mmap_write_unlock() they all will get unlocked with one increment of
> mm->mm_lock_seq. If your suggestion is to replace vma_end_write_all()
> with vma_end_write() and unlock vmas individually across the mm code,
> that would be a sizable effort. If that is indeed your ultimate goal,
> I can do that as a separate project: introduce vma_end_write(),
> gradually add them in required places (not yet sure how complex that
> would be), then retire vma_end_write_all() and add a lockdep for
> vma->vm_lock_seq.

Yeah, so ultimately I think it would be clearer if you explicitly mark
the point where the vma modification is 'done'. But I don't suppose we
have to do that here.

  reply	other threads:[~2024-12-18  9:41 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-16 19:24 [PATCH v6 00/16] move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 01/16] mm: introduce vma_start_read_locked{_nested} helpers Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 02/16] mm: move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 03/16] mm: mark vma as detached until it's added into vma tree Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 04/16] mm/nommu: fix the last places where vma is not locked before being attached Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 05/16] types: move struct rcuwait into types.h Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 06/16] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail Suren Baghdasaryan
2024-12-17 11:31   ` Lokesh Gidra
2024-12-17 15:51     ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 07/16] mm: move mmap_init_lock() out of the header file Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 08/16] mm: uninline the main body of vma_start_write() Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 09/16] refcount: introduce __refcount_{add|inc}_not_zero_limited Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count Suren Baghdasaryan
2024-12-16 20:42   ` Peter Zijlstra
2024-12-16 20:53     ` Suren Baghdasaryan
2024-12-16 21:15   ` Peter Zijlstra
2024-12-16 21:53     ` Suren Baghdasaryan
2024-12-16 22:00       ` Peter Zijlstra
2024-12-16 21:37   ` Peter Zijlstra
2024-12-16 21:44     ` Suren Baghdasaryan
2024-12-17 10:30       ` Peter Zijlstra
2024-12-17 16:27         ` Suren Baghdasaryan
2024-12-18  9:41           ` Peter Zijlstra [this message]
2024-12-18 10:06             ` Peter Zijlstra
2024-12-18 15:37               ` Liam R. Howlett
2024-12-18 15:50                 ` Suren Baghdasaryan
2024-12-18 16:18                   ` Peter Zijlstra
2024-12-18 17:36                     ` Suren Baghdasaryan
2024-12-18 17:44                       ` Peter Zijlstra
2024-12-18 17:58                         ` Suren Baghdasaryan
2024-12-18 19:00                           ` Liam R. Howlett
2024-12-18 19:07                             ` Suren Baghdasaryan
2024-12-18 19:29                               ` Suren Baghdasaryan
2024-12-18 19:38                                 ` Liam R. Howlett
2024-12-18 20:00                                   ` Suren Baghdasaryan
2024-12-18 20:38                                     ` Liam R. Howlett
2024-12-18 21:53                                       ` Suren Baghdasaryan
2024-12-18 21:55                                         ` Suren Baghdasaryan
2024-12-19  0:35                                         ` Andrew Morton
2024-12-19  0:47                                           ` Suren Baghdasaryan
2024-12-19  9:13                                         ` Peter Zijlstra
2024-12-19 11:20                                           ` Peter Zijlstra
2024-12-19 16:17                                             ` Suren Baghdasaryan
2024-12-19 17:16                                             ` Liam R. Howlett
2024-12-19 17:42                                               ` Peter Zijlstra
2024-12-19 18:18                                                 ` Liam R. Howlett
2024-12-19 18:46                                                   ` Peter Zijlstra
2024-12-19 18:55                                                     ` Liam R. Howlett
2024-12-20 15:22                                                       ` Suren Baghdasaryan
2024-12-23  3:03                                                       ` Suren Baghdasaryan
2024-12-26 17:12                                                         ` Suren Baghdasaryan
2024-12-19 16:14                                           ` Suren Baghdasaryan
2024-12-19 17:23                                             ` Peter Zijlstra
2024-12-19  8:55                                 ` Peter Zijlstra
2024-12-19 16:08                                   ` Suren Baghdasaryan
2024-12-19  8:53                           ` Peter Zijlstra
2024-12-19 16:08                             ` Suren Baghdasaryan
2024-12-18 15:57                 ` Suren Baghdasaryan
2024-12-18 16:13                 ` Peter Zijlstra
2024-12-18 15:42             ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 11/16] mm: enforce vma to be in detached state before freeing Suren Baghdasaryan
2024-12-16 21:16   ` Peter Zijlstra
2024-12-16 21:18     ` Peter Zijlstra
2024-12-16 21:57       ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 12/16] mm: remove extra vma_numab_state_init() call Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 13/16] mm: introduce vma_ensure_detached() Suren Baghdasaryan
2024-12-17 10:26   ` Peter Zijlstra
2024-12-17 15:58     ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 14/16] mm: prepare lock_vma_under_rcu() for vma reuse possibility Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 15/16] mm: make vma cache SLAB_TYPESAFE_BY_RCU Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 16/16] docs/mm: document latest changes to vm_lock Suren Baghdasaryan
2024-12-16 19:39 ` [PATCH v6 00/16] move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-12-17 18:42   ` Andrew Morton
2024-12-17 18:49     ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241218094104.GC2354@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=kernel-team@android.com \
    --cc=klarasmodin@gmail.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@google.com \
    --cc=mjguzik@gmail.com \
    --cc=oleg@redhat.com \
    --cc=oliver.sang@intel.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulmck@kernel.org \
    --cc=peterx@redhat.com \
    --cc=shakeel.butt@linux.dev \
    --cc=souravpanda@google.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.