From: Suren Baghdasaryan <surenb@google.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
akpm@linux-foundation.org, david@redhat.com, peterx@redhat.com,
jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org,
paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com,
brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com,
linux@weissschuh.net, willy@infradead.org, osalvador@suse.de,
andrii@kernel.org, ryan.roberts@arm.com,
christophe.leroy@csgroup.eu, tjmercier@google.com,
kaleshsingh@google.com, aha310510@gmail.com,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v6 7/8] fs/proc/task_mmu: read proc/pid/maps under per-vma lock
Date: Thu, 10 Jul 2025 00:03:06 -0700 [thread overview]
Message-ID: <CAJuCfpG_dRLVDv1DWveJWS5cQS0ADEVAeBxJ=5MaPQFNEvQ1+g@mail.gmail.com> (raw)
In-Reply-To: <CAJuCfpFKNm6CEcfkuy+0o-Qu8xXppCFbOcYVXUFLeg10ztMFPw@mail.gmail.com>
On Wed, Jul 9, 2025 at 10:47 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Jul 9, 2025 at 4:12 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> > * Suren Baghdasaryan <surenb@google.com> [250709 11:06]:
> > > On Wed, Jul 9, 2025 at 3:03 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > >
> > > > On 7/9/25 16:43, Suren Baghdasaryan wrote:
> > > > > On Wed, Jul 9, 2025 at 1:57 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > > >>
> > > > >> On 7/8/25 01:10, Suren Baghdasaryan wrote:
> > > > >> >>> + rcu_read_unlock();
> > > > >> >>> + vma = lock_vma_under_mmap_lock(mm, iter, address);
> > > > >> >>> + rcu_read_lock();
> > > > >> >> OK I guess we hold the RCU lock the whole time as we traverse except when
> > > > >> >> we lock under mmap lock.
> > > > >> > Correct.
> > > > >>
> > > > >> I wonder if it's really necessary? Can't it be done just inside
> > > > >> lock_next_vma()? It would also avoid the unlock/lock dance quoted above.
> > > > >>
> > > > >> Even if we later manage to extend this approach to smaps and employ rcu
> > > > >> locking to traverse the page tables, I'd think it's best to separate and
> > > > >> fine-grain the rcu lock usage for vma iterator and page tables, if only to
> > > > >> avoid too long time under the lock.
> > > > >
> > > > > I thought we would need to be in the same rcu read section while
> > > > > traversing the maple tree using vma_next() but now looking at it,
> > > > > maybe we can indeed enter only while finding and locking the next
> > > > > vma...
> > > > > Liam, would that work? I see struct ma_state containing a node field.
> > > > > Can it be freed from under us if we find a vma, exit rcu read section
> > > > > then re-enter rcu and use the same iterator to find the next vma?
> > > >
> > > > If the rcu protection needs to be contigous, and patch 8 avoids the issue by
> > > > always doing vma_iter_init() after rcu_read_lock() (but does it really avoid
> > > > the issue or is it why we see the syzbot reports?) then I guess in the code
> > > > quoted above we also need a vma_iter_init() after the rcu_read_lock(),
> > > > because although the iterator was used briefly under mmap_lock protection,
> > > > that was then unlocked and there can be a race before the rcu_read_lock().
> > >
> > > Quite true. So, let's wait for Liam's confirmation and based on his
> > > answer I'll change the patch by either reducing the rcu read section
> > > or adding the missing vma_iter_init() after we switch to mmap_lock.
> >
> > You need to either be under rcu or mmap lock to ensure the node in the
> > maple state hasn't been freed (and potentially, reallocated).
> >
> > So in this case, in the higher level, we can hold the rcu read lock for
> > a series of walks and avoid re-walking the tree then the performance
> > would be better.
>
> Got it. Thanks for confirming!
>
> >
> > When we return to userspace, then we should drop the rcu read lock and
> > will need to vma_iter_set()/vma_iter_invalidate() on return. I thought
> > this was being done (through vma_iter_init()), but syzbot seems to
> > indicate a path that was missed?
>
> We do that in m_start()/m_stop() by calling
> lock_vma_range()/unlock_vma_range() but I think I have two problems
> here:
> 1. As Vlastimil mentioned I do not reset the iterator when falling
> back to mmap_lock and exiting and then re-entering rcu read section;
> 2. I do not reset the iterator after exiting rcu read section in
> m_stop() and re-entering it in m_start(), so the later call to
> lock_next_vma() might be using an iterator with a node that was freed
> (and possibly reallocated).
>
> >
> > This is the same thing that needed to be done previously with the mmap
> > lock, but now under the rcu lock.
> >
> > I'm not sure how to mitigate the issue with the page table, maybe we
> > guess on the number of vmas that we were doing for 4k blocks of output
> > and just drop/reacquire then. Probably a problem for another day
> > anyways.
> >
> > Also, I think you can also change the vma_iter_init() to vma_iter_set(),
> > which is slightly less code under the hood. Vlastimil asked about this
> > and it's probably a better choice.
>
> Ack.
> I'll update my series with these fixes and all comments I received so
> far, will run the reproducers to confirm no issues and repost them
> later today.
I have the patchset ready but would like to test it some more. Will
post it tomorrow.
> Thanks,
> Suren.
>
> >
> > Thanks,
> > Liam
> >
next prev parent reply other threads:[~2025-07-10 7:03 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-04 6:07 [PATCH v6 0/8] use per-vma locks for /proc/pid/maps reads and PROCMAP_QUERY Suren Baghdasaryan
2025-07-04 6:07 ` [PATCH v6 1/8] selftests/proc: add /proc/pid/maps tearing from vma split test Suren Baghdasaryan
2025-07-04 6:07 ` [PATCH v6 2/8] selftests/proc: extend /proc/pid/maps tearing test to include vma resizing Suren Baghdasaryan
2025-07-04 6:07 ` [PATCH v6 3/8] selftests/proc: extend /proc/pid/maps tearing test to include vma remapping Suren Baghdasaryan
2025-07-04 6:07 ` [PATCH v6 4/8] selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified Suren Baghdasaryan
2025-07-04 6:07 ` [PATCH v6 5/8] selftests/proc: add verbose more for tests to facilitate debugging Suren Baghdasaryan
2025-07-04 6:07 ` [PATCH v6 6/8] fs/proc/task_mmu: remove conversion of seq_file position to unsigned Suren Baghdasaryan
2025-07-07 15:01 ` Lorenzo Stoakes
2025-07-08 17:37 ` Vlastimil Babka
2025-07-10 5:49 ` Suren Baghdasaryan
2025-07-04 6:07 ` [PATCH v6 7/8] fs/proc/task_mmu: read proc/pid/maps under per-vma lock Suren Baghdasaryan
2025-07-07 16:51 ` Lorenzo Stoakes
2025-07-07 23:10 ` Suren Baghdasaryan
2025-07-09 8:57 ` Vlastimil Babka
2025-07-09 14:43 ` Suren Baghdasaryan
2025-07-09 15:03 ` Vlastimil Babka
2025-07-09 15:06 ` Suren Baghdasaryan
2025-07-09 16:11 ` Liam R. Howlett
2025-07-09 17:47 ` Suren Baghdasaryan
2025-07-10 7:03 ` Suren Baghdasaryan [this message]
2025-07-10 17:02 ` Suren Baghdasaryan
2025-07-10 17:42 ` Vlastimil Babka
2025-07-15 8:16 ` Vlastimil Babka
2025-07-15 9:40 ` Lorenzo Stoakes
2025-07-15 9:52 ` David Hildenbrand
2025-07-15 10:16 ` Lorenzo Stoakes
2025-07-15 10:23 ` Vlastimil Babka
2025-07-15 10:31 ` Lorenzo Stoakes
2025-07-15 10:51 ` Lorenzo Stoakes
2025-07-15 17:05 ` Andrii Nakryiko
2025-07-15 17:10 ` Lorenzo Stoakes
2025-07-15 17:20 ` Lorenzo Stoakes
2025-07-15 17:29 ` Andrii Nakryiko
2025-07-15 20:18 ` Suren Baghdasaryan
2025-07-16 1:50 ` Suren Baghdasaryan
2025-07-15 20:13 ` Suren Baghdasaryan
2025-07-16 14:00 ` Lorenzo Stoakes
2025-07-16 14:07 ` Vlastimil Babka
2025-07-16 14:27 ` Suren Baghdasaryan
2025-07-07 18:20 ` Liam R. Howlett
2025-07-07 23:12 ` Suren Baghdasaryan
2025-07-09 10:03 ` Vlastimil Babka
2025-07-09 14:43 ` Suren Baghdasaryan
2025-07-04 6:07 ` [PATCH v6 8/8] fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under per-vma locks Suren Baghdasaryan
2025-07-07 16:54 ` Lorenzo Stoakes
2025-07-07 18:26 ` Liam R. Howlett
2025-07-15 8:10 ` [PATCH v6 0/8] use per-vma locks for /proc/pid/maps reads and PROCMAP_QUERY Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpG_dRLVDv1DWveJWS5cQS0ADEVAeBxJ=5MaPQFNEvQ1+g@mail.gmail.com' \
--to=surenb@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=adobriyan@gmail.com \
--cc=aha310510@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=brauner@kernel.org \
--cc=christophe.leroy@csgroup.eu \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=jannh@google.com \
--cc=josef@toxicpanda.com \
--cc=kaleshsingh@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@weissschuh.net \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=osalvador@suse.de \
--cc=paulmck@kernel.org \
--cc=peterx@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=tjmercier@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=yebin10@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).