From: Dev Jain <dev.jain@arm.com>
To: Barry Song <21cnbao@gmail.com>,
"Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: akpm@linux-foundation.org, axelrasmussen@google.com,
yuanchu@google.com, david@kernel.org, hughd@google.com,
chrisl@kernel.org, kasong@tencent.com, weixugc@google.com,
Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, riel@surriel.com,
harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de,
baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com,
nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com,
ziy@nvidia.com, kas@kernel.org, willy@infradead.org,
yuzhao@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, ryan.roberts@arm.com,
anshuman.khandual@arm.com
Subject: Re: [PATCH 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs
Date: Wed, 11 Mar 2026 10:22:24 +0530 [thread overview]
Message-ID: <2b9146bb-31e9-4bbd-843e-221f13a7097c@arm.com> (raw)
In-Reply-To: <CAGsJ_4wmrV0+wxiXAgEWRxBpv1QT9Sm__Pi+yREGvT9YH2n4uQ@mail.gmail.com>
On 11/03/26 9:44 am, Barry Song wrote:
> On Wed, Mar 11, 2026 at 7:32 AM Barry Song <21cnbao@gmail.com> wrote:
>>
>> On Tue, Mar 10, 2026 at 4:34 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>>>
>>> On Tue, Mar 10, 2026 at 01:00:09PM +0530, Dev Jain wrote:
>>>> The ptes are all the same w.r.t belonging to the same type of VMA, and
>>>> being marked with uffd-wp or all being not marked. Therefore we can batch
>>>> set uffd-wp markers through install_uffd_wp_ptes_if_needed, and enable
>>>> batched unmapping of folios belonging to uffd-wp VMAs by dropping that
>>>> condition from folio_unmap_pte_batch.
>>>>
>>>> It may happen that we don't batch over the entire folio in one go, in which
>>>> case, we must skip over the current batch. Add a helper to do that -
>>>> page_vma_mapped_walk_jump() will increment the relevant fields of pvmw
>>>> by nr pages.
>>>>
>>>> I think that we can get away with just incrementing pvmw->pte
>>>> and pvmw->address, since looking at the code in page_vma_mapped.c,
>>>> pvmw->pfn and pvmw->nr_pages are used in conjunction, and pvmw->pgoff
>>>> and pvmw->nr_pages (in vma_address_end()) are used in conjunction,
>>>> cancelling out the increment and decrement in the respective fields. But
>>>> let us not rely on the pvmw implementation and keep this simple.
>>>
>>> This isn't simple...
>>>
>>>>
>>>> Export this function to rmap.h to enable future reuse.
>>>>
>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>> ---
>>>> include/linux/rmap.h | 10 ++++++++++
>>>> mm/rmap.c | 8 +++-----
>>>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>> index 8dc0871e5f001..1b7720c66ac87 100644
>>>> --- a/include/linux/rmap.h
>>>> +++ b/include/linux/rmap.h
>>>> @@ -892,6 +892,16 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw)
>>>> spin_unlock(pvmw->ptl);
>>>> }
>>>>
>>>> +static inline void page_vma_mapped_walk_jump(struct page_vma_mapped_walk *pvmw,
>>>> + unsigned int nr)
>>>
>>> unsigned long nr_pages... 'nr' is meaningless and you're mixing + matching types
>>> for no reason.
>>>
>>>> +{
>>>> + pvmw->pfn += nr;
>>>> + pvmw->nr_pages -= nr;
>>>> + pvmw->pgoff += nr;
>>>> + pvmw->pte += nr;
>>>> + pvmw->address += nr * PAGE_SIZE;
>>>> +}
>>>
>>> I absolutely hate this. It's extremely confusing, especially since you're now
>>> going from looking at 1 page to nr_pages - 1, jump doesn't really mean anything
>>> here, you're losing sight of the batch size and exposing a silly detail to the
>>> caller, and I really don't want to 'export' this at this time.
>>
>> I’m fairly sure I raised the same concern when Dev first suggested this,
>> but somehow it seems my comment was completely overlooked. :-)
I do recall, perhaps I was lazy to look at the pvmw code :) But I should
have just looked at this earlier, it's fairly simple. See below.
>>
>>>
>>> If we must have this, can you please make it static in rmap.c at least for the
>>> time being.
>>>
>>> Or perhaps instead, have a batched variant of page_vma_mapped_walk(), like
>>> page_vma_mapped_walk_batch()?
>>
>> Right now, for non-anon pages we face the same issues, but
>> page_vma_mapped_walk() can skip those PTEs once it finds that
>> nr - 1 PTEs are none.
>>
>> next_pte:
>> do {
>> pvmw->address += PAGE_SIZE;
>> if (pvmw->address >= end)
>> return not_found(pvmw);
>> /* Did we cross page table boundary? */
>> if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
>> if (pvmw->ptl) {
>> spin_unlock(pvmw->ptl);
>> pvmw->ptl = NULL;
>> }
>> pte_unmap(pvmw->pte);
>> pvmw->pte = NULL;
>> pvmw->flags |= PVMW_PGTABLE_CROSSED;
>> goto restart;
>> }
>> pvmw->pte++;
>> } while (pte_none(ptep_get(pvmw->pte)));
>>
>> The difference now is that swap entries cannot be skipped.
>>
>> If we're trying to find `page_vma_mapped_walk_batch()`, I suppose
>> it could be like this?
>>
>> bool page_vma_mapped_walk_batch(struct page_vma_mapped_walk *pvmw,
>> unsigned long nr)
>> {
>> ...
>> }
>>
>> static inline bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>> {
>> return page_vma_mapped_walk_batch(pvmw, 1);
>> }
>
> Another approach might be to introduce a flag so that
> page_vma_mapped_walk() knows we are doing batched unmaps
> and can skip nr - 1 swap entries.
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 8dc0871e5f00..bf03ae006366 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -856,6 +856,9 @@ struct page *make_device_exclusive(struct
> mm_struct *mm, unsigned long addr,
> /* Look for migration entries rather than present PTEs */
> #define PVMW_MIGRATION (1 << 1)
>
> +/* Batched unmap: skip swap entries. */
> +#define PVMW_BATCH_UNMAP (1 << 2)
Exactly, I just also came up with the same solution and saw your reply :)
We can just name this PVMW_BATCH_PRESENT, the comment saying
"Look for present entries", and fix the comment above PVMW_MIGRATION to
drop the "rather than present PTEs" because that is wrong, we are currently
also looking for softleaf entries by default.
> +
> /* Result flags */
>
> /* The page is mapped across page table boundary */
>
>
> Thanks
> Barry
next prev parent reply other threads:[~2026-03-11 4:52 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 7:30 [PATCH 0/9] mm/rmap: Optimize anonymous large folio unmapping Dev Jain
2026-03-10 7:30 ` [PATCH 1/9] mm/rmap: make nr_pages signed in try_to_unmap_one Dev Jain
2026-03-10 7:56 ` Lorenzo Stoakes (Oracle)
2026-03-10 8:06 ` David Hildenbrand (Arm)
2026-03-10 8:23 ` Dev Jain
2026-03-10 12:40 ` Matthew Wilcox
2026-03-11 4:54 ` Dev Jain
2026-03-10 7:30 ` [PATCH 2/9] mm/rmap: initialize nr_pages to 1 at loop start " Dev Jain
2026-03-10 8:10 ` Lorenzo Stoakes (Oracle)
2026-03-10 8:31 ` Dev Jain
2026-03-10 8:39 ` Lorenzo Stoakes (Oracle)
2026-03-10 8:43 ` Dev Jain
2026-03-10 7:30 ` [PATCH 3/9] mm/rmap: refactor lazyfree unmap commit path to commit_ttu_lazyfree_folio() Dev Jain
2026-03-10 8:19 ` Lorenzo Stoakes (Oracle)
2026-03-10 8:42 ` Dev Jain
2026-03-19 15:53 ` Lorenzo Stoakes (Oracle)
2026-03-10 7:30 ` [PATCH 4/9] mm/memory: Batch set uffd-wp markers during zapping Dev Jain
2026-03-10 7:30 ` [PATCH 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs Dev Jain
2026-03-10 8:34 ` Lorenzo Stoakes (Oracle)
2026-03-10 23:32 ` Barry Song
2026-03-11 4:14 ` Barry Song
2026-03-11 4:52 ` Dev Jain [this message]
2026-03-11 4:56 ` Dev Jain
2026-03-10 7:30 ` [PATCH 6/9] mm/swapfile: Make folio_dup_swap batchable Dev Jain
2026-03-10 8:27 ` Kairui Song
2026-03-10 8:46 ` Dev Jain
2026-03-10 8:49 ` Lorenzo Stoakes (Oracle)
2026-03-11 5:42 ` Dev Jain
2026-03-19 15:26 ` Lorenzo Stoakes (Oracle)
2026-03-19 16:47 ` Matthew Wilcox
2026-03-18 0:20 ` kernel test robot
2026-03-10 7:30 ` [PATCH 7/9] mm/swapfile: Make folio_put_swap batchable Dev Jain
2026-03-10 8:29 ` Kairui Song
2026-03-10 8:50 ` Dev Jain
2026-03-10 8:55 ` Lorenzo Stoakes (Oracle)
2026-03-18 1:04 ` kernel test robot
2026-03-10 7:30 ` [PATCH 8/9] mm/rmap: introduce folio_try_share_anon_rmap_ptes Dev Jain
2026-03-10 9:38 ` Lorenzo Stoakes (Oracle)
2026-03-11 8:09 ` Dev Jain
2026-03-12 8:19 ` Wei Yang
2026-03-19 15:47 ` Lorenzo Stoakes (Oracle)
2026-04-08 7:14 ` Dev Jain
2026-03-10 7:30 ` [PATCH 9/9] mm/rmap: enable batch unmapping of anonymous folios Dev Jain
2026-03-10 8:02 ` [PATCH 0/9] mm/rmap: Optimize anonymous large folio unmapping Lorenzo Stoakes (Oracle)
2026-03-10 9:28 ` Dev Jain
2026-03-10 12:59 ` Lance Yang
2026-03-11 8:11 ` Dev Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2b9146bb-31e9-4bbd-843e-221f13a7097c@arm.com \
--to=dev.jain@arm.com \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=axelrasmussen@google.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=kas@kernel.org \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=nphamcs@gmail.com \
--cc=pfalcato@suse.de \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=youngjun.park@lge.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.