From: Lance Yang <lance.yang@linux.dev>
To: David Hildenbrand <david@redhat.com>, Barry Song <21cnbao@gmail.com>
Cc: akpm@linux-foundation.org, baolin.wang@linux.alibaba.com,
chrisl@kernel.org, kasong@tencent.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com,
ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org,
ying.huang@intel.com, zhengtangquan@oppo.com,
Lance Yang <ioworker0@gmail.com>
Subject: Re: [PATCH v4 3/4] mm: Support batched unmap for lazyfree large folios during reclamation
Date: Wed, 25 Jun 2025 20:58:12 +0800 [thread overview]
Message-ID: <f1ee8679-88b4-4712-b2ed-c3eec179b430@linux.dev> (raw)
In-Reply-To: <5db6fb4c-079d-4237-80b3-637565457f39@redhat.com>
On 2025/6/25 20:09, David Hildenbrand wrote:
> On 25.06.25 13:42, Barry Song wrote:
>> On Wed, Jun 25, 2025 at 11:27 PM David Hildenbrand <david@redhat.com>
>> wrote:
>>>
>>> On 25.06.25 13:15, Barry Song wrote:
>>>> On Wed, Jun 25, 2025 at 11:01 PM David Hildenbrand
>>>> <david@redhat.com> wrote:
>>>>>
>>>>> On 25.06.25 12:57, Barry Song wrote:
>>>>>>>>
>>>>>>>> Note that I don't quite understand why we have to batch the
>>>>>>>> whole thing
>>>>>>>> or fallback to
>>>>>>>> individual pages. Why can't we perform other batches that span
>>>>>>>> only some
>>>>>>>> PTEs? What's special
>>>>>>>> about 1 PTE vs. 2 PTEs vs. all PTEs?
>>>>>>>
>>>>>>> That's a good point about the "all-or-nothing" batching logic ;)
>>>>>>>
>>>>>>> It seems the "all-or-nothing" approach is specific to the
>>>>>>> lazyfree use
>>>>>>> case, which needs to unmap the entire folio for reclamation. If
>>>>>>> that's
>>>>>>> not possible, it falls back to the single-page slow path.
>>>>>>
>>>>>> Other cases advance the PTE themselves, while try_to_unmap_one()
>>>>>> relies
>>>>>> on page_vma_mapped_walk() to advance the PTE. Unless we want to
>>>>>> manually
>>>>>> modify pvmw.pte and pvmw.address outside of
>>>>>> page_vma_mapped_walk(), which
>>>>>> to me seems like a violation of layers. :-)
>>>>>
>>>>> Please explain to me why the following is not clearer and better:
>>>>
>>>> This part is much clearer, but that doesn’t necessarily improve the
>>>> overall
>>>> picture. The main challenge is how to exit the iteration of
>>>> while (page_vma_mapped_walk(&pvmw)).
>>>
>>> Okay, I get what you mean now.
>>>
>>>>
>>>> Right now, we have it laid out quite straightforwardly:
>>>> /* We have already batched the entire folio */
>>>> if (nr_pages > 1)
>>>> goto walk_done;
>>>
>>>
>>> Given that the comment is completely confusing whens seeing the
>>> check ... :)
>>>
>>> /*
>>> * If we are sure that we batched the entire folio and cleared all
>>> PTEs,
>>> * we can just optimize and stop right here.
>>> */
>>> if (nr_pages == folio_nr_pages(folio))
>>> goto walk_done;
>>>
>>> would make the comment match.
>>
>> Yes, that clarifies it.
>>
>>>
>>>>
>>>> with any nr between 1 and folio_nr_pages(), we have to consider two
>>>> issues:
>>>> 1. How to skip PTE checks inside page_vma_mapped_walk for entries that
>>>> were already handled in the previous batch;
>>>
>>> They are cleared if we reach that point. So the pte_none() checks will
>>> simply skip them?
>>>
>>>> 2. How to break the iteration when this batch has arrived at the end.
>>>
>>> page_vma_mapped_walk() should be doing that?
>>
>> It seems you might have missed the part in my reply that says:
>> "Of course, we could avoid both, but that would mean performing
>> unnecessary
>> checks inside page_vma_mapped_walk()."
> > > That’s true for both. But I’m wondering why we’re still doing the
> check,
>> even when we’re fairly sure they’ve already been cleared or we’ve reached
>> the end :-)
>
> :)
>
>>
>> Somehow, I feel we could combine your cleanup code—which handles a batch
>> size of "nr" between 1 and nr_pages—with the
>> "if (nr_pages == folio_nr_pages(folio)) goto walk_done" check.
>
> Yeah, that's what I was suggesting. It would have to be part of the
> cleanup I think.
>
> I'm still wondering if there is a case where
>
> if (nr_pages == folio_nr_pages(folio))
> goto walk_done;
>
> would be wrong when dealing with small folios.
We can make the check more explicit to avoid any future trouble ;)
if (nr_pages > 1 && nr_pages == folio_nr_pages(folio))
goto walk_done;
It should be safe for small folios.
Thanks,
Lance
>
>> In practice, this would let us skip almost all unnecessary checks,
>> except for a few rare corner cases.
>>
>> For those corner cases where "nr" truly falls between 1 and nr_pages,
>> we can just leave them as-is—performing the redundant check inside
>> page_vma_mapped_walk().
>
> I mean, batching mapcount+refcount updates etc. is always a win. If we
> end up doing some unnecessary pte_none() checks, that might be
> suboptimal but mostly noise in contrast to the other stuff we will
> optimize out :)
>
> Agreed that if we can easily avoid these pte_none() checks, we should do
> that. Optimizing that for "nr_pages == folio_nr_pages(folio)" makes sense.
>
WARNING: multiple messages have this Message-ID (diff)
From: Lance Yang <lance.yang@linux.dev>
To: David Hildenbrand <david@redhat.com>, Barry Song <21cnbao@gmail.com>
Cc: akpm@linux-foundation.org, baolin.wang@linux.alibaba.com,
chrisl@kernel.org, kasong@tencent.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com,
ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org,
ying.huang@intel.com, zhengtangquan@oppo.com,
Lance Yang <ioworker0@gmail.com>
Subject: Re: [PATCH v4 3/4] mm: Support batched unmap for lazyfree large folios during reclamation
Date: Wed, 25 Jun 2025 20:58:12 +0800 [thread overview]
Message-ID: <f1ee8679-88b4-4712-b2ed-c3eec179b430@linux.dev> (raw)
In-Reply-To: <5db6fb4c-079d-4237-80b3-637565457f39@redhat.com>
On 2025/6/25 20:09, David Hildenbrand wrote:
> On 25.06.25 13:42, Barry Song wrote:
>> On Wed, Jun 25, 2025 at 11:27 PM David Hildenbrand <david@redhat.com>
>> wrote:
>>>
>>> On 25.06.25 13:15, Barry Song wrote:
>>>> On Wed, Jun 25, 2025 at 11:01 PM David Hildenbrand
>>>> <david@redhat.com> wrote:
>>>>>
>>>>> On 25.06.25 12:57, Barry Song wrote:
>>>>>>>>
>>>>>>>> Note that I don't quite understand why we have to batch the
>>>>>>>> whole thing
>>>>>>>> or fallback to
>>>>>>>> individual pages. Why can't we perform other batches that span
>>>>>>>> only some
>>>>>>>> PTEs? What's special
>>>>>>>> about 1 PTE vs. 2 PTEs vs. all PTEs?
>>>>>>>
>>>>>>> That's a good point about the "all-or-nothing" batching logic ;)
>>>>>>>
>>>>>>> It seems the "all-or-nothing" approach is specific to the
>>>>>>> lazyfree use
>>>>>>> case, which needs to unmap the entire folio for reclamation. If
>>>>>>> that's
>>>>>>> not possible, it falls back to the single-page slow path.
>>>>>>
>>>>>> Other cases advance the PTE themselves, while try_to_unmap_one()
>>>>>> relies
>>>>>> on page_vma_mapped_walk() to advance the PTE. Unless we want to
>>>>>> manually
>>>>>> modify pvmw.pte and pvmw.address outside of
>>>>>> page_vma_mapped_walk(), which
>>>>>> to me seems like a violation of layers. :-)
>>>>>
>>>>> Please explain to me why the following is not clearer and better:
>>>>
>>>> This part is much clearer, but that doesn’t necessarily improve the
>>>> overall
>>>> picture. The main challenge is how to exit the iteration of
>>>> while (page_vma_mapped_walk(&pvmw)).
>>>
>>> Okay, I get what you mean now.
>>>
>>>>
>>>> Right now, we have it laid out quite straightforwardly:
>>>> /* We have already batched the entire folio */
>>>> if (nr_pages > 1)
>>>> goto walk_done;
>>>
>>>
>>> Given that the comment is completely confusing whens seeing the
>>> check ... :)
>>>
>>> /*
>>> * If we are sure that we batched the entire folio and cleared all
>>> PTEs,
>>> * we can just optimize and stop right here.
>>> */
>>> if (nr_pages == folio_nr_pages(folio))
>>> goto walk_done;
>>>
>>> would make the comment match.
>>
>> Yes, that clarifies it.
>>
>>>
>>>>
>>>> with any nr between 1 and folio_nr_pages(), we have to consider two
>>>> issues:
>>>> 1. How to skip PTE checks inside page_vma_mapped_walk for entries that
>>>> were already handled in the previous batch;
>>>
>>> They are cleared if we reach that point. So the pte_none() checks will
>>> simply skip them?
>>>
>>>> 2. How to break the iteration when this batch has arrived at the end.
>>>
>>> page_vma_mapped_walk() should be doing that?
>>
>> It seems you might have missed the part in my reply that says:
>> "Of course, we could avoid both, but that would mean performing
>> unnecessary
>> checks inside page_vma_mapped_walk()."
> > > That’s true for both. But I’m wondering why we’re still doing the
> check,
>> even when we’re fairly sure they’ve already been cleared or we’ve reached
>> the end :-)
>
> :)
>
>>
>> Somehow, I feel we could combine your cleanup code—which handles a batch
>> size of "nr" between 1 and nr_pages—with the
>> "if (nr_pages == folio_nr_pages(folio)) goto walk_done" check.
>
> Yeah, that's what I was suggesting. It would have to be part of the
> cleanup I think.
>
> I'm still wondering if there is a case where
>
> if (nr_pages == folio_nr_pages(folio))
> goto walk_done;
>
> would be wrong when dealing with small folios.
We can make the check more explicit to avoid any future trouble ;)
if (nr_pages > 1 && nr_pages == folio_nr_pages(folio))
goto walk_done;
It should be safe for small folios.
Thanks,
Lance
>
>> In practice, this would let us skip almost all unnecessary checks,
>> except for a few rare corner cases.
>>
>> For those corner cases where "nr" truly falls between 1 and nr_pages,
>> we can just leave them as-is—performing the redundant check inside
>> page_vma_mapped_walk().
>
> I mean, batching mapcount+refcount updates etc. is always a win. If we
> end up doing some unnecessary pte_none() checks, that might be
> suboptimal but mostly noise in contrast to the other stuff we will
> optimize out :)
>
> Agreed that if we can easily avoid these pte_none() checks, we should do
> that. Optimizing that for "nr_pages == folio_nr_pages(folio)" makes sense.
>
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2025-06-25 18:43 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-14 9:30 [PATCH v4 0/4] mm: batched unmap lazyfree large folios during reclamation Barry Song
2025-02-14 9:30 ` Barry Song
2025-02-14 9:30 ` [PATCH v4 1/4] mm: Set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
2025-02-14 9:30 ` Barry Song
2025-02-14 9:30 ` [PATCH v4 2/4] mm: Support tlbbatch flush for a range of PTEs Barry Song
2025-02-14 9:30 ` Barry Song
2025-02-14 9:30 ` [PATCH v4 3/4] mm: Support batched unmap for lazyfree large folios during reclamation Barry Song
2025-02-14 9:30 ` Barry Song
2025-06-24 12:55 ` David Hildenbrand
2025-06-24 12:55 ` David Hildenbrand
2025-06-24 15:26 ` Lance Yang
2025-06-24 15:26 ` Lance Yang
2025-06-24 15:34 ` David Hildenbrand
2025-06-24 15:34 ` David Hildenbrand
2025-06-24 16:25 ` Lance Yang
2025-06-24 16:25 ` Lance Yang
2025-06-25 9:38 ` Barry Song
2025-06-25 9:38 ` Barry Song
2025-06-25 10:00 ` David Hildenbrand
2025-06-25 10:00 ` David Hildenbrand
2025-06-25 10:38 ` Barry Song
2025-06-25 10:38 ` Barry Song
2025-06-25 10:43 ` David Hildenbrand
2025-06-25 10:43 ` David Hildenbrand
2025-06-25 10:49 ` Barry Song
2025-06-25 10:49 ` Barry Song
2025-06-25 10:59 ` David Hildenbrand
2025-06-25 10:59 ` David Hildenbrand
2025-06-25 10:47 ` Lance Yang
2025-06-25 10:47 ` Lance Yang
2025-06-25 10:49 ` David Hildenbrand
2025-06-25 10:49 ` David Hildenbrand
2025-06-25 10:57 ` Barry Song
2025-06-25 10:57 ` Barry Song
2025-06-25 11:01 ` David Hildenbrand
2025-06-25 11:01 ` David Hildenbrand
2025-06-25 11:15 ` Barry Song
2025-06-25 11:15 ` Barry Song
2025-06-25 11:27 ` David Hildenbrand
2025-06-25 11:27 ` David Hildenbrand
2025-06-25 11:42 ` Barry Song
2025-06-25 11:42 ` Barry Song
2025-06-25 12:09 ` David Hildenbrand
2025-06-25 12:09 ` David Hildenbrand
2025-06-25 12:20 ` Lance Yang
2025-06-25 12:20 ` Lance Yang
2025-06-25 12:25 ` David Hildenbrand
2025-06-25 12:25 ` David Hildenbrand
2025-06-25 12:35 ` Lance Yang
2025-06-25 12:35 ` Lance Yang
2025-06-25 21:03 ` Barry Song
2025-06-25 21:03 ` Barry Song
2025-06-26 1:17 ` Lance Yang
2025-06-26 1:17 ` Lance Yang
2025-06-26 8:17 ` David Hildenbrand
2025-06-26 8:17 ` David Hildenbrand
2025-06-26 9:29 ` Lance Yang
2025-06-26 9:29 ` Lance Yang
2025-06-26 12:44 ` Lance Yang
2025-06-26 12:44 ` Lance Yang
2025-06-26 13:16 ` David Hildenbrand
2025-06-26 13:16 ` David Hildenbrand
2025-06-26 13:52 ` Lance Yang
2025-06-26 13:52 ` Lance Yang
2025-06-26 14:39 ` David Hildenbrand
2025-06-26 14:39 ` David Hildenbrand
2025-06-26 15:06 ` Lance Yang
2025-06-26 15:06 ` Lance Yang
2025-06-26 21:46 ` Barry Song
2025-06-26 21:46 ` Barry Song
2025-06-26 21:52 ` David Hildenbrand
2025-06-26 21:52 ` David Hildenbrand
2025-06-25 12:58 ` Lance Yang [this message]
2025-06-25 12:58 ` Lance Yang
2025-06-25 13:02 ` David Hildenbrand
2025-06-25 13:02 ` David Hildenbrand
2025-06-25 8:44 ` Lance Yang
2025-06-25 8:44 ` Lance Yang
2025-06-25 9:29 ` Lance Yang
2025-06-25 9:29 ` Lance Yang
2025-07-01 10:03 ` Harry Yoo
2025-07-01 10:03 ` Harry Yoo
2025-07-01 13:27 ` Harry Yoo
2025-07-01 13:27 ` Harry Yoo
2025-07-01 16:17 ` David Hildenbrand
2025-07-01 16:17 ` David Hildenbrand
2025-02-14 9:30 ` [PATCH v4 4/4] mm: Avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap Barry Song
2025-02-14 9:30 ` Barry Song
2025-06-25 13:49 ` [PATCH v4 0/4] mm: batched unmap lazyfree large folios during reclamation Lorenzo Stoakes
2025-06-25 13:49 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f1ee8679-88b4-4712-b2ed-c3eec179b430@linux.dev \
--to=lance.yang@linux.dev \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=ioworker0@gmail.com \
--cc=kasong@tencent.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=ryan.roberts@arm.com \
--cc=v-songbaohua@oppo.com \
--cc=x86@kernel.org \
--cc=ying.huang@intel.com \
--cc=zhengtangquan@oppo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.