Re: [PATCH v3] ksm: use range-walk function to jump over holes in scan_get_next_rmap_item

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: craftfever <craftfever@airmail.cc>,
	Pedro Demarchi Gomes <pedrodemargomes@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Xu Xin <xu.xin16@zte.com.cn>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] ksm: use range-walk function to jump over holes in scan_get_next_rmap_item
Date: Mon, 20 Oct 2025 11:38:49 +0200	[thread overview]
Message-ID: <21e3ae2b-c441-4403-aac7-d9b96e61c6cd@redhat.com> (raw)
In-Reply-To: <6e9ca60a-648c-45e9-9580-27f45a4f2a4d@airmail.cc>

On 18.10.25 09:30, craftfever wrote:
> 
> 
> David Hildenbrand wrote:
>> On 16.10.25 03:22, Pedro Demarchi Gomes wrote:
>>> Currently, scan_get_next_rmap_item() walks every page address in a VMA
>>> to locate mergeable pages. This becomes highly inefficient when scanning
>>> large virtual memory areas that contain mostly unmapped regions.
>>>
>>> This patch replaces the per-address lookup with a range walk using
>>> walk_page_range(). The range walker allows KSM to skip over entire
>>> unmapped holes in a VMA, avoiding unnecessary lookups.
>>> This problem was previously discussed in [1].
>>>
>>> [1] https://lore.kernel.org/linux-
>>> mm/423de7a3-1c62-4e72-8e79-19a6413e420c@redhat.com/
>>>
>>> ---
>>
>> This patch does to much in a single patch which makes it
>> rather hard to review.
>>
>> As a first step, we should focus on leaving most of
>> scan_get_next_rmap_item() alone and only focus on replacing
>> folio_walk by walk_page_range_vma().
>>
>> Follow-up cleanups could try cleaning up scan_get_next_rmap_item()
>> -- and boy oh boy, does that function scream for quite some cleanups.
>>
>> This is something minimal based on your v3. I applied plenty of more
>> cleanups and I wish we could further shrink the pmd_entry function,
>> but I have to give up for today (well, it's already tomorrow :) ).
>>
>>
>> Briefly tested with ksm selftests and my machine did not burn down my
>> building.
>>
>>
>>   From d971b88056fe3fefe50e5d4fa5b359e8c8331b2c Mon Sep 17 00:00:00 2001
>> From: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
>> Date: Wed, 15 Oct 2025 22:22:36 -0300
>> Subject: [PATCH] ksm: use range-walk function to jump over holes in
>>    scan_get_next_rmap_item
>>
>> Currently, scan_get_next_rmap_item() walks every page address in a VMA
>> to locate mergeable pages. This becomes highly inefficient when scanning
>> large virtual memory areas that contain mostly unmapped regions.
>>
>> This patch replaces the per-address lookup with a range walk using
>> walk_page_range_vma(). The range walker allows KSM to skip over entire
>> unmapped holes in a VMA, avoiding unnecessary lookups.
>> This problem was previously discussed in [1].
>>
>> [1] https://lore.kernel.org/linux-
>> mm/423de7a3-1c62-4e72-8e79-19a6413e420c@redhat.com/
>>
>> Reported-by: craftfever <craftfever@airmail.cc>
>> Closes: https://lkml.kernel.org/
>> r/020cf8de6e773bb78ba7614ef250129f11a63781@murena.io
>> Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
>> Co-developed-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>    mm/ksm.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++-------
>>    1 file changed, 103 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index 3aed0478fdcef..8bd2b78c4f869 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -2455,6 +2455,94 @@ static bool should_skip_rmap_item(struct folio
>> *folio,
>>        return true;
>>    }
>>
>> +struct ksm_next_page_arg {
>> +    struct folio *folio;
>> +    struct page *page;
>> +    unsigned long addr;
>> +};
>> +
>> +static int ksm_next_page_pmd_entry(pmd_t *pmdp, unsigned long addr,
>> unsigned long end,
>> +        struct mm_walk *walk)
>> +{
>> +    struct ksm_next_page_arg *private = walk->private;
>> +    struct vm_area_struct *vma = walk->vma;
>> +    pte_t *start_ptep = NULL, *ptep, pte;
>> +    struct mm_struct *mm = walk->mm;
>> +    struct folio *folio;
>> +    struct page *page;
>> +    spinlock_t *ptl;
>> +    pmd_t pmd;
>> +
>> +    if (ksm_test_exit(mm))
>> +        return 0;
>> +    cond_resched();
>> +
>> +    pmd = pmdp_get_lockless(pmdp);
>> +    if (!pmd_present(pmd))
>> +        return 0;
>> +
>> +    if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && pmd_leaf(pmd)) {
>> +        ptl = pmd_lock(mm, pmdp);
>> +        pmd = pmdp_get(pmdp);
>> +
>> +        if (!pmd_present(pmd)) {
>> +            goto not_found_unlock;
>> +        } else if (pmd_leaf(pmd)) {
>> +            page = vm_normal_page_pmd(vma, addr, pmd);
>> +            if (!page)
>> +                goto not_found_unlock;
>> +            folio = page_folio(page);
>> +
>> +            if (folio_is_zone_device(folio) || !folio_test_anon(folio))
>> +                goto not_found_unlock;
>> +
>> +            page += ((addr & (PMD_SIZE - 1)) >> PAGE_SHIFT);
>> +            goto found_unlock;
>> +        }
>> +        spin_unlock(ptl);
>> +    }
>> +
>> +    start_ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
>> +    if (!start_ptep)
>> +        return 0;
>> +
>> +    for (ptep = start_ptep; addr < end; ptep++, addr += PAGE_SIZE) {
>> +        pte = ptep_get(ptep);
>> +
>> +        if (!pte_present(pte))
>> +            continue;
>> +
>> +        page = vm_normal_page(vma, addr, pte);
>> +        if (!page)
>> +            continue;
>> +        folio = page_folio(page);
>> +
>> +        if (folio_is_zone_device(folio) || !folio_test_anon(folio))
>> +            continue;
>> +        goto found_unlock;
>> +    }
>> +
>> +not_found_unlock:
>> +    spin_unlock(ptl);
>> +    if (start_ptep)
>> +        pte_unmap(start_ptep);
>> +    return 0;
>> +found_unlock:
>> +    folio_get(folio);
>> +    spin_unlock(ptl);
>> +    if (start_ptep)
>> +        pte_unmap(start_ptep);
>> +    private->page = page;
>> +    private->folio = folio;
>> +    private->addr = addr;
>> +    return 1;
>> +}
>> +
>> +static struct mm_walk_ops ksm_next_page_ops = {
>> +    .pmd_entry = ksm_next_page_pmd_entry,
>> +    .walk_lock = PGWALK_RDLOCK,
>> +};
>> +
>>    static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>    {
>>        struct mm_struct *mm;
>> @@ -2542,21 +2630,23 @@ static struct ksm_rmap_item
>> *scan_get_next_rmap_item(struct page **page)
>>                ksm_scan.address = vma->vm_end;
>>
>>            while (ksm_scan.address < vma->vm_end) {
>> +            struct ksm_next_page_arg ksm_next_page_arg;
>>                struct page *tmp_page = NULL;
>> -            struct folio_walk fw;
>>                struct folio *folio;
>> -
>> -            if (ksm_test_exit(mm))
>> -                break;
>> -
>> -            folio = folio_walk_start(&fw, vma, ksm_scan.address, 0);
>> -            if (folio) {
>> -                if (!folio_is_zone_device(folio) &&
>> -                     folio_test_anon(folio)) {
>> -                    folio_get(folio);
>> -                    tmp_page = fw.page;
>> -                }
>> -                folio_walk_end(&fw, vma);
>> +            int found;
>> +
>> +            found = walk_page_range_vma(vma, ksm_scan.address,
>> +                            vma->vm_end,
>> +                            &ksm_next_page_ops,
>> +                            &ksm_next_page_arg);
>> +
>> +            if (found > 0) {
>> +                folio = ksm_next_page_arg.folio;
>> +                tmp_page = ksm_next_page_arg.page;
>> +                ksm_scan.address = ksm_next_page_arg.addr;
>> +            } else {
>> +                VM_WARN_ON_ONCE(found < 0);
>> +                ksm_scan.address = vma->vm_end - PAGE_SIZE;
>>                }
>>
>>                if (tmp_page) {
> 
> 
> %)
> Guys, I'm so sorry, I"m little confused, can you lease tell further by
> e-mail, when patch or couple of patches will be done, so it could
> properly tested, 'cause I'm little lost in this progress, is it ready or
> not, thank you)

In general, we consider code ready once it was reviewed and acked by a 
maintainer.

Andrew usually throws patches ahead of time into mm/mm-unstable while 
review is still going one. That does not mean yet that the patches will 
go upstream, it's merely do give them some initial exposure to build 
bots etc.

You can feel free to test the revised patch submitted by me inline, or 
wait for a v4.

-- 
Cheers

David / dhildenb

next prev parent reply	other threads:[~2025-10-20  9:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16  1:22 [PATCH v3] ksm: use range-walk function to jump over holes in scan_get_next_rmap_item Pedro Demarchi Gomes
2025-10-16 21:07 ` Andrew Morton
2025-10-20  9:42   ` David Hildenbrand
2025-10-17 15:44 ` David Hildenbrand
2025-10-17 22:23 ` David Hildenbrand
     [not found]   ` <6e9ca60a-648c-45e9-9580-27f45a4f2a4d@airmail.cc>
2025-10-20  9:38     ` David Hildenbrand [this message]
2025-10-21  3:00   ` Pedro Demarchi Gomes
2025-10-21 15:25     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21e3ae2b-c441-4403-aac7-d9b96e61c6cd@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=craftfever@airmail.cc \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pedrodemargomes@gmail.com \
    --cc=xu.xin16@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).