From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 553172417E0; Wed, 4 Feb 2026 14:50:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770216633; cv=none; b=kM9WipkRBUSHaXUG/s/C8xq6C17S5Y1hx0RzOLKzv8QK5ZVZ8rnst7n6jDtfZPvJ0PMr9prkyfg3iK3UcZMewuLRPtsno7tUC71xd4QoX77TvHXdhVK5hp57hrljB4+QSbXt7YXfLgturKtaM5UXtmoK7cAke3usZxebmeldUsY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770216633; c=relaxed/simple; bh=Z8gwROcuNwMK26ERewrsrqalstEH4wAoxra+OwinaEQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LtefON7MVwnRXIFNl4xxGnEHxO1hfWYaXJmDZJNVXOn3iG8BhX/VD1T6Q3jccJN7x6D1P3D1bHf6IPvVOMuZ/IwnDfuESWgm6yK3HMTdS6NQ28oGpjziM6IBwRBC+By4BBlUZ0OVGikSHjyd/s2XPt8wFTi3lXajBIw5TX076xM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=SBjwA27e; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="SBjwA27e" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6FDEDC116C6; Wed, 4 Feb 2026 14:50:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1770216632; bh=Z8gwROcuNwMK26ERewrsrqalstEH4wAoxra+OwinaEQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SBjwA27e7Nw7nTOBlhD71avNi85amK0w/FdeK46UFHF9/lj1l7Hugeaw8RwYUye47 Fv36zR0hucEv4/eY+VQ79N7Mo0/ekwlbG8I1Xe1gqGww8ILF7/pW82NAvRx9vVkfVL 2fB27dqBHNDrg2xX32Rfvl56XCYtL+Ah8dgnZajE= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Pedro Demarchi Gomes , David Hildenbrand , craftfever , Chengming Zhou , xu xin , Andrew Morton Subject: [PATCH 5.10 147/161] ksm: use range-walk function to jump over holes in scan_get_next_rmap_item Date: Wed, 4 Feb 2026 15:40:10 +0100 Message-ID: <20260204143857.037871981@linuxfoundation.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260204143851.755002596@linuxfoundation.org> References: <20260204143851.755002596@linuxfoundation.org> User-Agent: quilt/0.69 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 5.10-stable review patch. If anyone has any objections, please let me know. ------------------ From: Pedro Demarchi Gomes [ Upstream commit f5548c318d6520d4fa3c5ed6003eeb710763cbc5 ] Currently, scan_get_next_rmap_item() walks every page address in a VMA to locate mergeable pages. This becomes highly inefficient when scanning large virtual memory areas that contain mostly unmapped regions, causing ksmd to use large amount of cpu without deduplicating much pages. This patch replaces the per-address lookup with a range walk using walk_page_range(). The range walker allows KSM to skip over entire unmapped holes in a VMA, avoiding unnecessary lookups. This problem was previously discussed in [1]. Consider the following test program which creates a 32 TiB mapping in the virtual address space but only populates a single page: /* 32 TiB */ const size_t size = 32ul * 1024 * 1024 * 1024 * 1024; int main() { char *area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0); if (area == MAP_FAILED) { perror("mmap() failed\n"); return -1; } /* Populate a single page such that we get an anon_vma. */ *area = 0; /* Enable KSM. */ madvise(area, size, MADV_MERGEABLE); pause(); return 0; } $ ./ksm-sparse & $ echo 1 > /sys/kernel/mm/ksm/run Without this patch ksmd uses 100% of the cpu for a long time (more then 1 hour in my test machine) scanning all the 32 TiB virtual address space that contain only one mapped page. This makes ksmd essentially deadlocked not able to deduplicate anything of value. With this patch ksmd walks only the one mapped page and skips the rest of the 32 TiB virtual address space, making the scan fast using little cpu. Link: https://lkml.kernel.org/r/20251023035841.41406-1-pedrodemargomes@gmail.com Link: https://lkml.kernel.org/r/20251022153059.22763-1-pedrodemargomes@gmail.com Link: https://lore.kernel.org/linux-mm/423de7a3-1c62-4e72-8e79-19a6413e420c@redhat.com/ [1] Fixes: 31dbd01f3143 ("ksm: Kernel SamePage Merging") Signed-off-by: Pedro Demarchi Gomes Co-developed-by: David Hildenbrand Signed-off-by: David Hildenbrand Reported-by: craftfever Closes: https://lkml.kernel.org/r/020cf8de6e773bb78ba7614ef250129f11a63781@murena.io Suggested-by: David Hildenbrand Acked-by: David Hildenbrand Cc: Chengming Zhou Cc: xu xin Cc: Signed-off-by: Andrew Morton [ change folio to page, replace pmdp_get_lockless with pmd_read_atomic and pmdp_get with READ_ONCE(*pmdp) ] Signed-off-by: Pedro Demarchi Gomes Signed-off-by: Greg Kroah-Hartman --- mm/ksm.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 105 insertions(+), 10 deletions(-) --- a/mm/ksm.c +++ b/mm/ksm.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include "internal.h" @@ -2223,6 +2224,89 @@ static struct rmap_item *get_next_rmap_i return rmap_item; } +struct ksm_next_page_arg { + struct page *page; + unsigned long addr; +}; + +static int ksm_next_page_pmd_entry(pmd_t *pmdp, unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct ksm_next_page_arg *private = walk->private; + struct vm_area_struct *vma = walk->vma; + pte_t *start_ptep = NULL, *ptep, pte; + struct mm_struct *mm = walk->mm; + struct page *page; + spinlock_t *ptl; + pmd_t pmd; + + if (ksm_test_exit(mm)) + return 0; + + cond_resched(); + + pmd = pmd_read_atomic(pmdp); + if (!pmd_present(pmd)) + return 0; + + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && pmd_leaf(pmd)) { + ptl = pmd_lock(mm, pmdp); + pmd = READ_ONCE(*pmdp); + + if (!pmd_present(pmd)) { + goto not_found_unlock; + } else if (pmd_leaf(pmd)) { + page = vm_normal_page_pmd(vma, addr, pmd); + if (!page) + goto not_found_unlock; + + if (is_zone_device_page(page) || !PageAnon(page)) + goto not_found_unlock; + + page += ((addr & (PMD_SIZE - 1)) >> PAGE_SHIFT); + goto found_unlock; + } + spin_unlock(ptl); + } + + start_ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); + if (!start_ptep) + return 0; + + for (ptep = start_ptep; addr < end; ptep++, addr += PAGE_SIZE) { + pte = ptep_get(ptep); + + if (!pte_present(pte)) + continue; + + page = vm_normal_page(vma, addr, pte); + if (!page) + continue; + + if (is_zone_device_page(page) || !PageAnon(page)) + continue; + goto found_unlock; + } + +not_found_unlock: + spin_unlock(ptl); + if (start_ptep) + pte_unmap(start_ptep); + return 0; +found_unlock: + get_page(page); + spin_unlock(ptl); + if (start_ptep) + pte_unmap(start_ptep); + private->page = page; + private->addr = addr; + return 1; +} + +static struct mm_walk_ops ksm_next_page_ops = { + .pmd_entry = ksm_next_page_pmd_entry, +}; + static struct rmap_item *scan_get_next_rmap_item(struct page **page) { struct mm_struct *mm; @@ -2302,29 +2386,40 @@ next_mm: ksm_scan.address = vma->vm_end; while (ksm_scan.address < vma->vm_end) { + struct ksm_next_page_arg ksm_next_page_arg; + struct page *tmp_page = NULL; + int found; + if (ksm_test_exit(mm)) break; - *page = follow_page(vma, ksm_scan.address, FOLL_GET); - if (IS_ERR_OR_NULL(*page)) { - ksm_scan.address += PAGE_SIZE; - cond_resched(); - continue; + + found = walk_page_range_vma(vma, ksm_scan.address, + vma->vm_end, + &ksm_next_page_ops, + &ksm_next_page_arg); + + if (found > 0) { + tmp_page = ksm_next_page_arg.page; + ksm_scan.address = ksm_next_page_arg.addr; + } else { + VM_WARN_ON_ONCE(found < 0); + ksm_scan.address = vma->vm_end - PAGE_SIZE; } - if (PageAnon(*page)) { - flush_anon_page(vma, *page, ksm_scan.address); - flush_dcache_page(*page); + if (tmp_page) { + flush_anon_page(vma, tmp_page, ksm_scan.address); + flush_dcache_page(tmp_page); rmap_item = get_next_rmap_item(slot, ksm_scan.rmap_list, ksm_scan.address); if (rmap_item) { ksm_scan.rmap_list = &rmap_item->rmap_list; ksm_scan.address += PAGE_SIZE; + *page = tmp_page; } else - put_page(*page); + put_page(tmp_page); mmap_read_unlock(mm); return rmap_item; } - put_page(*page); ksm_scan.address += PAGE_SIZE; cond_resched(); }