* [PATCH v3 0/3] Reclaim lazyfree THP without splitting
@ 2024-04-29 13:23 Lance Yang
2024-04-29 13:23 ` [PATCH v3 1/3] mm/rmap: remove duplicated exit code in pagewalk loop Lance Yang
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Lance Yang @ 2024-04-29 13:23 UTC (permalink / raw)
To: akpm
Cc: willy, maskray, ziy, ryan.roberts, david, 21cnbao, mhocko,
fengwei.yin, zokeefe, shy828301, xiehuan09, libang.li,
wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
linux-kernel, Lance Yang
Hi all,
This series adds support for reclaiming PMD-mapped THP marked as lazyfree
without needing to first split the large folio via split_huge_pmd_address().
When the user no longer requires the pages, they would use madvise(MADV_FREE)
to mark the pages as lazy free. Subsequently, they typically would not re-write
to that memory again.
During memory reclaim, if we detect that the large folio and its PMD are both
still marked as clean and there are no unexpected references(such as GUP), so we
can just discard the memory lazily, improving the efficiency of memory
reclamation in this case.
Performance Testing
===================
On an Intel i5 CPU, reclaiming 1GiB of lazyfree THPs using
mem_cgroup_force_empty() results in the following runtimes in seconds
(shorter is better):
--------------------------------------------
| Old | New | Change |
--------------------------------------------
| 0.683426 | 0.049197 | -92.80% |
--------------------------------------------
---
Changes since v2 [2]
====================
- Update the changelog (thanks to David Hildenbrand)
- Support try_to_unmap_one() to unmap PMD-mapped folios
(thanks a lot to David Hildenbrand and Zi Yan)
Changes since v1 [1]
====================
- Update the changelog
- Follow the exact same logic as in try_to_unmap_one() (per David Hildenbrand)
- Remove the extra code from rmap.c (per Matthew Wilcox)
[1] https://lore.kernel.org/linux-mm/20240417141111.77855-1-ioworker0@gmail.com
[2] https://lore.kernel.org/linux-mm/20240422055213.60231-1-ioworker0@gmail.com
Lance Yang (3):
mm/rmap: remove duplicated exit code in pagewalk loop
mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
include/linux/huge_mm.h | 4 ++
mm/huge_memory.c | 117 +++++++++++++++++++++++++++++++++-------
mm/rmap.c | 69 +++++++++++++-----------
3 files changed, 139 insertions(+), 51 deletions(-)
--
2.33.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/3] mm/rmap: remove duplicated exit code in pagewalk loop
2024-04-29 13:23 [PATCH v3 0/3] Reclaim lazyfree THP without splitting Lance Yang
@ 2024-04-29 13:23 ` Lance Yang
2024-04-29 13:23 ` [PATCH v3 2/3] mm/rmap: integrate PMD-mapped folio splitting into " Lance Yang
2024-04-29 13:23 ` [PATCH v3 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list() Lance Yang
2 siblings, 0 replies; 9+ messages in thread
From: Lance Yang @ 2024-04-29 13:23 UTC (permalink / raw)
To: akpm
Cc: willy, maskray, ziy, ryan.roberts, david, 21cnbao, mhocko,
fengwei.yin, zokeefe, shy828301, xiehuan09, libang.li,
wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
linux-kernel, Lance Yang
Introduce the labels walk_done and walk_done_err as exit points to
eliminate duplicated exit code in the pagewalk loop.
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
mm/rmap.c | 40 +++++++++++++++-------------------------
1 file changed, 15 insertions(+), 25 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 7faa60bc3e4d..7e2575d669a9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1675,9 +1675,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
/* Restore the mlock which got missed */
if (!folio_test_large(folio))
mlock_vma_folio(folio, vma);
- page_vma_mapped_walk_done(&pvmw);
- ret = false;
- break;
+ goto walk_done_err;
}
pfn = pte_pfn(ptep_get(pvmw.pte));
@@ -1715,11 +1713,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*/
if (!anon) {
VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
- if (!hugetlb_vma_trylock_write(vma)) {
- page_vma_mapped_walk_done(&pvmw);
- ret = false;
- break;
- }
+ if (!hugetlb_vma_trylock_write(vma))
+ goto walk_done_err;
if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
hugetlb_vma_unlock_write(vma);
flush_tlb_range(vma,
@@ -1734,8 +1729,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
* actual page and drop map count
* to zero.
*/
- page_vma_mapped_walk_done(&pvmw);
- break;
+ goto walk_done;
}
hugetlb_vma_unlock_write(vma);
}
@@ -1807,9 +1801,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (unlikely(folio_test_swapbacked(folio) !=
folio_test_swapcache(folio))) {
WARN_ON_ONCE(1);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- break;
+ goto walk_done_err;
}
/* MADV_FREE page check */
@@ -1848,23 +1840,17 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*/
set_pte_at(mm, address, pvmw.pte, pteval);
folio_set_swapbacked(folio);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- break;
+ goto walk_done_err;
}
if (swap_duplicate(entry) < 0) {
set_pte_at(mm, address, pvmw.pte, pteval);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- break;
+ goto walk_done_err;
}
if (arch_unmap_one(mm, vma, address, pteval) < 0) {
swap_free(entry);
set_pte_at(mm, address, pvmw.pte, pteval);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- break;
+ goto walk_done_err;
}
/* See folio_try_share_anon_rmap(): clear PTE first. */
@@ -1872,9 +1858,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
folio_try_share_anon_rmap_pte(folio, subpage)) {
swap_free(entry);
set_pte_at(mm, address, pvmw.pte, pteval);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- break;
+ goto walk_done_err;
}
if (list_empty(&mm->mmlist)) {
spin_lock(&mmlist_lock);
@@ -1914,6 +1898,12 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
+ continue;
+walk_done_err:
+ ret = false;
+walk_done:
+ page_vma_mapped_walk_done(&pvmw);
+ break;
}
mmu_notifier_invalidate_range_end(&range);
--
2.33.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
2024-04-29 13:23 [PATCH v3 0/3] Reclaim lazyfree THP without splitting Lance Yang
2024-04-29 13:23 ` [PATCH v3 1/3] mm/rmap: remove duplicated exit code in pagewalk loop Lance Yang
@ 2024-04-29 13:23 ` Lance Yang
2024-04-29 20:20 ` SeongJae Park
2024-04-29 13:23 ` [PATCH v3 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list() Lance Yang
2 siblings, 1 reply; 9+ messages in thread
From: Lance Yang @ 2024-04-29 13:23 UTC (permalink / raw)
To: akpm
Cc: willy, maskray, ziy, ryan.roberts, david, 21cnbao, mhocko,
fengwei.yin, zokeefe, shy828301, xiehuan09, libang.li,
wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
linux-kernel, Lance Yang
In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
folios, start the pagewalk first, then call split_huge_pmd_address()
to split the folio.
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
include/linux/huge_mm.h | 2 ++
mm/huge_memory.c | 42 +++++++++++++++++++++--------------------
mm/rmap.c | 26 +++++++++++++++++++------
3 files changed, 44 insertions(+), 26 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index c8d3ec116e29..2daadfcc6776 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -36,6 +36,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr, pgprot_t newprot,
unsigned long cp_flags);
+void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmd, bool freeze, struct folio *folio);
vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write);
vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8261b5669397..145505a1dd05 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2584,6 +2584,27 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
pmd_populate(mm, pmd, pgtable);
}
+void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmd, bool freeze, struct folio *folio)
+{
+ VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio));
+ VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE));
+ VM_WARN_ON_ONCE(folio && !folio_test_locked(folio));
+ VM_BUG_ON(freeze && !folio);
+
+ /*
+ * When the caller requests to set up a migration entry, we
+ * require a folio to check the PMD against. Otherwise, there
+ * is a risk of replacing the wrong folio.
+ */
+ if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
+ is_pmd_migration_entry(*pmd)) {
+ if (folio && folio != pmd_folio(*pmd))
+ return;
+ __split_huge_pmd_locked(vma, pmd, address, freeze);
+ }
+}
+
void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long address, bool freeze, struct folio *folio)
{
@@ -2595,26 +2616,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
(address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
ptl = pmd_lock(vma->vm_mm, pmd);
-
- /*
- * If caller asks to setup a migration entry, we need a folio to check
- * pmd against. Otherwise we can end up replacing wrong folio.
- */
- VM_BUG_ON(freeze && !folio);
- VM_WARN_ON_ONCE(folio && !folio_test_locked(folio));
-
- if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
- is_pmd_migration_entry(*pmd)) {
- /*
- * It's safe to call pmd_page when folio is set because it's
- * guaranteed that pmd is present.
- */
- if (folio && folio != pmd_folio(*pmd))
- goto out;
- __split_huge_pmd_locked(vma, pmd, range.start, freeze);
- }
-
-out:
+ split_huge_pmd_locked(vma, range.start, pmd, freeze, folio);
spin_unlock(ptl);
mmu_notifier_invalidate_range_end(&range);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 7e2575d669a9..e42f436c7ff3 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1636,9 +1636,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (flags & TTU_SYNC)
pvmw.flags = PVMW_SYNC;
- if (flags & TTU_SPLIT_HUGE_PMD)
- split_huge_pmd_address(vma, address, false, folio);
-
/*
* For THP, we have to assume the worse case ie pmd for invalidation.
* For hugetlb, it could be much worse if we need to do pud
@@ -1650,6 +1647,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
range.end = vma_address_end(&pvmw);
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
address, range.end);
+ if (flags & TTU_SPLIT_HUGE_PMD) {
+ range.start = address & HPAGE_PMD_MASK;
+ range.end = (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
+ }
if (folio_test_hugetlb(folio)) {
/*
* If sharing is possible, start and end will be adjusted
@@ -1664,9 +1665,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
mmu_notifier_invalidate_range_start(&range);
while (page_vma_mapped_walk(&pvmw)) {
- /* Unexpected PMD-mapped THP? */
- VM_BUG_ON_FOLIO(!pvmw.pte, folio);
-
/*
* If the folio is in an mlock()d vma, we must not swap it out.
*/
@@ -1678,6 +1676,22 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
goto walk_done_err;
}
+ if (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)) {
+ /*
+ * We temporarily have to drop the PTL and start once
+ * again from that now-PTE-mapped page table.
+ */
+ split_huge_pmd_locked(vma, range.start, pvmw.pmd, false,
+ folio);
+ pvmw.pmd = NULL;
+ spin_unlock(pvmw.ptl);
+ flags &= ~TTU_SPLIT_HUGE_PMD;
+ continue;
+ }
+
+ /* Unexpected PMD-mapped THP? */
+ VM_BUG_ON_FOLIO(!pvmw.pte, folio);
+
pfn = pte_pfn(ptep_get(pvmw.pte));
subpage = folio_page(folio, pfn - folio_pfn(folio));
address = pvmw.address;
--
2.33.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
2024-04-29 13:23 [PATCH v3 0/3] Reclaim lazyfree THP without splitting Lance Yang
2024-04-29 13:23 ` [PATCH v3 1/3] mm/rmap: remove duplicated exit code in pagewalk loop Lance Yang
2024-04-29 13:23 ` [PATCH v3 2/3] mm/rmap: integrate PMD-mapped folio splitting into " Lance Yang
@ 2024-04-29 13:23 ` Lance Yang
2024-04-30 8:34 ` Barry Song
2 siblings, 1 reply; 9+ messages in thread
From: Lance Yang @ 2024-04-29 13:23 UTC (permalink / raw)
To: akpm
Cc: willy, maskray, ziy, ryan.roberts, david, 21cnbao, mhocko,
fengwei.yin, zokeefe, shy828301, xiehuan09, libang.li,
wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
linux-kernel, Lance Yang
When the user no longer requires the pages, they would use
madvise(MADV_FREE) to mark the pages as lazy free. Subsequently, they
typically would not re-write to that memory again.
During memory reclaim, if we detect that the large folio and its PMD are
both still marked as clean and there are no unexpected references
(such as GUP), so we can just discard the memory lazily, improving the
efficiency of memory reclamation in this case.
On an Intel i5 CPU, reclaiming 1GiB of lazyfree THPs using
mem_cgroup_force_empty() results in the following runtimes in seconds
(shorter is better):
--------------------------------------------
| Old | New | Change |
--------------------------------------------
| 0.683426 | 0.049197 | -92.80% |
--------------------------------------------
Suggested-by: Zi Yan <ziy@nvidia.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
include/linux/huge_mm.h | 2 ++
mm/huge_memory.c | 75 +++++++++++++++++++++++++++++++++++++++++
mm/rmap.c | 3 ++
3 files changed, 80 insertions(+)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2daadfcc6776..fd330f72b4f3 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -38,6 +38,8 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
unsigned long cp_flags);
void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, bool freeze, struct folio *folio);
+bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
+ pmd_t *pmdp, struct folio *folio);
vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write);
vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 145505a1dd05..d35d526ed48f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2690,6 +2690,81 @@ static void unmap_folio(struct folio *folio)
try_to_unmap_flush();
}
+static bool __discard_trans_pmd_locked(struct vm_area_struct *vma,
+ unsigned long addr, pmd_t *pmdp,
+ struct folio *folio)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ int ref_count, map_count;
+ pmd_t orig_pmd = *pmdp;
+ struct mmu_gather tlb;
+ struct page *page;
+
+ if (pmd_dirty(orig_pmd) || folio_test_dirty(folio))
+ return false;
+ if (unlikely(!pmd_present(orig_pmd) || !pmd_trans_huge(orig_pmd)))
+ return false;
+
+ page = pmd_page(orig_pmd);
+ if (unlikely(page_folio(page) != folio))
+ return false;
+
+ tlb_gather_mmu(&tlb, mm);
+ orig_pmd = pmdp_huge_get_and_clear(mm, addr, pmdp);
+ tlb_remove_pmd_tlb_entry(&tlb, pmdp, addr);
+
+ /*
+ * Syncing against concurrent GUP-fast:
+ * - clear PMD; barrier; read refcount
+ * - inc refcount; barrier; read PMD
+ */
+ smp_mb();
+
+ ref_count = folio_ref_count(folio);
+ map_count = folio_mapcount(folio);
+
+ /*
+ * Order reads for folio refcount and dirty flag
+ * (see comments in __remove_mapping()).
+ */
+ smp_rmb();
+
+ /*
+ * If the PMD or folio is redirtied at this point, or if there are
+ * unexpected references, we will give up to discard this folio
+ * and remap it.
+ *
+ * The only folio refs must be one from isolation plus the rmap(s).
+ */
+ if (ref_count != map_count + 1 || folio_test_dirty(folio) ||
+ pmd_dirty(orig_pmd)) {
+ set_pmd_at(mm, addr, pmdp, orig_pmd);
+ return false;
+ }
+
+ folio_remove_rmap_pmd(folio, page, vma);
+ zap_deposited_table(mm, pmdp);
+ add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+ folio_put(folio);
+
+ return true;
+}
+
+bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
+ pmd_t *pmdp, struct folio *folio)
+{
+ VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio);
+ VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
+ VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE));
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ if (folio_test_anon(folio) && !folio_test_swapbacked(folio))
+ return __discard_trans_pmd_locked(vma, addr, pmdp, folio);
+#endif
+
+ return false;
+}
+
static void remap_page(struct folio *folio, unsigned long nr)
{
int i = 0;
diff --git a/mm/rmap.c b/mm/rmap.c
index e42f436c7ff3..ab37af4f47aa 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1677,6 +1677,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
}
if (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)) {
+ if (unmap_huge_pmd_locked(vma, range.start, pvmw.pmd,
+ folio))
+ goto walk_done;
/*
* We temporarily have to drop the PTL and start once
* again from that now-PTE-mapped page table.
--
2.33.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
2024-04-29 13:23 ` [PATCH v3 2/3] mm/rmap: integrate PMD-mapped folio splitting into " Lance Yang
@ 2024-04-29 20:20 ` SeongJae Park
2024-04-30 2:03 ` Lance Yang
0 siblings, 1 reply; 9+ messages in thread
From: SeongJae Park @ 2024-04-29 20:20 UTC (permalink / raw)
To: Lance Yang
Cc: SeongJae Park, akpm, willy, maskray, ziy, ryan.roberts, david,
21cnbao, mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09,
libang.li, wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
linux-kernel
Hi Lance,
On Mon, 29 Apr 2024 21:23:07 +0800 Lance Yang <ioworker0@gmail.com> wrote:
> In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
> folios, start the pagewalk first, then call split_huge_pmd_address()
> to split the folio.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
> include/linux/huge_mm.h | 2 ++
> mm/huge_memory.c | 42 +++++++++++++++++++++--------------------
> mm/rmap.c | 26 +++++++++++++++++++------
> 3 files changed, 44 insertions(+), 26 deletions(-)
>
[...]
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 7e2575d669a9..e42f436c7ff3 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1636,9 +1636,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> if (flags & TTU_SYNC)
> pvmw.flags = PVMW_SYNC;
>
> - if (flags & TTU_SPLIT_HUGE_PMD)
> - split_huge_pmd_address(vma, address, false, folio);
> -
> /*
> * For THP, we have to assume the worse case ie pmd for invalidation.
> * For hugetlb, it could be much worse if we need to do pud
> @@ -1650,6 +1647,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> range.end = vma_address_end(&pvmw);
> mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
> address, range.end);
> + if (flags & TTU_SPLIT_HUGE_PMD) {
> + range.start = address & HPAGE_PMD_MASK;
> + range.end = (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
> + }
I found the latest mm-unstable fails one[1] of my build configuration
with below error message. And 'git bisect' points this patch.
CC mm/rmap.o
In file included from <command-line>:
.../linux/mm/rmap.c: In function 'try_to_unmap_one':
.../linux/include/linux/compiler_types.h:460:38: error: call to '__compiletime_assert_455' declared with attribute error: BUILD_BUG failed
460 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
.../linux/include/linux/compiler_types.h:441:4: note: in definition of macro '__compiletime_assert'
441 | prefix ## suffix(); \
| ^~~~~~
.../linux/include/linux/compiler_types.h:460:2: note: in expansion of macro '_compiletime_assert'
460 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
.../linux/include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
.../linux/include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
| ^~~~~~~~~~~~~~~~
.../linux/include/linux/huge_mm.h:97:28: note: in expansion of macro 'BUILD_BUG'
97 | #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
| ^~~~~~~~~
.../linux/include/linux/huge_mm.h:104:34: note: in expansion of macro 'HPAGE_PMD_SHIFT'
104 | #define HPAGE_PMD_SIZE ((1UL) << HPAGE_PMD_SHIFT)
| ^~~~~~~~~~~~~~~
.../linux/include/linux/huge_mm.h:103:27: note: in expansion of macro 'HPAGE_PMD_SIZE'
103 | #define HPAGE_PMD_MASK (~(HPAGE_PMD_SIZE - 1))
| ^~~~~~~~~~~~~~
.../linux/mm/rmap.c:1651:27: note: in expansion of macro 'HPAGE_PMD_MASK'
1651 | range.start = address & HPAGE_PMD_MASK;
| ^~~~~~~~~~~~~~
I haven't looked into the code yet, but seems this code need to handle
CONFIG_PGTABLE_HAS_HUGE_LEAVES undefined case? May I ask your opinion?
[1] https://github.com/awslabs/damon-tests/blob/next/corr/tests/build_arm64.sh
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
2024-04-29 20:20 ` SeongJae Park
@ 2024-04-30 2:03 ` Lance Yang
2024-04-30 2:13 ` Lance Yang
0 siblings, 1 reply; 9+ messages in thread
From: Lance Yang @ 2024-04-30 2:03 UTC (permalink / raw)
To: SeongJae Park
Cc: akpm, willy, maskray, ziy, ryan.roberts, david, 21cnbao, mhocko,
fengwei.yin, zokeefe, shy828301, xiehuan09, libang.li,
wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
linux-kernel
Hey SJ,
Thanks a lot for reporting!
On Tue, Apr 30, 2024 at 4:20 AM SeongJae Park <sj@kernel.org> wrote:
>
> Hi Lance,
>
> On Mon, 29 Apr 2024 21:23:07 +0800 Lance Yang <ioworker0@gmail.com> wrote:
>
> > In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
> > folios, start the pagewalk first, then call split_huge_pmd_address()
> > to split the folio.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> > include/linux/huge_mm.h | 2 ++
> > mm/huge_memory.c | 42 +++++++++++++++++++++--------------------
> > mm/rmap.c | 26 +++++++++++++++++++------
> > 3 files changed, 44 insertions(+), 26 deletions(-)
> >
> [...]
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 7e2575d669a9..e42f436c7ff3 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1636,9 +1636,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> > if (flags & TTU_SYNC)
> > pvmw.flags = PVMW_SYNC;
> >
> > - if (flags & TTU_SPLIT_HUGE_PMD)
> > - split_huge_pmd_address(vma, address, false, folio);
> > -
> > /*
> > * For THP, we have to assume the worse case ie pmd for invalidation.
> > * For hugetlb, it could be much worse if we need to do pud
> > @@ -1650,6 +1647,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> > range.end = vma_address_end(&pvmw);
> > mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
> > address, range.end);
> > + if (flags & TTU_SPLIT_HUGE_PMD) {
> > + range.start = address & HPAGE_PMD_MASK;
> > + range.end = (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
> > + }
>
> I found the latest mm-unstable fails one[1] of my build configuration
> with below error message. And 'git bisect' points this patch.
Thanks for taking time to 'git bisect' and identify this bug!
>
> CC mm/rmap.o
> In file included from <command-line>:
> .../linux/mm/rmap.c: In function 'try_to_unmap_one':
> .../linux/include/linux/compiler_types.h:460:38: error: call to '__compiletime_assert_455' declared with attribute error: BUILD_BUG failed
> 460 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> | ^
> .../linux/include/linux/compiler_types.h:441:4: note: in definition of macro '__compiletime_assert'
> 441 | prefix ## suffix(); \
> | ^~~~~~
> .../linux/include/linux/compiler_types.h:460:2: note: in expansion of macro '_compiletime_assert'
> 460 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> | ^~~~~~~~~~~~~~~~~~~
> .../linux/include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
> 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> | ^~~~~~~~~~~~~~~~~~
> .../linux/include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
> 59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
> | ^~~~~~~~~~~~~~~~
> .../linux/include/linux/huge_mm.h:97:28: note: in expansion of macro 'BUILD_BUG'
> 97 | #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
> | ^~~~~~~~~
> .../linux/include/linux/huge_mm.h:104:34: note: in expansion of macro 'HPAGE_PMD_SHIFT'
> 104 | #define HPAGE_PMD_SIZE ((1UL) << HPAGE_PMD_SHIFT)
> | ^~~~~~~~~~~~~~~
> .../linux/include/linux/huge_mm.h:103:27: note: in expansion of macro 'HPAGE_PMD_SIZE'
> 103 | #define HPAGE_PMD_MASK (~(HPAGE_PMD_SIZE - 1))
> | ^~~~~~~~~~~~~~
> .../linux/mm/rmap.c:1651:27: note: in expansion of macro 'HPAGE_PMD_MASK'
> 1651 | range.start = address & HPAGE_PMD_MASK;
> | ^~~~~~~~~~~~~~
>
> I haven't looked into the code yet, but seems this code need to handle
> CONFIG_PGTABLE_HAS_HUGE_LEAVES undefined case? May I ask your opinion?
>
> [1] https://github.com/awslabs/damon-tests/blob/next/corr/tests/build_arm64.sh
I'll fix this bug and rebuild using the config you've provided above.
Thanks again for reporting!
Lance
>
>
> Thanks,
> SJ
> [...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
2024-04-30 2:03 ` Lance Yang
@ 2024-04-30 2:13 ` Lance Yang
0 siblings, 0 replies; 9+ messages in thread
From: Lance Yang @ 2024-04-30 2:13 UTC (permalink / raw)
To: Andrew Morton
Cc: SeongJae Park, willy, maskray, ziy, ryan.roberts, david, 21cnbao,
mhocko, fengwei.yin, zokeefe, shy828301, xiehuan09, libang.li,
wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
linux-kernel
On Tue, Apr 30, 2024 at 10:03 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> Hey SJ,
>
> Thanks a lot for reporting!
>
> On Tue, Apr 30, 2024 at 4:20 AM SeongJae Park <sj@kernel.org> wrote:
> >
> > Hi Lance,
> >
> > On Mon, 29 Apr 2024 21:23:07 +0800 Lance Yang <ioworker0@gmail.com> wrote:
> >
> > > In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
> > > folios, start the pagewalk first, then call split_huge_pmd_address()
> > > to split the folio.
> > >
> > > Suggested-by: David Hildenbrand <david@redhat.com>
> > > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > > ---
> > > include/linux/huge_mm.h | 2 ++
> > > mm/huge_memory.c | 42 +++++++++++++++++++++--------------------
> > > mm/rmap.c | 26 +++++++++++++++++++------
> > > 3 files changed, 44 insertions(+), 26 deletions(-)
> > >
> > [...]
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 7e2575d669a9..e42f436c7ff3 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1636,9 +1636,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> > > if (flags & TTU_SYNC)
> > > pvmw.flags = PVMW_SYNC;
> > >
> > > - if (flags & TTU_SPLIT_HUGE_PMD)
> > > - split_huge_pmd_address(vma, address, false, folio);
> > > -
> > > /*
> > > * For THP, we have to assume the worse case ie pmd for invalidation.
> > > * For hugetlb, it could be much worse if we need to do pud
> > > @@ -1650,6 +1647,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> > > range.end = vma_address_end(&pvmw);
> > > mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
> > > address, range.end);
> > > + if (flags & TTU_SPLIT_HUGE_PMD) {
> > > + range.start = address & HPAGE_PMD_MASK;
> > > + range.end = (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
> > > + }
> >
> > I found the latest mm-unstable fails one[1] of my build configuration
> > with below error message. And 'git bisect' points this patch.
>
> Thanks for taking time to 'git bisect' and identify this bug!
>
> >
> > CC mm/rmap.o
> > In file included from <command-line>:
> > .../linux/mm/rmap.c: In function 'try_to_unmap_one':
> > .../linux/include/linux/compiler_types.h:460:38: error: call to '__compiletime_assert_455' declared with attribute error: BUILD_BUG failed
> > 460 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> > | ^
> > .../linux/include/linux/compiler_types.h:441:4: note: in definition of macro '__compiletime_assert'
> > 441 | prefix ## suffix(); \
> > | ^~~~~~
> > .../linux/include/linux/compiler_types.h:460:2: note: in expansion of macro '_compiletime_assert'
> > 460 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> > | ^~~~~~~~~~~~~~~~~~~
> > .../linux/include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
> > 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> > | ^~~~~~~~~~~~~~~~~~
> > .../linux/include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
> > 59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
> > | ^~~~~~~~~~~~~~~~
> > .../linux/include/linux/huge_mm.h:97:28: note: in expansion of macro 'BUILD_BUG'
> > 97 | #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
> > | ^~~~~~~~~
> > .../linux/include/linux/huge_mm.h:104:34: note: in expansion of macro 'HPAGE_PMD_SHIFT'
> > 104 | #define HPAGE_PMD_SIZE ((1UL) << HPAGE_PMD_SHIFT)
> > | ^~~~~~~~~~~~~~~
> > .../linux/include/linux/huge_mm.h:103:27: note: in expansion of macro 'HPAGE_PMD_SIZE'
> > 103 | #define HPAGE_PMD_MASK (~(HPAGE_PMD_SIZE - 1))
> > | ^~~~~~~~~~~~~~
> > .../linux/mm/rmap.c:1651:27: note: in expansion of macro 'HPAGE_PMD_MASK'
> > 1651 | range.start = address & HPAGE_PMD_MASK;
> > | ^~~~~~~~~~~~~~
> >
> > I haven't looked into the code yet, but seems this code need to handle
> > CONFIG_PGTABLE_HAS_HUGE_LEAVES undefined case? May I ask your opinion?
> >
> > [1] https://github.com/awslabs/damon-tests/blob/next/corr/tests/build_arm64.sh
>
> I'll fix this bug and rebuild using the config you've provided above.
Hey Andrew,
Could you please temporarily drop this series from the mm tree?
I'll fix this bug in the next version.
Thanks,
Lance
>
> Thanks again for reporting!
> Lance
>
> >
> >
> > Thanks,
> > SJ
> > [...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
2024-04-29 13:23 ` [PATCH v3 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list() Lance Yang
@ 2024-04-30 8:34 ` Barry Song
2024-04-30 9:07 ` Lance Yang
0 siblings, 1 reply; 9+ messages in thread
From: Barry Song @ 2024-04-30 8:34 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, willy, maskray, ziy, ryan.roberts, david, mhocko,
fengwei.yin, zokeefe, shy828301, xiehuan09, libang.li,
wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
linux-kernel
On Tue, Apr 30, 2024 at 1:23 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> When the user no longer requires the pages, they would use
> madvise(MADV_FREE) to mark the pages as lazy free. Subsequently, they
> typically would not re-write to that memory again.
>
> During memory reclaim, if we detect that the large folio and its PMD are
> both still marked as clean and there are no unexpected references
> (such as GUP), so we can just discard the memory lazily, improving the
> efficiency of memory reclamation in this case.
>
> On an Intel i5 CPU, reclaiming 1GiB of lazyfree THPs using
> mem_cgroup_force_empty() results in the following runtimes in seconds
> (shorter is better):
>
> --------------------------------------------
> | Old | New | Change |
> --------------------------------------------
> | 0.683426 | 0.049197 | -92.80% |
> --------------------------------------------
>
> Suggested-by: Zi Yan <ziy@nvidia.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
> include/linux/huge_mm.h | 2 ++
> mm/huge_memory.c | 75 +++++++++++++++++++++++++++++++++++++++++
> mm/rmap.c | 3 ++
> 3 files changed, 80 insertions(+)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2daadfcc6776..fd330f72b4f3 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -38,6 +38,8 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> unsigned long cp_flags);
> void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
> pmd_t *pmd, bool freeze, struct folio *folio);
> +bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
> + pmd_t *pmdp, struct folio *folio);
>
> vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write);
> vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write);
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 145505a1dd05..d35d526ed48f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2690,6 +2690,81 @@ static void unmap_folio(struct folio *folio)
> try_to_unmap_flush();
> }
>
> +static bool __discard_trans_pmd_locked(struct vm_area_struct *vma,
> + unsigned long addr, pmd_t *pmdp,
> + struct folio *folio)
> +{
> + struct mm_struct *mm = vma->vm_mm;
> + int ref_count, map_count;
> + pmd_t orig_pmd = *pmdp;
> + struct mmu_gather tlb;
> + struct page *page;
> +
> + if (pmd_dirty(orig_pmd) || folio_test_dirty(folio))
> + return false;
> + if (unlikely(!pmd_present(orig_pmd) || !pmd_trans_huge(orig_pmd)))
> + return false;
> +
> + page = pmd_page(orig_pmd);
> + if (unlikely(page_folio(page) != folio))
> + return false;
> +
> + tlb_gather_mmu(&tlb, mm);
> + orig_pmd = pmdp_huge_get_and_clear(mm, addr, pmdp);
> + tlb_remove_pmd_tlb_entry(&tlb, pmdp, addr);
> +
> + /*
> + * Syncing against concurrent GUP-fast:
> + * - clear PMD; barrier; read refcount
> + * - inc refcount; barrier; read PMD
> + */
> + smp_mb();
> +
> + ref_count = folio_ref_count(folio);
> + map_count = folio_mapcount(folio);
> +
> + /*
> + * Order reads for folio refcount and dirty flag
> + * (see comments in __remove_mapping()).
> + */
> + smp_rmb();
> +
> + /*
> + * If the PMD or folio is redirtied at this point, or if there are
> + * unexpected references, we will give up to discard this folio
> + * and remap it.
> + *
> + * The only folio refs must be one from isolation plus the rmap(s).
> + */
> + if (ref_count != map_count + 1 || folio_test_dirty(folio) ||
> + pmd_dirty(orig_pmd)) {
> + set_pmd_at(mm, addr, pmdp, orig_pmd);
> + return false;
> + }
> +
> + folio_remove_rmap_pmd(folio, page, vma);
> + zap_deposited_table(mm, pmdp);
> + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> + folio_put(folio);
> +
> + return true;
> +}
> +
> +bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
> + pmd_t *pmdp, struct folio *folio)
> +{
> + VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio);
> + VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> + VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE));
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + if (folio_test_anon(folio) && !folio_test_swapbacked(folio))
> + return __discard_trans_pmd_locked(vma, addr, pmdp, folio);
> +#endif
this is weird and huge_memory.c is only built with
CONFIG_TRANSPARENT_HUGEPAGE = y;
mm/Makefile:
obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
> +
> + return false;
> +}
> +
> static void remap_page(struct folio *folio, unsigned long nr)
> {
> int i = 0;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index e42f436c7ff3..ab37af4f47aa 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1677,6 +1677,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> }
>
> if (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)) {
> + if (unmap_huge_pmd_locked(vma, range.start, pvmw.pmd,
> + folio))
> + goto walk_done;
this is making
mm/rmap.c:1680: undefined reference to `unmap_huge_pmd_locked'
mm/rmap.c:1687: undefined reference to `split_huge_pmd_locked'
> /*
> * We temporarily have to drop the PTL and start once
> * again from that now-PTE-mapped page table.
> --
> 2.33.1
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list()
2024-04-30 8:34 ` Barry Song
@ 2024-04-30 9:07 ` Lance Yang
0 siblings, 0 replies; 9+ messages in thread
From: Lance Yang @ 2024-04-30 9:07 UTC (permalink / raw)
To: Barry Song
Cc: akpm, willy, maskray, ziy, ryan.roberts, david, mhocko,
fengwei.yin, zokeefe, shy828301, xiehuan09, libang.li,
wangkefeng.wang, songmuchun, peterx, minchan, linux-mm,
linux-kernel
Hey Barry,
Thanks for taking time to review!
On Tue, Apr 30, 2024 at 4:35 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Apr 30, 2024 at 1:23 AM Lance Yang <ioworker0@gmail.com> wrote:
> >
> > When the user no longer requires the pages, they would use
> > madvise(MADV_FREE) to mark the pages as lazy free. Subsequently, they
> > typically would not re-write to that memory again.
> >
> > During memory reclaim, if we detect that the large folio and its PMD are
> > both still marked as clean and there are no unexpected references
> > (such as GUP), so we can just discard the memory lazily, improving the
> > efficiency of memory reclamation in this case.
> >
> > On an Intel i5 CPU, reclaiming 1GiB of lazyfree THPs using
> > mem_cgroup_force_empty() results in the following runtimes in seconds
> > (shorter is better):
> >
> > --------------------------------------------
> > | Old | New | Change |
> > --------------------------------------------
> > | 0.683426 | 0.049197 | -92.80% |
> > --------------------------------------------
> >
> > Suggested-by: Zi Yan <ziy@nvidia.com>
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> > include/linux/huge_mm.h | 2 ++
> > mm/huge_memory.c | 75 +++++++++++++++++++++++++++++++++++++++++
> > mm/rmap.c | 3 ++
> > 3 files changed, 80 insertions(+)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 2daadfcc6776..fd330f72b4f3 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -38,6 +38,8 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > unsigned long cp_flags);
> > void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
> > pmd_t *pmd, bool freeze, struct folio *folio);
> > +bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
> > + pmd_t *pmdp, struct folio *folio);
> >
> > vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write);
> > vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write);
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 145505a1dd05..d35d526ed48f 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2690,6 +2690,81 @@ static void unmap_folio(struct folio *folio)
> > try_to_unmap_flush();
> > }
> >
> > +static bool __discard_trans_pmd_locked(struct vm_area_struct *vma,
> > + unsigned long addr, pmd_t *pmdp,
> > + struct folio *folio)
> > +{
> > + struct mm_struct *mm = vma->vm_mm;
> > + int ref_count, map_count;
> > + pmd_t orig_pmd = *pmdp;
> > + struct mmu_gather tlb;
> > + struct page *page;
> > +
> > + if (pmd_dirty(orig_pmd) || folio_test_dirty(folio))
> > + return false;
> > + if (unlikely(!pmd_present(orig_pmd) || !pmd_trans_huge(orig_pmd)))
> > + return false;
> > +
> > + page = pmd_page(orig_pmd);
> > + if (unlikely(page_folio(page) != folio))
> > + return false;
> > +
> > + tlb_gather_mmu(&tlb, mm);
> > + orig_pmd = pmdp_huge_get_and_clear(mm, addr, pmdp);
> > + tlb_remove_pmd_tlb_entry(&tlb, pmdp, addr);
> > +
> > + /*
> > + * Syncing against concurrent GUP-fast:
> > + * - clear PMD; barrier; read refcount
> > + * - inc refcount; barrier; read PMD
> > + */
> > + smp_mb();
> > +
> > + ref_count = folio_ref_count(folio);
> > + map_count = folio_mapcount(folio);
> > +
> > + /*
> > + * Order reads for folio refcount and dirty flag
> > + * (see comments in __remove_mapping()).
> > + */
> > + smp_rmb();
> > +
> > + /*
> > + * If the PMD or folio is redirtied at this point, or if there are
> > + * unexpected references, we will give up to discard this folio
> > + * and remap it.
> > + *
> > + * The only folio refs must be one from isolation plus the rmap(s).
> > + */
> > + if (ref_count != map_count + 1 || folio_test_dirty(folio) ||
> > + pmd_dirty(orig_pmd)) {
> > + set_pmd_at(mm, addr, pmdp, orig_pmd);
> > + return false;
> > + }
> > +
> > + folio_remove_rmap_pmd(folio, page, vma);
> > + zap_deposited_table(mm, pmdp);
> > + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> > + folio_put(folio);
> > +
> > + return true;
> > +}
> > +
> > +bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
> > + pmd_t *pmdp, struct folio *folio)
> > +{
> > + VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio);
> > + VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> > + VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE));
> > +
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > + if (folio_test_anon(folio) && !folio_test_swapbacked(folio))
> > + return __discard_trans_pmd_locked(vma, addr, pmdp, folio);
> > +#endif
>
> this is weird and huge_memory.c is only built with
> CONFIG_TRANSPARENT_HUGEPAGE = y;
>
> mm/Makefile:
> obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
Thanks for pointing that out!
I'll drop the conditional compilation directives :)
>
> > +
> > + return false;
> > +}
> > +
> > static void remap_page(struct folio *folio, unsigned long nr)
> > {
> > int i = 0;
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index e42f436c7ff3..ab37af4f47aa 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1677,6 +1677,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> > }
> >
> > if (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)) {
> > + if (unmap_huge_pmd_locked(vma, range.start, pvmw.pmd,
> > + folio))
> > + goto walk_done;
>
> this is making
> mm/rmap.c:1680: undefined reference to `unmap_huge_pmd_locked'
> mm/rmap.c:1687: undefined reference to `split_huge_pmd_locked'
You're right!
It's my oversight, and I'll make sure to address it in the next version.
Thanks again for the review!
Lance
>
> > /*
> > * We temporarily have to drop the PTL and start once
> > * again from that now-PTE-mapped page table.
> > --
> > 2.33.1
> >
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-04-30 9:08 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-29 13:23 [PATCH v3 0/3] Reclaim lazyfree THP without splitting Lance Yang
2024-04-29 13:23 ` [PATCH v3 1/3] mm/rmap: remove duplicated exit code in pagewalk loop Lance Yang
2024-04-29 13:23 ` [PATCH v3 2/3] mm/rmap: integrate PMD-mapped folio splitting into " Lance Yang
2024-04-29 20:20 ` SeongJae Park
2024-04-30 2:03 ` Lance Yang
2024-04-30 2:13 ` Lance Yang
2024-04-29 13:23 ` [PATCH v3 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list() Lance Yang
2024-04-30 8:34 ` Barry Song
2024-04-30 9:07 ` Lance Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).