From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org, ziy@nvidia.com, willy@infradead.org,
vbabka@suse.cz, tsbogend@alpha.franken.de, songliubraving@fb.com,
sj@kernel.org, shy828301@gmail.com,
rongwei.wang@linux.alibaba.com, rientjes@google.com,
peterx@redhat.com, pasha.tatashin@soleen.com, minchan@kernel.org,
mhocko@suse.com, mattst88@gmail.com, linmiaohe@huawei.com,
kirill.shutemov@linux.intel.com, jrdr.linux@gmail.com,
jcmvbkbc@gmail.com, James.Bottomley@HansenPartnership.com,
ink@jurassic.park.msu.ru, hughd@google.com, deller@gmx.de,
david@redhat.com, dan.carpenter@oracle.com, ckennelly@google.com,
chris@zankel.net, axelrasmussen@google.com, axboe@kernel.dk,
asml.silence@gmail.com, arnd@arndb.de,
alex.shi@linux.alibaba.com, aarcange@redhat.com,
zokeefe@google.com, akpm@linux-foundation.org
Subject: [merged mm-stable] mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage.patch removed from -mm tree
Date: Sun, 11 Sep 2022 20:28:05 -0700 [thread overview]
Message-ID: <20220912032806.07C0AC433D7@smtp.kernel.org> (raw)
The quilt patch titled
Subject: mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage
has been removed from the -mm tree. Its filename was
mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Zach O'Keefe" <zokeefe@google.com>
Subject: mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage
Date: Wed, 6 Jul 2022 16:59:26 -0700
When scanning an anon pmd to see if it's eligible for collapse, return
SCAN_PMD_MAPPED if the pmd already maps a hugepage. Note that
SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the
file-collapse path, since the latter might identify pte-mapped compound
pages. This is required by MADV_COLLAPSE which necessarily needs to know
what hugepage-aligned/sized regions are already pmd-mapped.
In order to determine if a pmd already maps a hugepage, refactor
mm_find_pmd():
Return mm_find_pmd() to it's pre-commit f72e7dcdd252 ("mm: let mm_find_pmd
fix buggy race with THP fault") behavior. ksm was the only caller that
explicitly wanted a pte-mapping pmd, so open code the pte-mapping logic
there (pmd_present() and pmd_trans_huge() checks).
Undo revert change in commit f72e7dcdd252 ("mm: let mm_find_pmd fix buggy
race with THP fault") that open-coded split_huge_pmd_address() pmd lookup
and use mm_find_pmd() instead.
Link: https://lkml.kernel.org/r/20220706235936.2197195-9-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/trace/events/huge_memory.h | 1
mm/huge_memory.c | 18 --------
mm/internal.h | 2
mm/khugepaged.c | 60 +++++++++++++++++++++------
mm/ksm.c | 10 ++++
mm/rmap.c | 15 ++----
6 files changed, 67 insertions(+), 39 deletions(-)
--- a/include/trace/events/huge_memory.h~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage
+++ a/include/trace/events/huge_memory.h
@@ -11,6 +11,7 @@
EM( SCAN_FAIL, "failed") \
EM( SCAN_SUCCEED, "succeeded") \
EM( SCAN_PMD_NULL, "pmd_null") \
+ EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \
EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \
EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \
EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \
--- a/mm/huge_memory.c~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage
+++ a/mm/huge_memory.c
@@ -2286,25 +2286,11 @@ out:
void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address,
bool freeze, struct folio *folio)
{
- pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
+ pmd_t *pmd = mm_find_pmd(vma->vm_mm, address);
- pgd = pgd_offset(vma->vm_mm, address);
- if (!pgd_present(*pgd))
+ if (!pmd)
return;
- p4d = p4d_offset(pgd, address);
- if (!p4d_present(*p4d))
- return;
-
- pud = pud_offset(p4d, address);
- if (!pud_present(*pud))
- return;
-
- pmd = pmd_offset(pud, address);
-
__split_huge_pmd(vma, pmd, address, freeze, folio);
}
--- a/mm/internal.h~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage
+++ a/mm/internal.h
@@ -187,7 +187,7 @@ extern void reclaim_throttle(pg_data_t *
/*
* in mm/rmap.c:
*/
-extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
+pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
/*
* in mm/page_alloc.c
--- a/mm/khugepaged.c~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage
+++ a/mm/khugepaged.c
@@ -28,6 +28,7 @@ enum scan_result {
SCAN_FAIL,
SCAN_SUCCEED,
SCAN_PMD_NULL,
+ SCAN_PMD_MAPPED,
SCAN_EXCEED_NONE_PTE,
SCAN_EXCEED_SWAP_PTE,
SCAN_EXCEED_SHARED_PTE,
@@ -877,6 +878,45 @@ static int hugepage_vma_revalidate(struc
return SCAN_SUCCEED;
}
+static int find_pmd_or_thp_or_none(struct mm_struct *mm,
+ unsigned long address,
+ pmd_t **pmd)
+{
+ pmd_t pmde;
+
+ *pmd = mm_find_pmd(mm, address);
+ if (!*pmd)
+ return SCAN_PMD_NULL;
+
+ pmde = pmd_read_atomic(*pmd);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ /* See comments in pmd_none_or_trans_huge_or_clear_bad() */
+ barrier();
+#endif
+ if (!pmd_present(pmde))
+ return SCAN_PMD_NULL;
+ if (pmd_trans_huge(pmde))
+ return SCAN_PMD_MAPPED;
+ if (pmd_bad(pmde))
+ return SCAN_PMD_NULL;
+ return SCAN_SUCCEED;
+}
+
+static int check_pmd_still_valid(struct mm_struct *mm,
+ unsigned long address,
+ pmd_t *pmd)
+{
+ pmd_t *new_pmd;
+ int result = find_pmd_or_thp_or_none(mm, address, &new_pmd);
+
+ if (result != SCAN_SUCCEED)
+ return result;
+ if (new_pmd != pmd)
+ return SCAN_FAIL;
+ return SCAN_SUCCEED;
+}
+
/*
* Bring missing pages in from swap, to complete THP collapse.
* Only done if khugepaged_scan_pmd believes it is worthwhile.
@@ -988,9 +1028,8 @@ static int collapse_huge_page(struct mm_
goto out_nolock;
}
- pmd = mm_find_pmd(mm, address);
- if (!pmd) {
- result = SCAN_PMD_NULL;
+ result = find_pmd_or_thp_or_none(mm, address, &pmd);
+ if (result != SCAN_SUCCEED) {
mmap_read_unlock(mm);
goto out_nolock;
}
@@ -1018,7 +1057,8 @@ static int collapse_huge_page(struct mm_
if (result != SCAN_SUCCEED)
goto out_up_write;
/* check if the pmd is still valid */
- if (mm_find_pmd(mm, address) != pmd)
+ result = check_pmd_still_valid(mm, address, pmd);
+ if (result != SCAN_SUCCEED)
goto out_up_write;
anon_vma_lock_write(vma->anon_vma);
@@ -1121,11 +1161,9 @@ static int khugepaged_scan_pmd(struct mm
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
- pmd = mm_find_pmd(mm, address);
- if (!pmd) {
- result = SCAN_PMD_NULL;
+ result = find_pmd_or_thp_or_none(mm, address, &pmd);
+ if (result != SCAN_SUCCEED)
goto out;
- }
memset(cc->node_load, 0, sizeof(cc->node_load));
pte = pte_offset_map_lock(mm, pmd, address, &ptl);
@@ -1383,8 +1421,7 @@ void collapse_pte_mapped_thp(struct mm_s
if (!PageHead(hpage))
goto drop_hpage;
- pmd = mm_find_pmd(mm, haddr);
- if (!pmd)
+ if (find_pmd_or_thp_or_none(mm, haddr, &pmd) != SCAN_SUCCEED)
goto drop_hpage;
start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
@@ -1502,8 +1539,7 @@ static void retract_page_tables(struct a
if (vma->vm_end < addr + HPAGE_PMD_SIZE)
continue;
mm = vma->vm_mm;
- pmd = mm_find_pmd(mm, addr);
- if (!pmd)
+ if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED)
continue;
/*
* We need exclusive mmap_lock to retract page table.
--- a/mm/ksm.c~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage
+++ a/mm/ksm.c
@@ -1134,6 +1134,7 @@ static int replace_page(struct vm_area_s
{
struct mm_struct *mm = vma->vm_mm;
pmd_t *pmd;
+ pmd_t pmde;
pte_t *ptep;
pte_t newpte;
spinlock_t *ptl;
@@ -1148,6 +1149,15 @@ static int replace_page(struct vm_area_s
pmd = mm_find_pmd(mm, addr);
if (!pmd)
goto out;
+ /*
+ * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at()
+ * without holding anon_vma lock for write. So when looking for a
+ * genuine pmde (in which to find pte), test present and !THP together.
+ */
+ pmde = *pmd;
+ barrier();
+ if (!pmd_present(pmde) || pmd_trans_huge(pmde))
+ goto out;
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
addr + PAGE_SIZE);
--- a/mm/rmap.c~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage
+++ a/mm/rmap.c
@@ -767,13 +767,17 @@ unsigned long page_address_in_vma(struct
return vma_address(page, vma);
}
+/*
+ * Returns the actual pmd_t* where we expect 'address' to be mapped from, or
+ * NULL if it doesn't exist. No guarantees / checks on what the pmd_t*
+ * represents.
+ */
pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
{
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd = NULL;
- pmd_t pmde;
pgd = pgd_offset(mm, address);
if (!pgd_present(*pgd))
@@ -788,15 +792,6 @@ pmd_t *mm_find_pmd(struct mm_struct *mm,
goto out;
pmd = pmd_offset(pud, address);
- /*
- * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at()
- * without holding anon_vma lock for write. So when looking for a
- * genuine pmde (in which to find pte), test present and !THP together.
- */
- pmde = *pmd;
- barrier();
- if (!pmd_present(pmde) || pmd_trans_huge(pmde))
- pmd = NULL;
out:
return pmd;
}
_
Patches currently in -mm which might be from zokeefe@google.com are
mm-shmem-add-flag-to-enforce-shmem-thp-in-hugepage_vma_check.patch
mm-khugepaged-attempt-to-map-file-shmem-backed-pte-mapped-thps-by-pmds.patch
mm-madvise-add-file-and-shmem-support-to-madv_collapse.patch
mm-khugepaged-add-tracepoint-to-hpage_collapse_scan_file.patch
selftests-vm-dedup-thp-helpers.patch
selftests-vm-modularize-thp-collapse-memory-operations.patch
selftests-vm-add-thp-collapse-file-and-tmpfs-testing.patch
selftests-vm-add-thp-collapse-shmem-testing.patch
selftests-vm-add-file-shmem-madv_collapse-selftest-for-cleared-pmd.patch
selftests-vm-add-selftest-for-madv_collapse-of-uffd-minor-memory.patch
reply other threads:[~2022-09-12 3:28 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220912032806.07C0AC433D7@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=James.Bottomley@HansenPartnership.com \
--cc=aarcange@redhat.com \
--cc=alex.shi@linux.alibaba.com \
--cc=arnd@arndb.de \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=axelrasmussen@google.com \
--cc=chris@zankel.net \
--cc=ckennelly@google.com \
--cc=dan.carpenter@oracle.com \
--cc=david@redhat.com \
--cc=deller@gmx.de \
--cc=hughd@google.com \
--cc=ink@jurassic.park.msu.ru \
--cc=jcmvbkbc@gmail.com \
--cc=jrdr.linux@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mattst88@gmail.com \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=mm-commits@vger.kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=rongwei.wang@linux.alibaba.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=songliubraving@fb.com \
--cc=tsbogend@alpha.franken.de \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.