From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,zhengtangquan@oppo.com,yosryahmed@google.com,ying.huang@intel.com,yangyicong@hisilicon.com,will@kernel.org,wangkefeng.wang@huawei.com,tglx@linutronix.de,shahuang@redhat.com,ryan.roberts@arm.com,paul.walmsley@sifive.com,palmer@dabbelt.com,mingo@redhat.com,mfo@canonical.com,mark.rutland@arm.com,lorenzo.stoakes@oracle.com,kirill.shutemov@linux.intel.com,kasong@tencent.com,ioworker0@gmail.com,hpa@zytor.com,gshan@redhat.com,david@redhat.com,dave.hansen@linux.intel.com,chrisl@kernel.org,catalin.marinas@arm.com,bp@alien8.de,baolin.wang@linux.alibaba.com,aou@eecs.berkeley.edu,anshuman.khandual@arm.com,v-songbaohua@oppo.com,akpm@linux-foundation.org
Subject: [merged mm-stable] mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one.patch removed from -mm tree
Date: Sun, 16 Mar 2025 22:11:37 -0700 [thread overview]
Message-ID: <20250317051137.8C4A8C4CEEC@smtp.kernel.org> (raw)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7492 bytes --]
The quilt patch titled
Subject: mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
has been removed from the -mm tree. Its filename was
mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Barry Song <v-songbaohua@oppo.com>
Subject: mm: set folio swapbacked iff folios are dirty in try_to_unmap_one
Date: Fri, 14 Feb 2025 22:30:12 +1300
Patch series "mm: batched unmap lazyfree large folios during reclamation",
v4.
Commit 735ecdfaf4e8 ("mm/vmscan: avoid split lazyfree THP during
shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in
madvise.c.
However, those folios are still added to the deferred_split list in
try_to_unmap_one() because we are unmapping PTEs and removing rmap entries
one by one.
Firstly, this has rendered the following counter somewhat confusing,
/sys/kernel/mm/transparent_hugepage/hugepages-size/stats/split_deferred
The split_deferred counter was originally designed to track operations
such as partial unmap or madvise of large folios. However, in practice,
most split_deferred cases arise from memory reclamation of aligned
lazyfree mTHPs as observed by Tangquan. This discrepancy has made the
split_deferred counter highly misleading.
Secondly, this approach is slow because it requires iterating through each
PTE and removing the rmap one by one for a large folio. In fact, all PTEs
of a pte-mapped large folio should be unmapped at once, and the entire
folio should be removed from the rmap as a whole.
Thirdly, it also increases the risk of a race condition where lazyfree
folios are incorrectly set back to swapbacked, as a speculative folio_get
may occur in the shrinker's callback.
deferred_split_scan() might call folio_try_get(folio) since we have added
the folio to split_deferred list while removing rmap for the 1st subpage,
and while we are scanning the 2nd to nr_pages PTEs of this folio in
try_to_unmap_one(), the entire mTHP could be transitioned back to
swap-backed because the reference count is incremented, which can make
"ref_count == 1 + map_count" within try_to_unmap_one() false.
/*
* The only page refs must be one from isolation
* plus the rmap(s) (dropped by discard:).
*/
if (ref_count == 1 + map_count &&
(!folio_test_dirty(folio) ||
...
(vma->vm_flags & VM_DROPPABLE))) {
dec_mm_counter(mm, MM_ANONPAGES);
goto discard;
}
This patchset resolves the issue by marking only genuinely dirty folios as
swap-backed, as suggested by David, and transitioning to batched unmapping
of entire folios in try_to_unmap_one(). Consequently, the deferred_split
count drops to zero, and memory reclamation performance improves
significantly — reclaiming 64KiB lazyfree large folios is now 2.5x
faster(The specific data is embedded in the changelog of patch 3/4).
By the way, while the patchset is primarily aimed at PTE-mapped large
folios, Baolin and Lance also found that try_to_unmap_one() handles
lazyfree redirtied PMD-mapped large folios inefficiently — it splits the
PMD into PTEs and iterates over them. This patchset removes the
unnecessary splitting, enabling us to skip redirtied PMD-mapped large
folios 3.5X faster during memory reclamation. (The specific data can be
found in the changelog of patch 4/4).
This patch (of 4):
The refcount may be temporarily or long-term increased, but this does not
change the fundamental nature of the folio already being lazy- freed.
Therefore, we only reset 'swapbacked' when we are certain the folio is
dirty and not droppable.
Link: https://lkml.kernel.org/r/20250214093015.51024-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20250214093015.51024-2-21cnbao@gmail.com
Fixes: 6c8e2a256915 ("mm: fix race between MADV_FREE reclaim and blkdev direct IO read")
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0@gmail.com>
Cc: Mauricio Faria de Oliveira <mfo@canonical.com>
Cc: Chis Li <chrisl@kernel.org> (Google)
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Shaoqin Huang <shahuang@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/rmap.c | 49 ++++++++++++++++++++++---------------------------
1 file changed, 22 insertions(+), 27 deletions(-)
--- a/mm/rmap.c~mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one
+++ a/mm/rmap.c
@@ -1963,34 +1963,29 @@ static bool try_to_unmap_one(struct foli
*/
smp_rmb();
- /*
- * The only page refs must be one from isolation
- * plus the rmap(s) (dropped by discard:).
- */
- if (ref_count == 1 + map_count &&
- (!folio_test_dirty(folio) ||
- /*
- * Unlike MADV_FREE mappings, VM_DROPPABLE
- * ones can be dropped even if they've
- * been dirtied.
- */
- (vma->vm_flags & VM_DROPPABLE))) {
- dec_mm_counter(mm, MM_ANONPAGES);
- goto discard;
- }
-
- /*
- * If the folio was redirtied, it cannot be
- * discarded. Remap the page to page table.
- */
- set_pte_at(mm, address, pvmw.pte, pteval);
- /*
- * Unlike MADV_FREE mappings, VM_DROPPABLE ones
- * never get swap backed on failure to drop.
- */
- if (!(vma->vm_flags & VM_DROPPABLE))
+ if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) {
+ /*
+ * redirtied either using the page table or a previously
+ * obtained GUP reference.
+ */
+ set_pte_at(mm, address, pvmw.pte, pteval);
folio_set_swapbacked(folio);
- goto walk_abort;
+ goto walk_abort;
+ } else if (ref_count != 1 + map_count) {
+ /*
+ * Additional reference. Could be a GUP reference or any
+ * speculative reference. GUP users must mark the folio
+ * dirty if there was a modification. This folio cannot be
+ * reclaimed right now either way, so act just like nothing
+ * happened.
+ * We'll come back here later and detect if the folio was
+ * dirtied when the additional reference is gone.
+ */
+ set_pte_at(mm, address, pvmw.pte, pteval);
+ goto walk_abort;
+ }
+ dec_mm_counter(mm, MM_ANONPAGES);
+ goto discard;
}
if (swap_duplicate(entry) < 0) {
_
Patches currently in -mm which might be from v-songbaohua@oppo.com are
reply other threads:[~2025-03-17 5:11 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250317051137.8C4A8C4CEEC@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=aou@eecs.berkeley.edu \
--cc=baolin.wang@linux.alibaba.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=chrisl@kernel.org \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=gshan@redhat.com \
--cc=hpa@zytor.com \
--cc=ioworker0@gmail.com \
--cc=kasong@tencent.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mark.rutland@arm.com \
--cc=mfo@canonical.com \
--cc=mingo@redhat.com \
--cc=mm-commits@vger.kernel.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=ryan.roberts@arm.com \
--cc=shahuang@redhat.com \
--cc=tglx@linutronix.de \
--cc=v-songbaohua@oppo.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=yangyicong@hisilicon.com \
--cc=ying.huang@intel.com \
--cc=yosryahmed@google.com \
--cc=zhengtangquan@oppo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.