public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: David Hildenbrand <david@redhat.com>,
	Alistair Popple <apopple@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Johannes Weiner <hannes@cmpxchg.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	peterx@redhat.com,
	Muhammad Usama Anjum <usama.anjum@collabora.com>,
	Hugh Dickins <hughd@google.com>, Mike Rapoport <rppt@kernel.org>
Subject: [PATCH 4/4] mm: Make most walk page paths with pmd_trans_unstable() to retry
Date: Fri,  2 Jun 2023 19:05:52 -0400	[thread overview]
Message-ID: <20230602230552.350731-5-peterx@redhat.com> (raw)
In-Reply-To: <20230602230552.350731-1-peterx@redhat.com>

For most of the page walk paths, logically it'll always be good to have the
pmd retries if hit pmd_trans_unstable() race.  We can treat it as none
pmd (per comment above pmd_trans_unstable()), but in most cases we're not
even treating that as a none pmd.  If to fix it anyway, a retry will be the
most accurate.

I've went over all the pmd_trans_unstable() special cases and this patch
should cover all the rest places where we should retry properly with
unstable pmd.  With the newly introduced ACTION_AGAIN since 2020 we can
easily achieve that.

These are the call sites that I think should be fixed with it:

*** fs/proc/task_mmu.c:
smaps_pte_range[634]           if (pmd_trans_unstable(pmd))
clear_refs_pte_range[1194]     if (pmd_trans_unstable(pmd))
pagemap_pmd_range[1542]        if (pmd_trans_unstable(pmdp))
gather_pte_stats[1891]         if (pmd_trans_unstable(pmd))
*** mm/memcontrol.c:
mem_cgroup_count_precharge_pte_range[6024] if (pmd_trans_unstable(pmd))
mem_cgroup_move_charge_pte_range[6244] if (pmd_trans_unstable(pmd))
*** mm/memory-failure.c:
hwpoison_pte_range[794]        if (pmd_trans_unstable(pmdp))
*** mm/mempolicy.c:
queue_folios_pte_range[517]    if (pmd_trans_unstable(pmd))
*** mm/madvise.c:
madvise_cold_or_pageout_pte_range[425] if (pmd_trans_unstable(pmd))
madvise_free_pte_range[625]    if (pmd_trans_unstable(pmd))

IIUC most of them may or may not be a big issue even without a retry,
either because they're already not strict (smaps, pte_stats, MADV_COLD,
.. it can mean e.g. the statistic may be inaccurate or one less 2M chunk to
cold worst case), but some of them could have functional error without the
retry afaiu (e.g. pagemap, where we can have the output buffer shifted over
the unstable pmd range.. so IIUC the pagemap result can be wrong).

While these call sites all look fine, and don't need any change:

*** include/linux/pgtable.h:
pmd_devmap_trans_unstable[1418] return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
*** mm/gup.c:
follow_pmd_mask[695]           if (pmd_trans_unstable(pmd))
*** mm/mapping_dirty_helpers.c:
wp_clean_pmd_entry[131]        if (!pmd_trans_unstable(&pmdval))
*** mm/memory.c:
do_anonymous_page[4060]        if (unlikely(pmd_trans_unstable(vmf->pmd)))
*** mm/migrate_device.c:
migrate_vma_insert_page[616]   if (unlikely(pmd_trans_unstable(pmdp)))
*** mm/mincore.c:
mincore_pte_range[116]         if (pmd_trans_unstable(pmd)) {

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/proc/task_mmu.c  | 17 +++++++++++++----
 mm/madvise.c        |  8 ++++++--
 mm/memcontrol.c     |  8 ++++++--
 mm/memory-failure.c |  4 +++-
 mm/mempolicy.c      |  4 +++-
 5 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6259dd432eeb..823eaba5c6bf 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -631,8 +631,11 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		goto out;
 	}
 
-	if (pmd_trans_unstable(pmd))
+	if (pmd_trans_unstable(pmd)) {
+		walk->action = ACTION_AGAIN;
 		goto out;
+	}
+
 	/*
 	 * The mmap_lock held all the way back in m_start() is what
 	 * keeps khugepaged out of here and from collapsing things
@@ -1191,8 +1194,10 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 		return 0;
 	}
 
-	if (pmd_trans_unstable(pmd))
+	if (pmd_trans_unstable(pmd)) {
+		walk->action = ACTION_AGAIN;
 		return 0;
+	}
 
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
@@ -1539,8 +1544,10 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 		return err;
 	}
 
-	if (pmd_trans_unstable(pmdp))
+	if (pmd_trans_unstable(pmdp)) {
+		walk->action = ACTION_AGAIN;
 		return 0;
+	}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 	/*
@@ -1888,8 +1895,10 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 		return 0;
 	}
 
-	if (pmd_trans_unstable(pmd))
+	if (pmd_trans_unstable(pmd)) {
+		walk->action = ACTION_AGAIN;
 		return 0;
+	}
 #endif
 	orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
 	do {
diff --git a/mm/madvise.c b/mm/madvise.c
index 78cd12581628..0fd81712022c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -424,8 +424,10 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 	}
 
 regular_folio:
-	if (pmd_trans_unstable(pmd))
+	if (pmd_trans_unstable(pmd)) {
+		walk->action = ACTION_AGAIN;
 		return 0;
+	}
 #endif
 	tlb_change_page_size(tlb, PAGE_SIZE);
 	orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
@@ -626,8 +628,10 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 		if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next))
 			goto next;
 
-	if (pmd_trans_unstable(pmd))
+	if (pmd_trans_unstable(pmd)) {
+		walk->action = ACTION_AGAIN;
 		return 0;
+	}
 
 	tlb_change_page_size(tlb, PAGE_SIZE);
 	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6ee433be4c3b..15e50f033e41 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6021,8 +6021,10 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
 		return 0;
 	}
 
-	if (pmd_trans_unstable(pmd))
+	if (pmd_trans_unstable(pmd)) {
+		walk->action = ACTION_AGAIN;
 		return 0;
+	}
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	for (; addr != end; pte++, addr += PAGE_SIZE)
 		if (get_mctgt_type(vma, addr, *pte, NULL))
@@ -6241,8 +6243,10 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
 		return 0;
 	}
 
-	if (pmd_trans_unstable(pmd))
+	if (pmd_trans_unstable(pmd)) {
+		walk->action = ACTION_AGAIN;
 		return 0;
+	}
 retry:
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	for (; addr != end; addr += PAGE_SIZE) {
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 004a02f44271..c97fb2b7ab4a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -791,8 +791,10 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr,
 		goto out;
 	}
 
-	if (pmd_trans_unstable(pmdp))
+	if (pmd_trans_unstable(pmdp)) {
+		walk->action = ACTION_AGAIN;
 		goto out;
+	}
 
 	mapped_pte = ptep = pte_offset_map_lock(walk->vma->vm_mm, pmdp,
 						addr, &ptl);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f06ca8c18e62..af8907b4aad1 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -514,8 +514,10 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr,
 	if (ptl)
 		return queue_folios_pmd(pmd, ptl, addr, end, walk);
 
-	if (pmd_trans_unstable(pmd))
+	if (pmd_trans_unstable(pmd)) {
+		walk->action = ACTION_AGAIN;
 		return 0;
+	}
 
 	mapped_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
-- 
2.40.1


  parent reply	other threads:[~2023-06-02 23:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 23:05 [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry Peter Xu
2023-06-02 23:05 ` [PATCH 1/4] mm/mprotect: Retry on pmd_trans_unstable() Peter Xu
2023-06-03  2:04   ` Yang Shi
2023-06-04 23:58     ` Peter Xu
2023-06-02 23:05 ` [PATCH 2/4] mm/migrate: Unify and retry an unstable pmd when hit Peter Xu
2023-06-02 23:05 ` [PATCH 3/4] mm: Warn for unstable pmd in move_page_tables() Peter Xu
2023-06-02 23:05 ` Peter Xu [this message]
2023-06-05 18:46   ` [PATCH 4/4] mm: Make most walk page paths with pmd_trans_unstable() to retry Yang Shi
2023-06-05 19:20     ` Peter Xu
2023-06-06 19:12       ` Yang Shi
2023-06-06 19:59         ` Peter Xu
2023-06-07 13:49 ` [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry Peter Xu
2023-06-07 15:45   ` David Hildenbrand
2023-06-07 16:21     ` Peter Xu
2023-06-07 16:39       ` Yang Shi
2023-06-07 18:22         ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230602230552.350731-5-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=rppt@kernel.org \
    --cc=usama.anjum@collabora.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox