public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	Alistair Popple <apopple@nvidia.com>, Jan Kara <jack@suse.cz>,
	Jue Wang <juew@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Oscar Salvador <osalvador@suse.de>, Peter Xu <peterx@redhat.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Shakeel Butt <shakeelb@google.com>,
	Wang Yugui <wangyugui@e16-tech.com>,
	Yang Shi <shy828301@gmail.com>, Zi Yan <ziy@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 06/34] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
Date: Fri,  9 Jul 2021 15:20:22 +0200	[thread overview]
Message-ID: <20210709131648.396300970@linuxfoundation.org> (raw)
In-Reply-To: <20210709131644.969303901@linuxfoundation.org>

From: Hugh Dickins <hughd@google.com>

[ Upstream commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 ]

Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE
(!unmap_success): with dump_page() showing mapcount:1, but then its raw
struct page output showing _mapcount ffffffff i.e.  mapcount 0.

And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed,
it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)),
and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG():
all indicative of some mapcount difficulty in development here perhaps.
But the !CONFIG_DEBUG_VM path handles the failures correctly and
silently.

I believe the problem is that once a racing unmap has cleared pte or
pmd, try_to_unmap_one() may skip taking the page table lock, and emerge
from try_to_unmap() before the racing task has reached decrementing
mapcount.

Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that
follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding
TTU_SYNC to the options, and passing that from unmap_page().

When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same
for both: the slight overhead added should rarely matter, except perhaps
if splitting sparsely-populated multiply-mapped shmem.  Once confident
that bugs are fixed, TTU_SYNC here can be removed, and the race
tolerated.

Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com
Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jue Wang <juew@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Note on stable backport: upstream TTU_SYNC 0x10 takes the value which
5.11 commit 013339df116c ("mm/rmap: always do TTU_IGNORE_ACCESS") freed.
It is very tempting to backport that commit (as 5.10 already did) and
make no change here; but on reflection, good as that commit is, I'm
reluctant to include any possible side-effect of it in this series.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/rmap.h |  3 ++-
 mm/huge_memory.c     |  2 +-
 mm/page_vma_mapped.c | 11 +++++++++++
 mm/rmap.c            | 17 ++++++++++++++++-
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index d7d6d4eb1794..91ccae946716 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -98,7 +98,8 @@ enum ttu_flags {
 					 * do a final flush if necessary */
 	TTU_RMAP_LOCKED		= 0x80,	/* do not grab rmap lock:
 					 * caller holds it */
-	TTU_SPLIT_FREEZE	= 0x100,		/* freeze pte under splitting thp */
+	TTU_SPLIT_FREEZE	= 0x100, /* freeze pte under splitting thp */
+	TTU_SYNC		= 0x200, /* avoid racy checks with PVMW_SYNC */
 };
 
 #ifdef CONFIG_MMU
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 82ed62775c00..78c1ad5f8109 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2430,7 +2430,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
 static void unmap_page(struct page *page)
 {
 	enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS |
-		TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
+		TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | TTU_SYNC;
 	bool unmap_success;
 
 	VM_BUG_ON_PAGE(!PageHead(page), page);
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 11df03e71288..08e283ad4660 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -208,6 +208,17 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 			pvmw->ptl = NULL;
 		}
 	} else if (!pmd_present(pmde)) {
+		/*
+		 * If PVMW_SYNC, take and drop THP pmd lock so that we
+		 * cannot return prematurely, while zap_huge_pmd() has
+		 * cleared *pmd but not decremented compound_mapcount().
+		 */
+		if ((pvmw->flags & PVMW_SYNC) &&
+		    PageTransCompound(pvmw->page)) {
+			spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
+
+			spin_unlock(ptl);
+		}
 		return false;
 	}
 	if (!map_pte(pvmw))
diff --git a/mm/rmap.c b/mm/rmap.c
index 70872d5b203c..5df055654e63 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1348,6 +1348,15 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	unsigned long start = address, end;
 	enum ttu_flags flags = (enum ttu_flags)arg;
 
+	/*
+	 * When racing against e.g. zap_pte_range() on another cpu,
+	 * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+	 * try_to_unmap() may return false when it is about to become true,
+	 * if page table locking is skipped: use TTU_SYNC to wait for that.
+	 */
+	if (flags & TTU_SYNC)
+		pvmw.flags = PVMW_SYNC;
+
 	/* munlock has nothing to gain from examining un-locked vmas */
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
 		return true;
@@ -1723,7 +1732,13 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags)
 	else
 		rmap_walk(page, &rwc);
 
-	return !page_mapcount(page) ? true : false;
+	/*
+	 * When racing against e.g. zap_pte_range() on another cpu,
+	 * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+	 * try_to_unmap() may return false when it is about to become true,
+	 * if page table locking is skipped: use TTU_SYNC to wait for that.
+	 */
+	return !page_mapcount(page);
 }
 
 /**
-- 
2.30.2




  parent reply	other threads:[~2021-07-09 13:21 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-09 13:20 [PATCH 4.19 00/34] 4.19.197-rc1 review Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 01/34] mm: add VM_WARN_ON_ONCE_PAGE() macro Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 02/34] mm/rmap: remove unneeded semicolon in page_not_mapped() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 03/34] mm/rmap: use page_not_mapped in try_to_unmap() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 04/34] mm/thp: fix __split_huge_pmd_locked() on shmem migration entry Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 05/34] mm/thp: make is_huge_zero_pmd() safe and quicker Greg Kroah-Hartman
2021-07-09 13:20 ` Greg Kroah-Hartman [this message]
2021-07-09 13:20 ` [PATCH 4.19 07/34] mm/thp: fix vma_address() if virtual address below file offset Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 08/34] mm/thp: fix page_address_in_vma() on file THP tails Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 09/34] mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 10/34] mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 11/34] mm: page_vma_mapped_walk(): use page for pvmw->page Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 12/34] mm: page_vma_mapped_walk(): settle PageHuge on entry Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 13/34] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 14/34] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 15/34] mm: page_vma_mapped_walk(): crossing page table boundary Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 16/34] mm: page_vma_mapped_walk(): add a level of indentation Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 17/34] mm: page_vma_mapped_walk(): use goto instead of while (1) Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 18/34] mm: page_vma_mapped_walk(): get vma_address_end() earlier Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 19/34] mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 20/34] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 21/34] mm, futex: fix shared futex pgoff on shmem huge page Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 22/34] scsi: sr: Return appropriate error code when disk is ejected Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 23/34] drm/nouveau: fix dma_address check for CPU/GPU sync Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 24/34] ext4: eliminate bogus error in ext4_data_block_valid_rcu() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 25/34] KVM: SVM: Periodically schedule when unregistering regions on destroy Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 26/34] ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 27/34] kthread_worker: split code for canceling the delayed work timer Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 28/34] kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 29/34] xen/events: reset active flag for lateeoi events later Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 30/34] KVM: SVM: Call SEV Guest Decommission if ASID binding fails Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 31/34] ARM: OMAP: replace setup_irq() by request_irq() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 32/34] clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 33/34] clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 34/34] clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940 Greg Kroah-Hartman
2021-07-09 17:11 ` [PATCH 4.19 00/34] 4.19.197-rc1 review Jon Hunter
2021-07-09 21:43 ` Shuah Khan
2021-07-10 10:36 ` Sudip Mukherjee
2021-07-10 13:44 ` Naresh Kamboju
2021-07-10 19:51 ` Guenter Roeck
2021-07-11  7:59 ` Pavel Machek
2021-07-12  0:58 ` Samuel Zou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210709131648.396300970@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=juew@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=rcampbell@nvidia.com \
    --cc=sashal@kernel.org \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wangyugui@e16-tech.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox