All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Alistair Popple <apopple@nvidia.com>, Jan Kara <jack@suse.cz>,
	Jue Wang <juew@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Oscar Salvador <osalvador@suse.de>, Peter Xu <peterx@redhat.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Shakeel Butt <shakeelb@google.com>,
	Wang Yugui <wangyugui@e16-tech.com>,
	Yang Shi <shy828301@gmail.com>, Zi Yan <ziy@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 07/34] mm/thp: fix vma_address() if virtual address below file offset
Date: Fri,  9 Jul 2021 15:20:23 +0200	[thread overview]
Message-ID: <20210709131648.837306574@linuxfoundation.org> (raw)
In-Reply-To: <20210709131644.969303901@linuxfoundation.org>

From: Hugh Dickins <hughd@google.com>

[ Upstream commit 494334e43c16d63b878536a26505397fce6ff3a2 ]

Running certain tests with a DEBUG_VM kernel would crash within hours,
on the total_mapcount BUG() in split_huge_page_to_list(), while trying
to free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which,
on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).

When that BUG() was changed to a WARN(), it would later crash on the
VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in
mm/internal.h:vma_address(), used by rmap_walk_file() for
try_to_unmap().

vma_address() is usually correct, but there's a wraparound case when the
vm_start address is unusually low, but vm_pgoff not so low:
vma_address() chooses max(start, vma->vm_start), but that decides on the
wrong address, because start has become almost ULONG_MAX.

Rewrite vma_address() to be more careful about vm_pgoff; move the
VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can
be safely used from page_mapped_in_vma() and page_address_in_vma() too.

Add vma_address_end() to apply similar care to end address calculation,
in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one();
though it raises a question of whether callers would do better to supply
pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch.

An irritation is that their apparent generality breaks down on KSM
pages, which cannot be located by the page->index that page_to_pgoff()
uses: as commit 4b0ece6fa016 ("mm: migrate: fix remove_migration_pte()
for ksm pages") once discovered.  I dithered over the best thing to do
about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both
vma_address() and vma_address_end(); though the only place in danger of
using it on them was try_to_unmap_one().

Sidenote: vma_address() and vma_address_end() now use compound_nr() on a
head page, instead of thp_size(): to make the right calculation on a
hugetlbfs page, whether or not THPs are configured.  try_to_unmap() is
used on hugetlbfs pages, but perhaps the wrong calculation never
mattered.

Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com
Fixes: a8fa41ad2f6f ("mm, rmap: check all VMAs that PTE-mapped THP can be part of")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jue Wang <juew@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Note on stable backport: fixed up conflicts on intervening thp_size(),
and mmu_notifier_range initializations; substitute for compound_nr().

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 mm/internal.h        | 53 ++++++++++++++++++++++++++++++++------------
 mm/page_vma_mapped.c | 16 +++++--------
 mm/rmap.c            | 14 ++++++------
 3 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 397183c8fe47..3a2e973138d3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -331,27 +331,52 @@ static inline void mlock_migrate_page(struct page *newpage, struct page *page)
 extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
 
 /*
- * At what user virtual address is page expected in @vma?
+ * At what user virtual address is page expected in vma?
+ * Returns -EFAULT if all of the page is outside the range of vma.
+ * If page is a compound head, the entire compound page is considered.
  */
 static inline unsigned long
-__vma_address(struct page *page, struct vm_area_struct *vma)
+vma_address(struct page *page, struct vm_area_struct *vma)
 {
-	pgoff_t pgoff = page_to_pgoff(page);
-	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+	pgoff_t pgoff;
+	unsigned long address;
+
+	VM_BUG_ON_PAGE(PageKsm(page), page);	/* KSM page->index unusable */
+	pgoff = page_to_pgoff(page);
+	if (pgoff >= vma->vm_pgoff) {
+		address = vma->vm_start +
+			((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+		/* Check for address beyond vma (or wrapped through 0?) */
+		if (address < vma->vm_start || address >= vma->vm_end)
+			address = -EFAULT;
+	} else if (PageHead(page) &&
+		   pgoff + (1UL << compound_order(page)) - 1 >= vma->vm_pgoff) {
+		/* Test above avoids possibility of wrap to 0 on 32-bit */
+		address = vma->vm_start;
+	} else {
+		address = -EFAULT;
+	}
+	return address;
 }
 
+/*
+ * Then at what user virtual address will none of the page be found in vma?
+ * Assumes that vma_address() already returned a good starting address.
+ * If page is a compound head, the entire compound page is considered.
+ */
 static inline unsigned long
-vma_address(struct page *page, struct vm_area_struct *vma)
+vma_address_end(struct page *page, struct vm_area_struct *vma)
 {
-	unsigned long start, end;
-
-	start = __vma_address(page, vma);
-	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
-
-	/* page should be within @vma mapping range */
-	VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma);
-
-	return max(start, vma->vm_start);
+	pgoff_t pgoff;
+	unsigned long address;
+
+	VM_BUG_ON_PAGE(PageKsm(page), page);	/* KSM page->index unusable */
+	pgoff = page_to_pgoff(page) + (1UL << compound_order(page));
+	address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+	/* Check for address beyond vma (or wrapped through 0?) */
+	if (address < vma->vm_start || address > vma->vm_end)
+		address = vma->vm_end;
+	return address;
 }
 
 #else /* !CONFIG_MMU */
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 08e283ad4660..bb7488e86bea 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -224,18 +224,18 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 	if (!map_pte(pvmw))
 		goto next_pte;
 	while (1) {
+		unsigned long end;
+
 		if (check_pte(pvmw))
 			return true;
 next_pte:
 		/* Seek to next pte only makes sense for THP */
 		if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page))
 			return not_found(pvmw);
+		end = vma_address_end(pvmw->page, pvmw->vma);
 		do {
 			pvmw->address += PAGE_SIZE;
-			if (pvmw->address >= pvmw->vma->vm_end ||
-			    pvmw->address >=
-					__vma_address(pvmw->page, pvmw->vma) +
-					hpage_nr_pages(pvmw->page) * PAGE_SIZE)
+			if (pvmw->address >= end)
 				return not_found(pvmw);
 			/* Did we cross page table boundary? */
 			if (pvmw->address % PMD_SIZE == 0) {
@@ -273,14 +273,10 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
 		.vma = vma,
 		.flags = PVMW_SYNC,
 	};
-	unsigned long start, end;
-
-	start = __vma_address(page, vma);
-	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
 
-	if (unlikely(end < vma->vm_start || start >= vma->vm_end))
+	pvmw.address = vma_address(page, vma);
+	if (pvmw.address == -EFAULT)
 		return 0;
-	pvmw.address = max(start, vma->vm_start);
 	if (!page_vma_mapped_walk(&pvmw))
 		return 0;
 	page_vma_mapped_walk_done(&pvmw);
diff --git a/mm/rmap.c b/mm/rmap.c
index 5df055654e63..ff56c460af53 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -686,7 +686,6 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags)
  */
 unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
 {
-	unsigned long address;
 	if (PageAnon(page)) {
 		struct anon_vma *page__anon_vma = page_anon_vma(page);
 		/*
@@ -701,10 +700,8 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
 			return -EFAULT;
 	} else
 		return -EFAULT;
-	address = __vma_address(page, vma);
-	if (unlikely(address < vma->vm_start || address >= vma->vm_end))
-		return -EFAULT;
-	return address;
+
+	return vma_address(page, vma);
 }
 
 pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
@@ -896,7 +893,7 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 	 * We have to assume the worse case ie pmd for invalidation. Note that
 	 * the page can not be free from this function.
 	 */
-	end = min(vma->vm_end, start + (PAGE_SIZE << compound_order(page)));
+	end = vma_address_end(page, vma);
 	mmu_notifier_invalidate_range_start(vma->vm_mm, start, end);
 
 	while (page_vma_mapped_walk(&pvmw)) {
@@ -1378,7 +1375,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	 * Note that the page can not be free in this function as call of
 	 * try_to_unmap() must hold a reference on the page.
 	 */
-	end = min(vma->vm_end, start + (PAGE_SIZE << compound_order(page)));
+	end = PageKsm(page) ?
+			address + PAGE_SIZE : vma_address_end(page, vma);
 	if (PageHuge(page)) {
 		/*
 		 * If sharing is possible, start and end will be adjusted
@@ -1835,6 +1833,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
 		struct vm_area_struct *vma = avc->vma;
 		unsigned long address = vma_address(page, vma);
 
+		VM_BUG_ON_VMA(address == -EFAULT, vma);
 		cond_resched();
 
 		if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
@@ -1889,6 +1888,7 @@ static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc,
 			pgoff_start, pgoff_end) {
 		unsigned long address = vma_address(page, vma);
 
+		VM_BUG_ON_VMA(address == -EFAULT, vma);
 		cond_resched();
 
 		if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
-- 
2.30.2




  parent reply	other threads:[~2021-07-09 13:21 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-09 13:20 [PATCH 4.19 00/34] 4.19.197-rc1 review Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 01/34] mm: add VM_WARN_ON_ONCE_PAGE() macro Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 02/34] mm/rmap: remove unneeded semicolon in page_not_mapped() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 03/34] mm/rmap: use page_not_mapped in try_to_unmap() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 04/34] mm/thp: fix __split_huge_pmd_locked() on shmem migration entry Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 05/34] mm/thp: make is_huge_zero_pmd() safe and quicker Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 06/34] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting Greg Kroah-Hartman
2021-07-09 13:20 ` Greg Kroah-Hartman [this message]
2021-07-09 13:20 ` [PATCH 4.19 08/34] mm/thp: fix page_address_in_vma() on file THP tails Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 09/34] mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 10/34] mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 11/34] mm: page_vma_mapped_walk(): use page for pvmw->page Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 12/34] mm: page_vma_mapped_walk(): settle PageHuge on entry Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 13/34] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 14/34] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 15/34] mm: page_vma_mapped_walk(): crossing page table boundary Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 16/34] mm: page_vma_mapped_walk(): add a level of indentation Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 17/34] mm: page_vma_mapped_walk(): use goto instead of while (1) Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 18/34] mm: page_vma_mapped_walk(): get vma_address_end() earlier Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 19/34] mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 20/34] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 21/34] mm, futex: fix shared futex pgoff on shmem huge page Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 22/34] scsi: sr: Return appropriate error code when disk is ejected Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 23/34] drm/nouveau: fix dma_address check for CPU/GPU sync Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 24/34] ext4: eliminate bogus error in ext4_data_block_valid_rcu() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 25/34] KVM: SVM: Periodically schedule when unregistering regions on destroy Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 26/34] ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 27/34] kthread_worker: split code for canceling the delayed work timer Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 28/34] kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 29/34] xen/events: reset active flag for lateeoi events later Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 30/34] KVM: SVM: Call SEV Guest Decommission if ASID binding fails Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 31/34] ARM: OMAP: replace setup_irq() by request_irq() Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 32/34] clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 33/34] clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue Greg Kroah-Hartman
2021-07-09 13:20 ` [PATCH 4.19 34/34] clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940 Greg Kroah-Hartman
2021-07-09 17:11 ` [PATCH 4.19 00/34] 4.19.197-rc1 review Jon Hunter
2021-07-09 21:43 ` Shuah Khan
2021-07-10 10:36 ` Sudip Mukherjee
2021-07-10 13:44 ` Naresh Kamboju
2021-07-10 19:51 ` Guenter Roeck
2021-07-11  7:59 ` Pavel Machek
2021-07-12  0:58 ` Samuel Zou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210709131648.837306574@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=juew@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=rcampbell@nvidia.com \
    --cc=sashal@kernel.org \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wangyugui@e16-tech.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.