From: Sasha Levin <sashal@kernel.org>
To: stable@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Jerome Glisse <jglisse@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 4.19 01/57] mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition
Date: Sun, 4 Nov 2018 08:50:48 -0500 [thread overview]
Message-ID: <20181104135144.88324-1-sashal@kernel.org> (raw)
From: Andrea Arcangeli <aarcange@redhat.com>
[ Upstream commit d7c3393413fe7e7dc54498ea200ea94742d61e18 ]
Patch series "migrate_misplaced_transhuge_page race conditions".
Aaron found a new instance of the THP MADV_DONTNEED race against
pmdp_clear_flush* variants, that was apparently left unfixed.
While looking into the race found by Aaron, I may have found two more
issues in migrate_misplaced_transhuge_page.
These race conditions would not cause kernel instability, but they'd
corrupt userland data or leave data non zero after MADV_DONTNEED.
I did only minor testing, and I don't expect to be able to reproduce this
(especially the lack of ->invalidate_range before migrate_page_copy,
requires the latest iommu hardware or infiniband to reproduce). The last
patch is noop for x86 and it needs further review from maintainers of
archs that implement flush_cache_range() (not in CC yet).
To avoid confusion, it's not the first patch that introduces the bug fixed
in the second patch, even before removing the
pmdp_huge_clear_flush_notify, that _notify suffix was called after
migrate_page_copy already run.
This patch (of 3):
This is a corollary of ced108037c2aa ("thp: fix MADV_DONTNEED vs. numa
balancing race"), 58ceeb6bec8 ("thp: fix MADV_DONTNEED vs. MADV_FREE
race") and 5b7abeae3af8c ("thp: fix MADV_DONTNEED vs clear soft dirty
race).
When the above three fixes where posted Dave asked
https://lkml.kernel.org/r/929b3844-aec2-0111-fef7-8002f9d4e2b9@intel.com
but apparently this was missed.
The pmdp_clear_flush* in migrate_misplaced_transhuge_page() was introduced
in a54a407fbf7 ("mm: Close races between THP migration and PMD numa
clearing").
The important part of such commit is only the part where the page lock is
not released until the first do_huge_pmd_numa_page() finished disarming
the pagenuma/protnone.
The addition of pmdp_clear_flush() wasn't beneficial to such commit and
there's no commentary about such an addition either.
I guess the pmdp_clear_flush() in such commit was added just in case for
safety, but it ended up introducing the MADV_DONTNEED race condition found
by Aaron.
At that point in time nobody thought of such kind of MADV_DONTNEED race
conditions yet (they were fixed later) so the code may have looked more
robust by adding the pmdp_clear_flush().
This specific race condition won't destabilize the kernel, but it can
confuse userland because after MADV_DONTNEED the memory won't be zeroed
out.
This also optimizes the code and removes a superfluous TLB flush.
[akpm@linux-foundation.org: reflow comment to 80 cols, fix grammar and typo (beacuse)]
Link: http://lkml.kernel.org/r/20181013002430.698-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Aaron Tomlin <atomlin@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/migrate.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 84381b55b2bd..1f634b1563b6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2029,15 +2029,26 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
/*
- * Clear the old entry under pagetable lock and establish the new PTE.
- * Any parallel GUP will either observe the old page blocking on the
- * page lock, block on the page table lock or observe the new page.
- * The SetPageUptodate on the new page and page_add_new_anon_rmap
- * guarantee the copy is visible before the pagetable update.
+ * Overwrite the old entry under pagetable lock and establish
+ * the new PTE. Any parallel GUP will either observe the old
+ * page blocking on the page lock, block on the page table
+ * lock or observe the new page. The SetPageUptodate on the
+ * new page and page_add_new_anon_rmap guarantee the copy is
+ * visible before the pagetable update.
*/
flush_cache_range(vma, mmun_start, mmun_end);
page_add_anon_rmap(new_page, vma, mmun_start, true);
- pmdp_huge_clear_flush_notify(vma, mmun_start, pmd);
+ /*
+ * At this point the pmd is numa/protnone (i.e. non present) and the TLB
+ * has already been flushed globally. So no TLB can be currently
+ * caching this non present pmd mapping. There's no need to clear the
+ * pmd before doing set_pmd_at(), nor to flush the TLB after
+ * set_pmd_at(). Clearing the pmd here would introduce a race
+ * condition against MADV_DONTNEED, because MADV_DONTNEED only holds the
+ * mmap_sem for reading. If the pmd is set to NULL at any given time,
+ * MADV_DONTNEED won't wait on the pmd lock and it'll skip clearing this
+ * pmd.
+ */
set_pmd_at(mm, mmun_start, pmd, entry);
update_mmu_cache_pmd(vma, address, &entry);
@@ -2051,7 +2062,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
* No need to double call mmu_notifier->invalidate_range() callback as
* the above pmdp_huge_clear_flush_notify() did already call it.
*/
- mmu_notifier_invalidate_range_only_end(mm, mmun_start, mmun_end);
+ mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
/* Take an "isolate" reference and put new page on the LRU. */
get_page(new_page);
--
2.17.1
next reply other threads:[~2018-11-04 13:51 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-04 13:50 Sasha Levin [this message]
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 02/57] mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() Sasha Levin
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 03/57] mm: calculate deferred pages after skipping mirrored memory Sasha Levin
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 04/57] mm: don't raise MEMCG_OOM event due to failed high-order allocation Sasha Levin
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 05/57] mm/vmstat.c: assert that vmstat_text is in sync with stat_items_size Sasha Levin
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 06/57] userfaultfd: allow get_mempolicy(MPOL_F_NODE|MPOL_F_ADDR) to trigger userfaults Sasha Levin
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 07/57] mm: don't miss the last page because of round-off error Sasha Levin
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 08/57] mm: don't warn about large allocations for slab Sasha Levin
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 09/57] r8169: fix broken Wake-on-LAN from S5 (poweroff) Sasha Levin
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 10/57] powerpc/traps: restore recoverability of machine_check interrupts Sasha Levin
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 11/57] powerpc/64/module: REL32 relocation range check Sasha Levin
2018-11-04 13:50 ` [PATCH AUTOSEL 4.19 12/57] powerpc/mm: Fix page table dump to work on Radix Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 13/57] powerpc/mm: fix always true/false warning in slice.c Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 14/57] drm/amd/display: fix bug of accessing invalid memory Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 15/57] Input: wm97xx-ts - fix exit path Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 16/57] powerpc/Makefile: Fix PPC_BOOK3S_64 ASFLAGS Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 17/57] powerpc/eeh: Fix possible null deref in eeh_dump_dev_log() Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 18/57] tty: check name length in tty_find_polling_driver() Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 19/57] tracing/kprobes: Check the probe on unloaded module correctly Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 20/57] drm/nouveau/secboot/acr: fix memory leak Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 21/57] drm/amdgpu/powerplay: fix missing break in switch statements Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 22/57] ARM: imx_v6_v7_defconfig: Select CONFIG_TMPFS_POSIX_ACL Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 23/57] powerpc/nohash: fix undefined behaviour when testing page size support Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 24/57] drm/msm/gpu: fix parameters in function msm_gpu_crashstate_capture Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 25/57] drm/msm/disp/dpu: Use proper define for drm_encoder_init() 'encoder_type' Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 26/57] drm/msm: dpu: Allow planes to extend past active display Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 27/57] powerpc/mm: Don't report hugepage tables as memory leaks when using kmemleak Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 28/57] watchdog: lantiq: update register names to better match spec Sasha Levin
2018-11-05 22:26 ` Hauke Mehrtens
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 29/57] drm/omap: fix memory barrier bug in DMM driver Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 30/57] iio: adc: at91: fix wrong channel number in triggered buffer mode Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 31/57] iio: adc: at91: fix acking DRDY irq on simple conversions Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 32/57] drm/amd/display: Raise dispclk value for dce120 by 15% Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 33/57] drm/amd/display: fix gamma not being applied Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 34/57] drm/hisilicon: hibmc: Do not carry error code in HiBMC framebuffer pointer Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 35/57] media: pci: cx23885: handle adding to list failure Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 36/57] media: coda: don't overwrite h.264 profile_idc on decoder instance Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 37/57] iio: adc: imx25-gcq: Fix leak of device_node in mx25_gcq_setup_cfgs() Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 38/57] MIPS: kexec: Mark CPU offline before disabling local IRQ Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 39/57] powerpc/boot: Ensure _zimage_start is a weak symbol Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 40/57] powerpc/memtrace: Remove memory in chunks Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 41/57] MIPS/PCI: Call pcie_bus_configure_settings() to set MPS/MRRS Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 42/57] staging: erofs: fix a missing endian conversion Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 43/57] serial: 8250_of: Fix for lack of interrupt support Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 44/57] sc16is7xx: Fix for multi-channel stall Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 45/57] media: tvp5150: fix width alignment during set_selection() Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 46/57] powerpc/selftests: Wait all threads to join Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 47/57] staging:iio:ad7606: fix voltage scales Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 48/57] drm: rcar-du: Update Gen3 output limitations Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 49/57] drm/amdgpu: Fix SDMA TO after GPU reset v3 Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 50/57] staging: most: video: fix registration of an empty comp core_component Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 51/57] 9p locks: fix glock.client_id leak in do_lock Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 52/57] udf: Prevent write-unsupported filesystem to be remounted read-write Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 53/57] ARM: dts: imx6ull: keep IMX6UL_ prefix for signals on both i.MX6UL and i.MX6ULL Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 54/57] media: ov5640: fix mode change regression Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 55/57] 9p: clear dangling pointers in p9stat_free Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 56/57] drm/amdgpu: fix integer overflow test in amdgpu_bo_list_create() Sasha Levin
2018-11-04 13:51 ` [PATCH AUTOSEL 4.19 57/57] media: ov5640: fix restore of last mode set Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181104135144.88324-1-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=jglisse@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.