public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sasha.levin@oracle.com>
To: stable@vger.kernel.org, stable-commits@vger.kernel.org
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Andi Kleen <andi@firstfloor.org>,
	Dean Nelson <dnelson@redhat.com>, Tony Luck <tony.luck@intel.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Hugh Dickins <hughd@google.com>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sasha.levin@oracle.com>
Subject: [added to the 4.1 stable tree] mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*
Date: Thu, 19 May 2016 00:19:28 -0400	[thread overview]
Message-ID: <1463631606-32540-29-git-send-email-sasha.levin@oracle.com> (raw)
In-Reply-To: <1463631606-32540-1-git-send-email-sasha.levin@oracle.com>

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit f4c18e6f7b5bbb5b528b3334115806b0d76f50f9 ]

The race condition addressed in commit add05cecef80 ("mm: soft-offline:
don't free target page in successful page migration") was not closed
completely, because that can happen not only for soft-offline, but also
for hard-offline.  Consider that a slab page is about to be freed into
buddy pool, and then an uncorrected memory error hits the page just
after entering __free_one_page(), then VM_BUG_ON_PAGE(page->flags &
PAGE_FLAGS_CHECK_AT_PREP) is triggered, despite the fact that it's not
necessary because the data on the affected page is not consumed.

To solve it, this patch drops __PG_HWPOISON from page flag checks at
allocation/free time.  I think it's justified because __PG_HWPOISON
flags is defined to prevent the page from being reused, and setting it
outside the page's alloc-free cycle is a designed behavior (not a bug.)

For recent months, I was annoyed about BUG_ON when soft-offlined page
remains on lru cache list for a while, which is avoided by calling
put_page() instead of putback_lru_page() in page migration's success
path.  This means that this patch reverts a major change from commit
add05cecef80 about the new refcounting rule of soft-offlined pages, so
"reuse window" revives.  This will be closed by a subsequent patch.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 include/linux/page-flags.h | 10 +++++++---
 mm/huge_memory.c           |  7 +------
 mm/migrate.c               |  5 ++++-
 mm/page_alloc.c            |  4 ++++
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f34e040..41c9384 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -631,15 +631,19 @@ static inline void ClearPageSlabPfmemalloc(struct page *page)
 	 1 << PG_private | 1 << PG_private_2 | \
 	 1 << PG_writeback | 1 << PG_reserved | \
 	 1 << PG_slab	 | 1 << PG_swapcache | 1 << PG_active | \
-	 1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
+	 1 << PG_unevictable | __PG_MLOCKED | \
 	 __PG_COMPOUND_LOCK)
 
 /*
  * Flags checked when a page is prepped for return by the page allocator.
- * Pages being prepped should not have any flags set.  It they are set,
+ * Pages being prepped should not have these flags set.  It they are set,
  * there has been a kernel bug or struct page corruption.
+ *
+ * __PG_HWPOISON is exceptional because it needs to be kept beyond page's
+ * alloc-free cycle to prevent from reusing the page.
  */
-#define PAGE_FLAGS_CHECK_AT_PREP	((1 << NR_PAGEFLAGS) - 1)
+#define PAGE_FLAGS_CHECK_AT_PREP	\
+	(((1 << NR_PAGEFLAGS) - 1) & ~__PG_HWPOISON)
 
 #define PAGE_FLAGS_PRIVATE				\
 	(1 << PG_private | 1 << PG_private_2)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b807a8f..52975eb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1676,12 +1676,7 @@ static void __split_huge_page_refcount(struct page *page,
 		/* after clearing PageTail the gup refcount can be released */
 		smp_mb__after_atomic();
 
-		/*
-		 * retain hwpoison flag of the poisoned tail page:
-		 *   fix for the unsuitable process killed on Guest Machine(KVM)
-		 *   by the memory-failure.
-		 */
-		page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP | __PG_HWPOISON;
+		page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 		page_tail->flags |= (page->flags &
 				     ((1L << PG_referenced) |
 				      (1L << PG_swapbacked) |
diff --git a/mm/migrate.c b/mm/migrate.c
index 1d42561..fe71f91 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -950,7 +950,10 @@ out:
 		list_del(&page->lru);
 		dec_zone_page_state(page, NR_ISOLATED_ANON +
 				page_is_file_cache(page));
-		if (reason != MR_MEMORY_FAILURE)
+		/* Soft-offlined page shouldn't go through lru cache list */
+		if (reason == MR_MEMORY_FAILURE)
+			put_page(page);
+		else
 			putback_lru_page(page);
 	}
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 872b2ac..5519230 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -962,6 +962,10 @@ static inline int check_new_page(struct page *page)
 		bad_reason = "non-NULL mapping";
 	if (unlikely(atomic_read(&page->_count) != 0))
 		bad_reason = "nonzero _count";
+	if (unlikely(page->flags & __PG_HWPOISON)) {
+		bad_reason = "HWPoisoned (hardware-corrupted)";
+		bad_flags = __PG_HWPOISON;
+	}
 	if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) {
 		bad_reason = "PAGE_FLAGS_CHECK_AT_PREP flag set";
 		bad_flags = PAGE_FLAGS_CHECK_AT_PREP;
-- 
2.5.0


  parent reply	other threads:[~2016-05-19  4:21 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-19  4:19 [added to the 4.1 stable tree] Revert "usb: hub: do not clear BOS field during reset device" Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] stable: remove artifact created on backport Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] iwlwifi: pcie: lower the debug level for RSA semaphore access Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ASoC: rt5640: Correct the digital interface data select Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] regulator: s2mps11: Fix invalid selector mask and voltages for buck9 Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] libahci: save port map for forced port map Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ata: ahci-platform: Add ports-implemented DT bindings Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] iio: ak8975: Fix NULL pointer exception on early interrupt Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] efi: Fix out-of-bounds read in variable_matches() Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] USB: serial: cp210x: add ID for Link ECU Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] USB: serial: cp210x: add Straizona Focusers device ids Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] [media] v4l2-dv-timings.h: fix polarity for 4k formats Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] MD: make bio mergeable Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ALSA: hda - Add dock support for ThinkPad X260 Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] workqueue: fix ghost PENDING flag while doing MQ IO Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1() Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] drm/dp/mst: Restore primary hub guid on resume Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] cxl: Keep IRQ mappings on context teardown Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] drm/radeon: fix vertical bars appear on monitor (v2) Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] IB/security: Restrict use of the write() interface Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] mm: vmscan: reclaim highmem zone if buffer_heads is over limit Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] mm: soft-offline: don't free target page in successful page migration Sasha Levin
2016-05-19  4:19 ` Sasha Levin [this message]
2016-05-19  4:19 ` [added to the 4.1 stable tree] ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2) Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] atomic_open(): fix the handling of create_error Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] Drivers: hv: ring_buffer.c: fix comment style Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] Drivers: hv_vmbus: Fix signal to host condition Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read() Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] powerpc: Fix bad inline asm constraint in create_zero_mask() Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] Minimal fix-up of bad hashing behavior of hash_64() Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] tracing: Don't display trigger file for events that can't be enabled Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] drm/radeon: make sure vertical front porch is at least 1 Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] MAINTAINERS: Remove asterisk from EFI directory names Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ACPICA: Dispatcher: Update thread ID for recursive method calls Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] crypto: hash - Fix page length clamping in hash walk Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] x86/sysfb_efi: Fix valid BAR address range check Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] fs/pnode.c: treat zero mnt_group_id-s as unequal Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] propogate_mnt: Handle the first propogated copy being a slave Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] drm/radeon: fix DP link training issue with second 4K monitor Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] mm, cma: prevent nr_isolated_* counters from going negative Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] get_rock_ridge_filename(): handle malformed NM entries Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ALSA: hda - Apply fix for white noise on Asus N550JV, too Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ALSA: hda - Fix white noise on Asus UX501VW headset Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] Input: max8997-haptic - fix NULL pointer dereference Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] drm/i915: Bail out of pipe config compute loop on LPT Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ALSA: hda - Fix broken reconfig Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ALSA: hda - Asus N750JV external subwoofer fixup Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ALSA: hda - Fix white noise on Asus N750JV headphone Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ALSA: hda - Fix subwoofer pin on ASUS N751 and N551 Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] ALSA: usb-audio: Yet another Phoneix Audio device quirk Sasha Levin
2016-05-19  4:19 ` [added to the 4.1 stable tree] perf/core: Disable the event on a truncated AUX record Sasha Levin
2016-05-23  6:59   ` Alexander Shishkin
2016-05-30 21:50     ` Sasha Levin
2016-05-19  4:20 ` [added to the 4.1 stable tree] tools lib traceevent: Do not reassign parg after collapse_tree() Sasha Levin
2016-05-19  4:20 ` [added to the 4.1 stable tree] workqueue: fix rebind bound workers warning Sasha Levin
2016-05-19  4:20 ` [added to the 4.1 stable tree] drm/radeon: fix DP mode validation Sasha Levin
2016-05-19  4:20 ` [added to the 4.1 stable tree] ocfs2: fix SGID not inherited issue Sasha Levin
2016-05-19  4:20 ` [added to the 4.1 stable tree] ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang Sasha Levin
2016-05-19  4:20 ` [added to the 4.1 stable tree] ocfs2: fix posix_acl_create deadlock Sasha Levin
2016-05-19  4:20 ` [added to the 4.1 stable tree] nf_conntrack: avoid kernel pointer value leak in slab name Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1463631606-32540-29-git-send-email-sasha.levin@oracle.com \
    --to=sasha.levin@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=dnelson@redhat.com \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=rientjes@google.com \
    --cc=stable-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox