From: Sasha Levin <sasha.levin@oracle.com>
To: stable@vger.kernel.org, stable-commits@vger.kernel.org
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Andi Kleen <andi@firstfloor.org>, Tony Luck <tony.luck@intel.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Sasha Levin <sasha.levin@oracle.com>
Subject: [added to the 4.1 stable tree] mm: soft-offline: don't free target page in successful page migration
Date: Thu, 19 May 2016 00:19:27 -0400 [thread overview]
Message-ID: <1463631606-32540-28-git-send-email-sasha.levin@oracle.com> (raw)
In-Reply-To: <1463631606-32540-1-git-send-email-sasha.levin@oracle.com>
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.
===============
[ Upstream commit add05cecef803f3372c5fc1d2a964171872daf9f ]
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger
VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():
Soft offlining page 0x70fe1 at 0x70100008d000
Soft offlining page 0x705fb at 0x70300008d000
page:ffffea0001c3f840 count:0 mapcount:0 mapping: (null) index:0x2
flags: 0x1fffff80800000(hwpoison)
page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
------------[ cut here ]------------
kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 #139
RIP: free_pcppages_bulk+0x52a/0x6f0
Call Trace:
drain_pages_zone+0x3d/0x50
drain_local_pages+0x1d/0x30
on_each_cpu_mask+0x46/0x80
drain_all_pages+0x14b/0x1e0
soft_offline_page+0x432/0x6e0
SyS_madvise+0x73c/0x780
system_call_fastpath+0x12/0x17
Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
RIP [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
RSP <ffff88007a117d28>
---[ end trace 53926436e76d1f35 ]---
When soft offline successfully migrates page, the source page is supposed
to be freed. But there is a race condition where a source page looks
isolated (i.e. the refcount is 0 and the PageHWPoison is set) but
somewhat linked to pcplist. Then another soft offline event calls
drain_all_pages() and tries to free such hwpoisoned page, which is
forbidden.
This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn(). But I don't want to play
with tweaking drain code as done in commit 9ab3b598d2df "mm: hwpoison:
drop lru_add_drain_all() in __soft_offline_page()", or to change page
freeing code for this soft offline's purpose.
Instead, let's think about the difference between hard offline and soft
offline. There is an interesting difference in how to isolate the in-use
page between these, that is, hard offline marks PageHWPoison of the target
page at first, and doesn't free it by keeping its refcount 1. OTOH, soft
offline tries to free the target page then marks PageHWPoison. This
difference might be the source of complexity and result in bugs like the
above. So making soft offline isolate with keeping refcount can be a
solution for this problem.
We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from
soft offline, which effectively does the isolation for soft offline. With
this change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
mm/memory-failure.c | 22 ----------------------
mm/migrate.c | 9 ++++++---
2 files changed, 6 insertions(+), 25 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e26bc59..7207c16 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1695,20 +1695,7 @@ static int __soft_offline_page(struct page *page, int flags)
if (ret > 0)
ret = -EIO;
} else {
- /*
- * After page migration succeeds, the source page can
- * be trapped in pagevec and actual freeing is delayed.
- * Freeing code works differently based on PG_hwpoison,
- * so there's a race. We need to make sure that the
- * source page should be freed back to buddy before
- * setting PG_hwpoison.
- */
- if (!is_free_buddy_page(page))
- drain_all_pages(page_zone(page));
SetPageHWPoison(page);
- if (!is_free_buddy_page(page))
- pr_info("soft offline: %#lx: page leaked\n",
- pfn);
atomic_long_inc(&num_poisoned_pages);
}
} else {
@@ -1760,14 +1747,6 @@ int soft_offline_page(struct page *page, int flags)
get_online_mems();
- /*
- * Isolate the page, so that it doesn't get reallocated if it
- * was free. This flag should be kept set until the source page
- * is freed and PG_hwpoison on it is set.
- */
- if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
- set_migratetype_isolate(page, true);
-
ret = get_any_page(page, pfn, flags);
put_online_mems();
if (ret > 0) { /* for in-use pages */
@@ -1786,6 +1765,5 @@ int soft_offline_page(struct page *page, int flags)
atomic_long_inc(&num_poisoned_pages);
}
}
- unset_migratetype_isolate(page, MIGRATE_MOVABLE);
return ret;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 8c4841a..1d42561 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -918,7 +918,8 @@ out:
static ICE_noinline int unmap_and_move(new_page_t get_new_page,
free_page_t put_new_page,
unsigned long private, struct page *page,
- int force, enum migrate_mode mode)
+ int force, enum migrate_mode mode,
+ enum migrate_reason reason)
{
int rc = 0;
int *result = NULL;
@@ -949,7 +950,8 @@ out:
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- putback_lru_page(page);
+ if (reason != MR_MEMORY_FAILURE)
+ putback_lru_page(page);
}
/*
@@ -1122,7 +1124,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
pass > 2, mode);
else
rc = unmap_and_move(get_new_page, put_new_page,
- private, page, pass > 2, mode);
+ private, page, pass > 2, mode,
+ reason);
switch(rc) {
case -ENOMEM:
--
2.5.0
next prev parent reply other threads:[~2016-05-19 4:21 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-19 4:19 [added to the 4.1 stable tree] Revert "usb: hub: do not clear BOS field during reset device" Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] stable: remove artifact created on backport Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] iwlwifi: pcie: lower the debug level for RSA semaphore access Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ASoC: rt5640: Correct the digital interface data select Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] regulator: s2mps11: Fix invalid selector mask and voltages for buck9 Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] libahci: save port map for forced port map Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ata: ahci-platform: Add ports-implemented DT bindings Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] iio: ak8975: Fix NULL pointer exception on early interrupt Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] efi: Fix out-of-bounds read in variable_matches() Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] USB: serial: cp210x: add ID for Link ECU Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] USB: serial: cp210x: add Straizona Focusers device ids Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] [media] v4l2-dv-timings.h: fix polarity for 4k formats Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] MD: make bio mergeable Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ALSA: hda - Add dock support for ThinkPad X260 Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] workqueue: fix ghost PENDING flag while doing MQ IO Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1() Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] drm/dp/mst: Restore primary hub guid on resume Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] cxl: Keep IRQ mappings on context teardown Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] drm/radeon: fix vertical bars appear on monitor (v2) Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] IB/security: Restrict use of the write() interface Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] mm: vmscan: reclaim highmem zone if buffer_heads is over limit Sasha Levin
2016-05-19 4:19 ` Sasha Levin [this message]
2016-05-19 4:19 ` [added to the 4.1 stable tree] mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_* Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2) Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] atomic_open(): fix the handling of create_error Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] Drivers: hv: ring_buffer.c: fix comment style Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] Drivers: hv_vmbus: Fix signal to host condition Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read() Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] powerpc: Fix bad inline asm constraint in create_zero_mask() Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] Minimal fix-up of bad hashing behavior of hash_64() Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] tracing: Don't display trigger file for events that can't be enabled Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] drm/radeon: make sure vertical front porch is at least 1 Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] MAINTAINERS: Remove asterisk from EFI directory names Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ACPICA: Dispatcher: Update thread ID for recursive method calls Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] crypto: hash - Fix page length clamping in hash walk Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] x86/sysfb_efi: Fix valid BAR address range check Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] fs/pnode.c: treat zero mnt_group_id-s as unequal Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] propogate_mnt: Handle the first propogated copy being a slave Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] drm/radeon: fix DP link training issue with second 4K monitor Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] mm, cma: prevent nr_isolated_* counters from going negative Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] get_rock_ridge_filename(): handle malformed NM entries Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ALSA: hda - Apply fix for white noise on Asus N550JV, too Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ALSA: hda - Fix white noise on Asus UX501VW headset Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] Input: max8997-haptic - fix NULL pointer dereference Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] drm/i915: Bail out of pipe config compute loop on LPT Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ALSA: hda - Fix broken reconfig Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ALSA: hda - Asus N750JV external subwoofer fixup Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ALSA: hda - Fix white noise on Asus N750JV headphone Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ALSA: hda - Fix subwoofer pin on ASUS N751 and N551 Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] ALSA: usb-audio: Yet another Phoneix Audio device quirk Sasha Levin
2016-05-19 4:19 ` [added to the 4.1 stable tree] perf/core: Disable the event on a truncated AUX record Sasha Levin
2016-05-23 6:59 ` Alexander Shishkin
2016-05-30 21:50 ` Sasha Levin
2016-05-19 4:20 ` [added to the 4.1 stable tree] tools lib traceevent: Do not reassign parg after collapse_tree() Sasha Levin
2016-05-19 4:20 ` [added to the 4.1 stable tree] workqueue: fix rebind bound workers warning Sasha Levin
2016-05-19 4:20 ` [added to the 4.1 stable tree] drm/radeon: fix DP mode validation Sasha Levin
2016-05-19 4:20 ` [added to the 4.1 stable tree] ocfs2: fix SGID not inherited issue Sasha Levin
2016-05-19 4:20 ` [added to the 4.1 stable tree] ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang Sasha Levin
2016-05-19 4:20 ` [added to the 4.1 stable tree] ocfs2: fix posix_acl_create deadlock Sasha Levin
2016-05-19 4:20 ` [added to the 4.1 stable tree] nf_conntrack: avoid kernel pointer value leak in slab name Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1463631606-32540-28-git-send-email-sasha.levin@oracle.com \
--to=sasha.levin@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=kirill@shutemov.name \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=stable-commits@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox