From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Yang Shi <yang.shi@linux.alibaba.com>,
Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.com>,
Mel Gorman <mgorman@techsingularity.net>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.19 05/85] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
Date: Thu, 22 Aug 2019 10:18:38 -0700 [thread overview]
Message-ID: <20190822171731.233581237@linuxfoundation.org> (raw)
In-Reply-To: <20190822171731.012687054@linuxfoundation.org>
From: Yang Shi <yang.shi@linux.alibaba.com>
commit a53190a4aaa36494f4d7209fd1fcc6f2ee08e0e0 upstream.
When running syzkaller internally, we ran into the below bug on 4.9.x
kernel:
kernel BUG at mm/huge_memory.c:2124!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 1518 Comm: syz-executor107 Not tainted 4.9.168+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
task: ffff880067b34900 task.stack: ffff880068998000
RIP: split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
Call Trace:
split_huge_page include/linux/huge_mm.h:100 [inline]
queue_pages_pte_range+0x7e1/0x1480 mm/mempolicy.c:538
walk_pmd_range mm/pagewalk.c:50 [inline]
walk_pud_range mm/pagewalk.c:90 [inline]
walk_pgd_range mm/pagewalk.c:116 [inline]
__walk_page_range+0x44a/0xdb0 mm/pagewalk.c:208
walk_page_range+0x154/0x370 mm/pagewalk.c:285
queue_pages_range+0x115/0x150 mm/mempolicy.c:694
do_mbind mm/mempolicy.c:1241 [inline]
SYSC_mbind+0x3c3/0x1030 mm/mempolicy.c:1370
SyS_mbind+0x46/0x60 mm/mempolicy.c:1352
do_syscall_64+0x1d2/0x600 arch/x86/entry/common.c:282
entry_SYSCALL_64_after_swapgs+0x5d/0xdb
Code: c7 80 1c 02 00 e8 26 0a 76 01 <0f> 0b 48 c7 c7 40 46 45 84 e8 4c
RIP [<ffffffff81895d6b>] split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
RSP <ffff88006899f980>
with the below test:
uint64_t r[1] = {0xffffffffffffffff};
int main(void)
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
intptr_t res = 0;
res = syscall(__NR_socket, 0x11, 3, 0x300);
if (res != -1)
r[0] = res;
*(uint32_t*)0x20000040 = 0x10000;
*(uint32_t*)0x20000044 = 1;
*(uint32_t*)0x20000048 = 0xc520;
*(uint32_t*)0x2000004c = 1;
syscall(__NR_setsockopt, r[0], 0x107, 0xd, 0x20000040, 0x10);
syscall(__NR_mmap, 0x20fed000, 0x10000, 0, 0x8811, r[0], 0);
*(uint64_t*)0x20000340 = 2;
syscall(__NR_mbind, 0x20ff9000, 0x4000, 0x4002, 0x20000340, 0x45d4, 3);
return 0;
}
Actually the test does:
mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
socket(AF_PACKET, SOCK_RAW, 768) = 3
setsockopt(3, SOL_PACKET, PACKET_TX_RING, {block_size=65536, block_nr=1, frame_size=50464, frame_nr=1}, 16) = 0
mmap(0x20fed000, 65536, PROT_NONE, MAP_SHARED|MAP_FIXED|MAP_POPULATE|MAP_DENYWRITE, 3, 0) = 0x20fed000
mbind(..., MPOL_MF_STRICT|MPOL_MF_MOVE) = 0
The setsockopt() would allocate compound pages (16 pages in this test)
for packet tx ring, then the mmap() would call packet_mmap() to map the
pages into the user address space specified by the mmap() call.
When calling mbind(), it would scan the vma to queue the pages for
migration to the new node. It would split any huge page since 4.9
doesn't support THP migration, however, the packet tx ring compound
pages are not THP and even not movable. So, the above bug is triggered.
However, the later kernel is not hit by this issue due to commit
d44d363f6578 ("mm: don't assume anonymous pages have SwapBacked flag"),
which just removes the PageSwapBacked check for a different reason.
But, there is a deeper issue. According to the semantic of mbind(), it
should return -EIO if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified and
MPOL_MF_STRICT was also specified, but the kernel was unable to move all
existing pages in the range. The tx ring of the packet socket is
definitely not movable, however, mbind() returns success for this case.
Although the most socket file associates with non-movable pages, but XDP
may have movable pages from gup. So, it sounds not fine to just check
the underlying file type of vma in vma_migratable().
Change migrate_page_add() to check if the page is movable or not, if it
is unmovable, just return -EIO. But do not abort pte walk immediately,
since there may be pages off LRU temporarily. We should migrate other
pages if MPOL_MF_MOVE* is specified. Set has_unmovable flag if some
paged could not be not moved, then return -EIO for mbind() eventually.
With this change the above test would return -EIO as expected.
[yang.shi@linux.alibaba.com: fix review comments from Vlastimil]
Link: http://lkml.kernel.org/r/1563556862-54056-3-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1561162809-59140-3-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/mempolicy.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -403,7 +403,7 @@ static const struct mempolicy_operations
},
};
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags);
struct queue_pages {
@@ -463,12 +463,11 @@ static int queue_pages_pmd(pmd_t *pmd, s
flags = qp->flags;
/* go to thp migration */
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- if (!vma_migratable(walk->vma)) {
+ if (!vma_migratable(walk->vma) ||
+ migrate_page_add(page, qp->pagelist, flags)) {
ret = 1;
goto unlock;
}
-
- migrate_page_add(page, qp->pagelist, flags);
} else
ret = -EIO;
unlock:
@@ -532,7 +531,14 @@ static int queue_pages_pte_range(pmd_t *
has_unmovable = true;
break;
}
- migrate_page_add(page, qp->pagelist, flags);
+
+ /*
+ * Do not abort immediately since there may be
+ * temporary off LRU pages in the range. Still
+ * need migrate other LRU pages.
+ */
+ if (migrate_page_add(page, qp->pagelist, flags))
+ has_unmovable = true;
} else
break;
}
@@ -947,7 +953,7 @@ static long do_get_mempolicy(int *policy
/*
* page migration, thp tail pages can be passed.
*/
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags)
{
struct page *head = compound_head(page);
@@ -960,8 +966,19 @@ static void migrate_page_add(struct page
mod_node_page_state(page_pgdat(head),
NR_ISOLATED_ANON + page_is_file_cache(head),
hpage_nr_pages(head));
+ } else if (flags & MPOL_MF_STRICT) {
+ /*
+ * Non-movable page may reach here. And, there may be
+ * temporary off LRU pages or non-LRU movable pages.
+ * Treat them as unmovable pages since they can't be
+ * isolated, so they can't be moved at the moment. It
+ * should return -EIO for this case too.
+ */
+ return -EIO;
}
}
+
+ return 0;
}
/* page allocation callback for NUMA node migration */
@@ -1164,9 +1181,10 @@ static struct page *new_page(struct page
}
#else
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags)
{
+ return -EIO;
}
int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
next prev parent reply other threads:[~2019-08-22 17:32 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-22 17:18 [PATCH 4.19 00/85] 4.19.68-stable review Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 01/85] sh: kernel: hw_breakpoint: Fix missing break in switch statement Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 02/85] seq_file: fix problem when seeking mid-record Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 03/85] mm/hmm: fix bad subpage pointer in try_to_unmap_one Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 04/85] mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified Greg Kroah-Hartman
2019-08-22 17:18 ` Greg Kroah-Hartman [this message]
2019-08-22 17:18 ` [PATCH 4.19 06/85] mm/memcontrol.c: fix use after free in mem_cgroup_iter() Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 07/85] mm/usercopy: use memory range to be accessed for wraparound check Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 08/85] Revert "pwm: Set class for exported channels in sysfs" Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 09/85] cpufreq: schedutil: Dont skip freq update when limits change Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 10/85] xtensa: add missing isync to the cpu_reset TLB code Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 11/85] ALSA: hda/realtek - Add quirk for HP Envy x360 Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 12/85] ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 13/85] ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 14/85] ALSA: hda - Apply workaround for another AMD chip 1022:1487 Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 15/85] ALSA: hda - Fix a memory leak bug Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 16/85] ALSA: hda - Add a generic reboot_notify Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 17/85] ALSA: hda - Let all conexant codec enter D3 when rebooting Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 18/85] HID: holtek: test for sanity of intfdata Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 19/85] HID: hiddev: avoid opening a disconnected device Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 20/85] HID: hiddev: do cleanup in failure of opening a device Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 21/85] Input: kbtab - sanity check for endpoint type Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 22/85] Input: iforce - add sanity checks Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 23/85] net: usb: pegasus: fix improper read if get_registers() fail Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 24/85] netfilter: ebtables: also count base chain policies Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 25/85] riscv: Make __fstate_clean() work correctly Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 26/85] clk: at91: generated: Truncate divisor to GENERATED_MAX_DIV + 1 Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 27/85] clk: sprd: Select REGMAP_MMIO to avoid compile errors Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 28/85] clk: renesas: cpg-mssr: Fix reset control race condition Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 29/85] xen/pciback: remove set but not used variable old_state Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 30/85] irqchip/gic-v3-its: Free unused vpt_page when alloc vpe table fail Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 31/85] irqchip/irq-imx-gpcv2: Forward irq type to parent Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 32/85] perf header: Fix divide by zero error if f_header.attr_size==0 Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 33/85] perf header: Fix use of unitialized value warning Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 34/85] libata: zpodd: Fix small read overflow in zpodd_get_mech_type() Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 35/85] drm/bridge: lvds-encoder: Fix build error while CONFIG_DRM_KMS_HELPER=m Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 36/85] Btrfs: fix deadlock between fiemap and transaction commits Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 37/85] scsi: hpsa: correct scsi command status issue after reset Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 38/85] scsi: qla2xxx: Fix possible fcport null-pointer dereferences Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 39/85] drm/amdgpu: fix a potential information leaking bug Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 40/85] ata: libahci: do not complain in case of deferred probe Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 41/85] kbuild: modpost: handle KBUILD_EXTRA_SYMBOLS only for external modules Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 42/85] kbuild: Check for unknown options with cc-option usage in Kconfig and clang Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 43/85] arm64/efi: fix variable si set but not used Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 44/85] arm64: unwind: Prohibit probing on return_address() Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 45/85] arm64/mm: fix variable pud set but not used Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 46/85] IB/core: Add mitigation for Spectre V1 Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 47/85] IB/mlx5: Fix MR registration flow to use UMR properly Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 48/85] IB/mad: Fix use-after-free in ib mad completion handling Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 49/85] drm: msm: Fix add_gpu_components Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 50/85] drm/exynos: fix missing decrement of retry counter Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 51/85] Revert "kmemleak: allow to coexist with fault injection" Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 52/85] ocfs2: remove set but not used variable last_hash Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 53/85] asm-generic: fix -Wtype-limits compiler warnings Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 54/85] arm64: KVM: regmap: Fix unexpected switch fall-through Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 55/85] KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 56/85] staging: comedi: dt3000: Fix signed integer overflow divider * base Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 57/85] staging: comedi: dt3000: Fix rounding up of timer divisor Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 58/85] iio: adc: max9611: Fix temperature reading in probe Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 59/85] USB: core: Fix races in character device registration and deregistraion Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 60/85] usb: gadget: udc: renesas_usb3: Fix sysfs interface of "role" Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 61/85] usb: cdc-acm: make sure a refcount is taken early enough Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 62/85] USB: CDC: fix sanity checks in CDC union parser Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 63/85] USB: serial: option: add D-Link DWM-222 device ID Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 64/85] USB: serial: option: Add support for ZTE MF871A Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 65/85] USB: serial: option: add the BroadMobi BM818 card Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 66/85] USB: serial: option: Add Motorola modem UARTs Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 67/85] drm/i915/cfl: Add a new CFL PCI ID Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 68/85] dm: disable DISCARD if the underlying storage no longer supports it Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 69/85] arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 70/85] netfilter: conntrack: Use consistent ct id hash calculation Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 71/85] Input: psmouse - fix build error of multiple definition Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 72/85] iommu/amd: Move iommu_init_pci() to .init section Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 73/85] bnx2x: Fix VFs VLAN reconfiguration in reload Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 74/85] bonding: Add vlan tx offload to hw_enc_features Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 75/85] net: dsa: Check existence of .port_mdb_add callback before calling it Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 76/85] net/mlx4_en: fix a memory leak bug Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 77/85] net/packet: fix race in tpacket_snd() Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 78/85] sctp: fix memleak in sctp_send_reset_streams Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 79/85] sctp: fix the transport error_count check Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 80/85] team: Add vlan tx offload to hw_enc_features Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 81/85] tipc: initialise addr_trail_end when setting node addresses Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 82/85] xen/netback: Reset nr_frags before freeing skb Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 83/85] net/mlx5e: Only support tx/rx pause setting for port owner Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 84/85] net/mlx5e: Use flow keys dissector to parse packets for ARFS Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 85/85] mmc: sdhci-of-arasan: Do now show error message in case of deffered probe Greg Kroah-Hartman
2019-08-22 21:17 ` [PATCH 4.19 00/85] 4.19.68-stable review kernelci.org bot
2019-08-23 2:08 ` Jon Hunter
2019-08-23 8:06 ` Naresh Kamboju
2019-08-23 14:28 ` Guenter Roeck
2019-08-24 17:51 ` shuah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190822171731.233581237@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=yang.shi@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox