From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@kernel.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Gerald Schaefer <gerald.schaefer@de.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.9 62/72] mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
Date: Thu, 6 Apr 2017 10:38:49 +0200 [thread overview]
Message-ID: <20170406083622.677103310@linuxfoundation.org> (raw)
In-Reply-To: <20170406083619.775985942@linuxfoundation.org>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
commit c9d398fa237882ea07167e23bcfc5e6847066518 upstream.
I found the race condition which triggers the following bug when
move_pages() and soft offline are called on a single hugetlb page
concurrently.
Soft offlining page 0x119400 at 0x700000000000
BUG: unable to handle kernel paging request at ffffea0011943820
IP: follow_huge_pmd+0x143/0x190
PGD 7ffd2067
PUD 7ffd1067
PMD 0
[61163.582052] Oops: 0000 [#1] SMP
Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check]
CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P OE 4.11.0-rc2-mm1+ #2
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:follow_huge_pmd+0x143/0x190
RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202
RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000
RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80
RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000
R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800
R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000
FS: 00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0
Call Trace:
follow_page_mask+0x270/0x550
SYSC_move_pages+0x4ea/0x8f0
SyS_move_pages+0xe/0x10
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fc976e03949
RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949
RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827
RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004
R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650
R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000
Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a
RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0
CR2: ffffea0011943820
---[ end trace e4f81353a2d23232 ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
This bug is triggered when pmd_present() returns true for non-present
hugetlb, so fixing the present check in follow_huge_pmd() prevents it.
Using pmd_present() to determine present/non-present for hugetlb is not
correct, because pmd_present() checks multiple bits (not only
_PAGE_PRESENT) for historical reason and it can misjudge hugetlb state.
Fixes: e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()")
Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/hugetlb.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4471,6 +4471,7 @@ follow_huge_pmd(struct mm_struct *mm, un
{
struct page *page = NULL;
spinlock_t *ptl;
+ pte_t pte;
retry:
ptl = pmd_lockptr(mm, pmd);
spin_lock(ptl);
@@ -4480,12 +4481,13 @@ retry:
*/
if (!pmd_huge(*pmd))
goto out;
- if (pmd_present(*pmd)) {
+ pte = huge_ptep_get((pte_t *)pmd);
+ if (pte_present(pte)) {
page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
if (flags & FOLL_GET)
get_page(page);
} else {
- if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pmd))) {
+ if (is_hugetlb_entry_migration(pte)) {
spin_unlock(ptl);
__migration_entry_wait(mm, (pte_t *)pmd, ptl);
goto retry;
next prev parent reply other threads:[~2017-04-06 8:43 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-06 8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 01/72] libceph: force GFP_NOIO for socket allocations Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 02/72] xen/setup: Dont relocate p2m over existing one Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 03/72] xfs: only update mount/resv fields on success in __xfs_ag_resv_init Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 04/72] xfs: use per-AG reservations for the finobt Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 05/72] xfs: pull up iolock from xfs_free_eofblocks() Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 06/72] xfs: sync eofblocks scans under iolock are livelock prone Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 07/72] xfs: fix eofblocks race with file extending async dio writes Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 08/72] xfs: fix toctou race when locking an inode to access the data map Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 09/72] xfs: fail _dir_open when readahead fails Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 10/72] xfs: filter out obviously bad btree pointers Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 11/72] xfs: check for obviously bad level values in the bmbt root Greg Kroah-Hartman
2017-04-06 8:37 ` [PATCH 4.9 12/72] xfs: verify free block header fields Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 13/72] xfs: allow unwritten extents in the CoW fork Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 14/72] xfs: mark speculative prealloc CoW fork extents unwritten Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 15/72] xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 16/72] xfs: update ctime and mtime on clone destinatation inodes Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 17/72] xfs: reject all unaligned direct writes to reflinked files Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 18/72] xfs: dont fail xfs_extent_busy allocation Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 19/72] xfs: handle indlen shortage on delalloc extent merge Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 20/72] xfs: split indlen reservations fairly when under reserved Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 21/72] xfs: fix uninitialized variable in _reflink_convert_cow Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 22/72] xfs: dont reserve blocks for right shift transactions Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 23/72] xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 24/72] xfs: tune down agno asserts in the bmap code Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 25/72] xfs: only reclaim unwritten COW extents periodically Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 26/72] xfs: fix and streamline error handling in xfs_end_io Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 27/72] xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 28/72] xfs: use iomap new flag for newly allocated delalloc blocks Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 29/72] xfs: try any AG when allocating the first btree block when reflinking Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 30/72] scsi: sg: check length passed to SG_NEXT_CMD_LEN Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 31/72] scsi: libsas: fix ata xfer length Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 32/72] scsi: scsi_dh_alua: Check scsi_device_get() return value Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 33/72] scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 35/72] ALSA: seq: Fix race during FIFO resize Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 36/72] ALSA: hda - fix a problem for lineout on a Dell AIO machine Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 37/72] ASoC: atmel-classd: fix audio clock rate Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 38/72] ASoC: Intel: Skylake: fix invalid memory access due to wrong reference of pointer Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 39/72] HID: wacom: Dont add ghost interface as shared data Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 40/72] mmc: sdhci: Disable runtime pm when the sdio_irq is enabled Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 41/72] mmc: sdhci-of-at91: fix MMC_DDR_52 timing selection Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 42/72] NFSv4.1 fix infinite loop on IO BAD_STATEID error Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 43/72] nfsd: map the ENOKEY to nfserr_perm for avoiding warning Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 44/72] parisc: Clean up fixup routines for get_user()/put_user() Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 45/72] parisc: Avoid stalled CPU warnings after system shutdown Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 46/72] parisc: Fix access fault handling in pa_memcpy() Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 47/72] ACPI: Fix incompatibility with mcount-based function graph tracing Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 48/72] ACPI: Do not create a platform_device for IOAPIC/IOxAPIC Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 49/72] tty/serial: atmel: fix race condition (TX+DMA) Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 50/72] tty/serial: atmel: fix TX path in atmel_console_write() Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 51/72] USB: fix linked-list corruption in rh_call_control() Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 54/72] KVM: kvm_io_bus_unregister_dev() should never fail Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 56/72] drm/vc4: Allocate the right amount of space for boot-time CRTC state Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 57/72] drm/etnaviv: (re-)protect fence allocation with GPU mutex Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 58/72] x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 59/72] x86/mce: Fix copy/paste error in exception table entries Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 61/72] mm: rmap: fix huge file mmap accounting in the memcg stats Greg Kroah-Hartman
2017-04-06 8:38 ` Greg Kroah-Hartman [this message]
2017-04-06 8:38 ` [PATCH 4.9 64/72] qla2xxx: Allow vref count to timeout on vport delete Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 66/72] MIPS: Lantiq: Fix cascaded IRQ setup Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 67/72] mm: workingset: fix premature shadow node shrinking with cgroups Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 68/72] blk: improve order of bio handling in generic_make_request() Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 69/72] blk: Ensure users for current->bio_list can see the full list Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 70/72] padata: avoid race in reordering Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 71/72] nvme/core: Fix race kicking freed request_queue Greg Kroah-Hartman
2017-04-06 8:38 ` [PATCH 4.9 72/72] nvme/pci: Disable on removal when disconnected Greg Kroah-Hartman
2017-04-06 17:46 ` [PATCH 4.9 00/72] 4.9.21-stable review Shuah Khan
2017-04-06 21:52 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170406083622.677103310@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=borntraeger@de.ibm.com \
--cc=gerald.schaefer@de.ibm.com \
--cc=hillf.zj@alibaba-inc.com \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).