linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Keno Fischer <keno@juliacomputing.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Greg Thelen <gthelen@google.com>,
	Nicholas Piggin <npiggin@gmail.com>, Willy Tarreau <w@1wt.eu>,
	Oleg Nesterov <oleg@redhat.com>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@kernel.org>, Michal Hocko <mhocko@suse.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.9 12/66] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
Date: Tue, 31 Jan 2017 06:36:16 +0100	[thread overview]
Message-ID: <20170131053603.642507024@linuxfoundation.org> (raw)
In-Reply-To: <20170131053603.098140622@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Keno Fischer <keno@juliacomputing.com>

commit 8310d48b125d19fcd9521d83b8293e63eb1646aa upstream.

In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from
__get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
after a COW was resolved to setting the (newly introduced) FOLL_COW
instead.  Simultaneously, the check in gup.c was updated to still allow
writes with FOLL_FORCE set if FOLL_COW had also been set.

However, a similar check in huge_memory.c was forgotten.  As a result,
remote memory writes to ro regions of memory backed by transparent huge
pages cause an infinite loop in the kernel (handle_mm_fault sets
FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
true.

While in this state the process is stil SIGKILLable, but little else
works (e.g.  no ptrace attach, no other signals).  This is easily
reproduced with the following code (assuming thp are set to always):

    #include <assert.h>
    #include <fcntl.h>
    #include <stdint.h>
    #include <stdio.h>
    #include <string.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <unistd.h>

    #define TEST_SIZE 5 * 1024 * 1024

    int main(void) {
      int status;
      pid_t child;
      int fd = open("/proc/self/mem", O_RDWR);
      void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
                        MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
      assert(addr != MAP_FAILED);
      pid_t parent_pid = getpid();
      if ((child = fork()) == 0) {
        void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
                           MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
        assert(addr2 != MAP_FAILED);
        memset(addr2, 'a', TEST_SIZE);
        pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
        return 0;
      }
      assert(child == waitpid(child, &status, 0));
      assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
      return 0;
    }

Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
to the update in gup.c in the original commit.  The same pattern exists
in follow_devmap_pmd.  However, we should not be able to reach that
check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
ever do.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.com
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/huge_memory.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -772,6 +772,12 @@ struct page *follow_devmap_pmd(struct vm
 
 	assert_spin_locked(pmd_lockptr(mm, pmd));
 
+	/*
+	 * When we COW a devmap PMD entry, we split it into PTEs, so we should
+	 * not be in this function with `flags & FOLL_COW` set.
+	 */
+	WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set");
+
 	if (flags & FOLL_WRITE && !pmd_write(*pmd))
 		return NULL;
 
@@ -1118,6 +1124,16 @@ out_unlock:
 	return ret;
 }
 
+/*
+ * FOLL_FORCE can write to even unwritable pmd's, but only
+ * after we've gone through a COW cycle and they are dirty.
+ */
+static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
+{
+	return pmd_write(pmd) ||
+	       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
+}
+
 struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 				   unsigned long addr,
 				   pmd_t *pmd,
@@ -1128,7 +1144,7 @@ struct page *follow_trans_huge_pmd(struc
 
 	assert_spin_locked(pmd_lockptr(mm, pmd));
 
-	if (flags & FOLL_WRITE && !pmd_write(*pmd))
+	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags))
 		goto out;
 
 	/* Avoid dumping huge zero page */

  parent reply	other threads:[~2017-01-31  5:37 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-31  5:36 [PATCH 4.9 00/66] 4.9.7-stable review Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 01/66] fbdev: color map copying bounds checking Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 02/66] tile/ptrace: Preserve previous registers for short regset write Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 03/66] drm: Schedule the output_poll_work with 1s delay if we have delayed event Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 06/66] drm/vc4: Fix memory leak of the CRTC state Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 07/66] drm/vc4: Fix an integer overflow in temporary allocation layout Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 08/66] drm/vc4: Return -EINVAL on the overflow checks failing Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 09/66] drm/vc4: fix a bounds check Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 10/66] Revert "drm/radeon: always apply pci shutdown callbacks" Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 11/66] drm/atomic: clear out fence when duplicating state Greg Kroah-Hartman
2017-01-31  5:36 ` Greg Kroah-Hartman [this message]
2017-01-31  5:36 ` [PATCH 4.9 13/66] mm/mempolicy.c: do not put mempolicy before using its nodemask Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 14/66] mm, page_alloc: fix check for NULL preferred_zone Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 15/66] mm, page_alloc: fix fast-path race with cpuset update or removal Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 16/66] mm, page_alloc: move cpuset seqcount checking to slowpath Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 17/66] mm, page_alloc: fix premature OOM when racing with cpuset mems update Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 18/66] vring: Force use of DMA API for ARM-based systems with legacy devices Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 19/66] userns: Make ucounts lock irq-safe Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 20/66] sysctl: fix proc_doulongvec_ms_jiffies_minmax() Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 21/66] xfs: prevent quotacheck from overloading inode lru Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 22/66] ISDN: eicon: silence misleading array-bounds warning Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 23/66] Btrfs: remove old tree_root case in btrfs_read_locked_inode() Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 24/66] Btrfs: disable xattr operations on subvolume directories Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 25/66] Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operations Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 26/66] RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 27/66] s390/mm: Fix cmma unused transfer from pgste into pte Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 28/66] s390/ptrace: Preserve previous registers for short regset write Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 29/66] IB/cxgb3: fix misspelling in header guard Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 30/66] IB/iser: Fix sg_tablesize calculation Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 31/66] IB/srp: fix mr allocation when the device supports sg gaps Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 32/66] IB/srp: fix invalid indirect_sg_entries parameter value Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 34/66] can: ti_hecc: add missing prepare and unprepare of the clock Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 35/66] ARC: udelay: fix inline assembler by adding LP_COUNT to clobber list Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 36/66] ARC: [arcompact] handle unaligned access delay slot corner case Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 37/66] parisc: Dont use BITS_PER_LONG in userspace-exported swab.h header Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 38/66] nfs: Dont increment lock sequence ID after NFS4ERR_MOVED Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 39/66] NFSv4.1: Fix a deadlock in layoutget Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 40/66] NFSv4.0: always send mode in SETATTR after EXCLUSIVE4 Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 41/66] SUNRPC: cleanup ida information when removing sunrpc module Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 42/66] iw_cxgb4: free EQ queue memory on last deref Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 43/66] [media] pctv452e: move buffer to heap, no mutex Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 44/66] [media] v4l: tvp5150: Reset device at probe time, not in get/set format handlers Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 45/66] [media] v4l: tvp5150: Fix comment regarding output pin muxing Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 46/66] [media] v4l: tvp5150: Dont override output pinmuxing at stream on/off time Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 47/66] drm/i915: Clear ret before unbinding in i915_gem_evict_something() Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 48/66] drm/i915: prevent crash with .disable_display parameter Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 53/66] IB/umem: Release pid in error and ODP flow Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 54/66] IB/rxe: Fix rxe dev insertion to rxe_dev_list Greg Kroah-Hartman
2017-01-31  5:36 ` [PATCH 4.9 55/66] IB/rxe: Prevent from completer to operate on non valid QP Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 56/66] [media] s5k4ecgx: select CRC32 helper Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 57/66] pinctrl: broxton: Use correct PADCFGLOCK offset Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 58/66] pinctrl: uniphier: fix Ethernet (RMII) pin-mux setting for LD20 Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 59/66] pinctrl: baytrail: Rectify debounce support Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 60/66] memory_hotplug: make zone_can_shift() return a boolean value Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 61/66] virtio_mmio: Set DMA masks appropriately Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 62/66] platform/x86: mlx-platform: free first dev on error Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 63/66] platform/x86: intel_mid_powerbtn: Set IRQ_ONESHOT Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 64/66] mm, memcg: do not retry precharge charges Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 65/66] perf/core: Fix concurrent sys_perf_event_open() vs. move_group race Greg Kroah-Hartman
2017-01-31  5:37 ` [PATCH 4.9 66/66] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround Greg Kroah-Hartman
2017-01-31 17:21 ` [PATCH 4.9 00/66] 4.9.7-stable review Guenter Roeck
2017-01-31 20:16   ` Greg Kroah-Hartman
2017-01-31 22:06 ` Shuah Khan
2017-02-01  7:28   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170131053603.642507024@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=hughd@google.com \
    --cc=keescook@chromium.org \
    --cc=keno@juliacomputing.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npiggin@gmail.com \
    --cc=oleg@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).