All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Andrea Arcangeli <aarcange@redhat.com>,
	Michal Hocko <mhocko@suse.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
	Hugh Dickins <hughd@google.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Peter Xu <peterx@redhat.com>, Jason Gunthorpe <jgg@mellanox.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ajay Kaher <akaher@vmware.com>
Subject: [PATCH 4.9 37/42] coredump: fix race condition between collapse_huge_page() and core dumping
Date: Mon,  5 Aug 2019 15:03:03 +0200	[thread overview]
Message-ID: <20190805124929.390092782@linuxfoundation.org> (raw)
In-Reply-To: <20190805124924.788666484@linuxfoundation.org>

From: Andrea Arcangeli <aarcange@redhat.com>

commit 59ea6d06cfa9247b586a695c21f94afa7183af74 upstream.

When fixing the race conditions between the coredump and the mmap_sem
holders outside the context of the process, we focused on
mmget_not_zero()/get_task_mm() callers in 04f5866e41fb70 ("coredump: fix
race condition between mmget_not_zero()/get_task_mm() and core
dumping"), but those aren't the only cases where the mmap_sem can be
taken outside of the context of the process as Michal Hocko noticed
while backporting that commit to older -stable kernels.

If mmgrab() is called in the context of the process, but then the
mm_count reference is transferred outside the context of the process,
that can also be a problem if the mmap_sem has to be taken for writing
through that mm_count reference.

khugepaged registration calls mmgrab() in the context of the process,
but the mmap_sem for writing is taken later in the context of the
khugepaged kernel thread.

collapse_huge_page() after taking the mmap_sem for writing doesn't
modify any vma, so it's not obvious that it could cause a problem to the
coredump, but it happens to modify the pmd in a way that breaks an
invariant that pmd_trans_huge_lock() relies upon.  collapse_huge_page()
needs the mmap_sem for writing just to block concurrent page faults that
call pmd_trans_huge_lock().

Specifically the invariant that "!pmd_trans_huge()" cannot become a
"pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.

The coredump will call __get_user_pages() without mmap_sem for reading,
which eventually can invoke a lockless page fault which will need a
functional pmd_trans_huge_lock().

So collapse_huge_page() needs to use mmget_still_valid() to check it's
not running concurrently with the coredump...  as long as the coredump
can invoke page faults without holding the mmap_sem for reading.

This has "Fixes: khugepaged" to facilitate backporting, but in my view
it's more a bug in the coredump code that will eventually have to be
rewritten to stop invoking page faults without the mmap_sem for reading.
So the long term plan is still to drop all mmget_still_valid().

Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
Fixes: ba76149f47d8 ("thp: khugepaged")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Ajay: Just adjusted to apply on v4.9]
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/mm.h |    4 ++++
 mm/khugepaged.c    |    3 +++
 2 files changed, 7 insertions(+)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1197,6 +1197,10 @@ void unmap_vmas(struct mmu_gather *tlb,
  * followed by taking the mmap_sem for writing before modifying the
  * vmas or anything the coredump pretends not to change from under it.
  *
+ * It also has to be called when mmgrab() is used in the context of
+ * the process, but then the mm_count refcount is transferred outside
+ * the context of the process to run down_write() on that pinned mm.
+ *
  * NOTE: find_extend_vma() called from GUP context is the only place
  * that can modify the "mm" (notably the vm_start/end) under mmap_sem
  * for reading and outside the context of the process, so it is also
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1004,6 +1004,9 @@ static void collapse_huge_page(struct mm
 	 * handled by the anon_vma lock + PG_lock.
 	 */
 	down_write(&mm->mmap_sem);
+	result = SCAN_ANY_PROCESS;
+	if (!mmget_still_valid(mm))
+		goto out;
 	result = hugepage_vma_revalidate(mm, address, &vma);
 	if (result)
 		goto out;



  parent reply	other threads:[~2019-08-05 13:06 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-05 13:02 [PATCH 4.9 00/42] 4.9.188-stable review Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 01/42] ARM: riscpc: fix DMA Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 02/42] ARM: dts: rockchip: Make rk3288-veyron-minnie run at hs200 Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 03/42] ARM: dts: rockchip: Make rk3288-veyron-mickeys emmc work again Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 04/42] ARM: dts: rockchip: Mark that the rk3288 timer might stop in suspend Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 05/42] ftrace: Enable trampoline when rec count returns back to one Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 06/42] kernel/module.c: Only return -EEXIST for modules that have finished loading Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 07/42] MIPS: lantiq: Fix bitfield masking Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 08/42] dmaengine: rcar-dmac: Reject zero-length slave DMA requests Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 09/42] fs/adfs: super: fix use-after-free bug Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 10/42] btrfs: fix minimum number of chunk errors for DUP Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 11/42] ceph: fix improper use of smp_mb__before_atomic() Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 12/42] ceph: return -ERANGE if virtual xattr value didnt fit in buffer Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 13/42] scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 14/42] ACPI: fix false-positive -Wuninitialized warning Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 15/42] be2net: Signal that the device cannot transmit during reconfiguration Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 16/42] x86/apic: Silence -Wtype-limits compiler warnings Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 17/42] x86: math-emu: Hide clang warnings for 16-bit overflow Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 18/42] mm/cma.c: fail if fixed declaration cant be honored Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 19/42] coda: add error handling for fget Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 20/42] coda: fix build using bare-metal toolchain Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 21/42] uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 22/42] drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 23/42] ipc/mqueue.c: only perform resource calculation if user valid Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 24/42] x86/kvm: Dont call kvm_spurious_fault() from .fixup Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 25/42] x86, boot: Remove multiple copy of static function sanitize_boot_params() Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 26/42] kbuild: initialize CLANG_FLAGS correctly in the top Makefile Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 27/42] Btrfs: fix incremental send failure after deduplication Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 28/42] mmc: dw_mmc: Fix occasional hang after tuning on eMMC Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 29/42] gpiolib: fix incorrect IRQ requesting of an active-low lineevent Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 30/42] selinux: fix memory leak in policydb_init() Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 31/42] s390/dasd: fix endless loop after read unit address configuration Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 32/42] drivers/perf: arm_pmu: Fix failure path in PM notifier Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.9 33/42] xen/swiotlb: fix condition for calling xen_destroy_contiguous_region() Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.9 34/42] IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.9 35/42] coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.9 36/42] infiniband: fix race condition between infiniband mlx4, mlx5 driver " Greg Kroah-Hartman
2019-08-05 13:03 ` Greg Kroah-Hartman [this message]
2019-08-05 13:03 ` [PATCH 4.9 38/42] eeprom: at24: make spd world-readable again Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.9 39/42] Backport minimal compiler_attributes.h to support GCC 9 Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.9 40/42] include/linux/module.h: copy __init/__exit attrs to init/cleanup_module Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.9 41/42] objtool: Support GCC 9 cold subfunction naming scheme Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.9 42/42] x86, mm, gup: prevent get_page() race with munmap in paravirt guest Greg Kroah-Hartman
2019-08-05 17:35 ` [PATCH 4.9 00/42] 4.9.188-stable review kernelci.org bot
2019-08-05 20:11 ` Jari Ruusu
2019-08-05 20:18   ` Greg Kroah-Hartman
2019-08-06  4:20     ` Jari Ruusu
2019-08-06  4:20     ` Jari Ruusu
2019-08-06  5:16       ` Greg Kroah-Hartman
2019-11-21 20:54         ` Greg Kroah-Hartman
2019-08-06  1:10 ` shuah
2019-08-06  2:55 ` Naresh Kamboju
2019-08-06 15:49 ` Guenter Roeck
2019-08-06 18:29 ` Jon Hunter
2019-08-06 18:29   ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190805124929.390092782@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aarcange@redhat.com \
    --cc=akaher@vmware.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jgg@mellanox.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.