All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.19 31/38] mm: thp: fix MADV_REMOVE deadlock on shmem THP
Date: Mon,  8 Feb 2021 16:01:18 +0100	[thread overview]
Message-ID: <20210208145807.405105895@linuxfoundation.org> (raw)
In-Reply-To: <20210208145806.141056364@linuxfoundation.org>

From: Hugh Dickins <hughd@google.com>

commit 1c2f67308af4c102b4e1e6cd6f69819ae59408e0 upstream.

Sergey reported deadlock between kswapd correctly doing its usual
lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and
madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing
down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page).

This happened when shmem_fallocate(punch hole)'s unmap_mapping_range()
reaches zap_pmd_range()'s call to __split_huge_pmd().  The same deadlock
could occur when partially truncating a mapped huge tmpfs file, or using
fallocate(FALLOC_FL_PUNCH_HOLE) on it.

__split_huge_pmd()'s page lock was added in 5.8, to make sure that any
concurrent use of reuse_swap_page() (holding page lock) could not catch
the anon THP's mapcounts and swapcounts while they were being split.

Fortunately, reuse_swap_page() is never applied to a shmem or file THP
(not even by khugepaged, which checks PageSwapCache before calling), and
anonymous THPs are never created in shmem or file areas: so that
__split_huge_pmd()'s page lock can only be necessary for anonymous THPs,
on which there is no risk of deadlock with i_mmap_rwsem.

Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils
Fixes: c444eb564fb1 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/huge_memory.c |   37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2278,7 +2278,7 @@ void __split_huge_pmd(struct vm_area_str
 	spinlock_t *ptl;
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long haddr = address & HPAGE_PMD_MASK;
-	bool was_locked = false;
+	bool do_unlock_page = false;
 	pmd_t _pmd;
 
 	mmu_notifier_invalidate_range_start(mm, haddr, haddr + HPAGE_PMD_SIZE);
@@ -2291,7 +2291,6 @@ void __split_huge_pmd(struct vm_area_str
 	VM_BUG_ON(freeze && !page);
 	if (page) {
 		VM_WARN_ON_ONCE(!PageLocked(page));
-		was_locked = true;
 		if (page != pmd_page(*pmd))
 			goto out;
 	}
@@ -2300,19 +2299,29 @@ repeat:
 	if (pmd_trans_huge(*pmd)) {
 		if (!page) {
 			page = pmd_page(*pmd);
-			if (unlikely(!trylock_page(page))) {
-				get_page(page);
-				_pmd = *pmd;
-				spin_unlock(ptl);
-				lock_page(page);
-				spin_lock(ptl);
-				if (unlikely(!pmd_same(*pmd, _pmd))) {
-					unlock_page(page);
+			/*
+			 * An anonymous page must be locked, to ensure that a
+			 * concurrent reuse_swap_page() sees stable mapcount;
+			 * but reuse_swap_page() is not used on shmem or file,
+			 * and page lock must not be taken when zap_pmd_range()
+			 * calls __split_huge_pmd() while i_mmap_lock is held.
+			 */
+			if (PageAnon(page)) {
+				if (unlikely(!trylock_page(page))) {
+					get_page(page);
+					_pmd = *pmd;
+					spin_unlock(ptl);
+					lock_page(page);
+					spin_lock(ptl);
+					if (unlikely(!pmd_same(*pmd, _pmd))) {
+						unlock_page(page);
+						put_page(page);
+						page = NULL;
+						goto repeat;
+					}
 					put_page(page);
-					page = NULL;
-					goto repeat;
 				}
-				put_page(page);
+				do_unlock_page = true;
 			}
 		}
 		if (PageMlocked(page))
@@ -2322,7 +2331,7 @@ repeat:
 	__split_huge_pmd_locked(vma, pmd, haddr, freeze);
 out:
 	spin_unlock(ptl);
-	if (!was_locked && page)
+	if (do_unlock_page)
 		unlock_page(page);
 	/*
 	 * No need to double call mmu_notifier->invalidate_range() callback.



  parent reply	other threads:[~2021-02-08 16:13 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 15:00 [PATCH 4.19 00/38] 4.19.175-rc1 review Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 01/38] USB: serial: cp210x: add pid/vid for WSDA-200-USB Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 02/38] USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000 Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 03/38] USB: serial: option: Adding support for Cinterion MV31 Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 04/38] elfcore: fix building with clang Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 05/38] Input: i8042 - unbreak Pegatron C15B Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 06/38] rxrpc: Fix deadlock around release of dst cached on udp tunnel Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 07/38] arm64: dts: ls1046a: fix dcfg address range Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 08/38] net: lapb: Copy the skb before sending a packet Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 09/38] net: mvpp2: TCAM entry enable should be written after SRAM data Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 10/38] memblock: do not start bottom-up allocations with kernel_end Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 11/38] USB: gadget: legacy: fix an error code in eth_bind() Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.19 12/38] USB: usblp: dont call usb_set_interface if theres a single alt Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 13/38] usb: renesas_usbhs: Clear pipe running flag in usbhs_pkt_pop() Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 14/38] usb: dwc2: Fix endpoint direction check in ep_from_windex Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 15/38] usb: dwc3: fix clock issue during resume in OTG mode Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 16/38] ovl: fix dentry leak in ovl_get_redirect Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 17/38] mac80211: fix station rate table updates on assoc Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 18/38] kretprobe: Avoid re-registration of the same kretprobe earlier Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 19/38] genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 20/38] xhci: fix bounce buffer usage for non-sg list case Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 21/38] cifs: report error instead of invalid when revalidating a dentry fails Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 22/38] smb3: Fix out-of-bounds bug in SMB2_negotiate() Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 23/38] mmc: core: Limit retries when analyse of SDIO tuples fails Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 24/38] nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 25/38] KVM: SVM: Treat SVM as unsupported when running as an SEV guest Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 26/38] ARM: footbridge: fix dc21285 PCI configuration accessors Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 27/38] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 28/38] mm: hugetlb: fix a race between freeing and dissolving the page Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 29/38] mm: hugetlb: fix a race between isolating and freeing page Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 30/38] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Greg Kroah-Hartman
2021-02-08 15:01 ` Greg Kroah-Hartman [this message]
2021-02-08 15:01 ` [PATCH 4.19 32/38] x86/build: Disable CET instrumentation in the kernel Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 33/38] x86/apic: Add extra serialization for non-serializing MSRs Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 34/38] Input: xpad - sync supported devices with fork on GitHub Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 35/38] iommu/vt-d: Do not use flush-queue when caching-mode is on Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 36/38] md: Set prev_flush_start and flush_bio in an atomic way Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 37/38] net: ip_tunnel: fix mtu calculation Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.19 38/38] net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add Greg Kroah-Hartman
2021-02-08 18:34 ` [PATCH 4.19 00/38] 4.19.175-rc1 review Pavel Machek
2021-02-08 20:42 ` Shuah Khan
2021-02-09 11:00 ` Igor Torrente
2021-02-09 15:02 ` Naresh Kamboju
2021-02-09 18:11 ` Guenter Roeck
2021-02-10  1:16 ` Ross Schmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210208145807.405105895@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.