public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Mike Kravetz <mike.kravetz@oracle.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Michal Hocko <mhocko@kernel.org>,
	Jerome Glisse <jglisse@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH 4.4 34/52] hugetlb: take PMD sharing into account when flushing tlb/caches
Date: Mon,  6 Dec 2021 15:56:18 +0100	[thread overview]
Message-ID: <20211206145549.058512955@linuxfoundation.org> (raw)
In-Reply-To: <20211206145547.892668902@linuxfoundation.org>

From: Mike Kravetz <mike.kravetz@oracle.com>

commit dff11abe280b47c21b804a8ace318e0638bb9a49 upstream.

When fixing an issue with PMD sharing and migration, it was discovered via
code inspection that other callers of huge_pmd_unshare potentially have an
issue with cache and tlb flushing.

Use the routine adjust_range_if_pmd_sharing_possible() to calculate worst
case ranges for mmu notifiers.  Ensure that this range is flushed if
huge_pmd_unshare succeeds and unmaps a PUD_SUZE area.

Link: http://lkml.kernel.org/r/20180823205917.16297-3-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/hugetlb.c |   53 +++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 10 deletions(-)

--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3273,14 +3273,19 @@ void __unmap_hugepage_range(struct mmu_g
 	struct page *page;
 	struct hstate *h = hstate_vma(vma);
 	unsigned long sz = huge_page_size(h);
-	const unsigned long mmun_start = start;	/* For mmu_notifiers */
-	const unsigned long mmun_end   = end;	/* For mmu_notifiers */
+	unsigned long mmun_start = start;	/* For mmu_notifiers */
+	unsigned long mmun_end   = end;		/* For mmu_notifiers */
 
 	WARN_ON(!is_vm_hugetlb_page(vma));
 	BUG_ON(start & ~huge_page_mask(h));
 	BUG_ON(end & ~huge_page_mask(h));
 
 	tlb_start_vma(tlb, vma);
+
+	/*
+	 * If sharing possible, alert mmu notifiers of worst case.
+	 */
+	adjust_range_if_pmd_sharing_possible(vma, &mmun_start, &mmun_end);
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 	address = start;
 again:
@@ -3387,12 +3392,23 @@ void unmap_hugepage_range(struct vm_area
 {
 	struct mm_struct *mm;
 	struct mmu_gather tlb;
+	unsigned long tlb_start = start;
+	unsigned long tlb_end = end;
+
+	/*
+	 * If shared PMDs were possibly used within this vma range, adjust
+	 * start/end for worst case tlb flushing.
+	 * Note that we can not be sure if PMDs are shared until we try to
+	 * unmap pages.  However, we want to make sure TLB flushing covers
+	 * the largest possible range.
+	 */
+	adjust_range_if_pmd_sharing_possible(vma, &tlb_start, &tlb_end);
 
 	mm = vma->vm_mm;
 
-	tlb_gather_mmu(&tlb, mm, start, end);
+	tlb_gather_mmu(&tlb, mm, tlb_start, tlb_end);
 	__unmap_hugepage_range(&tlb, vma, start, end, ref_page);
-	tlb_finish_mmu(&tlb, start, end);
+	tlb_finish_mmu(&tlb, tlb_start, tlb_end);
 }
 
 /*
@@ -4068,11 +4084,21 @@ unsigned long hugetlb_change_protection(
 	pte_t pte;
 	struct hstate *h = hstate_vma(vma);
 	unsigned long pages = 0;
+	unsigned long f_start = start;
+	unsigned long f_end = end;
+	bool shared_pmd = false;
+
+	/*
+	 * In the case of shared PMDs, the area to flush could be beyond
+	 * start/end.  Set f_start/f_end to cover the maximum possible
+	 * range if PMD sharing is possible.
+	 */
+	adjust_range_if_pmd_sharing_possible(vma, &f_start, &f_end);
 
 	BUG_ON(address >= end);
-	flush_cache_range(vma, address, end);
+	flush_cache_range(vma, f_start, f_end);
 
-	mmu_notifier_invalidate_range_start(mm, start, end);
+	mmu_notifier_invalidate_range_start(mm, f_start, f_end);
 	i_mmap_lock_write(vma->vm_file->f_mapping);
 	for (; address < end; address += huge_page_size(h)) {
 		spinlock_t *ptl;
@@ -4083,6 +4109,7 @@ unsigned long hugetlb_change_protection(
 		if (huge_pmd_unshare(mm, &address, ptep)) {
 			pages++;
 			spin_unlock(ptl);
+			shared_pmd = true;
 			continue;
 		}
 		pte = huge_ptep_get(ptep);
@@ -4117,12 +4144,18 @@ unsigned long hugetlb_change_protection(
 	 * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare
 	 * may have cleared our pud entry and done put_page on the page table:
 	 * once we release i_mmap_rwsem, another task can do the final put_page
-	 * and that page table be reused and filled with junk.
+	 * and that page table be reused and filled with junk.  If we actually
+	 * did unshare a page of pmds, flush the range corresponding to the pud.
 	 */
-	flush_tlb_range(vma, start, end);
-	mmu_notifier_invalidate_range(mm, start, end);
+	if (shared_pmd) {
+		flush_tlb_range(vma, f_start, f_end);
+		mmu_notifier_invalidate_range(mm, f_start, f_end);
+	} else {
+		flush_tlb_range(vma, start, end);
+		mmu_notifier_invalidate_range(mm, start, end);
+	}
 	i_mmap_unlock_write(vma->vm_file->f_mapping);
-	mmu_notifier_invalidate_range_end(mm, start, end);
+	mmu_notifier_invalidate_range_end(mm, f_start, f_end);
 
 	return pages << h->order;
 }



  parent reply	other threads:[~2021-12-06 15:01 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-06 14:55 [PATCH 4.4 00/52] 4.4.294-rc1 review Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 01/52] staging: ion: Prevent incorrect reference counting behavour Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 02/52] USB: serial: option: add Telit LE910S1 0x9200 composition Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 03/52] USB: serial: option: add Fibocom FM101-GL variants Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 04/52] usb: hub: Fix usb enumeration issue due to address0 race Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 05/52] usb: hub: Fix locking issues with address0_mutex Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 06/52] binder: fix test regression due to sender_euid change Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 07/52] ALSA: ctxfi: Fix out-of-range access Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 08/52] staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect() Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 09/52] xen: dont continue xenstore initialization in case of errors Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 10/52] xen: detect uninitialized xenbus in xenbus_init Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 11/52] ARM: dts: BCM5301X: Add interrupt properties to GPIO node Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 12/52] ASoC: topology: Add missing rwsem around snd_ctl_remove() calls Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 13/52] net: ieee802154: handle iftypes as u32 Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 14/52] NFSv42: Dont fail clone() unless the OP_CLONE operation failed Greg Kroah-Hartman
2021-12-06 14:55 ` [PATCH 4.4 15/52] ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 16/52] scsi: mpt3sas: Fix kernel panic during drive powercycle test Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 17/52] tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 18/52] tracing: Check pid filtering when creating events Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 19/52] hugetlbfs: flush TLBs correctly after huge_pmd_unshare Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 20/52] proc/vmcore: fix clearing user buffer by properly using clear_user() Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 21/52] NFC: add NCI_UNREG flag to eliminate the race Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 22/52] fuse: fix page stealing Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 23/52] fuse: release pipe buf after last use Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 24/52] shm: extend forced shm destroy to support objects from several IPC nses Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 25/52] xen: sync include/xen/interface/io/ring.h with Xens newest version Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 26/52] xen/blkfront: read response from backend only once Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 27/52] xen/blkfront: dont take local copy of a request from the ring page Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 28/52] xen/blkfront: dont trust the backend response data blindly Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 29/52] xen/netfront: read response from backend only once Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 30/52] xen/netfront: dont read data from request on the ring page Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 31/52] xen/netfront: disentangle tx_skb_freelist Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 32/52] xen/netfront: dont trust the backend response data blindly Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 33/52] tty: hvc: replace BUG_ON() with negative return value Greg Kroah-Hartman
2021-12-06 14:56 ` Greg Kroah-Hartman [this message]
2021-12-06 14:56 ` [PATCH 4.4 35/52] net: return correct error code Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 36/52] platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 37/52] s390/setup: avoid using memblock_enforce_memory_limit Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 38/52] scsi: iscsi: Unblock session then wake up error handler Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 39/52] net: tulip: de4x5: fix the problem that the array lp->phy[8] may be out of bound Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 40/52] net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 41/52] kprobes: Limit max data_size of the kretprobe instances Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 42/52] sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 43/52] sata_fsl: fix warning in remove_proc_entry " Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 44/52] fs: add fget_many() and fput_many() Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 45/52] fget: check that the fd still exists after getting a ref to it Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 46/52] natsemi: xtensa: fix section mismatch warnings Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 47/52] net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings() Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 48/52] siphash: use _unaligned version by default Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 49/52] parisc: Fix "make install" on newer debian releases Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 50/52] vgacon: Propagate console boot parameters before calling `vc_resize Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 51/52] tty: serial: msm_serial: Deactivate RX DMA for polling support Greg Kroah-Hartman
2021-12-06 14:56 ` [PATCH 4.4 52/52] serial: pl011: Add ACPI SBSA UART match id Greg Kroah-Hartman
2021-12-06 19:34 ` [PATCH 4.4 00/52] 4.4.294-rc1 review Pavel Machek
2021-12-06 21:58 ` Shuah Khan
2021-12-07 20:39 ` Guenter Roeck
2021-12-08  4:15 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211206145549.058512955@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=jglisse@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox