Archive-only list for patches
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, David Hildenbrand <david@redhat.com>,
	Wupeng Ma <mawupeng1@huawei.com>, Ingo Molnar <mingo@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH 5.15 53/57] x86/mm/pat: fix VM_PAT handling in COW mappings
Date: Thu, 11 Apr 2024 11:58:01 +0200	[thread overview]
Message-ID: <20240411095409.593095914@linuxfoundation.org> (raw)
In-Reply-To: <20240411095407.982258070@linuxfoundation.org>

5.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Hildenbrand <david@redhat.com>

commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 upstream.

PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios.  Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.

Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().

In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.

To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.

We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode.  We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.

For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.

Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():

<--- C reproducer --->
 #include <stdio.h>
 #include <sys/mman.h>
 #include <unistd.h>
 #include <liburing.h>

 int main(void)
 {
         struct io_uring_params p = {};
         int ring_fd;
         size_t size;
         char *map;

         ring_fd = io_uring_setup(1, &p);
         if (ring_fd < 0) {
                 perror("io_uring_setup");
                 return 1;
         }
         size = p.sq_off.array + p.sq_entries * sizeof(unsigned);

         /* Map the submission queue ring MAP_PRIVATE */
         map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
                    ring_fd, IORING_OFF_SQ_RING);
         if (map == MAP_FAILED) {
                 perror("mmap");
                 return 1;
         }

         /* We have at least one page. Let's COW it. */
         *map = 0;
         pause();
         return 0;
 }
<--- C reproducer --->

On a system with 16 GiB RAM and swap configured:
 # ./iouring &
 # memhog 16G
 # killall iouring
[  301.552930] ------------[ cut here ]------------
[  301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[  301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[  301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[  301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[  301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[  301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[  301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[  301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[  301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[  301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[  301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[  301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[  301.564186] FS:  0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[  301.564773] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[  301.565725] PKRU: 55555554
[  301.565944] Call Trace:
[  301.566148]  <TASK>
[  301.566325]  ? untrack_pfn+0xf4/0x100
[  301.566618]  ? __warn+0x81/0x130
[  301.566876]  ? untrack_pfn+0xf4/0x100
[  301.567163]  ? report_bug+0x171/0x1a0
[  301.567466]  ? handle_bug+0x3c/0x80
[  301.567743]  ? exc_invalid_op+0x17/0x70
[  301.568038]  ? asm_exc_invalid_op+0x1a/0x20
[  301.568363]  ? untrack_pfn+0xf4/0x100
[  301.568660]  ? untrack_pfn+0x65/0x100
[  301.568947]  unmap_single_vma+0xa6/0xe0
[  301.569247]  unmap_vmas+0xb5/0x190
[  301.569532]  exit_mmap+0xec/0x340
[  301.569801]  __mmput+0x3e/0x130
[  301.570051]  do_exit+0x305/0xaf0
...

Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Wupeng Ma <mawupeng1@huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/mm/pat/memtype.c |   49 ++++++++++++++++++++++++++++++++--------------
 mm/memory.c               |    4 +++
 2 files changed, 39 insertions(+), 14 deletions(-)

--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -989,6 +989,38 @@ static void free_pfn_range(u64 paddr, un
 		memtype_free(paddr, paddr + size);
 }
 
+static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
+		pgprot_t *pgprot)
+{
+	unsigned long prot;
+
+	VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT));
+
+	/*
+	 * We need the starting PFN and cachemode used for track_pfn_remap()
+	 * that covered the whole VMA. For most mappings, we can obtain that
+	 * information from the page tables. For COW mappings, we might now
+	 * suddenly have anon folios mapped and follow_phys() will fail.
+	 *
+	 * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to
+	 * detect the PFN. If we need the cachemode as well, we're out of luck
+	 * for now and have to fail fork().
+	 */
+	if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) {
+		if (pgprot)
+			*pgprot = __pgprot(prot);
+		return 0;
+	}
+	if (is_cow_mapping(vma->vm_flags)) {
+		if (pgprot)
+			return -EINVAL;
+		*paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
+		return 0;
+	}
+	WARN_ON_ONCE(1);
+	return -EINVAL;
+}
+
 /*
  * track_pfn_copy is called when vma that is covering the pfnmap gets
  * copied through copy_page_range().
@@ -999,20 +1031,13 @@ static void free_pfn_range(u64 paddr, un
 int track_pfn_copy(struct vm_area_struct *vma)
 {
 	resource_size_t paddr;
-	unsigned long prot;
 	unsigned long vma_size = vma->vm_end - vma->vm_start;
 	pgprot_t pgprot;
 
 	if (vma->vm_flags & VM_PAT) {
-		/*
-		 * reserve the whole chunk covered by vma. We need the
-		 * starting address and protection from pte.
-		 */
-		if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
-			WARN_ON_ONCE(1);
+		if (get_pat_info(vma, &paddr, &pgprot))
 			return -EINVAL;
-		}
-		pgprot = __pgprot(prot);
+		/* reserve the whole chunk covered by vma. */
 		return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
 	}
 
@@ -1087,7 +1112,6 @@ void untrack_pfn(struct vm_area_struct *
 		 unsigned long size)
 {
 	resource_size_t paddr;
-	unsigned long prot;
 
 	if (vma && !(vma->vm_flags & VM_PAT))
 		return;
@@ -1095,11 +1119,8 @@ void untrack_pfn(struct vm_area_struct *
 	/* free the chunk starting from pfn or the whole chunk */
 	paddr = (resource_size_t)pfn << PAGE_SHIFT;
 	if (!paddr && !size) {
-		if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
-			WARN_ON_ONCE(1);
+		if (get_pat_info(vma, &paddr, NULL))
 			return;
-		}
-
 		size = vma->vm_end - vma->vm_start;
 	}
 	free_pfn_range(paddr, size);
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5086,6 +5086,10 @@ int follow_phys(struct vm_area_struct *v
 		goto out;
 	pte = *ptep;
 
+	/* Never return PFNs of anon folios in COW mappings. */
+	if (vm_normal_page(vma, address, pte))
+		goto unlock;
+
 	if ((flags & FOLL_WRITE) && !pte_write(pte))
 		goto unlock;
 



  parent reply	other threads:[~2024-04-11 10:51 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11  9:57 [PATCH 5.15 00/57] 5.15.155-rc1 review Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 01/57] net: dsa: fix panic when DSA master device unbinds on shutdown Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 02/57] wifi: ath9k: fix LNA selection in ath_ant_try_scan() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 03/57] batman-adv: Return directly after a failed batadv_dat_select_candidates() in batadv_dat_forward_data() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 04/57] batman-adv: Improve exception handling in batadv_throw_uevent() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 05/57] VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 06/57] panic: Flush kernel log buffer at the end Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 07/57] cpuidle: Avoid potential overflow in integer multiplication Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 08/57] arm64: dts: rockchip: fix rk3328 hdmi ports node Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 09/57] arm64: dts: rockchip: fix rk3399 " Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 10/57] ionic: set adminq irq affinity Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 11/57] pstore/zone: Add a null pointer check to the psz_kmsg_read Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 12/57] tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 13/57] net: pcs: xpcs: Return EINVAL in the internal methods Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 14/57] wifi: ath11k: decrease MHI channel buffer length to 8KB Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 15/57] btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 16/57] btrfs: export: handle invalid inode or root reference in btrfs_get_parent() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 17/57] btrfs: send: handle path ref underflow in header iterate_inode_ref() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 18/57] net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 19/57] Bluetooth: btintel: Fix null ptr deref in btintel_read_version Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 20/57] Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 21/57] pinctrl: renesas: checker: Limit cfg reg enum checks to provided IDs Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 22/57] sysv: dont call sb_bread() with pointers_lock held Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 23/57] scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 24/57] isofs: handle CDs with bad root inode but good Joliet root directory Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 25/57] media: sta2x11: fix irq handler cast Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 26/57] ALSA: firewire-lib: handle quirk to calculate payload quadlets as data block counter Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 27/57] ext4: add a hint for block bitmap corrupt state in mb_groups Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 28/57] ext4: forbid commit inconsistent quota data when errors=remount-ro Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 29/57] drm/amd/display: Fix nanosec stat overflow Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 30/57] SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 31/57] Revert "ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default" Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 32/57] libperf evlist: Avoid out-of-bounds access Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 33/57] block: prevent division by zero in blk_rq_stat_sum() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 34/57] RDMA/cm: add timeout to cm_destroy_id wait Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 35/57] Input: allocate keycode for Display refresh rate toggle Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 36/57] platform/x86: touchscreen_dmi: Add an extra entry for a variant of the Chuwi Vi8 tablet Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 37/57] ktest: force $buildonly = 1 for make_warnings_file test type Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 38/57] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 39/57] tools: iio: replace seekdir() in iio_generic_buffer Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 40/57] usb: typec: tcpci: add generic tcpci fallback compatible Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 41/57] usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 42/57] ASoC: soc-core.c: Skip dummy codec when adding platforms Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 43/57] fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2 Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 44/57] drivers/nvme: Add quirks for device 126f:2262 Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 45/57] fbmon: prevent division by zero in fb_videomode_from_videomode() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 46/57] netfilter: nf_tables: release batch on table validation from abort path Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 47/57] netfilter: nf_tables: release mutex after nft_gc_seq_end " Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 48/57] netfilter: nf_tables: discard table flag update with pending basechain deletion Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 49/57] tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 50/57] gcc-plugins/stackleak: Ignore .noinstr.text and .entry.text Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 51/57] gcc-plugins/stackleak: Avoid .head.text section Greg Kroah-Hartman
2024-04-11  9:58 ` [PATCH 5.15 52/57] virtio: reenable config if freezing device failed Greg Kroah-Hartman
2024-04-11  9:58 ` Greg Kroah-Hartman [this message]
2024-04-11  9:58 ` [PATCH 5.15 54/57] randomize_kstack: Improve entropy diffusion Greg Kroah-Hartman
2024-04-11  9:58 ` [PATCH 5.15 55/57] platform/x86: intel-vbtn: Update tablet mode switch at end of probe Greg Kroah-Hartman
2024-04-11  9:58 ` [PATCH 5.15 56/57] Bluetooth: btintel: Fixe build regression Greg Kroah-Hartman
2024-04-11  9:58 ` [PATCH 5.15 57/57] VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() Greg Kroah-Hartman
2024-04-11 17:12 ` [PATCH 5.15 00/57] 5.15.155-rc1 review SeongJae Park
2024-04-11 18:36 ` Easwar Hariharan
2024-04-12  8:27   ` Greg Kroah-Hartman
2024-04-11 19:13 ` Florian Fainelli
2024-04-11 23:46 ` Shuah Khan
2024-04-12  6:40 ` Shreeya Patel
2024-04-12  7:28 ` Ron Economos
2024-04-12  8:03 ` Jon Hunter
2024-04-12 10:25 ` Harshit Mogalapalli
2024-04-12 10:50   ` Greg Kroah-Hartman
2024-04-12 15:57   ` Chuck Lever III
2024-04-12 20:06     ` Calum Mackay
2024-04-12 20:11     ` Harshit Mogalapalli
2024-04-12 20:23       ` Chuck Lever
2024-04-12 21:34         ` Harshit Mogalapalli
2024-04-13 15:56           ` Chuck Lever
2024-04-14  6:13             ` Greg Kroah-Hartman
2024-04-15 13:31               ` Chuck Lever
2024-04-12 18:24 ` Naresh Kamboju
2024-04-12 22:22 ` Kelsey Steele

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240411095409.593095914@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=luto@kernel.org \
    --cc=mawupeng1@huawei.com \
    --cc=mingo@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox