From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, David Hildenbrand <david@redhat.com>,
Wupeng Ma <mawupeng1@huawei.com>, Ingo Molnar <mingo@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH 5.15 53/57] x86/mm/pat: fix VM_PAT handling in COW mappings
Date: Thu, 11 Apr 2024 11:58:01 +0200 [thread overview]
Message-ID: <20240411095409.593095914@linuxfoundation.org> (raw)
In-Reply-To: <20240411095407.982258070@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand <david@redhat.com>
commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 upstream.
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[ 301.565725] PKRU: 55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 301.567163] ? report_bug+0x171/0x1a0
[ 301.567466] ? handle_bug+0x3c/0x80
[ 301.567743] ? exc_invalid_op+0x17/0x70
[ 301.568038] ? asm_exc_invalid_op+0x1a/0x20
[ 301.568363] ? untrack_pfn+0xf4/0x100
[ 301.568660] ? untrack_pfn+0x65/0x100
[ 301.568947] unmap_single_vma+0xa6/0xe0
[ 301.569247] unmap_vmas+0xb5/0x190
[ 301.569532] exit_mmap+0xec/0x340
[ 301.569801] __mmput+0x3e/0x130
[ 301.570051] do_exit+0x305/0xaf0
...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Wupeng Ma <mawupeng1@huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/mm/pat/memtype.c | 49 ++++++++++++++++++++++++++++++++--------------
mm/memory.c | 4 +++
2 files changed, 39 insertions(+), 14 deletions(-)
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -989,6 +989,38 @@ static void free_pfn_range(u64 paddr, un
memtype_free(paddr, paddr + size);
}
+static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
+ pgprot_t *pgprot)
+{
+ unsigned long prot;
+
+ VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT));
+
+ /*
+ * We need the starting PFN and cachemode used for track_pfn_remap()
+ * that covered the whole VMA. For most mappings, we can obtain that
+ * information from the page tables. For COW mappings, we might now
+ * suddenly have anon folios mapped and follow_phys() will fail.
+ *
+ * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to
+ * detect the PFN. If we need the cachemode as well, we're out of luck
+ * for now and have to fail fork().
+ */
+ if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) {
+ if (pgprot)
+ *pgprot = __pgprot(prot);
+ return 0;
+ }
+ if (is_cow_mapping(vma->vm_flags)) {
+ if (pgprot)
+ return -EINVAL;
+ *paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
+ return 0;
+ }
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+
/*
* track_pfn_copy is called when vma that is covering the pfnmap gets
* copied through copy_page_range().
@@ -999,20 +1031,13 @@ static void free_pfn_range(u64 paddr, un
int track_pfn_copy(struct vm_area_struct *vma)
{
resource_size_t paddr;
- unsigned long prot;
unsigned long vma_size = vma->vm_end - vma->vm_start;
pgprot_t pgprot;
if (vma->vm_flags & VM_PAT) {
- /*
- * reserve the whole chunk covered by vma. We need the
- * starting address and protection from pte.
- */
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, &pgprot))
return -EINVAL;
- }
- pgprot = __pgprot(prot);
+ /* reserve the whole chunk covered by vma. */
return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}
@@ -1087,7 +1112,6 @@ void untrack_pfn(struct vm_area_struct *
unsigned long size)
{
resource_size_t paddr;
- unsigned long prot;
if (vma && !(vma->vm_flags & VM_PAT))
return;
@@ -1095,11 +1119,8 @@ void untrack_pfn(struct vm_area_struct *
/* free the chunk starting from pfn or the whole chunk */
paddr = (resource_size_t)pfn << PAGE_SHIFT;
if (!paddr && !size) {
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, NULL))
return;
- }
-
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5086,6 +5086,10 @@ int follow_phys(struct vm_area_struct *v
goto out;
pte = *ptep;
+ /* Never return PFNs of anon folios in COW mappings. */
+ if (vm_normal_page(vma, address, pte))
+ goto unlock;
+
if ((flags & FOLL_WRITE) && !pte_write(pte))
goto unlock;
next prev parent reply other threads:[~2024-04-11 10:51 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-11 9:57 [PATCH 5.15 00/57] 5.15.155-rc1 review Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 01/57] net: dsa: fix panic when DSA master device unbinds on shutdown Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 02/57] wifi: ath9k: fix LNA selection in ath_ant_try_scan() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 03/57] batman-adv: Return directly after a failed batadv_dat_select_candidates() in batadv_dat_forward_data() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 04/57] batman-adv: Improve exception handling in batadv_throw_uevent() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 05/57] VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 06/57] panic: Flush kernel log buffer at the end Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 07/57] cpuidle: Avoid potential overflow in integer multiplication Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 08/57] arm64: dts: rockchip: fix rk3328 hdmi ports node Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 09/57] arm64: dts: rockchip: fix rk3399 " Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 10/57] ionic: set adminq irq affinity Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 11/57] pstore/zone: Add a null pointer check to the psz_kmsg_read Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 12/57] tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 13/57] net: pcs: xpcs: Return EINVAL in the internal methods Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 14/57] wifi: ath11k: decrease MHI channel buffer length to 8KB Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 15/57] btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 16/57] btrfs: export: handle invalid inode or root reference in btrfs_get_parent() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 17/57] btrfs: send: handle path ref underflow in header iterate_inode_ref() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 18/57] net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 19/57] Bluetooth: btintel: Fix null ptr deref in btintel_read_version Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 20/57] Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 21/57] pinctrl: renesas: checker: Limit cfg reg enum checks to provided IDs Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 22/57] sysv: dont call sb_bread() with pointers_lock held Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 23/57] scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 24/57] isofs: handle CDs with bad root inode but good Joliet root directory Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 25/57] media: sta2x11: fix irq handler cast Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 26/57] ALSA: firewire-lib: handle quirk to calculate payload quadlets as data block counter Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 27/57] ext4: add a hint for block bitmap corrupt state in mb_groups Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 28/57] ext4: forbid commit inconsistent quota data when errors=remount-ro Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 29/57] drm/amd/display: Fix nanosec stat overflow Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 30/57] SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 31/57] Revert "ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default" Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 32/57] libperf evlist: Avoid out-of-bounds access Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 33/57] block: prevent division by zero in blk_rq_stat_sum() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 34/57] RDMA/cm: add timeout to cm_destroy_id wait Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 35/57] Input: allocate keycode for Display refresh rate toggle Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 36/57] platform/x86: touchscreen_dmi: Add an extra entry for a variant of the Chuwi Vi8 tablet Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 37/57] ktest: force $buildonly = 1 for make_warnings_file test type Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 38/57] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 39/57] tools: iio: replace seekdir() in iio_generic_buffer Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 40/57] usb: typec: tcpci: add generic tcpci fallback compatible Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 41/57] usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 42/57] ASoC: soc-core.c: Skip dummy codec when adding platforms Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 43/57] fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2 Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 44/57] drivers/nvme: Add quirks for device 126f:2262 Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 45/57] fbmon: prevent division by zero in fb_videomode_from_videomode() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 46/57] netfilter: nf_tables: release batch on table validation from abort path Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 47/57] netfilter: nf_tables: release mutex after nft_gc_seq_end " Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 48/57] netfilter: nf_tables: discard table flag update with pending basechain deletion Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 49/57] tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 50/57] gcc-plugins/stackleak: Ignore .noinstr.text and .entry.text Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 51/57] gcc-plugins/stackleak: Avoid .head.text section Greg Kroah-Hartman
2024-04-11 9:58 ` [PATCH 5.15 52/57] virtio: reenable config if freezing device failed Greg Kroah-Hartman
2024-04-11 9:58 ` Greg Kroah-Hartman [this message]
2024-04-11 9:58 ` [PATCH 5.15 54/57] randomize_kstack: Improve entropy diffusion Greg Kroah-Hartman
2024-04-11 9:58 ` [PATCH 5.15 55/57] platform/x86: intel-vbtn: Update tablet mode switch at end of probe Greg Kroah-Hartman
2024-04-11 9:58 ` [PATCH 5.15 56/57] Bluetooth: btintel: Fixe build regression Greg Kroah-Hartman
2024-04-11 9:58 ` [PATCH 5.15 57/57] VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() Greg Kroah-Hartman
2024-04-11 17:12 ` [PATCH 5.15 00/57] 5.15.155-rc1 review SeongJae Park
2024-04-11 18:36 ` Easwar Hariharan
2024-04-12 8:27 ` Greg Kroah-Hartman
2024-04-11 19:13 ` Florian Fainelli
2024-04-11 23:46 ` Shuah Khan
2024-04-12 6:40 ` Shreeya Patel
2024-04-12 7:28 ` Ron Economos
2024-04-12 8:03 ` Jon Hunter
2024-04-12 10:25 ` Harshit Mogalapalli
2024-04-12 10:50 ` Greg Kroah-Hartman
2024-04-12 15:57 ` Chuck Lever III
2024-04-12 20:06 ` Calum Mackay
2024-04-12 20:11 ` Harshit Mogalapalli
2024-04-12 20:23 ` Chuck Lever
2024-04-12 21:34 ` Harshit Mogalapalli
2024-04-13 15:56 ` Chuck Lever
2024-04-14 6:13 ` Greg Kroah-Hartman
2024-04-15 13:31 ` Chuck Lever
2024-04-12 18:24 ` Naresh Kamboju
2024-04-12 22:22 ` Kelsey Steele
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240411095409.593095914@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hpa@zytor.com \
--cc=luto@kernel.org \
--cc=mawupeng1@huawei.com \
--cc=mingo@kernel.org \
--cc=patches@lists.linux.dev \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.