All of lore.kernel.org
 help / color / mirror / Atom feed
* (no subject)
@ 2025-04-02 11:37 Lorenzo Stoakes
  2025-04-03 15:14 ` [PATCH v3] x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range() Dan Carpenter
  0 siblings, 1 reply; 21+ messages in thread
From: Lorenzo Stoakes @ 2025-04-02 11:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kernel test robot, oe-kbuild, Dan Carpenter, linux-mm,
	linux-kernel, Liam R. Howlett, x86

Bcc:
Subject: Re: [PATCH v3] x86/mm/pat: Fix VM_PAT handling when fork() fails in
 copy_page_range()
Message-ID: <0f94adaf-37a4-4d38-b952-01c2dc474a2c@lucifer.local>
Reply-To:
In-Reply-To: <b21bcd61-faf0-4ad8-b644-99794794594f@redhat.com>

Actually let me +cc a few more so this isn't lost further :P

On Wed, Apr 02, 2025 at 01:32:52PM +0200, David Hildenbrand wrote:
> On 02.04.25 13:19, Lorenzo Stoakes wrote:
> > On Thu, Mar 27, 2025 at 09:59:02AM +0800, kernel test robot wrote:
> > > BCC: lkp@intel.com
> > > CC: oe-kbuild-all@lists.linux.dev
> > > In-Reply-To: <20250325191951.471185-1-david@redhat.com>
> > > References: <20250325191951.471185-1-david@redhat.com>
> > > TO: David Hildenbrand <david@redhat.com>
> > >
> > > Hi David,
> > >
> > > kernel test robot noticed the following build warnings:
> > >
> > > [auto build test WARNING on 38fec10eb60d687e30c8c6b5420d86e8149f7557]
> > >
> > > url:    https://github.com/intel-lab-lkp/linux/commits/David-Hildenbrand/x86-mm-pat-Fix-VM_PAT-handling-when-fork-fails-in-copy_page_range/20250326-032200
> > > base:   38fec10eb60d687e30c8c6b5420d86e8149f7557
> > > patch link:    https://lore.kernel.org/r/20250325191951.471185-1-david%40redhat.com
> > > patch subject: [PATCH v3] x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
> > > :::::: branch date: 31 hours ago
> > > :::::: commit date: 31 hours ago
> > > config: hexagon-randconfig-r073-20250327 (https://download.01.org/0day-ci/archive/20250327/202503270941.IFILyNCX-lkp@intel.com/config)
> > > compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project c2692afc0a92cd5da140dfcdfff7818a5b8ce997)
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <lkp@intel.com>
> > > | Reported-by: Dan Carpenter <error27@gmail.com>
> > > | Closes: https://lore.kernel.org/r/202503270941.IFILyNCX-lkp@intel.com/
> > >
> > > smatch warnings:
> > > mm/memory.c:1428 copy_page_range() error: uninitialized symbol 'pfn'.
>
> Huh,
>
> how did the original report not make it into my inbox ? :/

Yeah it's odd... maybe broken script?

>
> Thanks for replying Lorenzo!

NP!

>
> >
> > I have a feeling this is because if ndef __HAVE_PFNMAP_TRACKING you just
> > don't touch pfn at all, but also I see in the new track_pfn_copy() there
> > are code paths where pfn doesn't get set, but you still pass the
> > uninitialised pfn to untrack_pfn_copy()...
>
> If track_pfn_copy() returns 0 and VM_PAT applies, the pfn is set. Otherwise
> (returns an error), we immediately return from copy_page_range().
>
> So once we reach untrack_pfn_copy() ... the PFN was set.
>
> In case of !__HAVE_PFNMAP_TRACKING the pfn is not set and not used.
>
> >
> > I mean it could also be in the case of !(src_vma->vm_flags & VM_PAT) (but &
> > VM_PFNMAP), where we return 0 but still pass pfn to untrack_pfn_copy()...
>
> I assume that's what it is complaining about, and it doesn't figure out that
> the parameter is unused.
>
> So likely it's best to just initialize pfn to 0.
>
> >
> > This is all super icky, we probably want to actually have track_pfn_copy()
> > indicate whether we want to later untrack, not only if there's an error.
>
> Sounds overly-complicated. But having a pfn != 0 might work.
>
> > > Will comment accordingly on patch, but I mean I don't like the idea of
> us
> > just initialising the pfn here, because... what to?... :)
>

Sure, I mean for all of above let's have the debate on the main patch I guess so
it's in one place...

> Stared at that code for too long (and I reached a point where the PAT stuff
> absolutely annoys me).

But, also lol. Can. Relate.

>
> Thanks!
>
> --
> Cheers,
>
> David / dhildenb
>

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH v3] x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
@ 2025-03-27  1:59 kernel test robot
  2025-04-02 11:19 ` Lorenzo Stoakes
  0 siblings, 1 reply; 21+ messages in thread
From: kernel test robot @ 2025-03-27  1:59 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Dan Carpenter

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20250325191951.471185-1-david@redhat.com>
References: <20250325191951.471185-1-david@redhat.com>
TO: David Hildenbrand <david@redhat.com>

Hi David,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 38fec10eb60d687e30c8c6b5420d86e8149f7557]

url:    https://github.com/intel-lab-lkp/linux/commits/David-Hildenbrand/x86-mm-pat-Fix-VM_PAT-handling-when-fork-fails-in-copy_page_range/20250326-032200
base:   38fec10eb60d687e30c8c6b5420d86e8149f7557
patch link:    https://lore.kernel.org/r/20250325191951.471185-1-david%40redhat.com
patch subject: [PATCH v3] x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
:::::: branch date: 31 hours ago
:::::: commit date: 31 hours ago
config: hexagon-randconfig-r073-20250327 (https://download.01.org/0day-ci/archive/20250327/202503270941.IFILyNCX-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project c2692afc0a92cd5da140dfcdfff7818a5b8ce997)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Closes: https://lore.kernel.org/r/202503270941.IFILyNCX-lkp@intel.com/

smatch warnings:
mm/memory.c:1428 copy_page_range() error: uninitialized symbol 'pfn'.

vim +/pfn +1428 mm/memory.c

c56d1b62cce836 Peter Xu                      2022-05-12  1360  
c78f463649d60f Peter Xu                      2020-10-13  1361  int
c78f463649d60f Peter Xu                      2020-10-13  1362  copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
^1da177e4c3f41 Linus Torvalds                2005-04-16  1363  {
^1da177e4c3f41 Linus Torvalds                2005-04-16  1364  	pgd_t *src_pgd, *dst_pgd;
c78f463649d60f Peter Xu                      2020-10-13  1365  	unsigned long addr = src_vma->vm_start;
c78f463649d60f Peter Xu                      2020-10-13  1366  	unsigned long end = src_vma->vm_end;
c78f463649d60f Peter Xu                      2020-10-13  1367  	struct mm_struct *dst_mm = dst_vma->vm_mm;
c78f463649d60f Peter Xu                      2020-10-13  1368  	struct mm_struct *src_mm = src_vma->vm_mm;
ac46d4f3c43241 Jérôme Glisse                 2018-12-28  1369  	struct mmu_notifier_range range;
ebd5c0670e88d9 David Hildenbrand             2025-03-25  1370  	unsigned long next, pfn;
2ec74c3ef2d8c5 Sagi Grimberg                 2012-10-08  1371  	bool is_cow;
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1372  	int ret;
^1da177e4c3f41 Linus Torvalds                2005-04-16  1373  
c56d1b62cce836 Peter Xu                      2022-05-12  1374  	if (!vma_needs_copy(dst_vma, src_vma))
d992895ba2b27c Nicholas Piggin               2005-08-28  1375  		return 0;
d992895ba2b27c Nicholas Piggin               2005-08-28  1376  
c78f463649d60f Peter Xu                      2020-10-13  1377  	if (is_vm_hugetlb_page(src_vma))
bc70fbf269fdff Peter Xu                      2022-05-12  1378  		return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma);
^1da177e4c3f41 Linus Torvalds                2005-04-16  1379  
c78f463649d60f Peter Xu                      2020-10-13  1380  	if (unlikely(src_vma->vm_flags & VM_PFNMAP)) {
ebd5c0670e88d9 David Hildenbrand             2025-03-25  1381  		ret = track_pfn_copy(dst_vma, src_vma, &pfn);
2ab640379a0ab4 venkatesh.pallipadi@intel.com 2008-12-18  1382  		if (ret)
2ab640379a0ab4 venkatesh.pallipadi@intel.com 2008-12-18  1383  			return ret;
2ab640379a0ab4 venkatesh.pallipadi@intel.com 2008-12-18  1384  	}
2ab640379a0ab4 venkatesh.pallipadi@intel.com 2008-12-18  1385  
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1386  	/*
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1387  	 * We need to invalidate the secondary MMU mappings only when
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1388  	 * there could be a permission downgrade on the ptes of the
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1389  	 * parent mm. And a permission downgrade will only happen if
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1390  	 * is_cow_mapping() returns true.
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1391  	 */
c78f463649d60f Peter Xu                      2020-10-13  1392  	is_cow = is_cow_mapping(src_vma->vm_flags);
ac46d4f3c43241 Jérôme Glisse                 2018-12-28  1393  
ac46d4f3c43241 Jérôme Glisse                 2018-12-28  1394  	if (is_cow) {
7269f999934b28 Jérôme Glisse                 2019-05-13  1395  		mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
7d4a8be0c4b2b7 Alistair Popple               2023-01-10  1396  					0, src_mm, addr, end);
ac46d4f3c43241 Jérôme Glisse                 2018-12-28  1397  		mmu_notifier_invalidate_range_start(&range);
57efa1fe595769 Jason Gunthorpe               2020-12-14  1398  		/*
57efa1fe595769 Jason Gunthorpe               2020-12-14  1399  		 * Disabling preemption is not needed for the write side, as
57efa1fe595769 Jason Gunthorpe               2020-12-14  1400  		 * the read side doesn't spin, but goes to the mmap_lock.
57efa1fe595769 Jason Gunthorpe               2020-12-14  1401  		 *
57efa1fe595769 Jason Gunthorpe               2020-12-14  1402  		 * Use the raw variant of the seqcount_t write API to avoid
57efa1fe595769 Jason Gunthorpe               2020-12-14  1403  		 * lockdep complaining about preemptibility.
57efa1fe595769 Jason Gunthorpe               2020-12-14  1404  		 */
e727bfd5e73a35 Suren Baghdasaryan            2023-08-04  1405  		vma_assert_write_locked(src_vma);
57efa1fe595769 Jason Gunthorpe               2020-12-14  1406  		raw_write_seqcount_begin(&src_mm->write_protect_seq);
ac46d4f3c43241 Jérôme Glisse                 2018-12-28  1407  	}
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1408  
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1409  	ret = 0;
^1da177e4c3f41 Linus Torvalds                2005-04-16  1410  	dst_pgd = pgd_offset(dst_mm, addr);
^1da177e4c3f41 Linus Torvalds                2005-04-16  1411  	src_pgd = pgd_offset(src_mm, addr);
^1da177e4c3f41 Linus Torvalds                2005-04-16  1412  	do {
^1da177e4c3f41 Linus Torvalds                2005-04-16  1413  		next = pgd_addr_end(addr, end);
^1da177e4c3f41 Linus Torvalds                2005-04-16  1414  		if (pgd_none_or_clear_bad(src_pgd))
^1da177e4c3f41 Linus Torvalds                2005-04-16  1415  			continue;
c78f463649d60f Peter Xu                      2020-10-13  1416  		if (unlikely(copy_p4d_range(dst_vma, src_vma, dst_pgd, src_pgd,
c78f463649d60f Peter Xu                      2020-10-13  1417  					    addr, next))) {
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1418  			ret = -ENOMEM;
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1419  			break;
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1420  		}
^1da177e4c3f41 Linus Torvalds                2005-04-16  1421  	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1422  
57efa1fe595769 Jason Gunthorpe               2020-12-14  1423  	if (is_cow) {
57efa1fe595769 Jason Gunthorpe               2020-12-14  1424  		raw_write_seqcount_end(&src_mm->write_protect_seq);
ac46d4f3c43241 Jérôme Glisse                 2018-12-28  1425  		mmu_notifier_invalidate_range_end(&range);
57efa1fe595769 Jason Gunthorpe               2020-12-14  1426  	}
ebd5c0670e88d9 David Hildenbrand             2025-03-25  1427  	if (ret && unlikely(src_vma->vm_flags & VM_PFNMAP))
ebd5c0670e88d9 David Hildenbrand             2025-03-25 @1428  		untrack_pfn_copy(dst_vma, pfn);
cddb8a5c14aa89 Andrea Arcangeli              2008-07-28  1429  	return ret;
^1da177e4c3f41 Linus Torvalds                2005-04-16  1430  }
^1da177e4c3f41 Linus Torvalds                2005-04-16  1431  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 21+ messages in thread
* [PATCH v3] x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
@ 2025-03-25 19:19 David Hildenbrand
  2025-04-02 11:36 ` David Hildenbrand
  2025-04-02 11:59 ` Lorenzo Stoakes
  0 siblings, 2 replies; 21+ messages in thread
From: David Hildenbrand @ 2025-03-25 19:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, David Hildenbrand, xingwei lee, yuxin wang,
	Marius Fleischer, Ingo Molnar, Borislav Petkov, Dan Carpenter,
	Andrew Morton, Linus Torvalds, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Rik van Riel, H. Peter Anvin, Peter Xu

If track_pfn_copy() fails, we already added the dst VMA to the maple
tree. As fork() fails, we'll cleanup the maple tree, and stumble over
the dst VMA for which we neither performed any reservation nor copied
any page tables.

Consequently untrack_pfn() will see VM_PAT and try obtaining the
PAT information from the page table -- which fails because the page
table was not copied.

The easiest fix would be to simply clear the VM_PAT flag of the dst VMA
if track_pfn_copy() fails. However, the whole thing is about "simply"
clearing the VM_PAT flag is shaky as well: if we passed track_pfn_copy()
and performed a reservation, but copying the page tables fails, we'll
simply clear the VM_PAT flag, not properly undoing the reservation ...
which is also wrong.

So let's fix it properly: set the VM_PAT flag only if the reservation
succeeded (leaving it clear initially), and undo the reservation if
anything goes wrong while copying the page tables: clearing the VM_PAT
flag after undoing the reservation.

Note that any copied page table entries will get zapped when the VMA will
get removed later, after copy_page_range() succeeded; as VM_PAT is not set
then, we won't try cleaning VM_PAT up once more and untrack_pfn() will be
happy. Note that leaving these page tables in place without a reservation
is not a problem, as we are aborting fork(); this process will never run.

A reproducer can trigger this usually at the first try:

  https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/pat_fork.c

  [   45.239440] WARNING: CPU: 26 PID: 11650 at arch/x86/mm/pat/memtype.c:983 get_pat_info+0xf6/0x110
  [   45.241082] Modules linked in: ...
  [   45.249119] CPU: 26 UID: 0 PID: 11650 Comm: repro3 Not tainted 6.12.0-rc5+ #92
  [   45.250598] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
  [   45.252181] RIP: 0010:get_pat_info+0xf6/0x110
  ...
  [   45.268513] Call Trace:
  [   45.269003]  <TASK>
  [   45.269425]  ? __warn.cold+0xb7/0x14d
  [   45.270131]  ? get_pat_info+0xf6/0x110
  [   45.270846]  ? report_bug+0xff/0x140
  [   45.271519]  ? handle_bug+0x58/0x90
  [   45.272192]  ? exc_invalid_op+0x17/0x70
  [   45.272935]  ? asm_exc_invalid_op+0x1a/0x20
  [   45.273717]  ? get_pat_info+0xf6/0x110
  [   45.274438]  ? get_pat_info+0x71/0x110
  [   45.275165]  untrack_pfn+0x52/0x110
  [   45.275835]  unmap_single_vma+0xa6/0xe0
  [   45.276549]  unmap_vmas+0x105/0x1f0
  [   45.277256]  exit_mmap+0xf6/0x460
  [   45.277913]  __mmput+0x4b/0x120
  [   45.278512]  copy_process+0x1bf6/0x2aa0
  [   45.279264]  kernel_clone+0xab/0x440
  [   45.279959]  __do_sys_clone+0x66/0x90
  [   45.280650]  do_syscall_64+0x95/0x180

Likely this case was missed in commit d155df53f310 ("x86/mm/pat: clear
VM_PAT if copy_p4d_range failed")

... and instead of undoing the reservation we simply cleared the VM_PAT flag.

Keep the documentation of these functions in include/linux/pgtable.h,
one place is more than sufficient -- we should clean that up for the other
functions like track_pfn_remap/untrack_pfn separately.

Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yuxin wang <wang1315768607@163.com>
Closes: https://lore.kernel.org/lkml/CABOYnLx_dnqzpCW99G81DmOr+2UzdmZMk=T3uxwNxwz+R1RAwg@mail.gmail.com/
Reported-by: Marius Fleischer <fleischermarius@gmail.com>
Closes: https://lore.kernel.org/lkml/CAJg=8jwijTP5fre8woS4JVJQ8iUA6v+iNcsOgtj9Zfpc3obDOQ@mail.gmail.com/
Fixes: d155df53f310 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed")
Fixes: 2ab640379a0a ("x86: PAT: hooks in generic vm code to help archs to track pfnmap regions - v3")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---

v2 -> v3:
* Make some !MMU configs happy by just moving the code into memtype.c

v1 -> v2:
* Avoid a second get_pat_info() [and thereby fix the error checking]
  by passing the pfn from track_pfn_copy() to untrack_pfn_copy()
* Simplify untrack_pfn_copy() by calling untrack_pfn().
* Retested

Not sure if we want to CC stable ... it's really hard to trigger in
sane environments.

---
 arch/x86/mm/pat/memtype.c | 52 +++++++++++++++++++++------------------
 include/linux/pgtable.h   | 28 ++++++++++++++++-----
 kernel/fork.c             |  4 +++
 mm/memory.c               | 11 +++------
 4 files changed, 58 insertions(+), 37 deletions(-)

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index feb8cc6a12bf2..d721cc19addbd 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -984,29 +984,42 @@ static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
 	return -EINVAL;
 }
 
-/*
- * track_pfn_copy is called when vma that is covering the pfnmap gets
- * copied through copy_page_range().
- *
- * If the vma has a linear pfn mapping for the entire range, we get the prot
- * from pte and reserve the entire vma range with single reserve_pfn_range call.
- */
-int track_pfn_copy(struct vm_area_struct *vma)
+int track_pfn_copy(struct vm_area_struct *dst_vma,
+		struct vm_area_struct *src_vma, unsigned long *pfn)
 {
+	const unsigned long vma_size = src_vma->vm_end - src_vma->vm_start;
 	resource_size_t paddr;
-	unsigned long vma_size = vma->vm_end - vma->vm_start;
 	pgprot_t pgprot;
+	int rc;
 
-	if (vma->vm_flags & VM_PAT) {
-		if (get_pat_info(vma, &paddr, &pgprot))
-			return -EINVAL;
-		/* reserve the whole chunk covered by vma. */
-		return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
-	}
+	if (!(src_vma->vm_flags & VM_PAT))
+		return 0;
+
+	/*
+	 * Duplicate the PAT information for the dst VMA based on the src
+	 * VMA.
+	 */
+	if (get_pat_info(src_vma, &paddr, &pgprot))
+		return -EINVAL;
+	rc = reserve_pfn_range(paddr, vma_size, &pgprot, 1);
+	if (rc)
+		return rc;
 
+	/* Reservation for the destination VMA succeeded. */
+	vm_flags_set(dst_vma, VM_PAT);
+	*pfn = PHYS_PFN(paddr);
 	return 0;
 }
 
+void untrack_pfn_copy(struct vm_area_struct *dst_vma, unsigned long pfn)
+{
+	untrack_pfn(dst_vma, pfn, dst_vma->vm_end - dst_vma->vm_start, true);
+	/*
+	 * Reservation was freed, any copied page tables will get cleaned
+	 * up later, but without getting PAT involved again.
+	 */
+}
+
 /*
  * prot is passed in as a parameter for the new mapping. If the vma has
  * a linear pfn mapping for the entire range, or no vma is provided,
@@ -1095,15 +1108,6 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
 	}
 }
 
-/*
- * untrack_pfn_clear is called if the following situation fits:
- *
- * 1) while mremapping a pfnmap for a new region,  with the old vma after
- * its pfnmap page table has been removed.  The new vma has a new pfnmap
- * to the same pfn & cache type with VM_PAT set.
- * 2) while duplicating vm area, the new vma fails to copy the pgtable from
- * old vma.
- */
 void untrack_pfn_clear(struct vm_area_struct *vma)
 {
 	vm_flags_clear(vma, VM_PAT);
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 94d267d02372e..4c107e17c547e 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1508,14 +1508,25 @@ static inline void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
 }
 
 /*
- * track_pfn_copy is called when vma that is covering the pfnmap gets
- * copied through copy_page_range().
+ * track_pfn_copy is called when a VM_PFNMAP VMA is about to get the page
+ * tables copied during copy_page_range(). On success, stores the pfn to be
+ * passed to untrack_pfn_copy().
  */
-static inline int track_pfn_copy(struct vm_area_struct *vma)
+static inline int track_pfn_copy(struct vm_area_struct *dst_vma,
+		struct vm_area_struct *src_vma, unsigned long *pfn)
 {
 	return 0;
 }
 
+/*
+ * untrack_pfn_copy is called when a VM_PFNMAP VMA failed to copy during
+ * copy_page_range(), but after track_pfn_copy() was already called.
+ */
+static inline void untrack_pfn_copy(struct vm_area_struct *dst_vma,
+		unsigned long pfn)
+{
+}
+
 /*
  * untrack_pfn is called while unmapping a pfnmap for a region.
  * untrack can be called for a specific region indicated by pfn and size or
@@ -1528,8 +1539,10 @@ static inline void untrack_pfn(struct vm_area_struct *vma,
 }
 
 /*
- * untrack_pfn_clear is called while mremapping a pfnmap for a new region
- * or fails to copy pgtable during duplicate vm area.
+ * untrack_pfn_clear is called in the following cases on a VM_PFNMAP VMA:
+ *
+ * 1) During mremap() on the src VMA after the page tables were moved.
+ * 2) During fork() on the dst VMA, immediately after duplicating the src VMA.
  */
 static inline void untrack_pfn_clear(struct vm_area_struct *vma)
 {
@@ -1540,7 +1553,10 @@ extern int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
 			   unsigned long size);
 extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
 			     pfn_t pfn);
-extern int track_pfn_copy(struct vm_area_struct *vma);
+extern int track_pfn_copy(struct vm_area_struct *dst_vma,
+		struct vm_area_struct *src_vma, unsigned long *pfn);
+extern void untrack_pfn_copy(struct vm_area_struct *dst_vma,
+		unsigned long pfn);
 extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
 			unsigned long size, bool mm_wr_locked);
 extern void untrack_pfn_clear(struct vm_area_struct *vma);
diff --git a/kernel/fork.c b/kernel/fork.c
index 735405a9c5f32..ca2ca3884f763 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -504,6 +504,10 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
 	vma_numab_state_init(new);
 	dup_anon_vma_name(orig, new);
 
+	/* track_pfn_copy() will later take care of copying internal state. */
+	if (unlikely(new->vm_flags & VM_PFNMAP))
+		untrack_pfn_clear(new);
+
 	return new;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index fb7b8dc751679..dc8efa1358e94 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1362,12 +1362,12 @@ int
 copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 {
 	pgd_t *src_pgd, *dst_pgd;
-	unsigned long next;
 	unsigned long addr = src_vma->vm_start;
 	unsigned long end = src_vma->vm_end;
 	struct mm_struct *dst_mm = dst_vma->vm_mm;
 	struct mm_struct *src_mm = src_vma->vm_mm;
 	struct mmu_notifier_range range;
+	unsigned long next, pfn;
 	bool is_cow;
 	int ret;
 
@@ -1378,11 +1378,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 		return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma);
 
 	if (unlikely(src_vma->vm_flags & VM_PFNMAP)) {
-		/*
-		 * We do not free on error cases below as remove_vma
-		 * gets called on error from higher level routine
-		 */
-		ret = track_pfn_copy(src_vma);
+		ret = track_pfn_copy(dst_vma, src_vma, &pfn);
 		if (ret)
 			return ret;
 	}
@@ -1419,7 +1415,6 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 			continue;
 		if (unlikely(copy_p4d_range(dst_vma, src_vma, dst_pgd, src_pgd,
 					    addr, next))) {
-			untrack_pfn_clear(dst_vma);
 			ret = -ENOMEM;
 			break;
 		}
@@ -1429,6 +1424,8 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 		raw_write_seqcount_end(&src_mm->write_protect_seq);
 		mmu_notifier_invalidate_range_end(&range);
 	}
+	if (ret && unlikely(src_vma->vm_flags & VM_PFNMAP))
+		untrack_pfn_copy(dst_vma, pfn);
 	return ret;
 }
 

base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-04-07  7:11 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-02 11:37 Lorenzo Stoakes
2025-04-03 15:14 ` [PATCH v3] x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range() Dan Carpenter
2025-04-03 20:59   ` David Hildenbrand
2025-04-04 11:52     ` Lorenzo Stoakes
2025-04-04 12:20       ` David Hildenbrand
2025-04-04 12:27         ` David Hildenbrand
2025-04-06 17:17           ` Ingo Molnar
2025-04-07  7:11     ` Dan Carpenter
  -- strict thread matches above, loose matches on Subject: below --
2025-03-27  1:59 kernel test robot
2025-04-02 11:19 ` Lorenzo Stoakes
2025-04-02 11:32   ` David Hildenbrand
2025-04-02 11:40     ` Lorenzo Stoakes
2025-03-25 19:19 David Hildenbrand
2025-04-02 11:36 ` David Hildenbrand
2025-04-02 12:32   ` Lorenzo Stoakes
2025-04-03 14:47     ` David Hildenbrand
2025-04-03 14:50       ` Lorenzo Stoakes
2025-04-02 11:59 ` Lorenzo Stoakes
2025-04-02 12:20   ` David Hildenbrand
2025-04-02 12:31     ` Lorenzo Stoakes
2025-04-02 15:19       ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.