From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Sean Christopherson <seanjc@google.com>,
Oscar Salvador <osalvador@suse.de>,
Jason Gunthorpe <jgg@nvidia.com>,
Axel Rasmussen <axelrasmussen@google.com>,
linux-arm-kernel@lists.infradead.org, x86@kernel.org,
Will Deacon <will@kernel.org>, Gavin Shan <gshan@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, Zi Yan <ziy@nvidia.com>,
Andrew Morton <akpm@linux-foundation.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Ingo Molnar <mingo@redhat.com>,
Alistair Popple <apopple@nvidia.com>,
Borislav Petkov <bp@alien8.de>,
Thomas Gleixner <tglx@linutronix.de>,
kvm@vger.kernel.org, Dave Hansen <dave.hansen@linux.intel.com>,
Alex Williamson <alex.williamson@redhat.com>,
Yan Zhao <yan.y.zhao@intel.com>
Subject: Re: [PATCH 07/19] mm/fork: Accept huge pfnmap entries
Date: Fri, 9 Aug 2024 13:15:21 -0400 [thread overview]
Message-ID: <ZrZOqbS3bcj52JZP@x1n> (raw)
In-Reply-To: <d7fcec73-16f6-4d54-b334-6450a29e0a1d@redhat.com>
On Fri, Aug 09, 2024 at 06:32:44PM +0200, David Hildenbrand wrote:
> On 09.08.24 18:08, Peter Xu wrote:
> > Teach the fork code to properly copy pfnmaps for pmd/pud levels. Pud is
> > much easier, the write bit needs to be persisted though for writable and
> > shared pud mappings like PFNMAP ones, otherwise a follow up write in either
> > parent or child process will trigger a write fault.
> >
> > Do the same for pmd level.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > mm/huge_memory.c | 27 ++++++++++++++++++++++++---
> > 1 file changed, 24 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 6568586b21ab..015c9468eed5 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1375,6 +1375,22 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > pgtable_t pgtable = NULL;
> > int ret = -ENOMEM;
> > + pmd = pmdp_get_lockless(src_pmd);
> > + if (unlikely(pmd_special(pmd))) {
> > + dst_ptl = pmd_lock(dst_mm, dst_pmd);
> > + src_ptl = pmd_lockptr(src_mm, src_pmd);
> > + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
> > + /*
> > + * No need to recheck the pmd, it can't change with write
> > + * mmap lock held here.
> > + */
> > + if (is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd)) {
> > + pmdp_set_wrprotect(src_mm, addr, src_pmd);
> > + pmd = pmd_wrprotect(pmd);
> > + }
> > + goto set_pmd;
> > + }
> > +
>
> I strongly assume we should be using using vm_normal_page_pmd() instead of
> pmd_page() further below. pmd_special() should be mostly limited to GUP-fast
> and vm_normal_page_pmd().
One thing to mention that it has this:
if (!vma_is_anonymous(dst_vma))
return 0;
So it's only about anonymous below that. In that case I feel like the
pmd_page() is benign, and actually good.
Though what you're saying here made me notice my above check doesn't seem
to be necessary, I mean, "(is_cow_mapping(src_vma->vm_flags) &&
pmd_write(pmd))" can't be true when special bit is set, aka, pfnmaps.. and
if it's writable for CoW it means it's already an anon.
I think I can probably drop that line there, perhaps with a
VM_WARN_ON_ONCE() making sure it won't happen.
>
> Again, we should be doing this similar to how we handle PTEs.
>
> I'm a bit confused about the "unlikely(!pmd_trans_huge(pmd)" check, below:
> what else should we have here if it's not a migration entry but a present
> entry?
I had a feeling that it was just a safety belt since the 1st day of thp
when Andrea worked that out, so that it'll work with e.g. file truncation
races.
But with current code it looks like it's only anonymous indeed, so looks
not possible at least from that pov.
Thanks,
>
> Likely this function needs a bit of rework.
>
> --
> Cheers,
>
> David / dhildenb
>
--
Peter Xu
next prev parent reply other threads:[~2024-08-09 17:15 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-09 16:08 [PATCH 00/19] mm: Support huge pfnmaps Peter Xu
2024-08-09 16:08 ` [PATCH 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-09 16:34 ` David Hildenbrand
2024-08-09 17:16 ` Peter Xu
2024-08-09 18:06 ` David Hildenbrand
2024-08-09 16:08 ` [PATCH 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-09 16:34 ` David Hildenbrand
2024-08-14 12:38 ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-14 12:40 ` Jason Gunthorpe
2024-08-14 15:23 ` Peter Xu
2024-08-14 15:53 ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-14 12:40 ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-09 16:23 ` David Hildenbrand
2024-08-09 16:59 ` Peter Xu
2024-08-14 12:42 ` Jason Gunthorpe
2024-08-14 15:34 ` Peter Xu
2024-08-14 12:41 ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-09 16:32 ` David Hildenbrand
2024-08-09 17:15 ` Peter Xu [this message]
2024-08-09 17:59 ` David Hildenbrand
2024-08-12 18:29 ` Peter Xu
2024-08-12 18:50 ` David Hildenbrand
2024-08-12 19:05 ` Peter Xu
2024-08-09 16:08 ` [PATCH 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-14 13:09 ` Jason Gunthorpe
2024-08-14 15:43 ` Peter Xu
2024-08-09 16:08 ` [PATCH 09/19] mm: New follow_pfnmap API Peter Xu
2024-08-14 13:19 ` Jason Gunthorpe
2024-08-14 18:24 ` Peter Xu
2024-08-14 22:14 ` Jason Gunthorpe
2024-08-15 15:41 ` Peter Xu
2024-08-15 16:16 ` Jason Gunthorpe
2024-08-15 17:21 ` Peter Xu
2024-08-15 17:24 ` Jason Gunthorpe
2024-08-15 18:52 ` Peter Xu
2024-08-16 23:12 ` Sean Christopherson
2024-08-17 11:05 ` David Hildenbrand
2024-08-21 19:10 ` Peter Xu
2024-08-09 16:09 ` [PATCH 10/19] KVM: Use " Peter Xu
2024-08-09 17:23 ` Axel Rasmussen
2024-08-12 18:58 ` Peter Xu
2024-08-12 22:47 ` Axel Rasmussen
2024-08-12 23:44 ` Sean Christopherson
2024-08-14 13:15 ` Jason Gunthorpe
2024-08-14 14:23 ` Sean Christopherson
2024-08-09 16:09 ` [PATCH 11/19] s390/pci_mmio: " Peter Xu
2024-08-09 16:09 ` [PATCH 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-09 16:09 ` [PATCH 13/19] vfio: " Peter Xu
2024-08-14 13:20 ` Jason Gunthorpe
2024-08-09 16:09 ` [PATCH 14/19] acrn: " Peter Xu
2024-08-09 16:09 ` [PATCH 15/19] mm/access_process_vm: " Peter Xu
2024-08-09 16:09 ` [PATCH 16/19] mm: Remove follow_pte() Peter Xu
2024-08-09 16:09 ` [PATCH 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-09 16:09 ` [PATCH 18/19] mm/arm64: " Peter Xu
2024-08-09 16:09 ` [PATCH 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-14 13:25 ` Jason Gunthorpe
2024-08-14 16:08 ` Alex Williamson
2024-08-14 16:24 ` Jason Gunthorpe
[not found] ` <20240809160909.1023470-7-peterx@redhat.com>
2024-08-09 16:20 ` [PATCH 06/19] mm/pagewalk: Check pfnmap early for folio_walk_start() David Hildenbrand
2024-08-09 16:54 ` Peter Xu
2024-08-09 17:25 ` David Hildenbrand
2024-08-09 21:37 ` Peter Xu
2024-08-14 13:05 ` Jason Gunthorpe
2024-08-16 9:30 ` David Hildenbrand
2024-08-16 14:21 ` Peter Xu
2024-08-16 17:38 ` Jason Gunthorpe
2024-08-21 18:42 ` Peter Xu
2024-08-16 17:56 ` David Hildenbrand
2024-08-19 12:19 ` Jason Gunthorpe
2024-08-19 14:19 ` Sean Christopherson
2024-08-09 18:12 ` [PATCH 00/19] mm: Support huge pfnmaps David Hildenbrand
2024-08-14 12:37 ` Jason Gunthorpe
2024-08-14 14:35 ` Sean Christopherson
2024-08-14 14:42 ` Paolo Bonzini
2024-08-14 14:43 ` Jason Gunthorpe
2024-08-14 20:54 ` Sean Christopherson
2024-08-14 22:00 ` Sean Christopherson
2024-08-14 22:10 ` Jason Gunthorpe
2024-08-14 23:36 ` Oliver Upton
2024-08-14 23:27 ` Oliver Upton
2024-08-14 23:38 ` Oliver Upton
2024-08-15 0:23 ` Sean Christopherson
2024-08-15 19:20 ` Peter Xu
2024-08-16 3:05 ` Kefeng Wang
2024-08-16 14:33 ` Peter Xu
2024-08-19 13:14 ` Kefeng Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZrZOqbS3bcj52JZP@x1n \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=gshan@redhat.com \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=osalvador@suse.de \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).