linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Gavin Shan <gshan@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Alistair Popple <apopple@nvidia.com>,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Sean Christopherson <seanjc@google.com>,
	Oscar Salvador <osalvador@suse.de>,
	Jason Gunthorpe <jgg@nvidia.com>, Borislav Petkov <bp@alien8.de>,
	Zi Yan <ziy@nvidia.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yan Zhao <yan.y.zhao@intel.com>, Will Deacon <will@kernel.org>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start()
Date: Wed, 28 Aug 2024 10:24:05 -0400	[thread overview]
Message-ID: <Zs8zBT1aDh1v9Eje@x1n> (raw)
In-Reply-To: <9f9d7e96-b135-4830-b528-37418ae7bbfd@redhat.com>

On Wed, Aug 28, 2024 at 09:44:04AM +0200, David Hildenbrand wrote:
> On 26.08.24 22:43, Peter Xu wrote:
> > Teach folio_walk_start() to recognize special pmd/pud mappings, and fail
> > them properly as it means there's no folio backing them.
> > 
> > Cc: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   mm/pagewalk.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> > index cd79fb3b89e5..12be5222d70e 100644
> > --- a/mm/pagewalk.c
> > +++ b/mm/pagewalk.c
> > @@ -753,7 +753,7 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> >   		fw->pudp = pudp;
> >   		fw->pud = pud;
> > -		if (!pud_present(pud) || pud_devmap(pud)) {
> > +		if (!pud_present(pud) || pud_devmap(pud) || pud_special(pud)) {
> >   			spin_unlock(ptl);
> >   			goto not_found;
> >   		} else if (!pud_leaf(pud)) {
> > @@ -783,7 +783,7 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> >   		fw->pmdp = pmdp;
> >   		fw->pmd = pmd;
> > -		if (pmd_none(pmd)) {
> > +		if (pmd_none(pmd) || pmd_special(pmd)) {
> >   			spin_unlock(ptl);
> >   			goto not_found;
> >   		} else if (!pmd_leaf(pmd)) {
> 
> As raised, this is not the right way to to it. You should follow what
> CONFIG_ARCH_HAS_PTE_SPECIAL and vm_normal_page() does.
> 
> It's even spelled out in vm_normal_page_pmd() that at the time it was
> introduced there was no pmd_special(), so there was no way to handle that.

I can try to do something like that, but even so it'll be mostly cosmetic
changes, and AFAICT there's no real functional difference.

Meanwhile, see below comment.

> 
> 
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index f0cf5d02b4740..272445e9db147 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -672,15 +672,29 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>  {
>         unsigned long pfn = pmd_pfn(pmd);
> -       /*
> -        * There is no pmd_special() but there may be special pmds, e.g.
> -        * in a direct-access (dax) mapping, so let's just replicate the
> -        * !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here.
> -        */

This one is correct; I overlooked this comment which can be obsolete.  I
can either refine this patch or add one patch on top to refine the comment
at least.

> +       if (IS_ENABLED(CONFIG_ARCH_HAS_PMD_SPECIAL)) {

We don't yet have CONFIG_ARCH_HAS_PMD_SPECIAL, but I get your point.

> +               if (likely(!pmd_special(pmd)))
> +                       goto check_pfn;
> +               if (vma->vm_ops && vma->vm_ops->find_special_page)
> +                       return vma->vm_ops->find_special_page(vma, addr);

Why do we ever need this?  This is so far destined to be totally a waste of
cycles.  I think it's better we leave that until either xen/gntdev.c or any
new driver start to use it, rather than keeping dead code around.

> +               if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> +                       return NULL;
> +               if (is_huge_zero_pmd(pmd))
> +                       return NULL;

This is meaningless too until we make huge zero pmd apply special bit
first, which does sound like to be outside the scope of this series.

> +               if (pmd_devmap(pmd))
> +                       /* See vm_normal_page() */
> +                       return NULL;

When will it be pmd_devmap() if it's already pmd_special()?

> +               return NULL;

And see this one.. it's after:

  if (xxx)
      return NULL;
  if (yyy)
      return NULL;
  if (zzz)
      return NULL;
  return NULL;

Hmm??  If so, what's the difference if we simply check pmd_special and
return NULL..

> +       }
> +
> +       /* !CONFIG_ARCH_HAS_PMD_SPECIAL case follows: */
> +
>         if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
>                 if (vma->vm_flags & VM_MIXEDMAP) {
>                         if (!pfn_valid(pfn))
>                                 return NULL;
> +                       if (is_huge_zero_pmd(pmd))
> +                               return NULL;

I'd rather not touch here as this series doesn't change anything for
MIXEDMAP yet..

>                         goto out;
>                 } else {
>                         unsigned long off;
> @@ -692,6 +706,11 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>                 }
>         }
> +       /*
> +        * For historical reasons, these might not have pmd_special() set,
> +        * so we'll check them manually, in contrast to vm_normal_page().
> +        */
> +check_pfn:
>         if (pmd_devmap(pmd))
>                 return NULL;
>         if (is_huge_zero_pmd(pmd))
> 
> 
> 
> We should then look into mapping huge zeropages also with pmd_special.
> pmd_devmap we'll leave alone until removed. But that's indeoendent of your series.

This does look reasonable to match what we do with pte zeropage.  Could you
remind me what might be the benefit when we switch to using special bit for
pmd zero pages?

> 
> I wonder if CONFIG_ARCH_HAS_PTE_SPECIAL is sufficient and we don't need additional
> CONFIG_ARCH_HAS_PMD_SPECIAL.

The hope is we can always reuse the bit in the pte to work the same for
pmd/pud.

Now we require arch to select ARCH_SUPPORTS_HUGE_PFNMAP to say "pmd/pud has
the same special bit defined".

> 
> As I said, if you need someone to add vm_normal_page_pud(), I can handle that.

I'm pretty confused why we need that for this series alone.

If you prefer vm_normal_page_pud() to be defined and check pud_special()
there, I can do that.  But again, I don't yet see how that can make a
functional difference considering the so far very limited usage of the
special bit, and wonder whether we can do that on top when it became
necessary (and when we start to have functional requirement of such).

Thanks,

-- 
Peter Xu



  reply	other threads:[~2024-08-28 14:24 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-26 20:43 [PATCH v2 00/19] mm: Support huge pfnmaps Peter Xu
2024-08-26 20:43 ` [PATCH v2 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-26 20:43 ` [PATCH v2 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-26 20:43 ` [PATCH v2 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-26 20:43 ` [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start() Peter Xu
2024-08-28  7:44   ` David Hildenbrand
2024-08-28 14:24     ` Peter Xu [this message]
2024-08-28 15:30       ` David Hildenbrand
2024-08-28 19:45         ` Peter Xu
2024-08-28 23:46           ` Jason Gunthorpe
2024-08-29  6:35             ` David Hildenbrand
2024-08-29 18:45               ` Peter Xu
2024-08-29 15:10           ` David Hildenbrand
2024-08-29 18:49             ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-29 15:10   ` David Hildenbrand
2024-08-29 18:26     ` Peter Xu
2024-08-29 19:44       ` David Hildenbrand
2024-08-29 20:01         ` Peter Xu
2024-09-02  7:58   ` Yan Zhao
2024-09-03 21:23     ` Peter Xu
2024-09-09 22:25       ` Andrew Morton
2024-09-09 22:43         ` Peter Xu
2024-09-09 23:15           ` Andrew Morton
2024-09-10  0:08             ` Peter Xu
2024-09-10  2:52               ` Yan Zhao
2024-09-10 12:16                 ` Peter Xu
2024-09-11  2:16                   ` Yan Zhao
2024-09-11 14:34                     ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-26 20:43 ` [PATCH v2 09/19] mm: New follow_pfnmap API Peter Xu
2024-08-26 20:43 ` [PATCH v2 10/19] KVM: Use " Peter Xu
2024-08-26 20:43 ` [PATCH v2 11/19] s390/pci_mmio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-26 20:43 ` [PATCH v2 13/19] vfio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 14/19] acrn: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 15/19] mm/access_process_vm: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 16/19] mm: Remove follow_pte() Peter Xu
2024-09-01  4:33   ` Yu Zhao
2024-09-01 13:39     ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-26 20:43 ` [PATCH v2 18/19] mm/arm64: " Peter Xu
2025-03-19 22:22   ` Keith Busch
2025-03-19 22:46     ` Peter Xu
2025-03-19 22:53       ` Keith Busch
2024-08-26 20:43 ` [PATCH v2 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-27 22:36 ` [PATCH v2 00/19] mm: Support huge pfnmaps Jiaqi Yan
2024-08-27 22:57   ` Peter Xu
2024-08-28  0:42     ` Jiaqi Yan
2024-08-28  0:46       ` Jiaqi Yan
2024-08-28 14:24       ` Jason Gunthorpe
2024-08-28 16:10         ` Jiaqi Yan
2024-08-28 23:49           ` Jason Gunthorpe
2024-08-29 19:21             ` Jiaqi Yan
2024-09-04 15:52               ` Jason Gunthorpe
2024-09-04 16:38                 ` Jiaqi Yan
2024-09-04 16:43                   ` Jason Gunthorpe
2024-09-04 16:58                     ` Jiaqi Yan
2024-09-04 17:00                       ` Jason Gunthorpe
2024-09-04 17:07                         ` Jiaqi Yan
2024-09-09  3:56                           ` Ankit Agrawal
2024-08-28 14:41       ` Peter Xu
2024-08-28 16:23         ` Jiaqi Yan
2024-09-09  4:03 ` Ankit Agrawal
2024-09-09 15:03   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zs8zBT1aDh1v9Eje@x1n \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).