linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Jiaqi Yan <jiaqiyan@google.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Gavin Shan <gshan@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Alistair Popple <apopple@nvidia.com>,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Sean Christopherson <seanjc@google.com>,
	Oscar Salvador <osalvador@suse.de>,
	Jason Gunthorpe <jgg@nvidia.com>, Borislav Petkov <bp@alien8.de>,
	Zi Yan <ziy@nvidia.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	David Hildenbrand <david@redhat.com>,
	Yan Zhao <yan.y.zhao@intel.com>, Will Deacon <will@kernel.org>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH v2 00/19] mm: Support huge pfnmaps
Date: Wed, 28 Aug 2024 10:41:40 -0400	[thread overview]
Message-ID: <Zs83JJhFY9S-Gxqc@x1n> (raw)
In-Reply-To: <CACw3F52_LtLzRD479piaFJSePjA-DKG08o-hGT-f8R5VV94S=Q@mail.gmail.com>

On Tue, Aug 27, 2024 at 05:42:21PM -0700, Jiaqi Yan wrote:
> On Tue, Aug 27, 2024 at 3:57 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Aug 27, 2024 at 03:36:07PM -0700, Jiaqi Yan wrote:
> > > Hi Peter,
> >
> > Hi, Jiaqi,
> >
> > > I am curious if there is any work needed for unmap_mapping_range? If a
> > > driver hugely remap_pfn_range()ed at 1G granularity, can the driver
> > > unmap at PAGE_SIZE granularity? For example, when handling a PFN is
> >
> > Yes it can, but it'll invoke the split_huge_pud() which default routes to
> > removal of the whole pud right now (currently only covers either DAX
> > mappings or huge pfnmaps; it won't for anonymous if it comes, for example).
> >
> > In that case it'll rely on the driver providing proper fault() /
> > huge_fault() to refault things back with smaller sizes later when accessed
> > again.
> 
> I see, so the driver needs to drive the recovery process, and code
> needs to be in the driver.
> 
> But it seems to me the recovery process will be more or less the same
> to different drivers? In that case does it make sense that
> memory_failure do the common things for all drivers?
> 
> Instead of removing the whole pud, can driver or memory_failure do
> something similar to non-struct-page-version of split_huge_page? So
> driver doesn't need to re-fault good pages back?

I think we can, it's just that we don't yet have a valid use case.

DAX is definitely fault-able.

While for the new huge pfnmap, currently vfio is the only user, and vfio
only requires to either zap all or map all.  In that case there's no real
need to ask for what you described yet.  Meanwhile it's also faultable, so
if / when needed it should hopefully still do the work properly.

I believe it's not usual requirement too for most of the rest drivers, as
most of them don't even support fault() afaiu. remap_pfn_range() can start
to use huge mappings, however I'd expect they're mostly not ready for
random tearing down of any MMIO mappings.

It sounds doable to me though when there's a need of what you're
describing, but I don't think I know well on the use case yet.

> 
> 
> >
> > > poisoned in the 1G mapping, it would be great if the mapping can be
> > > splitted to 2M mappings + 4k mappings, so only the single poisoned PFN
> > > is lost. (Pretty much like the past proposal* to use HGM** to improve
> > > hugetlb's memory failure handling).
> >
> > Note that we're only talking about MMIO mappings here, in which case the
> > PFN doesn't even have a struct page, so the whole poison idea shouldn't
> > apply, afaiu.
> 
> Yes, there won't be any struct page. Ankit proposed this patchset* for
> handling poisoning. I wonder if someday the vfio-nvgrace-gpu-pci
> driver adopts your change via new remap_pfn_range (install PMD/PUD
> instead of PTE), and memory_failure_pfn still
> unmap_mapping_range(pfn_space->mapping, pfn << PAGE_SHIFT, PAGE_SIZE,
> 0), can it somehow just work and no re-fault needed?
> 
> * https://lore.kernel.org/lkml/20231123003513.24292-2-ankita@nvidia.com/#t

I see now, interesting.. Thanks for the link.  

In that case of nvgpu usage, one way is to do as what you said; we can
enhance the pmd/pud split for pfnmap, but maybe that's an overkill.

I saw that the nvgpu will need a fault() anyway so as to detect poisoned
PFNs, then it's also feasible that in the new nvgrace_gpu_vfio_pci_fault()
when it supports huge pfnmaps it'll need to try to detect whether the whole
faulting range contains any poisoned PFNs, then provide FALLBACK if so
(rather than VM_FAULT_HWPOISON).

E.g., when 4K of 2M is poisoned, we'll erase the 2M completely.  When
access happens, as long as the accessed 4K is not on top of the poisoned
4k, huge_fault() should still detect that there's 4k range poisoned, then
it'll not inject pmd but return FALLBACK, then in the fault() it'll see
the accessed 4k range is not poisoned, then install a pte.

Thanks,

-- 
Peter Xu



  parent reply	other threads:[~2024-08-28 14:44 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-26 20:43 [PATCH v2 00/19] mm: Support huge pfnmaps Peter Xu
2024-08-26 20:43 ` [PATCH v2 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-26 20:43 ` [PATCH v2 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-26 20:43 ` [PATCH v2 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-26 20:43 ` [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start() Peter Xu
2024-08-28  7:44   ` David Hildenbrand
2024-08-28 14:24     ` Peter Xu
2024-08-28 15:30       ` David Hildenbrand
2024-08-28 19:45         ` Peter Xu
2024-08-28 23:46           ` Jason Gunthorpe
2024-08-29  6:35             ` David Hildenbrand
2024-08-29 18:45               ` Peter Xu
2024-08-29 15:10           ` David Hildenbrand
2024-08-29 18:49             ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-29 15:10   ` David Hildenbrand
2024-08-29 18:26     ` Peter Xu
2024-08-29 19:44       ` David Hildenbrand
2024-08-29 20:01         ` Peter Xu
2024-09-02  7:58   ` Yan Zhao
2024-09-03 21:23     ` Peter Xu
2024-09-09 22:25       ` Andrew Morton
2024-09-09 22:43         ` Peter Xu
2024-09-09 23:15           ` Andrew Morton
2024-09-10  0:08             ` Peter Xu
2024-09-10  2:52               ` Yan Zhao
2024-09-10 12:16                 ` Peter Xu
2024-09-11  2:16                   ` Yan Zhao
2024-09-11 14:34                     ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-26 20:43 ` [PATCH v2 09/19] mm: New follow_pfnmap API Peter Xu
2024-08-26 20:43 ` [PATCH v2 10/19] KVM: Use " Peter Xu
2024-08-26 20:43 ` [PATCH v2 11/19] s390/pci_mmio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-26 20:43 ` [PATCH v2 13/19] vfio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 14/19] acrn: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 15/19] mm/access_process_vm: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 16/19] mm: Remove follow_pte() Peter Xu
2024-09-01  4:33   ` Yu Zhao
2024-09-01 13:39     ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-26 20:43 ` [PATCH v2 18/19] mm/arm64: " Peter Xu
2025-03-19 22:22   ` Keith Busch
2025-03-19 22:46     ` Peter Xu
2025-03-19 22:53       ` Keith Busch
2024-08-26 20:43 ` [PATCH v2 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-27 22:36 ` [PATCH v2 00/19] mm: Support huge pfnmaps Jiaqi Yan
2024-08-27 22:57   ` Peter Xu
2024-08-28  0:42     ` Jiaqi Yan
2024-08-28  0:46       ` Jiaqi Yan
2024-08-28 14:24       ` Jason Gunthorpe
2024-08-28 16:10         ` Jiaqi Yan
2024-08-28 23:49           ` Jason Gunthorpe
2024-08-29 19:21             ` Jiaqi Yan
2024-09-04 15:52               ` Jason Gunthorpe
2024-09-04 16:38                 ` Jiaqi Yan
2024-09-04 16:43                   ` Jason Gunthorpe
2024-09-04 16:58                     ` Jiaqi Yan
2024-09-04 17:00                       ` Jason Gunthorpe
2024-09-04 17:07                         ` Jiaqi Yan
2024-09-09  3:56                           ` Ankit Agrawal
2024-08-28 14:41       ` Peter Xu [this message]
2024-08-28 16:23         ` Jiaqi Yan
2024-09-09  4:03 ` Ankit Agrawal
2024-09-09 15:03   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zs83JJhFY9S-Gxqc@x1n \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=jiaqiyan@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).