From: Jiaqi Yan <jiaqiyan@google.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Gavin Shan <gshan@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>,
x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Alistair Popple <apopple@nvidia.com>,
kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Sean Christopherson <seanjc@google.com>,
Oscar Salvador <osalvador@suse.de>,
Borislav Petkov <bp@alien8.de>, Zi Yan <ziy@nvidia.com>,
Axel Rasmussen <axelrasmussen@google.com>,
David Hildenbrand <david@redhat.com>,
Yan Zhao <yan.y.zhao@intel.com>, Will Deacon <will@kernel.org>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
Alex Williamson <alex.williamson@redhat.com>,
ankita@nvidia.com
Subject: Re: [PATCH v2 00/19] mm: Support huge pfnmaps
Date: Thu, 29 Aug 2024 12:21:39 -0700 [thread overview]
Message-ID: <CACw3F52dyiAyo1ijKfLUGLbh+kquwoUhGMwg4-RObSDvqxreJw@mail.gmail.com> (raw)
In-Reply-To: <20240828234958.GE3773488@nvidia.com>
On Wed, Aug 28, 2024 at 4:50 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Aug 28, 2024 at 09:10:34AM -0700, Jiaqi Yan wrote:
> > On Wed, Aug 28, 2024 at 7:24 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Tue, Aug 27, 2024 at 05:42:21PM -0700, Jiaqi Yan wrote:
> > >
> > > > Instead of removing the whole pud, can driver or memory_failure do
> > > > something similar to non-struct-page-version of split_huge_page? So
> > > > driver doesn't need to re-fault good pages back?
> > >
> > > It would be far nicer if we didn't have to poke a hole in a 1G mapping
> > > just for memory failure reporting.
> >
> > If I follow this, which of the following sounds better? 1. remove pud
> > and rely on the driver to re-fault PFNs that it knows are not poisoned
> > (what Peter suggested), or 2. keep the pud and allow access to both
> > good and bad PFNs.
>
> In practice I think people will need 2, as breaking up a 1G mapping
> just because a few bits are bad will destroy the VM performance.
>
Totally agreed.
> For this the expectation would be for the VM to co-operate and not
> keep causing memory failures, or perhaps for the platform to spare in
> good memory somehow.
Yes, whether a VM gets into a memory-error-consumption loop
maliciously or accidentally, a reasonable VMM should have means to
detect and break it.
>
> > Or provide some knob (configured by ?) so that kernel + driver can
> > switch between the two?
>
> This is also sounding reasonable, especially if we need some
> alternative protocol to signal userspace about the failed memory
> besides fault and SIGBUS.
To clarify, what on my mind is a knob say named
"sysctl_enable_hard_offline", configured by userspace.
To apply to Ankit's memory_failure_pfn patch[*]:
static int memory_failure_pfn(unsigned long pfn, int flags)
{
struct interval_tree_node *node;
int res = MF_FAILED;
LIST_HEAD(tokill);
mutex_lock(&pfn_space_lock);
for (node = interval_tree_iter_first(&pfn_space_itree, pfn, pfn); node;
node = interval_tree_iter_next(node, pfn, pfn)) {
struct pfn_address_space *pfn_space =
container_of(node, struct pfn_address_space, node);
if (pfn_space->ops)
pfn_space->ops->failure(pfn_space, pfn);
collect_procs_pgoff(NULL, pfn_space->mapping, pfn, &tokill);
if (sysctl_enable_hard_offline)
unmap_mapping_range(pfn_space->mapping, pfn << PAGE_SHIFT,
PAGE_SIZE, 0);
res = MF_RECOVERED;
}
mutex_unlock(&pfn_space_lock);
if (res == MF_FAILED)
return action_result(pfn, MF_MSG_PFN_MAP, res);
flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
kill_procs(&tokill, true, false, pfn, flags);
return action_result(pfn, MF_MSG_PFN_MAP, MF_RECOVERED);
}
I think we still want to attempt to SIGBUS userspace, regardless of
doing unmap_mapping_range or not.
[*] https://lore.kernel.org/lkml/20231123003513.24292-2-ankita@nvidia.com/#t
>
> Jason
next prev parent reply other threads:[~2024-08-29 19:21 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-26 20:43 [PATCH v2 00/19] mm: Support huge pfnmaps Peter Xu
2024-08-26 20:43 ` [PATCH v2 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-26 20:43 ` [PATCH v2 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-26 20:43 ` [PATCH v2 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-28 15:31 ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-28 15:31 ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-26 20:43 ` [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start() Peter Xu
2024-08-28 7:44 ` David Hildenbrand
2024-08-28 14:24 ` Peter Xu
2024-08-28 15:30 ` David Hildenbrand
2024-08-28 19:45 ` Peter Xu
2024-08-28 23:46 ` Jason Gunthorpe
2024-08-29 6:35 ` David Hildenbrand
2024-08-29 18:45 ` Peter Xu
2024-08-29 15:10 ` David Hildenbrand
2024-08-29 18:49 ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-29 15:10 ` David Hildenbrand
2024-08-29 18:26 ` Peter Xu
2024-08-29 19:44 ` David Hildenbrand
2024-08-29 20:01 ` Peter Xu
2024-09-02 7:58 ` Yan Zhao
2024-09-03 21:23 ` Peter Xu
2024-09-09 22:25 ` Andrew Morton
2024-09-09 22:43 ` Peter Xu
2024-09-09 23:15 ` Andrew Morton
2024-09-10 0:08 ` Peter Xu
2024-09-10 2:52 ` Yan Zhao
2024-09-10 12:16 ` Peter Xu
2024-09-11 2:16 ` Yan Zhao
2024-09-11 14:34 ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-26 20:43 ` [PATCH v2 09/19] mm: New follow_pfnmap API Peter Xu
2024-08-26 20:43 ` [PATCH v2 10/19] KVM: Use " Peter Xu
2024-08-26 20:43 ` [PATCH v2 11/19] s390/pci_mmio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-26 20:43 ` [PATCH v2 13/19] vfio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 14/19] acrn: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 15/19] mm/access_process_vm: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 16/19] mm: Remove follow_pte() Peter Xu
2024-09-01 4:33 ` Yu Zhao
2024-09-01 13:39 ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-26 20:43 ` [PATCH v2 18/19] mm/arm64: " Peter Xu
2025-03-19 22:22 ` Keith Busch
2025-03-19 22:46 ` Peter Xu
2025-03-19 22:53 ` Keith Busch
2024-08-26 20:43 ` [PATCH v2 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-27 22:36 ` [PATCH v2 00/19] mm: Support huge pfnmaps Jiaqi Yan
2024-08-27 22:57 ` Peter Xu
2024-08-28 0:42 ` Jiaqi Yan
2024-08-28 0:46 ` Jiaqi Yan
2024-08-28 14:24 ` Jason Gunthorpe
2024-08-28 16:10 ` Jiaqi Yan
2024-08-28 23:49 ` Jason Gunthorpe
2024-08-29 19:21 ` Jiaqi Yan [this message]
2024-09-04 15:52 ` Jason Gunthorpe
2024-09-04 16:38 ` Jiaqi Yan
2024-09-04 16:43 ` Jason Gunthorpe
2024-09-04 16:58 ` Jiaqi Yan
2024-09-04 17:00 ` Jason Gunthorpe
2024-09-04 17:07 ` Jiaqi Yan
2024-09-09 3:56 ` Ankit Agrawal
2024-08-28 14:41 ` Peter Xu
2024-08-28 16:23 ` Jiaqi Yan
2024-09-09 4:03 ` Ankit Agrawal
2024-09-09 15:03 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACw3F52dyiAyo1ijKfLUGLbh+kquwoUhGMwg4-RObSDvqxreJw@mail.gmail.com \
--to=jiaqiyan@google.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=ankita@nvidia.com \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=gshan@redhat.com \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=osalvador@suse.de \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).