Re: [RFC PATCH v1 1/2] mm/memory-failure: introduce global MFR policy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jiaqi Yan <jiaqiyan@google.com>
To: jane.chu@oracle.com
Cc: nao.horiguchi@gmail.com, linmiaohe@huawei.com,
	tony.luck@intel.com,  wangkefeng.wang@huawei.com,
	akpm@linux-foundation.org, osalvador@suse.de,
	 rientjes@google.com, duenwen@google.com, jthoughton@google.com,
	 jgg@nvidia.com, ankita@nvidia.com, peterx@redhat.com,
	linux-mm@kvack.org
Subject: Re: [RFC PATCH v1 1/2] mm/memory-failure: introduce global MFR policy
Date: Tue, 15 Oct 2024 16:45:49 -0700	[thread overview]
Message-ID: <CACw3F50aJRgR4SCpW6wNJZ6hLt6dmPe3bipfgh9eYbN3GARc-A@mail.gmail.com> (raw)
In-Reply-To: <aa42865e-faab-4199-b80b-8fd15aae3ed7@oracle.com>

On Fri, Oct 11, 2024 at 11:28 AM <jane.chu@oracle.com> wrote:
>
> On 10/10/2024 4:21 PM, Jiaqi Yan wrote:
>
> > On Mon, Oct 7, 2024 at 10:24 AM <jane.chu@oracle.com> wrote:
> >> On 10/3/2024 4:51 PM, Jiaqi Yan wrote:
> >>> soned page (sub- or huge-) will eventually be isolated, because,
> >>> The code here is "global policy". The "per-VMA policy", proposed in
> >>> 0/2 but code not sent, should be able to support isolation + offline
> >>> at some point (all VMAs are gone and page becomes free).
> >> "per-VMA policy" sounds interesting.
> >>>> Another thing I'm curious at is whether you have tested with real
> >>>> hardware UE - the one that triggers MCE.  When a real UE is consumed by
> >>> Yes, with our workload. Can you share more about what is the "training
> >>> process"? Is it something to train memory or screen memory errors?
> >> The cover letter mentioned "Machine Learning (ML) workloads", so I used
> >> it as an example.
> > Got you. In that case, if the ML workload (running in a VM) wants to
> > do what you described, wouldn't losing 1G hugetlb page due to kernel
> > offline make the VM/workload even harder to execute recover logic?
>
> Indeed.
>
> As the user application got more sophisticated on recovering from
> poison, what about making the kernel to do the heavy lifting?

I think there are two things.

First, if userspace claims it has enough or sophisticated recovery
ability (assume we trust it), can it take full control of what happens
to the hardware poisoned memory page it **owns**?
My answer to this question is yes. The reason is I believe the kernel
has a limited ability to do memory failure recovery (MFR) optimally
for all userspace. Current hard offline support in the kernel has also
made userspace recovery hard, so userspace deserve a position in MFR.

Second, what is the granularity of the control? This patch makes the
control applicable to every process. So what about making it
controllable only by the userspace process that owns the memory page?
Kernel can still do whatever the heavy lifting (hard offline, set
HWPoison) **after** the owning userspace unclaims the control, or
exits.

Another way to "disable hardoffline but still set HWPoison" I can
think of is, make the HWPOISON flag apply at page_size level, instead
of always set at the compound head. At least from hugetlb's
perspective, is it a good idea?

>
> Something like by way of userfaultfd,  kernel provides a new/clean
> hugetlb page, copied over good data from the clean subpages and then
> present the clean hugetlb page to user process with indication that
> subpage x is a substitute of the poisoned old subpage x, hence its data
> might need a refill?  I am not sure how exactly to pull this through as
> the even is not a page-fault, but just wondering whether something like
> this is possible.
>
> thanks,
>
> -jane
>
> >
> >> -jane
> >>

next prev parent reply	other threads:[~2024-10-15 23:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-24  4:39 [RFC PATCH v1 0/2] Userspace Can Control Memory Failure Recovery Jiaqi Yan
2024-09-24  4:39 ` [RFC PATCH v1 1/2] mm/memory-failure: introduce global MFR policy Jiaqi Yan
2024-10-02 23:50   ` jane.chu
2024-10-03 23:51     ` Jiaqi Yan
2024-10-07 17:24       ` jane.chu
2024-10-10 23:21         ` Jiaqi Yan
2024-10-11 18:28           ` jane.chu
2024-10-11 19:44             ` Luck, Tony
2024-10-11 20:15               ` jane.chu
2024-10-15 23:45             ` Jiaqi Yan [this message]
2024-10-15 23:56               ` Luck, Tony
2024-10-16  0:19                 ` jane.chu
2024-10-11  7:04       ` Miaohe Lin
2024-10-15 23:58         ` Jiaqi Yan
2024-09-24  4:39 ` [RFC PATCH v1 2/2] docs: mm: add enable_hard_offline sysctl Jiaqi Yan
2024-10-02 15:02 ` [RFC PATCH v1 0/2] Userspace Can Control Memory Failure Recovery Jason Gunthorpe
2024-10-03 22:45   ` Jiaqi Yan
2024-10-03 22:58     ` Luck, Tony
2024-10-03 23:19       ` Jiaqi Yan
2024-10-03 23:19     ` Jason Gunthorpe
2024-10-04 18:32       ` Jiaqi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACw3F50aJRgR4SCpW6wNJZ6hLt6dmPe3bipfgh9eYbN3GARc-A@mail.gmail.com \
    --to=jiaqiyan@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=ankita@nvidia.com \
    --cc=duenwen@google.com \
    --cc=jane.chu@oracle.com \
    --cc=jgg@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=tony.luck@intel.com \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).