All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <ak@linux.intel.com>
To: Jiaqi Yan <jiaqiyan@google.com>
Cc: nao.horiguchi@gmail.com, linmiaohe@huawei.com,
	jane.chu@oracle.com, osalvador@suse.de, muchun.song@linux.dev,
	akpm@linux-foundation.org, shuah@kernel.org, corbet@lwn.net,
	rientjes@google.com, duenwen@google.com, fvdl@google.com,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	linux-doc@vger.kernel.org
Subject: Re: [PATCH v4 0/4] Userspace controls soft-offline pages
Date: Sat, 22 Jun 2024 09:49:12 -0700	[thread overview]
Message-ID: <ZncAiH-SWmGQY5so@tassilo> (raw)
In-Reply-To: <CACw3F51QadqESg2a8Lb_A+PCH7TH0W8BqwNKCyOX4nyeeP1wAw@mail.gmail.com>

On Fri, Jun 21, 2024 at 04:53:41PM -0700, Jiaqi Yan wrote:
> Thanks for your comment, Andi.
> 
> On Thu, Jun 20, 2024 at 3:53 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > Jiaqi Yan <jiaqiyan@google.com> writes:
> >
> > > Correctable memory errors are very common on servers with large
> > > amount of memory, and are corrected by ECC, but with two
> > > pain points to users:
> > > 1. Correction usually happens on the fly and adds latency overhead
> > > 2. Not-fully-proved theory states excessive correctable memory
> > >    errors can develop into uncorrectable memory error.
> >
> > This patchkit is amusing (or maybe sad) because it basically tries to
> > reconstruct the original soft offline design using a user space daemon
> > instead of doing policy badly in the kernel.
> 
> Some clarifications. I don't intend to reconstruct. I think this
> patchset can also be treated as "patch some missing places so that
> kernel doesn't soft offline behind the back of userspace daemon".
> I agree with you (IIUC) that the policy for corrected memory errors
> should exist in userspace. But the situation is that some behaviors in
> the kernel don't respect that (they either have a reason to not
> respect, or just forget to respect). enable_soft_offline is basically
> the big button in userspace to block these kernel violators.

It would be better to disable them earlier before they waste work
tracking things unnecessarily.  But yes it's a step in the right direction.

> 
> >
> > You can still have it by enabling CONFIG_X86_MCELOG_LEGACY and
> > use http://www.mcelog.org or an equivalent daemon of your chosing
> > that listens to /dev/mcelog.
> 
> If I didn't miss anything important in
> https://github.com/andikleen/mcelog and
> arch/x86/kernel/cpu/mce/dev-mcelog.c, I don't think /dev/mcelog works
> on ARM platforms where CPER is used to convey hw errors from platform
> to OS.

Yes or not on AMD even. 

-Andi

      reply	other threads:[~2024-06-22 16:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-20 18:48 [PATCH v4 0/4] Userspace controls soft-offline pages Jiaqi Yan
2024-06-20 18:48 ` [PATCH v4 1/4] mm/memory-failure: refactor log format in soft offline code Jiaqi Yan
2024-06-24  3:08   ` Miaohe Lin
2024-06-20 18:48 ` [PATCH v4 2/4] mm/memory-failure: userspace controls soft-offlining pages Jiaqi Yan
2024-06-24  3:41   ` Miaohe Lin
2024-06-24 16:18     ` Jiaqi Yan
2024-06-20 18:48 ` [PATCH v4 3/4] selftest/mm: test enable_soft_offline behaviors Jiaqi Yan
2024-06-21  5:08   ` Muhammad Usama Anjum
2024-06-21 14:43     ` Jiaqi Yan
2024-06-20 18:48 ` [PATCH v4 4/4] docs: mm: add enable_soft_offline sysctl Jiaqi Yan
2024-06-20 22:53 ` [PATCH v4 0/4] Userspace controls soft-offline pages Andi Kleen
2024-06-21 23:53   ` Jiaqi Yan
2024-06-22 16:49     ` Andi Kleen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZncAiH-SWmGQY5so@tassilo \
    --to=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=duenwen@google.com \
    --cc=fvdl@google.com \
    --cc=jane.chu@oracle.com \
    --cc=jiaqiyan@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.