All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kyle Meyer <kyle.meyer@hpe.com>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>,
	akpm@linux-foundation.org, david@redhat.com, tony.luck@intel.com,
	bp@alien8.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, nao.horiguchi@gmail.com,
	jane.chu@oracle.com, osalvador@suse.de
Subject: Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
Date: Mon, 25 Aug 2025 11:09:06 -0500	[thread overview]
Message-ID: <aKyKort2opfQYqgA@hpe.com> (raw)
In-Reply-To: <14a0dd45-388d-7a32-5ee5-44e60277271a@huawei.com>

On Mon, Aug 25, 2025 at 11:04:43AM +0800, Miaohe Lin wrote:
> On 2025/8/22 8:24, Jiaqi Yan wrote:
> > On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> >>
> >> On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
> >>> On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> >>>>
> >>>> Calling action_result() on already poisoned pages causes issues:
> >>>>
> >>>> * The amount of hardware corrupted memory is incorrectly incremented.
> >>>> * NUMA node memory failure statistics are incorrectly updated.
> >>>> * Redundant "already poisoned" messages are printed.
> >>>
> >>> All agreed.
> >>>
> >>>>
> >>>> Do not call action_result() on already poisoned pages and drop unused
> >>>> MF_MSG_ALREADY_POISONED.
> >>>
> >>> Hi Kyle,
> >>>
> >>> Patch looks great to me, just one thought...
> 
> Thanks both.
> 
> >>>
> >>> Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
> >>> but changing action_result for MF_MSG_ALREADY_POISONED?
> >>> - don't num_poisoned_pages_inc(pfn)
> >>> - don't update_per_node_mf_stats(pfn, result)
> >>> - still pr_err("%#lx: recovery action for %s: %s\n", ...)
> >>> - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
> >>> in memory_failure and try_memory_failure_hugetlb
> >>
> >> I did consider that approach but I was concerned about passing
> >> MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
> >> bit misleading.
> > 
> > Based on my reading the documentation for MF_* in static const char
> > *action_name[]...
> > 
> > Yeah, for file mapped pages, kernel may not have hole-punched or
> > truncated it from the file mapping (shmem and hugetlbfs for example)
> > but that still considered as MF_RECOVERED, so touching a page with
> > HWPoison flag doesn't mean that page was failed to be recovered
> > previously.
> > 
> > For pages intended to be taken out of the buddy system, touching a
> > page with HWPoison flag does imply it isn't isolated and hence
> > MF_FAILED.
> 
> There should be other cases that memory_failure failed to isolate the
> hwpoisoned pages at first time due to various reasons.
> 
> > 
> > In summary, seeing the HWPoison flag again doesn't necessarily
> > indicate what the recovery result was previously; it only indicate
> > kernel won't re-attempt to recover?
> 
> Yes, kernel won't re-attempt to or just cannot recover.
> 
> > 
> >>
> >> How about introducing a new MF action result? Maybe MF_NONE? The message could
> >> look something like:
> > 
> > Adding MF_NONE sounds fine to me, as long as we correctly document its
> > meaning, which can be subtle.
> 
> Adding a new MF action result sounds good to me. But IMHO MF_NONE might not be that suitable
> as kill_accessing_process might be called to kill proc in this case, so it's not "NONE".

OK, would you like a separate MF action result for each case? Maybe
MF_ALREADY_POISONED and MF_ALREADY_POISONED_KILLED?

MF_ALREADY_POISONED can be the default and MF_ALREADY_POISONED_KILLED can be
used when kill_accessing_process() returns -EHWPOISON.

The log messages could look like...

Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
	and
Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: Process killed

Thanks,
Kyle Meyer

  reply	other threads:[~2025-08-25 16:10 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-21 16:44 [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages Kyle Meyer
2025-08-21 18:23 ` Jiaqi Yan
2025-08-21 19:36   ` Kyle Meyer
2025-08-22  0:24     ` Jiaqi Yan
2025-08-25  3:04       ` Miaohe Lin
2025-08-25 16:09         ` Kyle Meyer [this message]
2025-08-25 22:36           ` jane.chu
2025-08-26  1:56             ` Kyle Meyer
2025-08-26 17:24               ` jane.chu
2025-08-26 19:27                 ` Kyle Meyer
2025-08-26 21:22                   ` Jiaqi Yan
2025-08-27  8:06                     ` jane.chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKyKort2opfQYqgA@hpe.com \
    --to=kyle.meyer@hpe.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=jane.chu@oracle.com \
    --cc=jiaqiyan@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.