All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kyle Meyer <kyle.meyer@hpe.com>
To: Jiaqi Yan <jiaqiyan@google.com>
Cc: akpm@linux-foundation.org, david@redhat.com, tony.luck@intel.com,
	bp@alien8.de, linmiaohe@huawei.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org,
	lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, nao.horiguchi@gmail.com, jane.chu@oracle.com,
	osalvador@suse.de
Subject: Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
Date: Thu, 21 Aug 2025 14:36:11 -0500	[thread overview]
Message-ID: <aKd1K3ueTacGTf1W@hpe.com> (raw)
In-Reply-To: <CACw3F53KmKRJyH+ajicyDUgGbPZT=U3VE4n+Jt3E62BxEiiCGA@mail.gmail.com>

On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
> On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> >
> > Calling action_result() on already poisoned pages causes issues:
> >
> > * The amount of hardware corrupted memory is incorrectly incremented.
> > * NUMA node memory failure statistics are incorrectly updated.
> > * Redundant "already poisoned" messages are printed.
> 
> All agreed.
> 
> >
> > Do not call action_result() on already poisoned pages and drop unused
> > MF_MSG_ALREADY_POISONED.
> 
> Hi Kyle,
> 
> Patch looks great to me, just one thought...
> 
> Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
> but changing action_result for MF_MSG_ALREADY_POISONED?
> - don't num_poisoned_pages_inc(pfn)
> - don't update_per_node_mf_stats(pfn, result)
> - still pr_err("%#lx: recovery action for %s: %s\n", ...)
> - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
> in memory_failure and try_memory_failure_hugetlb

I did consider that approach but I was concerned about passing
MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
bit misleading.

How about introducing a new MF action result? Maybe MF_NONE? The message could
look something like:

Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None

> This way, all the MF recovery result kernel logs out will be sitting
> in one place, action_result, instead of scattering around all over the
> place.

That sounds better to me.
 
> >
> > Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages")
> > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> > ---
> >  include/linux/mm.h      | 1 -
> >  include/ras/ras_event.h | 1 -
> >  mm/memory-failure.c     | 3 ---
> >  3 files changed, 5 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 1ae97a0b8ec7..09ce81ef7afc 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4005,7 +4005,6 @@ enum mf_action_page_type {
> >         MF_MSG_BUDDY,
> >         MF_MSG_DAX,
> >         MF_MSG_UNSPLIT_THP,
> > -       MF_MSG_ALREADY_POISONED,
> >         MF_MSG_UNKNOWN,
> >  };
> >
> > diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> > index c8cd0f00c845..f62a52f5bd81 100644
> > --- a/include/ras/ras_event.h
> > +++ b/include/ras/ras_event.h
> > @@ -374,7 +374,6 @@ TRACE_EVENT(aer_event,
> >         EM ( MF_MSG_BUDDY, "free buddy page" )                          \
> >         EM ( MF_MSG_DAX, "dax page" )                                   \
> >         EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )                        \
> > -       EM ( MF_MSG_ALREADY_POISONED, "already poisoned" )              \
> >         EMe ( MF_MSG_UNKNOWN, "unknown page" )
> >
> >  /*
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index e2e685b971bb..7839ec83bc1d 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -948,7 +948,6 @@ static const char * const action_page_types[] = {
> >         [MF_MSG_BUDDY]                  = "free buddy page",
> >         [MF_MSG_DAX]                    = "dax page",
> >         [MF_MSG_UNSPLIT_THP]            = "unsplit thp",
> > -       [MF_MSG_ALREADY_POISONED]       = "already poisoned",
> >         [MF_MSG_UNKNOWN]                = "unknown page",
> >  };
> >
> > @@ -2090,7 +2089,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> >                 if (flags & MF_ACTION_REQUIRED) {
> >                         folio = page_folio(p);
> >                         res = kill_accessing_process(current, folio_pfn(folio), flags);
> > -                       action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> >                 }
> >                 return res;
> >         } else if (res == -EBUSY) {
> > @@ -2283,7 +2281,6 @@ int memory_failure(unsigned long pfn, int flags)
> >                         res = kill_accessing_process(current, pfn, flags);
> >                 if (flags & MF_COUNT_INCREASED)
> >                         put_page(p);
> > -               action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> >                 goto unlock_mutex;
> >         }
> >
> > --
> > 2.50.1
> >
> >

  reply	other threads:[~2025-08-21 19:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-21 16:44 [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages Kyle Meyer
2025-08-21 18:23 ` Jiaqi Yan
2025-08-21 19:36   ` Kyle Meyer [this message]
2025-08-22  0:24     ` Jiaqi Yan
2025-08-25  3:04       ` Miaohe Lin
2025-08-25 16:09         ` Kyle Meyer
2025-08-25 22:36           ` jane.chu
2025-08-26  1:56             ` Kyle Meyer
2025-08-26 17:24               ` jane.chu
2025-08-26 19:27                 ` Kyle Meyer
2025-08-26 21:22                   ` Jiaqi Yan
2025-08-27  8:06                     ` jane.chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKd1K3ueTacGTf1W@hpe.com \
    --to=kyle.meyer@hpe.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=jane.chu@oracle.com \
    --cc=jiaqiyan@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.