* [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
@ 2025-08-21 16:44 Kyle Meyer
2025-08-21 18:23 ` Jiaqi Yan
0 siblings, 1 reply; 12+ messages in thread
From: Kyle Meyer @ 2025-08-21 16:44 UTC (permalink / raw)
To: akpm, david, tony.luck, bp, linmiaohe, linux-mm, linux-kernel,
linux-edac
Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
nao.horiguchi, jane.chu, osalvador, Kyle Meyer
Calling action_result() on already poisoned pages causes issues:
* The amount of hardware corrupted memory is incorrectly incremented.
* NUMA node memory failure statistics are incorrectly updated.
* Redundant "already poisoned" messages are printed.
Do not call action_result() on already poisoned pages and drop unused
MF_MSG_ALREADY_POISONED.
Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages")
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
---
include/linux/mm.h | 1 -
include/ras/ras_event.h | 1 -
mm/memory-failure.c | 3 ---
3 files changed, 5 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ae97a0b8ec7..09ce81ef7afc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4005,7 +4005,6 @@ enum mf_action_page_type {
MF_MSG_BUDDY,
MF_MSG_DAX,
MF_MSG_UNSPLIT_THP,
- MF_MSG_ALREADY_POISONED,
MF_MSG_UNKNOWN,
};
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index c8cd0f00c845..f62a52f5bd81 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -374,7 +374,6 @@ TRACE_EVENT(aer_event,
EM ( MF_MSG_BUDDY, "free buddy page" ) \
EM ( MF_MSG_DAX, "dax page" ) \
EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \
- EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \
EMe ( MF_MSG_UNKNOWN, "unknown page" )
/*
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e2e685b971bb..7839ec83bc1d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -948,7 +948,6 @@ static const char * const action_page_types[] = {
[MF_MSG_BUDDY] = "free buddy page",
[MF_MSG_DAX] = "dax page",
[MF_MSG_UNSPLIT_THP] = "unsplit thp",
- [MF_MSG_ALREADY_POISONED] = "already poisoned",
[MF_MSG_UNKNOWN] = "unknown page",
};
@@ -2090,7 +2089,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
if (flags & MF_ACTION_REQUIRED) {
folio = page_folio(p);
res = kill_accessing_process(current, folio_pfn(folio), flags);
- action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
}
return res;
} else if (res == -EBUSY) {
@@ -2283,7 +2281,6 @@ int memory_failure(unsigned long pfn, int flags)
res = kill_accessing_process(current, pfn, flags);
if (flags & MF_COUNT_INCREASED)
put_page(p);
- action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
goto unlock_mutex;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-21 16:44 [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages Kyle Meyer
@ 2025-08-21 18:23 ` Jiaqi Yan
2025-08-21 19:36 ` Kyle Meyer
0 siblings, 1 reply; 12+ messages in thread
From: Jiaqi Yan @ 2025-08-21 18:23 UTC (permalink / raw)
To: Kyle Meyer
Cc: akpm, david, tony.luck, bp, linmiaohe, linux-mm, linux-kernel,
linux-edac, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
mhocko, nao.horiguchi, jane.chu, osalvador
On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>
> Calling action_result() on already poisoned pages causes issues:
>
> * The amount of hardware corrupted memory is incorrectly incremented.
> * NUMA node memory failure statistics are incorrectly updated.
> * Redundant "already poisoned" messages are printed.
All agreed.
>
> Do not call action_result() on already poisoned pages and drop unused
> MF_MSG_ALREADY_POISONED.
Hi Kyle,
Patch looks great to me, just one thought...
Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
but changing action_result for MF_MSG_ALREADY_POISONED?
- don't num_poisoned_pages_inc(pfn)
- don't update_per_node_mf_stats(pfn, result)
- still pr_err("%#lx: recovery action for %s: %s\n", ...)
- meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
in memory_failure and try_memory_failure_hugetlb
This way, all the MF recovery result kernel logs out will be sitting
in one place, action_result, instead of scattering around all over the
place.
>
> Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages")
> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> ---
> include/linux/mm.h | 1 -
> include/ras/ras_event.h | 1 -
> mm/memory-failure.c | 3 ---
> 3 files changed, 5 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ae97a0b8ec7..09ce81ef7afc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4005,7 +4005,6 @@ enum mf_action_page_type {
> MF_MSG_BUDDY,
> MF_MSG_DAX,
> MF_MSG_UNSPLIT_THP,
> - MF_MSG_ALREADY_POISONED,
> MF_MSG_UNKNOWN,
> };
>
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index c8cd0f00c845..f62a52f5bd81 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -374,7 +374,6 @@ TRACE_EVENT(aer_event,
> EM ( MF_MSG_BUDDY, "free buddy page" ) \
> EM ( MF_MSG_DAX, "dax page" ) \
> EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \
> - EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \
> EMe ( MF_MSG_UNKNOWN, "unknown page" )
>
> /*
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index e2e685b971bb..7839ec83bc1d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -948,7 +948,6 @@ static const char * const action_page_types[] = {
> [MF_MSG_BUDDY] = "free buddy page",
> [MF_MSG_DAX] = "dax page",
> [MF_MSG_UNSPLIT_THP] = "unsplit thp",
> - [MF_MSG_ALREADY_POISONED] = "already poisoned",
> [MF_MSG_UNKNOWN] = "unknown page",
> };
>
> @@ -2090,7 +2089,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> if (flags & MF_ACTION_REQUIRED) {
> folio = page_folio(p);
> res = kill_accessing_process(current, folio_pfn(folio), flags);
> - action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> }
> return res;
> } else if (res == -EBUSY) {
> @@ -2283,7 +2281,6 @@ int memory_failure(unsigned long pfn, int flags)
> res = kill_accessing_process(current, pfn, flags);
> if (flags & MF_COUNT_INCREASED)
> put_page(p);
> - action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> goto unlock_mutex;
> }
>
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-21 18:23 ` Jiaqi Yan
@ 2025-08-21 19:36 ` Kyle Meyer
2025-08-22 0:24 ` Jiaqi Yan
0 siblings, 1 reply; 12+ messages in thread
From: Kyle Meyer @ 2025-08-21 19:36 UTC (permalink / raw)
To: Jiaqi Yan
Cc: akpm, david, tony.luck, bp, linmiaohe, linux-mm, linux-kernel,
linux-edac, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
mhocko, nao.horiguchi, jane.chu, osalvador
On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
> On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> >
> > Calling action_result() on already poisoned pages causes issues:
> >
> > * The amount of hardware corrupted memory is incorrectly incremented.
> > * NUMA node memory failure statistics are incorrectly updated.
> > * Redundant "already poisoned" messages are printed.
>
> All agreed.
>
> >
> > Do not call action_result() on already poisoned pages and drop unused
> > MF_MSG_ALREADY_POISONED.
>
> Hi Kyle,
>
> Patch looks great to me, just one thought...
>
> Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
> but changing action_result for MF_MSG_ALREADY_POISONED?
> - don't num_poisoned_pages_inc(pfn)
> - don't update_per_node_mf_stats(pfn, result)
> - still pr_err("%#lx: recovery action for %s: %s\n", ...)
> - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
> in memory_failure and try_memory_failure_hugetlb
I did consider that approach but I was concerned about passing
MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
bit misleading.
How about introducing a new MF action result? Maybe MF_NONE? The message could
look something like:
Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
> This way, all the MF recovery result kernel logs out will be sitting
> in one place, action_result, instead of scattering around all over the
> place.
That sounds better to me.
> >
> > Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages")
> > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> > ---
> > include/linux/mm.h | 1 -
> > include/ras/ras_event.h | 1 -
> > mm/memory-failure.c | 3 ---
> > 3 files changed, 5 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 1ae97a0b8ec7..09ce81ef7afc 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4005,7 +4005,6 @@ enum mf_action_page_type {
> > MF_MSG_BUDDY,
> > MF_MSG_DAX,
> > MF_MSG_UNSPLIT_THP,
> > - MF_MSG_ALREADY_POISONED,
> > MF_MSG_UNKNOWN,
> > };
> >
> > diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> > index c8cd0f00c845..f62a52f5bd81 100644
> > --- a/include/ras/ras_event.h
> > +++ b/include/ras/ras_event.h
> > @@ -374,7 +374,6 @@ TRACE_EVENT(aer_event,
> > EM ( MF_MSG_BUDDY, "free buddy page" ) \
> > EM ( MF_MSG_DAX, "dax page" ) \
> > EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \
> > - EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \
> > EMe ( MF_MSG_UNKNOWN, "unknown page" )
> >
> > /*
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index e2e685b971bb..7839ec83bc1d 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -948,7 +948,6 @@ static const char * const action_page_types[] = {
> > [MF_MSG_BUDDY] = "free buddy page",
> > [MF_MSG_DAX] = "dax page",
> > [MF_MSG_UNSPLIT_THP] = "unsplit thp",
> > - [MF_MSG_ALREADY_POISONED] = "already poisoned",
> > [MF_MSG_UNKNOWN] = "unknown page",
> > };
> >
> > @@ -2090,7 +2089,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> > if (flags & MF_ACTION_REQUIRED) {
> > folio = page_folio(p);
> > res = kill_accessing_process(current, folio_pfn(folio), flags);
> > - action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> > }
> > return res;
> > } else if (res == -EBUSY) {
> > @@ -2283,7 +2281,6 @@ int memory_failure(unsigned long pfn, int flags)
> > res = kill_accessing_process(current, pfn, flags);
> > if (flags & MF_COUNT_INCREASED)
> > put_page(p);
> > - action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> > goto unlock_mutex;
> > }
> >
> > --
> > 2.50.1
> >
> >
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-21 19:36 ` Kyle Meyer
@ 2025-08-22 0:24 ` Jiaqi Yan
2025-08-25 3:04 ` Miaohe Lin
0 siblings, 1 reply; 12+ messages in thread
From: Jiaqi Yan @ 2025-08-22 0:24 UTC (permalink / raw)
To: Kyle Meyer, linmiaohe
Cc: akpm, david, tony.luck, bp, linux-mm, linux-kernel, linux-edac,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
nao.horiguchi, jane.chu, osalvador
On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>
> On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
> > On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> > >
> > > Calling action_result() on already poisoned pages causes issues:
> > >
> > > * The amount of hardware corrupted memory is incorrectly incremented.
> > > * NUMA node memory failure statistics are incorrectly updated.
> > > * Redundant "already poisoned" messages are printed.
> >
> > All agreed.
> >
> > >
> > > Do not call action_result() on already poisoned pages and drop unused
> > > MF_MSG_ALREADY_POISONED.
> >
> > Hi Kyle,
> >
> > Patch looks great to me, just one thought...
> >
> > Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
> > but changing action_result for MF_MSG_ALREADY_POISONED?
> > - don't num_poisoned_pages_inc(pfn)
> > - don't update_per_node_mf_stats(pfn, result)
> > - still pr_err("%#lx: recovery action for %s: %s\n", ...)
> > - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
> > in memory_failure and try_memory_failure_hugetlb
>
> I did consider that approach but I was concerned about passing
> MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
> bit misleading.
Based on my reading the documentation for MF_* in static const char
*action_name[]...
Yeah, for file mapped pages, kernel may not have hole-punched or
truncated it from the file mapping (shmem and hugetlbfs for example)
but that still considered as MF_RECOVERED, so touching a page with
HWPoison flag doesn't mean that page was failed to be recovered
previously.
For pages intended to be taken out of the buddy system, touching a
page with HWPoison flag does imply it isn't isolated and hence
MF_FAILED.
In summary, seeing the HWPoison flag again doesn't necessarily
indicate what the recovery result was previously; it only indicate
kernel won't re-attempt to recover?
>
> How about introducing a new MF action result? Maybe MF_NONE? The message could
> look something like:
Adding MF_NONE sounds fine to me, as long as we correctly document its
meaning, which can be subtle.
Let's see what Miaohe's thoughts are.
>
> Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
>
> > This way, all the MF recovery result kernel logs out will be sitting
> > in one place, action_result, instead of scattering around all over the
> > place.
>
> That sounds better to me.
>
> > >
> > > Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages")
> > > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> > > ---
> > > include/linux/mm.h | 1 -
> > > include/ras/ras_event.h | 1 -
> > > mm/memory-failure.c | 3 ---
> > > 3 files changed, 5 deletions(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 1ae97a0b8ec7..09ce81ef7afc 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -4005,7 +4005,6 @@ enum mf_action_page_type {
> > > MF_MSG_BUDDY,
> > > MF_MSG_DAX,
> > > MF_MSG_UNSPLIT_THP,
> > > - MF_MSG_ALREADY_POISONED,
> > > MF_MSG_UNKNOWN,
> > > };
> > >
> > > diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> > > index c8cd0f00c845..f62a52f5bd81 100644
> > > --- a/include/ras/ras_event.h
> > > +++ b/include/ras/ras_event.h
> > > @@ -374,7 +374,6 @@ TRACE_EVENT(aer_event,
> > > EM ( MF_MSG_BUDDY, "free buddy page" ) \
> > > EM ( MF_MSG_DAX, "dax page" ) \
> > > EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \
> > > - EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \
> > > EMe ( MF_MSG_UNKNOWN, "unknown page" )
> > >
> > > /*
> > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > index e2e685b971bb..7839ec83bc1d 100644
> > > --- a/mm/memory-failure.c
> > > +++ b/mm/memory-failure.c
> > > @@ -948,7 +948,6 @@ static const char * const action_page_types[] = {
> > > [MF_MSG_BUDDY] = "free buddy page",
> > > [MF_MSG_DAX] = "dax page",
> > > [MF_MSG_UNSPLIT_THP] = "unsplit thp",
> > > - [MF_MSG_ALREADY_POISONED] = "already poisoned",
> > > [MF_MSG_UNKNOWN] = "unknown page",
> > > };
> > >
> > > @@ -2090,7 +2089,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> > > if (flags & MF_ACTION_REQUIRED) {
> > > folio = page_folio(p);
> > > res = kill_accessing_process(current, folio_pfn(folio), flags);
> > > - action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> > > }
> > > return res;
> > > } else if (res == -EBUSY) {
> > > @@ -2283,7 +2281,6 @@ int memory_failure(unsigned long pfn, int flags)
> > > res = kill_accessing_process(current, pfn, flags);
> > > if (flags & MF_COUNT_INCREASED)
> > > put_page(p);
> > > - action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> > > goto unlock_mutex;
> > > }
> > >
> > > --
> > > 2.50.1
> > >
> > >
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-22 0:24 ` Jiaqi Yan
@ 2025-08-25 3:04 ` Miaohe Lin
2025-08-25 16:09 ` Kyle Meyer
0 siblings, 1 reply; 12+ messages in thread
From: Miaohe Lin @ 2025-08-25 3:04 UTC (permalink / raw)
To: Jiaqi Yan, Kyle Meyer
Cc: akpm, david, tony.luck, bp, linux-mm, linux-kernel, linux-edac,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
nao.horiguchi, jane.chu, osalvador
On 2025/8/22 8:24, Jiaqi Yan wrote:
> On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>
>> On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
>>> On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>>>
>>>> Calling action_result() on already poisoned pages causes issues:
>>>>
>>>> * The amount of hardware corrupted memory is incorrectly incremented.
>>>> * NUMA node memory failure statistics are incorrectly updated.
>>>> * Redundant "already poisoned" messages are printed.
>>>
>>> All agreed.
>>>
>>>>
>>>> Do not call action_result() on already poisoned pages and drop unused
>>>> MF_MSG_ALREADY_POISONED.
>>>
>>> Hi Kyle,
>>>
>>> Patch looks great to me, just one thought...
Thanks both.
>>>
>>> Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
>>> but changing action_result for MF_MSG_ALREADY_POISONED?
>>> - don't num_poisoned_pages_inc(pfn)
>>> - don't update_per_node_mf_stats(pfn, result)
>>> - still pr_err("%#lx: recovery action for %s: %s\n", ...)
>>> - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
>>> in memory_failure and try_memory_failure_hugetlb
>>
>> I did consider that approach but I was concerned about passing
>> MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
>> bit misleading.
>
> Based on my reading the documentation for MF_* in static const char
> *action_name[]...
>
> Yeah, for file mapped pages, kernel may not have hole-punched or
> truncated it from the file mapping (shmem and hugetlbfs for example)
> but that still considered as MF_RECOVERED, so touching a page with
> HWPoison flag doesn't mean that page was failed to be recovered
> previously.
>
> For pages intended to be taken out of the buddy system, touching a
> page with HWPoison flag does imply it isn't isolated and hence
> MF_FAILED.
There should be other cases that memory_failure failed to isolate the
hwpoisoned pages at first time due to various reasons.
>
> In summary, seeing the HWPoison flag again doesn't necessarily
> indicate what the recovery result was previously; it only indicate
> kernel won't re-attempt to recover?
Yes, kernel won't re-attempt to or just cannot recover.
>
>>
>> How about introducing a new MF action result? Maybe MF_NONE? The message could
>> look something like:
>
> Adding MF_NONE sounds fine to me, as long as we correctly document its
> meaning, which can be subtle.
Adding a new MF action result sounds good to me. But IMHO MF_NONE might not be that suitable
as kill_accessing_process might be called to kill proc in this case, so it's not "NONE".
Thanks.
.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-25 3:04 ` Miaohe Lin
@ 2025-08-25 16:09 ` Kyle Meyer
2025-08-25 22:36 ` jane.chu
0 siblings, 1 reply; 12+ messages in thread
From: Kyle Meyer @ 2025-08-25 16:09 UTC (permalink / raw)
To: Miaohe Lin
Cc: Jiaqi Yan, akpm, david, tony.luck, bp, linux-mm, linux-kernel,
linux-edac, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
mhocko, nao.horiguchi, jane.chu, osalvador
On Mon, Aug 25, 2025 at 11:04:43AM +0800, Miaohe Lin wrote:
> On 2025/8/22 8:24, Jiaqi Yan wrote:
> > On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> >>
> >> On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
> >>> On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> >>>>
> >>>> Calling action_result() on already poisoned pages causes issues:
> >>>>
> >>>> * The amount of hardware corrupted memory is incorrectly incremented.
> >>>> * NUMA node memory failure statistics are incorrectly updated.
> >>>> * Redundant "already poisoned" messages are printed.
> >>>
> >>> All agreed.
> >>>
> >>>>
> >>>> Do not call action_result() on already poisoned pages and drop unused
> >>>> MF_MSG_ALREADY_POISONED.
> >>>
> >>> Hi Kyle,
> >>>
> >>> Patch looks great to me, just one thought...
>
> Thanks both.
>
> >>>
> >>> Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
> >>> but changing action_result for MF_MSG_ALREADY_POISONED?
> >>> - don't num_poisoned_pages_inc(pfn)
> >>> - don't update_per_node_mf_stats(pfn, result)
> >>> - still pr_err("%#lx: recovery action for %s: %s\n", ...)
> >>> - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
> >>> in memory_failure and try_memory_failure_hugetlb
> >>
> >> I did consider that approach but I was concerned about passing
> >> MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
> >> bit misleading.
> >
> > Based on my reading the documentation for MF_* in static const char
> > *action_name[]...
> >
> > Yeah, for file mapped pages, kernel may not have hole-punched or
> > truncated it from the file mapping (shmem and hugetlbfs for example)
> > but that still considered as MF_RECOVERED, so touching a page with
> > HWPoison flag doesn't mean that page was failed to be recovered
> > previously.
> >
> > For pages intended to be taken out of the buddy system, touching a
> > page with HWPoison flag does imply it isn't isolated and hence
> > MF_FAILED.
>
> There should be other cases that memory_failure failed to isolate the
> hwpoisoned pages at first time due to various reasons.
>
> >
> > In summary, seeing the HWPoison flag again doesn't necessarily
> > indicate what the recovery result was previously; it only indicate
> > kernel won't re-attempt to recover?
>
> Yes, kernel won't re-attempt to or just cannot recover.
>
> >
> >>
> >> How about introducing a new MF action result? Maybe MF_NONE? The message could
> >> look something like:
> >
> > Adding MF_NONE sounds fine to me, as long as we correctly document its
> > meaning, which can be subtle.
>
> Adding a new MF action result sounds good to me. But IMHO MF_NONE might not be that suitable
> as kill_accessing_process might be called to kill proc in this case, so it's not "NONE".
OK, would you like a separate MF action result for each case? Maybe
MF_ALREADY_POISONED and MF_ALREADY_POISONED_KILLED?
MF_ALREADY_POISONED can be the default and MF_ALREADY_POISONED_KILLED can be
used when kill_accessing_process() returns -EHWPOISON.
The log messages could look like...
Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
and
Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: Process killed
Thanks,
Kyle Meyer
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-25 16:09 ` Kyle Meyer
@ 2025-08-25 22:36 ` jane.chu
2025-08-26 1:56 ` Kyle Meyer
0 siblings, 1 reply; 12+ messages in thread
From: jane.chu @ 2025-08-25 22:36 UTC (permalink / raw)
To: Kyle Meyer, Miaohe Lin
Cc: Jiaqi Yan, akpm, david, tony.luck, bp, linux-mm, linux-kernel,
linux-edac, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
mhocko, nao.horiguchi, osalvador
On 8/25/2025 9:09 AM, Kyle Meyer wrote:
> On Mon, Aug 25, 2025 at 11:04:43AM +0800, Miaohe Lin wrote:
>> On 2025/8/22 8:24, Jiaqi Yan wrote:
>>> On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>>>
>>>> On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
>>>>> On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>>>>>
>>>>>> Calling action_result() on already poisoned pages causes issues:
>>>>>>
>>>>>> * The amount of hardware corrupted memory is incorrectly incremented.
>>>>>> * NUMA node memory failure statistics are incorrectly updated.
>>>>>> * Redundant "already poisoned" messages are printed.
Assuming this means that the numbers reported from
/sys/devices/system/node/node*/memory_failure/*
do not match certain expectation, right?
If so, could you clarify what is the expectation?
>>>>>
>>>>> All agreed.
>>>>>
>>>>>>
>>>>>> Do not call action_result() on already poisoned pages and drop unused
>>>>>> MF_MSG_ALREADY_POISONED.
>>>>>
>>>>> Hi Kyle,
>>>>>
>>>>> Patch looks great to me, just one thought...
>>
>> Thanks both.
>>
>>>>>
>>>>> Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
>>>>> but changing action_result for MF_MSG_ALREADY_POISONED?
>>>>> - don't num_poisoned_pages_inc(pfn)
>>>>> - don't update_per_node_mf_stats(pfn, result)
>>>>> - still pr_err("%#lx: recovery action for %s: %s\n", ...)
>>>>> - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
>>>>> in memory_failure and try_memory_failure_hugetlb
>>>>
>>>> I did consider that approach but I was concerned about passing
>>>> MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
>>>> bit misleading.
>>>
>>> Based on my reading the documentation for MF_* in static const char
>>> *action_name[]...
>>>
>>> Yeah, for file mapped pages, kernel may not have hole-punched or
>>> truncated it from the file mapping (shmem and hugetlbfs for example)
>>> but that still considered as MF_RECOVERED, so touching a page with
>>> HWPoison flag doesn't mean that page was failed to be recovered
>>> previously.
>>>
>>> For pages intended to be taken out of the buddy system, touching a
>>> page with HWPoison flag does imply it isn't isolated and hence
>>> MF_FAILED.
>>
>> There should be other cases that memory_failure failed to isolate the
>> hwpoisoned pages at first time due to various reasons.
>>
>>>
>>> In summary, seeing the HWPoison flag again doesn't necessarily
>>> indicate what the recovery result was previously; it only indicate
>>> kernel won't re-attempt to recover?
>>
>> Yes, kernel won't re-attempt to or just cannot recover.
>>
>>>
>>>>
>>>> How about introducing a new MF action result? Maybe MF_NONE? The message could
>>>> look something like:
>>>
>>> Adding MF_NONE sounds fine to me, as long as we correctly document its
>>> meaning, which can be subtle.
>>
>> Adding a new MF action result sounds good to me. But IMHO MF_NONE might not be that suitable
>> as kill_accessing_process might be called to kill proc in this case, so it's not "NONE".
>
> OK, would you like a separate MF action result for each case? Maybe
> MF_ALREADY_POISONED and MF_ALREADY_POISONED_KILLED?
>
> MF_ALREADY_POISONED can be the default and MF_ALREADY_POISONED_KILLED can be
> used when kill_accessing_process() returns -EHWPOISON.
>
> The log messages could look like...
>
> Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
> and
> Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: Process killed
Agreed with Miaohe that "None" won't work.
"Process killed" sounds okay for MF_MSG_ALREADY_POISONED, but
we need to understand why "Failed" doesn't work for your usecase.
"Failed" means process is killed but page is not successfully isolated
which applies to MF_MSG_ALREADY_POISONED case as well.
thanks!
-jane
>
> Thanks,
> Kyle Meyer
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-25 22:36 ` jane.chu
@ 2025-08-26 1:56 ` Kyle Meyer
2025-08-26 17:24 ` jane.chu
0 siblings, 1 reply; 12+ messages in thread
From: Kyle Meyer @ 2025-08-26 1:56 UTC (permalink / raw)
To: jane.chu
Cc: Miaohe Lin, Jiaqi Yan, akpm, david, tony.luck, bp, linux-mm,
linux-kernel, linux-edac, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, nao.horiguchi, osalvador
On Mon, Aug 25, 2025 at 03:36:54PM -0700, jane.chu@oracle.com wrote:
> On 8/25/2025 9:09 AM, Kyle Meyer wrote:
> > On Mon, Aug 25, 2025 at 11:04:43AM +0800, Miaohe Lin wrote:
> > > On 2025/8/22 8:24, Jiaqi Yan wrote:
> > > > On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> > > > >
> > > > > On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
> > > > > > On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> > > > > > >
> > > > > > > Calling action_result() on already poisoned pages causes issues:
> > > > > > >
> > > > > > > * The amount of hardware corrupted memory is incorrectly incremented.
> > > > > > > * NUMA node memory failure statistics are incorrectly updated.
> > > > > > > * Redundant "already poisoned" messages are printed.
>
> Assuming this means that the numbers reported from
> /sys/devices/system/node/node*/memory_failure/*
> do not match certain expectation, right?
>
> If so, could you clarify what is the expectation?
Sure, and please let me know if I'm mistaken.
Here's the description of total:
What: /sys/devices/system/node/nodeX/memory_failure/total
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
The total number of raw poisoned pages (pages containing
corrupted data due to memory errors) on a NUMA node.
That should emit the number of poisoned pages on NUMA node X. That's
incremented each time update_per_node_mf_stats() is called.
Here's the description of failed:
What: /sys/devices/system/node/nodeX/memory_failure/failed
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
Of the raw poisoned pages on a NUMA node, how many pages are
failed by memory error recovery attempt. This usually means
a key recovery operation failed.
That should emit the number of poisoned pages on NUMA node X that could
not be recovered because the attempt failed. That's incremented each time
update_per_node_mf_stats() is called with MF_FAILED.
We're currently calling action_result() with MF_FAILED each time we encounter
a poisoned page (note: the huge page path is a bit different, we only call
action_result() with MF_FAILED when MF_ACTION_REQUIRED is set). That, IMO,
breaks the descriptions. We already incremented the per NUMA node MF statistics
to account for that poisoned page.
> > > > > >
> > > > > > All agreed.
> > > > > >
> > > > > > >
> > > > > > > Do not call action_result() on already poisoned pages and drop unused
> > > > > > > MF_MSG_ALREADY_POISONED.
> > > > > >
> > > > > > Hi Kyle,
> > > > > >
> > > > > > Patch looks great to me, just one thought...
> > >
> > > Thanks both.
> > >
> > > > > >
> > > > > > Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
> > > > > > but changing action_result for MF_MSG_ALREADY_POISONED?
> > > > > > - don't num_poisoned_pages_inc(pfn)
> > > > > > - don't update_per_node_mf_stats(pfn, result)
> > > > > > - still pr_err("%#lx: recovery action for %s: %s\n", ...)
> > > > > > - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
> > > > > > in memory_failure and try_memory_failure_hugetlb
> > > > >
> > > > > I did consider that approach but I was concerned about passing
> > > > > MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
> > > > > bit misleading.
> > > >
> > > > Based on my reading the documentation for MF_* in static const char
> > > > *action_name[]...
> > > >
> > > > Yeah, for file mapped pages, kernel may not have hole-punched or
> > > > truncated it from the file mapping (shmem and hugetlbfs for example)
> > > > but that still considered as MF_RECOVERED, so touching a page with
> > > > HWPoison flag doesn't mean that page was failed to be recovered
> > > > previously.
> > > >
> > > > For pages intended to be taken out of the buddy system, touching a
> > > > page with HWPoison flag does imply it isn't isolated and hence
> > > > MF_FAILED.
> > >
> > > There should be other cases that memory_failure failed to isolate the
> > > hwpoisoned pages at first time due to various reasons.
> > >
> > > >
> > > > In summary, seeing the HWPoison flag again doesn't necessarily
> > > > indicate what the recovery result was previously; it only indicate
> > > > kernel won't re-attempt to recover?
> > >
> > > Yes, kernel won't re-attempt to or just cannot recover.
> > >
> > > >
> > > > >
> > > > > How about introducing a new MF action result? Maybe MF_NONE? The message could
> > > > > look something like:
> > > >
> > > > Adding MF_NONE sounds fine to me, as long as we correctly document its
> > > > meaning, which can be subtle.
> > >
> > > Adding a new MF action result sounds good to me. But IMHO MF_NONE might not be that suitable
> > > as kill_accessing_process might be called to kill proc in this case, so it's not "NONE".
> >
> > OK, would you like a separate MF action result for each case? Maybe
> > MF_ALREADY_POISONED and MF_ALREADY_POISONED_KILLED?
> >
> > MF_ALREADY_POISONED can be the default and MF_ALREADY_POISONED_KILLED can be
> > used when kill_accessing_process() returns -EHWPOISON.
> >
> > The log messages could look like...
> >
> > Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
> > and
> > Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: Process killed
>
> Agreed with Miaohe that "None" won't work.
What action is M-F() taking to recover already poisoned pages that don't have
MF_ACTION_REQUIRED set?
> "Process killed" sounds okay for MF_MSG_ALREADY_POISONED, but
> we need to understand why "Failed" doesn't work for your usecase.
> "Failed" means process is killed but page is not successfully isolated which
> applies to MF_MSG_ALREADY_POISONED case as well.
So that accessing process is killed. Why call action_result() with MF_FAILED?
Doesn't that indicate we poisoned another page and the recovery attempt failed?
Thanks,
Kyle Meyer
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-26 1:56 ` Kyle Meyer
@ 2025-08-26 17:24 ` jane.chu
2025-08-26 19:27 ` Kyle Meyer
0 siblings, 1 reply; 12+ messages in thread
From: jane.chu @ 2025-08-26 17:24 UTC (permalink / raw)
To: Kyle Meyer
Cc: Miaohe Lin, Jiaqi Yan, akpm, david, tony.luck, bp, linux-mm,
linux-kernel, linux-edac, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, nao.horiguchi, osalvador
On 8/25/2025 6:56 PM, Kyle Meyer wrote:
> On Mon, Aug 25, 2025 at 03:36:54PM -0700, jane.chu@oracle.com wrote:
>> On 8/25/2025 9:09 AM, Kyle Meyer wrote:
>>> On Mon, Aug 25, 2025 at 11:04:43AM +0800, Miaohe Lin wrote:
>>>> On 2025/8/22 8:24, Jiaqi Yan wrote:
>>>>> On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>>>>>
>>>>>> On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
>>>>>>> On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>>>>>>>
>>>>>>>> Calling action_result() on already poisoned pages causes issues:
>>>>>>>>
>>>>>>>> * The amount of hardware corrupted memory is incorrectly incremented.
>>>>>>>> * NUMA node memory failure statistics are incorrectly updated.
>>>>>>>> * Redundant "already poisoned" messages are printed.
>>
>> Assuming this means that the numbers reported from
>> /sys/devices/system/node/node*/memory_failure/*
>> do not match certain expectation, right?
>>
>> If so, could you clarify what is the expectation?
>
> Sure, and please let me know if I'm mistaken.
>
> Here's the description of total:
>
> What: /sys/devices/system/node/nodeX/memory_failure/total
> Date: January 2023
> Contact: Jiaqi Yan <jiaqiyan@google.com>
> Description:
> The total number of raw poisoned pages (pages containing
> corrupted data due to memory errors) on a NUMA node.
>
> That should emit the number of poisoned pages on NUMA node X. That's
> incremented each time update_per_node_mf_stats() is called.
>
> Here's the description of failed:
>
> What: /sys/devices/system/node/nodeX/memory_failure/failed
> Date: January 2023
> Contact: Jiaqi Yan <jiaqiyan@google.com>
> Description:
> Of the raw poisoned pages on a NUMA node, how many pages are
> failed by memory error recovery attempt. This usually means
> a key recovery operation failed.
>
> That should emit the number of poisoned pages on NUMA node X that could
> not be recovered because the attempt failed. That's incremented each time
> update_per_node_mf_stats() is called with MF_FAILED.
>
> We're currently calling action_result() with MF_FAILED each time we encounter
> a poisoned page (note: the huge page path is a bit different, we only call
> action_result() with MF_FAILED when MF_ACTION_REQUIRED is set). That, IMO,
> breaks the descriptions. We already incremented the per NUMA node MF statistics
> to account for that poisoned page.
Thanks! My reading is that these numbers are best as hints, I won't
take them literately. As you alluded below, kill_accessing_process is
applied only if MF_ACTION_REQUIRED is set, though the page is already
marked poisoned. Besides, there can be bug that renders a poisoned page
not being properly isolated nor being properly categorized. If you're
looking for something precise, is there another way? maybe from firmware?
>
>>>>>>>
>>>>>>> All agreed.
>>>>>>>
>>>>>>>>
>>>>>>>> Do not call action_result() on already poisoned pages and drop unused
>>>>>>>> MF_MSG_ALREADY_POISONED.
>>>>>>>
>>>>>>> Hi Kyle,
>>>>>>>
>>>>>>> Patch looks great to me, just one thought...
>>>>
>>>> Thanks both.
>>>>
>>>>>>>
>>>>>>> Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
>>>>>>> but changing action_result for MF_MSG_ALREADY_POISONED?
>>>>>>> - don't num_poisoned_pages_inc(pfn)
>>>>>>> - don't update_per_node_mf_stats(pfn, result)
>>>>>>> - still pr_err("%#lx: recovery action for %s: %s\n", ...)
>>>>>>> - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
>>>>>>> in memory_failure and try_memory_failure_hugetlb
>>>>>>
>>>>>> I did consider that approach but I was concerned about passing
>>>>>> MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
>>>>>> bit misleading.
>>>>>
>>>>> Based on my reading the documentation for MF_* in static const char
>>>>> *action_name[]...
>>>>>
>>>>> Yeah, for file mapped pages, kernel may not have hole-punched or
>>>>> truncated it from the file mapping (shmem and hugetlbfs for example)
>>>>> but that still considered as MF_RECOVERED, so touching a page with
>>>>> HWPoison flag doesn't mean that page was failed to be recovered
>>>>> previously.
>>>>>
>>>>> For pages intended to be taken out of the buddy system, touching a
>>>>> page with HWPoison flag does imply it isn't isolated and hence
>>>>> MF_FAILED.
>>>>
>>>> There should be other cases that memory_failure failed to isolate the
>>>> hwpoisoned pages at first time due to various reasons.
>>>>
>>>>>
>>>>> In summary, seeing the HWPoison flag again doesn't necessarily
>>>>> indicate what the recovery result was previously; it only indicate
>>>>> kernel won't re-attempt to recover?
>>>>
>>>> Yes, kernel won't re-attempt to or just cannot recover.
>>>>
>>>>>
>>>>>>
>>>>>> How about introducing a new MF action result? Maybe MF_NONE? The message could
>>>>>> look something like:
>>>>>
>>>>> Adding MF_NONE sounds fine to me, as long as we correctly document its
>>>>> meaning, which can be subtle.
>>>>
>>>> Adding a new MF action result sounds good to me. But IMHO MF_NONE might not be that suitable
>>>> as kill_accessing_process might be called to kill proc in this case, so it's not "NONE".
>>>
>>> OK, would you like a separate MF action result for each case? Maybe
>>> MF_ALREADY_POISONED and MF_ALREADY_POISONED_KILLED?
>>>
>>> MF_ALREADY_POISONED can be the default and MF_ALREADY_POISONED_KILLED can be
>>> used when kill_accessing_process() returns -EHWPOISON.
>>>
>>> The log messages could look like...
>>>
>>> Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
>>> and
>>> Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: Process killed
>>
>> Agreed with Miaohe that "None" won't work.
>
> What action is M-F() taking to recover already poisoned pages that don't have
> MF_ACTION_REQUIRED set?
The action taken toward poisoned page not under MF_ACTION_REQUIRED is
typically isolation, that is, remove the pte or mark the pte as poisoned
special swap entry, so that subsequent page fault is given a chance to
deliver a SIGBUS. That said, things might fail during the process, like
encountering GUP pinned THP page.>
>> "Process killed" sounds okay for MF_MSG_ALREADY_POISONED, but
>> we need to understand why "Failed" doesn't work for your usecase.
>> "Failed" means process is killed but page is not successfully isolated which
>> applies to MF_MSG_ALREADY_POISONED case as well.
>
> So that accessing process is killed. Why call action_result() with MF_FAILED?
> Doesn't that indicate we poisoned another page and the recovery attempt failed?
What I recall is that, "recovery" is reserved for page that is clean,
isolated, and even by chance, unmapped. "failed" is reserved for page
that has been(or might not?) removed from the page table, page might be
dirty, certainly mapped, etc. A SIGBUS doesn't make recovery an
automatic success.
Others please correct me if I'm mistaken.
thanks,
-jane
>
> Thanks,
> Kyle Meyer
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-26 17:24 ` jane.chu
@ 2025-08-26 19:27 ` Kyle Meyer
2025-08-26 21:22 ` Jiaqi Yan
0 siblings, 1 reply; 12+ messages in thread
From: Kyle Meyer @ 2025-08-26 19:27 UTC (permalink / raw)
To: jane.chu
Cc: Miaohe Lin, Jiaqi Yan, akpm, david, tony.luck, bp, linux-mm,
linux-kernel, linux-edac, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, nao.horiguchi, osalvador, russ.anderson
On Tue, Aug 26, 2025 at 10:24:07AM -0700, jane.chu@oracle.com wrote:
>
> On 8/25/2025 6:56 PM, Kyle Meyer wrote:
> > On Mon, Aug 25, 2025 at 03:36:54PM -0700, jane.chu@oracle.com wrote:
> > > On 8/25/2025 9:09 AM, Kyle Meyer wrote:
> > > > On Mon, Aug 25, 2025 at 11:04:43AM +0800, Miaohe Lin wrote:
> > > > > On 2025/8/22 8:24, Jiaqi Yan wrote:
> > > > > > On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> > > > > > >
> > > > > > > On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
> > > > > > > > On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> > > > > > > > >
> > > > > > > > > Calling action_result() on already poisoned pages causes issues:
> > > > > > > > >
> > > > > > > > > * The amount of hardware corrupted memory is incorrectly incremented.
> > > > > > > > > * NUMA node memory failure statistics are incorrectly updated.
> > > > > > > > > * Redundant "already poisoned" messages are printed.
> > >
> > > Assuming this means that the numbers reported from
> > > /sys/devices/system/node/node*/memory_failure/*
> > > do not match certain expectation, right?
> > >
> > > If so, could you clarify what is the expectation?
> >
> > Sure, and please let me know if I'm mistaken.
> >
> > Here's the description of total:
> >
> > What: /sys/devices/system/node/nodeX/memory_failure/total
> > Date: January 2023
> > Contact: Jiaqi Yan <jiaqiyan@google.com>
> > Description:
> > The total number of raw poisoned pages (pages containing
> > corrupted data due to memory errors) on a NUMA node.
> >
> > That should emit the number of poisoned pages on NUMA node X. That's
> > incremented each time update_per_node_mf_stats() is called.
> >
> > Here's the description of failed:
> >
> > What: /sys/devices/system/node/nodeX/memory_failure/failed
> > Date: January 2023
> > Contact: Jiaqi Yan <jiaqiyan@google.com>
> > Description:
> > Of the raw poisoned pages on a NUMA node, how many pages are
> > failed by memory error recovery attempt. This usually means
> > a key recovery operation failed.
> >
> > That should emit the number of poisoned pages on NUMA node X that could
> > not be recovered because the attempt failed. That's incremented each time
> > update_per_node_mf_stats() is called with MF_FAILED.
> >
> > We're currently calling action_result() with MF_FAILED each time we encounter
> > a poisoned page (note: the huge page path is a bit different, we only call
> > action_result() with MF_FAILED when MF_ACTION_REQUIRED is set). That, IMO,
> > breaks the descriptions. We already incremented the per NUMA node MF statistics
> > to account for that poisoned page.
>
> Thanks! My reading is that these numbers are best as hints, I won't take
> them literately. As you alluded below, kill_accessing_process is applied
> only if MF_ACTION_REQUIRED is set, though the page is already marked
> poisoned. Besides, there can be bug that renders a poisoned page not being
> properly isolated nor being properly categorized. If you're looking for
> something precise, is there another way? maybe from firmware?
Firmware records the number of memory errors that have been detected and
reported, but it doesn't record how Linux responded to those memory errors.
Checking the ring buffer, the amount of hardware corrupted memory, and the
per NUMA node memory failure statistics is a simple way to determine how Linux
responded.
Since commit b8b9488d50b7, that has become unreliable. The same memory error
may be reported by multiple sources and now each report increments the amount of
hardware corrupted memory and the per NUMA node memory failure statistics. Isn't
that a regression?
The per NUMA node memory failure statistics might not always be 100% accurate,
but this issue seems preventable.
> > > > > > > >
> > > > > > > > All agreed.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Do not call action_result() on already poisoned pages and drop unused
> > > > > > > > > MF_MSG_ALREADY_POISONED.
> > > > > > > >
> > > > > > > > Hi Kyle,
> > > > > > > >
> > > > > > > > Patch looks great to me, just one thought...
> > > > >
> > > > > Thanks both.
> > > > >
> > > > > > > >
> > > > > > > > Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
> > > > > > > > but changing action_result for MF_MSG_ALREADY_POISONED?
> > > > > > > > - don't num_poisoned_pages_inc(pfn)
> > > > > > > > - don't update_per_node_mf_stats(pfn, result)
> > > > > > > > - still pr_err("%#lx: recovery action for %s: %s\n", ...)
> > > > > > > > - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
> > > > > > > > in memory_failure and try_memory_failure_hugetlb
> > > > > > >
> > > > > > > I did consider that approach but I was concerned about passing
> > > > > > > MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
> > > > > > > bit misleading.
> > > > > >
> > > > > > Based on my reading the documentation for MF_* in static const char
> > > > > > *action_name[]...
> > > > > >
> > > > > > Yeah, for file mapped pages, kernel may not have hole-punched or
> > > > > > truncated it from the file mapping (shmem and hugetlbfs for example)
> > > > > > but that still considered as MF_RECOVERED, so touching a page with
> > > > > > HWPoison flag doesn't mean that page was failed to be recovered
> > > > > > previously.
> > > > > >
> > > > > > For pages intended to be taken out of the buddy system, touching a
> > > > > > page with HWPoison flag does imply it isn't isolated and hence
> > > > > > MF_FAILED.
> > > > >
> > > > > There should be other cases that memory_failure failed to isolate the
> > > > > hwpoisoned pages at first time due to various reasons.
> > > > >
> > > > > >
> > > > > > In summary, seeing the HWPoison flag again doesn't necessarily
> > > > > > indicate what the recovery result was previously; it only indicate
> > > > > > kernel won't re-attempt to recover?
> > > > >
> > > > > Yes, kernel won't re-attempt to or just cannot recover.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > How about introducing a new MF action result? Maybe MF_NONE? The message could
> > > > > > > look something like:
> > > > > >
> > > > > > Adding MF_NONE sounds fine to me, as long as we correctly document its
> > > > > > meaning, which can be subtle.
> > > > >
> > > > > Adding a new MF action result sounds good to me. But IMHO MF_NONE might not be that suitable
> > > > > as kill_accessing_process might be called to kill proc in this case, so it's not "NONE".
> > > >
> > > > OK, would you like a separate MF action result for each case? Maybe
> > > > MF_ALREADY_POISONED and MF_ALREADY_POISONED_KILLED?
> > > >
> > > > MF_ALREADY_POISONED can be the default and MF_ALREADY_POISONED_KILLED can be
> > > > used when kill_accessing_process() returns -EHWPOISON.
> > > >
> > > > The log messages could look like...
> > > >
> > > > Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
> > > > and
> > > > Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: Process killed
> > >
> > > Agreed with Miaohe that "None" won't work.
> >
> > What action is M-F() taking to recover already poisoned pages that don't have
> > MF_ACTION_REQUIRED set?
>
> The action taken toward poisoned page not under MF_ACTION_REQUIRED is
> typically isolation, that is, remove the pte or mark the pte as poisoned
> special swap entry, so that subsequent page fault is given a chance to
> deliver a SIGBUS. That said, things might fail during the process, like
> encountering GUP pinned THP page.>
> > > "Process killed" sounds okay for MF_MSG_ALREADY_POISONED, but
> > > we need to understand why "Failed" doesn't work for your usecase.
> > > "Failed" means process is killed but page is not successfully isolated which
> > > applies to MF_MSG_ALREADY_POISONED case as well.
> >
> > So that accessing process is killed. Why call action_result() with MF_FAILED?
> > Doesn't that indicate we poisoned another page and the recovery attempt failed?
>
> What I recall is that, "recovery" is reserved for page that is clean,
> isolated, and even by chance, unmapped. "failed" is reserved for page that
> has been(or might not?) removed from the page table, page might be dirty,
> certainly mapped, etc. A SIGBUS doesn't make recovery an automatic success.
>
> Others please correct me if I'm mistaken.
Thank you very much for taking the time to explain everything.
Thanks,
Kyle Meyer
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-26 19:27 ` Kyle Meyer
@ 2025-08-26 21:22 ` Jiaqi Yan
2025-08-27 8:06 ` jane.chu
0 siblings, 1 reply; 12+ messages in thread
From: Jiaqi Yan @ 2025-08-26 21:22 UTC (permalink / raw)
To: Kyle Meyer, jane.chu, Miaohe Lin, russ.anderson
Cc: akpm, david, tony.luck, bp, linux-mm, linux-kernel, linux-edac,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
nao.horiguchi, osalvador
On Tue, Aug 26, 2025 at 12:28 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>
> On Tue, Aug 26, 2025 at 10:24:07AM -0700, jane.chu@oracle.com wrote:
> >
> > On 8/25/2025 6:56 PM, Kyle Meyer wrote:
> > > On Mon, Aug 25, 2025 at 03:36:54PM -0700, jane.chu@oracle.com wrote:
> > > > On 8/25/2025 9:09 AM, Kyle Meyer wrote:
> > > > > On Mon, Aug 25, 2025 at 11:04:43AM +0800, Miaohe Lin wrote:
> > > > > > On 2025/8/22 8:24, Jiaqi Yan wrote:
> > > > > > > On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
> > > > > > > > > On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Calling action_result() on already poisoned pages causes issues:
> > > > > > > > > >
> > > > > > > > > > * The amount of hardware corrupted memory is incorrectly incremented.
> > > > > > > > > > * NUMA node memory failure statistics are incorrectly updated.
> > > > > > > > > > * Redundant "already poisoned" messages are printed.
> > > >
> > > > Assuming this means that the numbers reported from
> > > > /sys/devices/system/node/node*/memory_failure/*
> > > > do not match certain expectation, right?
> > > >
> > > > If so, could you clarify what is the expectation?
> > >
> > > Sure, and please let me know if I'm mistaken.
> > >
> > > Here's the description of total:
> > >
> > > What: /sys/devices/system/node/nodeX/memory_failure/total
> > > Date: January 2023
> > > Contact: Jiaqi Yan <jiaqiyan@google.com>
> > > Description:
> > > The total number of raw poisoned pages (pages containing
> > > corrupted data due to memory errors) on a NUMA node.
> > >
> > > That should emit the number of poisoned pages on NUMA node X. That's
> > > incremented each time update_per_node_mf_stats() is called.
> > >
> > > Here's the description of failed:
> > >
> > > What: /sys/devices/system/node/nodeX/memory_failure/failed
> > > Date: January 2023
> > > Contact: Jiaqi Yan <jiaqiyan@google.com>
> > > Description:
> > > Of the raw poisoned pages on a NUMA node, how many pages are
> > > failed by memory error recovery attempt. This usually means
> > > a key recovery operation failed.
> > >
> > > That should emit the number of poisoned pages on NUMA node X that could
> > > not be recovered because the attempt failed. That's incremented each time
> > > update_per_node_mf_stats() is called with MF_FAILED.
> > >
> > > We're currently calling action_result() with MF_FAILED each time we encounter
> > > a poisoned page (note: the huge page path is a bit different, we only call
> > > action_result() with MF_FAILED when MF_ACTION_REQUIRED is set). That, IMO,
> > > breaks the descriptions. We already incremented the per NUMA node MF statistics
> > > to account for that poisoned page.
> >
> > Thanks! My reading is that these numbers are best as hints, I won't take
> > them literately. As you alluded below, kill_accessing_process is applied
> > only if MF_ACTION_REQUIRED is set, though the page is already marked
> > poisoned. Besides, there can be bug that renders a poisoned page not being
> > properly isolated nor being properly categorized. If you're looking for
> > something precise, is there another way? maybe from firmware?
>
> Firmware records the number of memory errors that have been detected and
> reported, but it doesn't record how Linux responded to those memory errors.
>
> Checking the ring buffer, the amount of hardware corrupted memory, and the
> per NUMA node memory failure statistics is a simple way to determine how Linux
> responded.
>
> Since commit b8b9488d50b7, that has become unreliable. The same memory error
> may be reported by multiple sources and now each report increments the amount of
To me this multiple counting is an existing issue, and what Kyle
originally targeted. It seems to me that as long as the
mf_action_page_type is MF_MSG_ALREADY_POISONED (or HWPoison flag is
set), action_result shouldn't invoke statistic related ops, and
mf_results just become irrelevant (wrt
/sys/devices/system/node/nodeX/memory_failure/*), right? IWO, we can
decouple the two issues:
1. how to update statistics correctly?
2. what log msg to show for recovery result, MF_FAILED or something else/new?
While Kyle fixing issue #1, can we just keep MF_FAILED as is?
> hardware corrupted memory and the per NUMA node memory failure statistics. Isn't
> that a regression?
>
> The per NUMA node memory failure statistics might not always be 100% accurate,
> but this issue seems preventable.
>
> > > > > > > > >
> > > > > > > > > All agreed.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Do not call action_result() on already poisoned pages and drop unused
> > > > > > > > > > MF_MSG_ALREADY_POISONED.
> > > > > > > > >
> > > > > > > > > Hi Kyle,
> > > > > > > > >
> > > > > > > > > Patch looks great to me, just one thought...
> > > > > >
> > > > > > Thanks both.
> > > > > >
> > > > > > > > >
> > > > > > > > > Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
> > > > > > > > > but changing action_result for MF_MSG_ALREADY_POISONED?
> > > > > > > > > - don't num_poisoned_pages_inc(pfn)
> > > > > > > > > - don't update_per_node_mf_stats(pfn, result)
> > > > > > > > > - still pr_err("%#lx: recovery action for %s: %s\n", ...)
> > > > > > > > > - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
> > > > > > > > > in memory_failure and try_memory_failure_hugetlb
> > > > > > > >
> > > > > > > > I did consider that approach but I was concerned about passing
> > > > > > > > MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
> > > > > > > > bit misleading.
> > > > > > >
> > > > > > > Based on my reading the documentation for MF_* in static const char
> > > > > > > *action_name[]...
> > > > > > >
> > > > > > > Yeah, for file mapped pages, kernel may not have hole-punched or
> > > > > > > truncated it from the file mapping (shmem and hugetlbfs for example)
> > > > > > > but that still considered as MF_RECOVERED, so touching a page with
> > > > > > > HWPoison flag doesn't mean that page was failed to be recovered
> > > > > > > previously.
> > > > > > >
> > > > > > > For pages intended to be taken out of the buddy system, touching a
> > > > > > > page with HWPoison flag does imply it isn't isolated and hence
> > > > > > > MF_FAILED.
> > > > > >
> > > > > > There should be other cases that memory_failure failed to isolate the
> > > > > > hwpoisoned pages at first time due to various reasons.
> > > > > >
> > > > > > >
> > > > > > > In summary, seeing the HWPoison flag again doesn't necessarily
> > > > > > > indicate what the recovery result was previously; it only indicate
> > > > > > > kernel won't re-attempt to recover?
> > > > > >
> > > > > > Yes, kernel won't re-attempt to or just cannot recover.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > How about introducing a new MF action result? Maybe MF_NONE? The message could
> > > > > > > > look something like:
> > > > > > >
> > > > > > > Adding MF_NONE sounds fine to me, as long as we correctly document its
> > > > > > > meaning, which can be subtle.
> > > > > >
> > > > > > Adding a new MF action result sounds good to me. But IMHO MF_NONE might not be that suitable
> > > > > > as kill_accessing_process might be called to kill proc in this case, so it's not "NONE".
> > > > >
> > > > > OK, would you like a separate MF action result for each case? Maybe
> > > > > MF_ALREADY_POISONED and MF_ALREADY_POISONED_KILLED?
> > > > >
> > > > > MF_ALREADY_POISONED can be the default and MF_ALREADY_POISONED_KILLED can be
> > > > > used when kill_accessing_process() returns -EHWPOISON.
> > > > >
> > > > > The log messages could look like...
> > > > >
> > > > > Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
> > > > > and
> > > > > Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: Process killed
> > > >
> > > > Agreed with Miaohe that "None" won't work.
> > >
> > > What action is M-F() taking to recover already poisoned pages that don't have
> > > MF_ACTION_REQUIRED set?
> >
> > The action taken toward poisoned page not under MF_ACTION_REQUIRED is
> > typically isolation, that is, remove the pte or mark the pte as poisoned
> > special swap entry, so that subsequent page fault is given a chance to
> > deliver a SIGBUS. That said, things might fail during the process, like
> > encountering GUP pinned THP page.>
> > > > "Process killed" sounds okay for MF_MSG_ALREADY_POISONED, but
> > > > we need to understand why "Failed" doesn't work for your usecase.
> > > > "Failed" means process is killed but page is not successfully isolated which
> > > > applies to MF_MSG_ALREADY_POISONED case as well.
> > >
> > > So that accessing process is killed. Why call action_result() with MF_FAILED?
> > > Doesn't that indicate we poisoned another page and the recovery attempt failed?
> >
> > What I recall is that, "recovery" is reserved for page that is clean,
> > isolated, and even by chance, unmapped. "failed" is reserved for page that
> > has been(or might not?) removed from the page table, page might be dirty,
> > certainly mapped, etc. A SIGBUS doesn't make recovery an automatic success.
> >
> > Others please correct me if I'm mistaken.
>
> Thank you very much for taking the time to explain everything.
>
> Thanks,
> Kyle Meyer
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages
2025-08-26 21:22 ` Jiaqi Yan
@ 2025-08-27 8:06 ` jane.chu
0 siblings, 0 replies; 12+ messages in thread
From: jane.chu @ 2025-08-27 8:06 UTC (permalink / raw)
To: Jiaqi Yan, Kyle Meyer, Miaohe Lin, russ.anderson
Cc: akpm, david, tony.luck, bp, linux-mm, linux-kernel, linux-edac,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
nao.horiguchi, osalvador
On 8/26/2025 2:22 PM, Jiaqi Yan wrote:
> On Tue, Aug 26, 2025 at 12:28 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>
>> On Tue, Aug 26, 2025 at 10:24:07AM -0700, jane.chu@oracle.com wrote:
>>>
>>> On 8/25/2025 6:56 PM, Kyle Meyer wrote:
>>>> On Mon, Aug 25, 2025 at 03:36:54PM -0700, jane.chu@oracle.com wrote:
>>>>> On 8/25/2025 9:09 AM, Kyle Meyer wrote:
>>>>>> On Mon, Aug 25, 2025 at 11:04:43AM +0800, Miaohe Lin wrote:
>>>>>>> On 2025/8/22 8:24, Jiaqi Yan wrote:
>>>>>>>> On Thu, Aug 21, 2025 at 12:36 PM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>>>>>>>>
>>>>>>>>> On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
>>>>>>>>>> On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Calling action_result() on already poisoned pages causes issues:
>>>>>>>>>>>
>>>>>>>>>>> * The amount of hardware corrupted memory is incorrectly incremented.
>>>>>>>>>>> * NUMA node memory failure statistics are incorrectly updated.
>>>>>>>>>>> * Redundant "already poisoned" messages are printed.
>>>>>
>>>>> Assuming this means that the numbers reported from
>>>>> /sys/devices/system/node/node*/memory_failure/*
>>>>> do not match certain expectation, right?
>>>>>
>>>>> If so, could you clarify what is the expectation?
>>>>
>>>> Sure, and please let me know if I'm mistaken.
>>>>
>>>> Here's the description of total:
>>>>
>>>> What: /sys/devices/system/node/nodeX/memory_failure/total
>>>> Date: January 2023
>>>> Contact: Jiaqi Yan <jiaqiyan@google.com>
>>>> Description:
>>>> The total number of raw poisoned pages (pages containing
>>>> corrupted data due to memory errors) on a NUMA node.
>>>>
>>>> That should emit the number of poisoned pages on NUMA node X. That's
>>>> incremented each time update_per_node_mf_stats() is called.
>>>>
>>>> Here's the description of failed:
>>>>
>>>> What: /sys/devices/system/node/nodeX/memory_failure/failed
>>>> Date: January 2023
>>>> Contact: Jiaqi Yan <jiaqiyan@google.com>
>>>> Description:
>>>> Of the raw poisoned pages on a NUMA node, how many pages are
>>>> failed by memory error recovery attempt. This usually means
>>>> a key recovery operation failed.
>>>>
>>>> That should emit the number of poisoned pages on NUMA node X that could
>>>> not be recovered because the attempt failed. That's incremented each time
>>>> update_per_node_mf_stats() is called with MF_FAILED.
>>>>
>>>> We're currently calling action_result() with MF_FAILED each time we encounter
>>>> a poisoned page (note: the huge page path is a bit different, we only call
>>>> action_result() with MF_FAILED when MF_ACTION_REQUIRED is set). That, IMO,
>>>> breaks the descriptions. We already incremented the per NUMA node MF statistics
>>>> to account for that poisoned page.
>>>
>>> Thanks! My reading is that these numbers are best as hints, I won't take
>>> them literately. As you alluded below, kill_accessing_process is applied
>>> only if MF_ACTION_REQUIRED is set, though the page is already marked
>>> poisoned. Besides, there can be bug that renders a poisoned page not being
>>> properly isolated nor being properly categorized. If you're looking for
>>> something precise, is there another way? maybe from firmware?
>>
>> Firmware records the number of memory errors that have been detected and
>> reported, but it doesn't record how Linux responded to those memory errors.
>>
>> Checking the ring buffer, the amount of hardware corrupted memory, and the
>> per NUMA node memory failure statistics is a simple way to determine how Linux
>> responded.
>>
>> Since commit b8b9488d50b7, that has become unreliable. The same memory error
>> may be reported by multiple sources and now each report increments the amount of
>
> To me this multiple counting is an existing issue, and what Kyle
> originally targeted. It seems to me that as long as the
> mf_action_page_type is MF_MSG_ALREADY_POISONED (or HWPoison flag is
> set), action_result shouldn't invoke statistic related ops, and
> mf_results just become irrelevant (wrt
> /sys/devices/system/node/nodeX/memory_failure/*), right? IWO, we can
> decouple the two issues:
> 1. how to update statistics correctly?
> 2. what log msg to show for recovery result, MF_FAILED or something else/new?
>
> While Kyle fixing issue #1, can we just keep MF_FAILED as is?
>
Sounds good to me.
thanks,
-jane
>> hardware corrupted memory and the per NUMA node memory failure statistics. Isn't
>> that a regression?
>>
>> The per NUMA node memory failure statistics might not always be 100% accurate,
>> but this issue seems preventable.
>>
>>>>>>>>>>
>>>>>>>>>> All agreed.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Do not call action_result() on already poisoned pages and drop unused
>>>>>>>>>>> MF_MSG_ALREADY_POISONED.
>>>>>>>>>>
>>>>>>>>>> Hi Kyle,
>>>>>>>>>>
>>>>>>>>>> Patch looks great to me, just one thought...
>>>>>>>
>>>>>>> Thanks both.
>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
>>>>>>>>>> but changing action_result for MF_MSG_ALREADY_POISONED?
>>>>>>>>>> - don't num_poisoned_pages_inc(pfn)
>>>>>>>>>> - don't update_per_node_mf_stats(pfn, result)
>>>>>>>>>> - still pr_err("%#lx: recovery action for %s: %s\n", ...)
>>>>>>>>>> - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
>>>>>>>>>> in memory_failure and try_memory_failure_hugetlb
>>>>>>>>>
>>>>>>>>> I did consider that approach but I was concerned about passing
>>>>>>>>> MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
>>>>>>>>> bit misleading.
>>>>>>>>
>>>>>>>> Based on my reading the documentation for MF_* in static const char
>>>>>>>> *action_name[]...
>>>>>>>>
>>>>>>>> Yeah, for file mapped pages, kernel may not have hole-punched or
>>>>>>>> truncated it from the file mapping (shmem and hugetlbfs for example)
>>>>>>>> but that still considered as MF_RECOVERED, so touching a page with
>>>>>>>> HWPoison flag doesn't mean that page was failed to be recovered
>>>>>>>> previously.
>>>>>>>>
>>>>>>>> For pages intended to be taken out of the buddy system, touching a
>>>>>>>> page with HWPoison flag does imply it isn't isolated and hence
>>>>>>>> MF_FAILED.
>>>>>>>
>>>>>>> There should be other cases that memory_failure failed to isolate the
>>>>>>> hwpoisoned pages at first time due to various reasons.
>>>>>>>
>>>>>>>>
>>>>>>>> In summary, seeing the HWPoison flag again doesn't necessarily
>>>>>>>> indicate what the recovery result was previously; it only indicate
>>>>>>>> kernel won't re-attempt to recover?
>>>>>>>
>>>>>>> Yes, kernel won't re-attempt to or just cannot recover.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> How about introducing a new MF action result? Maybe MF_NONE? The message could
>>>>>>>>> look something like:
>>>>>>>>
>>>>>>>> Adding MF_NONE sounds fine to me, as long as we correctly document its
>>>>>>>> meaning, which can be subtle.
>>>>>>>
>>>>>>> Adding a new MF action result sounds good to me. But IMHO MF_NONE might not be that suitable
>>>>>>> as kill_accessing_process might be called to kill proc in this case, so it's not "NONE".
>>>>>>
>>>>>> OK, would you like a separate MF action result for each case? Maybe
>>>>>> MF_ALREADY_POISONED and MF_ALREADY_POISONED_KILLED?
>>>>>>
>>>>>> MF_ALREADY_POISONED can be the default and MF_ALREADY_POISONED_KILLED can be
>>>>>> used when kill_accessing_process() returns -EHWPOISON.
>>>>>>
>>>>>> The log messages could look like...
>>>>>>
>>>>>> Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None
>>>>>> and
>>>>>> Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: Process killed
>>>>>
>>>>> Agreed with Miaohe that "None" won't work.
>>>>
>>>> What action is M-F() taking to recover already poisoned pages that don't have
>>>> MF_ACTION_REQUIRED set?
>>>
>>> The action taken toward poisoned page not under MF_ACTION_REQUIRED is
>>> typically isolation, that is, remove the pte or mark the pte as poisoned
>>> special swap entry, so that subsequent page fault is given a chance to
>>> deliver a SIGBUS. That said, things might fail during the process, like
>>> encountering GUP pinned THP page.>
>>>>> "Process killed" sounds okay for MF_MSG_ALREADY_POISONED, but
>>>>> we need to understand why "Failed" doesn't work for your usecase.
>>>>> "Failed" means process is killed but page is not successfully isolated which
>>>>> applies to MF_MSG_ALREADY_POISONED case as well.
>>>>
>>>> So that accessing process is killed. Why call action_result() with MF_FAILED?
>>>> Doesn't that indicate we poisoned another page and the recovery attempt failed?
>>>
>>> What I recall is that, "recovery" is reserved for page that is clean,
>>> isolated, and even by chance, unmapped. "failed" is reserved for page that
>>> has been(or might not?) removed from the page table, page might be dirty,
>>> certainly mapped, etc. A SIGBUS doesn't make recovery an automatic success.
>>>
>>> Others please correct me if I'm mistaken.
>>
>> Thank you very much for taking the time to explain everything.
>>
>> Thanks,
>> Kyle Meyer
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-27 8:07 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-21 16:44 [PATCH] mm/memory-failure: Do not call action_result() on already poisoned pages Kyle Meyer
2025-08-21 18:23 ` Jiaqi Yan
2025-08-21 19:36 ` Kyle Meyer
2025-08-22 0:24 ` Jiaqi Yan
2025-08-25 3:04 ` Miaohe Lin
2025-08-25 16:09 ` Kyle Meyer
2025-08-25 22:36 ` jane.chu
2025-08-26 1:56 ` Kyle Meyer
2025-08-26 17:24 ` jane.chu
2025-08-26 19:27 ` Kyle Meyer
2025-08-26 21:22 ` Jiaqi Yan
2025-08-27 8:06 ` jane.chu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).