[PATCH] mm/memory-failure: fix infinite UCE for VM

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
@ 2025-08-06  2:05 Jinjiang Tu
  2025-08-06  3:05 ` Miaohe Lin
  2025-08-06 12:39 ` David Hildenbrand
  0 siblings, 2 replies; 10+ messages in thread
From: Jinjiang Tu @ 2025-08-06  2:05 UTC (permalink / raw)
  To: linmiaohe, nao.horiguchi, akpm, xueshuai, david, ziy, osalvador,
	linux-mm
  Cc: wangkefeng.wang, tujinjiang

When memory_failure() is called for a already hwpoisoned pfn,
kill_accessing_process() will be called to kill current task. However, if
the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will skip
the vma in walk_page_test() and return 0.

Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes
with recovered clean pages"), kill_accessing_process() will return EFAULT.
For x86, the current task will be killed in kill_me_maybe().

However, after this commit, kill_accessing_process() simplies return 0,
that means UCE is handled properly, but it doesn't actually. In such case,
the user task will trigger UCE infinitely.

To fix it, add .test_walk callback for hwpoison_walk_ops to scan all vmas.

Fixes: aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes with recovered clean pages")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
 mm/memory-failure.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b91a33fb6c69..66b0c359d447 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -847,9 +847,16 @@ static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask,
 #define hwpoison_hugetlb_range	NULL
 #endif

+static int hwpoison_test_walk(unsigned long start, unsigned long end,
+			     struct mm_walk *walk)
+{
+	return 0;
+}
+
 static const struct mm_walk_ops hwpoison_walk_ops = {
 	.pmd_entry = hwpoison_pte_range,
 	.hugetlb_entry = hwpoison_hugetlb_range,
+	.test_walk = hwpoison_test_walk,
 	.walk_lock = PGWALK_RDLOCK,
 };

-- 
2.43.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  2025-08-06  2:05 [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn Jinjiang Tu
@ 2025-08-06  3:05 ` Miaohe Lin
  2025-08-06  3:24   ` Jinjiang Tu
  2025-08-06 12:39 ` David Hildenbrand
  1 sibling, 1 reply; 10+ messages in thread
From: Miaohe Lin @ 2025-08-06  3:05 UTC (permalink / raw)
  To: Jinjiang Tu
  Cc: wangkefeng.wang, nao.horiguchi, akpm, xueshuai, david, ziy,
	osalvador, linux-mm

On 2025/8/6 10:05, Jinjiang Tu wrote:
> When memory_failure() is called for a already hwpoisoned pfn,
> kill_accessing_process() will be called to kill current task. However, if

Thanks for your patch.

> the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will skip
> the vma in walk_page_test() and return 0.
> 
> Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes
> with recovered clean pages"), kill_accessing_process() will return EFAULT.

I'm not sure but pfn_to_online_page should return NULL for VM_PFNMAP pages?
So memory_failure_dev_pagemap should handle these pages?

> For x86, the current task will be killed in kill_me_maybe().
> 
> However, after this commit, kill_accessing_process() simplies return 0,
> that means UCE is handled properly, but it doesn't actually. In such case,
> the user task will trigger UCE infinitely.

Did you ever trigger this loop?

Thanks.
.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  2025-08-06  3:05 ` Miaohe Lin
@ 2025-08-06  3:24   ` Jinjiang Tu
  2025-08-06 12:41     ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Jinjiang Tu @ 2025-08-06  3:24 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: wangkefeng.wang, nao.horiguchi, akpm, xueshuai, david, ziy,
	osalvador, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1859 bytes --]

在 2025/8/6 11:05, Miaohe Lin 写道:
> On 2025/8/6 10:05, Jinjiang Tu wrote:
>> When memory_failure() is called for a already hwpoisoned pfn,
>> kill_accessing_process() will be called to kill current task. However, if
> Thanks for your patch.
>
>> the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will skip
>> the vma in walk_page_test() and return 0.
>>
>> Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes
>> with recovered clean pages"), kill_accessing_process() will return EFAULT.
> I'm not sure but pfn_to_online_page should return NULL for VM_PFNMAP pages?
> So memory_failure_dev_pagemap should handle these pages?

We could call remap_pfn_range() for those pfns with struct page. IIUC, VM_PFNMAP
means we should assume the pfn doesn't have struct page, but it can have.

>
>> For x86, the current task will be killed in kill_me_maybe().
>>
>> However, after this commit, kill_accessing_process() simplies return 0,
>> that means UCE is handled properly, but it doesn't actually. In such case,
>> the user task will trigger UCE infinitely.
> Did you ever trigger this loop?

Yes. Our test is as follow steps:
1) create a user task allocates a clean anonymous page, wihout accessing it.
2) use einj to inject UCE for the page
3) create task devmem to use /dev/mem to map the pfn and keep accessing it.

/dev/mem uses remap_pfn_range() to map the pfn.

When task devmem first accesses the pfn, UCE is triggered, memory_failure()
succeeds to isolate it due to it's clean user page. But the task devmem isn't killed.

When task devmem accesses the pfn again, since the pfn is already hwpoisoned, kill_accessing_process() is called.
But it fails to kill the accessing task.

Theoretically, if we have several tasks that share the pfn range mapped by remap_pfn_range(), the above issue exists too.

>
> Thanks.
> .

[-- Attachment #2: Type: text/html, Size: 2975 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  2025-08-06  2:05 [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn Jinjiang Tu
  2025-08-06  3:05 ` Miaohe Lin
@ 2025-08-06 12:39 ` David Hildenbrand
  2025-08-07 11:06   ` Jinjiang Tu
  1 sibling, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-08-06 12:39 UTC (permalink / raw)
  To: Jinjiang Tu, linmiaohe, nao.horiguchi, akpm, xueshuai, ziy,
	osalvador, linux-mm
  Cc: wangkefeng.wang

On 06.08.25 04:05, Jinjiang Tu wrote:
> When memory_failure() is called for a already hwpoisoned pfn,
> kill_accessing_process() will be called to kill current task. However, if
> the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will skip
> the vma in walk_page_test() and return 0.

That check is dubious and it's on my todo list to revisit that, because 
it doesn't make sense for COW VM_PFNMAP mappings.

But I am curious, is this about having an anon page in a COW VM_PFNMAP, 
or on "what kidn of page" do you get the fault?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  2025-08-06  3:24   ` Jinjiang Tu
@ 2025-08-06 12:41     ` David Hildenbrand
  2025-08-07 11:13       ` Jinjiang Tu
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-08-06 12:41 UTC (permalink / raw)
  To: Jinjiang Tu, Miaohe Lin
  Cc: wangkefeng.wang, nao.horiguchi, akpm, xueshuai, ziy, osalvador,
	linux-mm

On 06.08.25 05:24, Jinjiang Tu wrote:
> 
> 在 2025/8/6 11:05, Miaohe Lin 写道:
>> On 2025/8/6 10:05, Jinjiang Tu wrote:
>>> When memory_failure() is called for a already hwpoisoned pfn,
>>> kill_accessing_process() will be called to kill current task. However, if
>> Thanks for your patch.
>>
>>> the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will skip
>>> the vma in walk_page_test() and return 0.
>>>
>>> Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes
>>> with recovered clean pages"), kill_accessing_process() will return EFAULT.
>> I'm not sure but pfn_to_online_page should return NULL for VM_PFNMAP pages?
>> So memory_failure_dev_pagemap should handle these pages?
> 
> We could call remap_pfn_range() for those pfns with struct page. IIUC, VM_PFNMAP
> means we should assume the pfn doesn't have struct page, but it can have.
> 
>>> For x86, the current task will be killed in kill_me_maybe().
>>>
>>> However, after this commit, kill_accessing_process() simplies return 0,
>>> that means UCE is handled properly, but it doesn't actually. In such case,
>>> the user task will trigger UCE infinitely.
>> Did you ever trigger this loop?
> 
> Yes. Our test is as follow steps:
> 1) create a user task allocates a clean anonymous page, wihout accessing it.
> 2) use einj to inject UCE for the page
> 3) create task devmem to use /dev/mem to map the pfn and keep accessing it.

What is the use case for that? It sounds extremely questionable.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  2025-08-06 12:39 ` David Hildenbrand
@ 2025-08-07 11:06   ` Jinjiang Tu
  2025-08-08  8:08     ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Jinjiang Tu @ 2025-08-07 11:06 UTC (permalink / raw)
  To: David Hildenbrand, linmiaohe, nao.horiguchi, akpm, xueshuai, ziy,
	osalvador, linux-mm
  Cc: wangkefeng.wang


在 2025/8/6 20:39, David Hildenbrand 写道:
> On 06.08.25 04:05, Jinjiang Tu wrote:
>> When memory_failure() is called for a already hwpoisoned pfn,
>> kill_accessing_process() will be called to kill current task. 
>> However, if
>> the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will skip
>> the vma in walk_page_test() and return 0.
>
> That check is dubious and it's on my todo list to revisit that, 
> because it doesn't make sense for COW VM_PFNMAP mappings.
>
> But I am curious, is this about having an anon page in a COW 
> VM_PFNMAP, or on "what kidn of page" do you get the fault?
The pfn is RAM pfn, which is mapped with remap_pfn_range(). There is no 
COW.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  2025-08-06 12:41     ` David Hildenbrand
@ 2025-08-07 11:13       ` Jinjiang Tu
  2025-08-08  8:21         ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Jinjiang Tu @ 2025-08-07 11:13 UTC (permalink / raw)
  To: David Hildenbrand, Miaohe Lin
  Cc: wangkefeng.wang, nao.horiguchi, akpm, xueshuai, ziy, osalvador,
	linux-mm

[-- Attachment #1: Type: text/plain, Size: 1936 bytes --]


在 2025/8/6 20:41, David Hildenbrand 写道:
> On 06.08.25 05:24, Jinjiang Tu wrote:
>>
>> 在 2025/8/6 11:05, Miaohe Lin 写道:
>>> On 2025/8/6 10:05, Jinjiang Tu wrote:
>>>> When memory_failure() is called for a already hwpoisoned pfn,
>>>> kill_accessing_process() will be called to kill current task. 
>>>> However, if
>>> Thanks for your patch.
>>>
>>>> the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will 
>>>> skip
>>>> the vma in walk_page_test() and return 0.
>>>>
>>>> Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to 
>>>> processes
>>>> with recovered clean pages"), kill_accessing_process() will return 
>>>> EFAULT.
>>> I'm not sure but pfn_to_online_page should return NULL for VM_PFNMAP 
>>> pages?
>>> So memory_failure_dev_pagemap should handle these pages?
>>
>> We could call remap_pfn_range() for those pfns with struct page. 
>> IIUC, VM_PFNMAP
>> means we should assume the pfn doesn't have struct page, but it can 
>> have.
>>
>>>> For x86, the current task will be killed in kill_me_maybe().
>>>>
>>>> However, after this commit, kill_accessing_process() simplies 
>>>> return 0,
>>>> that means UCE is handled properly, but it doesn't actually. In 
>>>> such case,
>>>> the user task will trigger UCE infinitely.
>>> Did you ever trigger this loop?
>>
>> Yes. Our test is as follow steps:
>> 1) create a user task allocates a clean anonymous page, wihout 
>> accessing it.
>> 2) use einj to inject UCE for the page
>> 3) create task devmem to use /dev/mem to map the pfn and keep 
>> accessing it.
>
> What is the use case for that? It sounds extremely questionable.
>
This case is only for test, and is strange indeed.

But considering another case, a driver may map same RAM pfn to several processes with remap_pfn_range().
If the first task triggers UCE when accessing the pfn, the task will be killed. But the other tasks couldn't be killed
and triggers UCE infinitely.



[-- Attachment #2: Type: text/html, Size: 3285 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  2025-08-07 11:06   ` Jinjiang Tu
@ 2025-08-08  8:08     ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-08-08  8:08 UTC (permalink / raw)
  To: Jinjiang Tu, linmiaohe, nao.horiguchi, akpm, xueshuai, ziy,
	osalvador, linux-mm
  Cc: wangkefeng.wang

On 07.08.25 13:06, Jinjiang Tu wrote:
> 
> 在 2025/8/6 20:39, David Hildenbrand 写道:
>> On 06.08.25 04:05, Jinjiang Tu wrote:
>>> When memory_failure() is called for a already hwpoisoned pfn,
>>> kill_accessing_process() will be called to kill current task.
>>> However, if
>>> the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will skip
>>> the vma in walk_page_test() and return 0.
>>
>> That check is dubious and it's on my todo list to revisit that,
>> because it doesn't make sense for COW VM_PFNMAP mappings.
>>
>> But I am curious, is this about having an anon page in a COW
>> VM_PFNMAP, or on "what kidn of page" do you get the fault?
> The pfn is RAM pfn, which is mapped with remap_pfn_range(). There is no
> COW.
> 

Ah, and that works because in check_hwpoisoned_entry() we are only using 
pte_pfn().

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  2025-08-07 11:13       ` Jinjiang Tu
@ 2025-08-08  8:21         ` David Hildenbrand
  2025-08-09  1:23           ` Jinjiang Tu
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-08-08  8:21 UTC (permalink / raw)
  To: Jinjiang Tu, Miaohe Lin
  Cc: wangkefeng.wang, nao.horiguchi, akpm, xueshuai, ziy, osalvador,
	linux-mm

On 07.08.25 13:13, Jinjiang Tu wrote:
> 
> 在 2025/8/6 20:41, David Hildenbrand 写道:
>> On 06.08.25 05:24, Jinjiang Tu wrote:
>>>
>>> 在 2025/8/6 11:05, Miaohe Lin 写道:
>>>> On 2025/8/6 10:05, Jinjiang Tu wrote:
>>>>> When memory_failure() is called for a already hwpoisoned pfn,
>>>>> kill_accessing_process() will be called to kill current task. 
>>>>> However, if
>>>> Thanks for your patch.
>>>>
>>>>> the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will 
>>>>> skip
>>>>> the vma in walk_page_test() and return 0.
>>>>>
>>>>> Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to 
>>>>> processes
>>>>> with recovered clean pages"), kill_accessing_process() will return 
>>>>> EFAULT.
>>>> I'm not sure but pfn_to_online_page should return NULL for VM_PFNMAP 
>>>> pages?
>>>> So memory_failure_dev_pagemap should handle these pages?
>>>
>>> We could call remap_pfn_range() for those pfns with struct page. 
>>> IIUC, VM_PFNMAP
>>> means we should assume the pfn doesn't have struct page, but it can 
>>> have.
>>>
>>>>> For x86, the current task will be killed in kill_me_maybe().
>>>>>
>>>>> However, after this commit, kill_accessing_process() simplies 
>>>>> return 0,
>>>>> that means UCE is handled properly, but it doesn't actually. In 
>>>>> such case,
>>>>> the user task will trigger UCE infinitely.
>>>> Did you ever trigger this loop?
>>>
>>> Yes. Our test is as follow steps:
>>> 1) create a user task allocates a clean anonymous page, wihout 
>>> accessing it.
>>> 2) use einj to inject UCE for the page
>>> 3) create task devmem to use /dev/mem to map the pfn and keep 
>>> accessing it.
>>
>> What is the use case for that? It sounds extremely questionable.
>>
> This case is only for test, and is strange indeed.
> 
> But considering another case, a driver may map same RAM pfn to several processes with remap_pfn_range().
> If the first task triggers UCE when accessing the pfn, the task will be killed. But the other tasks couldn't be killed
> and triggers UCE infinitely.

Yes, the "anon page" example is confusing though. We really just want to 
test here if the PFN is mapped. And I would agree that your patch is 
correct in that case.

For memory poisoning handling you really need a "struct page". 
struct-less memory is only handled in special ways for DAX (see 
pfn_to_online_page() logic in memory_failure()).

So what you describe here really only works when a process uses 
remap_pfn_range() to VM_PFNMAP a struct-page-backed PFN.

Likely your patch description should be:

"
mm/memory-failure: fix infinite UCE for VM_PFNMAP'ed page

When memory_failure() is called for an already hardware poisoned page,
kill_accessing_process() will conditionally send a SIGBUS to the current 
(triggering) process if it still maps the page.

However, in case the page is not ordinarily mapped, but was mapped 
through remap_pfn_range(), kill_accessing_process() would not identify 
it as mapped even though hwpoison_pte_range() would be prepared to 
handle it, because walk_page_range() will skip VM_PFNMAP as default in 
walk_page_test().

walk_page_range() will return 0, assuming "not mapped" and the SIGBUS 
will be skipped. In this case, the user task will trigger UCE infinitely 
because it will not receive a SIGBUS on access and simply retry.

Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to 
processes with recovered clean pages"), kill_accessing_process() would 
return EFAULT in that case, and on x86, the current task would be killed 
in kill_me_maybe().

Let's fix it by adding our custom .test_walk callback that will also
process VM_PFNMAP VMAs.
"

-- 
Cheers,

David / dhildenb

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  2025-08-08  8:21         ` David Hildenbrand
@ 2025-08-09  1:23           ` Jinjiang Tu
  0 siblings, 0 replies; 10+ messages in thread
From: Jinjiang Tu @ 2025-08-09  1:23 UTC (permalink / raw)
  To: David Hildenbrand, Miaohe Lin
  Cc: wangkefeng.wang, nao.horiguchi, akpm, xueshuai, ziy, osalvador,
	linux-mm

[-- Attachment #1: Type: text/plain, Size: 3910 bytes --]


在 2025/8/8 16:21, David Hildenbrand 写道:
> On 07.08.25 13:13, Jinjiang Tu wrote:
>>
>> 在 2025/8/6 20:41, David Hildenbrand 写道:
>>> On 06.08.25 05:24, Jinjiang Tu wrote:
>>>>
>>>> 在 2025/8/6 11:05, Miaohe Lin 写道:
>>>>> On 2025/8/6 10:05, Jinjiang Tu wrote:
>>>>>> When memory_failure() is called for a already hwpoisoned pfn,
>>>>>> kill_accessing_process() will be called to kill current task. 
>>>>>> However, if
>>>>> Thanks for your patch.
>>>>>
>>>>>> the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() 
>>>>>> will skip
>>>>>> the vma in walk_page_test() and return 0.
>>>>>>
>>>>>> Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to 
>>>>>> processes
>>>>>> with recovered clean pages"), kill_accessing_process() will 
>>>>>> return EFAULT.
>>>>> I'm not sure but pfn_to_online_page should return NULL for 
>>>>> VM_PFNMAP pages?
>>>>> So memory_failure_dev_pagemap should handle these pages?
>>>>
>>>> We could call remap_pfn_range() for those pfns with struct page. 
>>>> IIUC, VM_PFNMAP
>>>> means we should assume the pfn doesn't have struct page, but it can 
>>>> have.
>>>>
>>>>>> For x86, the current task will be killed in kill_me_maybe().
>>>>>>
>>>>>> However, after this commit, kill_accessing_process() simplies 
>>>>>> return 0,
>>>>>> that means UCE is handled properly, but it doesn't actually. In 
>>>>>> such case,
>>>>>> the user task will trigger UCE infinitely.
>>>>> Did you ever trigger this loop?
>>>>
>>>> Yes. Our test is as follow steps:
>>>> 1) create a user task allocates a clean anonymous page, wihout 
>>>> accessing it.
>>>> 2) use einj to inject UCE for the page
>>>> 3) create task devmem to use /dev/mem to map the pfn and keep 
>>>> accessing it.
>>>
>>> What is the use case for that? It sounds extremely questionable.
>>>
>> This case is only for test, and is strange indeed.
>>
>> But considering another case, a driver may map same RAM pfn to 
>> several processes with remap_pfn_range().
>> If the first task triggers UCE when accessing the pfn, the task will 
>> be killed. But the other tasks couldn't be killed
>> and triggers UCE infinitely.
>
> Yes, the "anon page" example is confusing though. We really just want 
> to test here if the PFN is mapped. And I would agree that your patch 
> is correct in that case.
>
> For memory poisoning handling you really need a "struct page". 
> struct-less memory is only handled in special ways for DAX (see 
> pfn_to_online_page() logic in memory_failure()).
>
> So what you describe here really only works when a process uses 
> remap_pfn_range() to VM_PFNMAP a struct-page-backed PFN.
>
>
> Likely your patch description should be:
>
> "
> mm/memory-failure: fix infinite UCE for VM_PFNMAP'ed page
>
> When memory_failure() is called for an already hardware poisoned page,
> kill_accessing_process() will conditionally send a SIGBUS to the 
> current (triggering) process if it still maps the page.
>
> However, in case the page is not ordinarily mapped, but was mapped 
> through remap_pfn_range(), kill_accessing_process() would not identify 
> it as mapped even though hwpoison_pte_range() would be prepared to 
> handle it, because walk_page_range() will skip VM_PFNMAP as default in 
> walk_page_test().
>
> walk_page_range() will return 0, assuming "not mapped" and the SIGBUS 
> will be skipped. In this case, the user task will trigger UCE 
> infinitely because it will not receive a SIGBUS on access and simply 
> retry.
>
>
> Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to 
> processes with recovered clean pages"), kill_accessing_process() would 
> return EFAULT in that case, and on x86, the current task would be 
> killed in kill_me_maybe().
>
> Let's fix it by adding our custom .test_walk callback that will also
> process VM_PFNMAP VMAs.
> "
>
Thanks, I will update the patch description to emphasize the pfn is backed with struct page.

[-- Attachment #2: Type: text/html, Size: 6066 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-08-09  1:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-06  2:05 [PATCH] mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn Jinjiang Tu
2025-08-06  3:05 ` Miaohe Lin
2025-08-06  3:24   ` Jinjiang Tu
2025-08-06 12:41     ` David Hildenbrand
2025-08-07 11:13       ` Jinjiang Tu
2025-08-08  8:21         ` David Hildenbrand
2025-08-09  1:23           ` Jinjiang Tu
2025-08-06 12:39 ` David Hildenbrand
2025-08-07 11:06   ` Jinjiang Tu
2025-08-08  8:08     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).