From: Lance Yang <lance.yang@linux.dev>
To: Miaohe Lin <linmiaohe@huawei.com>, qiuxu.zhuo@intel.com
Cc: Longlong Xia <xialonglong2025@163.com>,
nao.horiguchi@gmail.com, akpm@linux-foundation.org,
wangkefeng.wang@huawei.com, xu.xin16@zte.com.cn,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Longlong Xia <xialonglong@kylinos.cn>,
david@redhat.com
Subject: Re: [PATCH RFC 1/1] mm/ksm: Add recovery mechanism for memory failures
Date: Sat, 11 Oct 2025 20:57:54 +0800 [thread overview]
Message-ID: <3954ac60-e818-42a0-b114-c2a09d34572b@linux.dev> (raw)
In-Reply-To: <077882e3-f69f-44f3-aa74-b325721beb42@linux.dev>
Cc Qiuxu
On 2025/10/11 17:38, Lance Yang wrote:
>
>
> On 2025/10/11 17:23, Miaohe Lin wrote:
>> On 2025/10/11 15:52, Lance Yang wrote:
>>> @Miaohe
>>>
>>> I'd like to raise a concern about a potential hardware failure :)
>>
>> Thanks for your thought.
>>
>>>
>>> My tests show that if the shared zeropage (or huge zeropage) gets marked
>>> with HWpoison, the kernel continues to install it for new mappings.
>>> Surprisingly, it does not kill the accessing process ...
>>
>> Have you investigated the cause? If user space writes to shared zeropage,
>> it will trigger COW and a new page will be installed. After that, reading
>> the newly allocated page won't trigger memory error. In this scene, it
>> does
>> not kill the accessing process.
>
> Not write just read :)
>
>>
>>>
>>> The concern is, once the page is no longer zero-filled due to the
>>> hardware
>>> failure, what will happen? Would this lead to silent data corruption for
>>> applications that expect to read zeros?
>>
>> IMHO, once the page is no longer zero-filled due to the hardware
>> failure, later
>> any read will trigger memory error and memory_failure should handle that.
>
> I've only tested injecting an error on the shared zeropage using
> corrupt-pfn:
>
> echo $PFN > /sys/kernel/debug/hwpoison/corrupt-pfn
>
> But no memory error was triggered on a subsequent read ...
>
> Anyway, I'm trying to explore other ways to simulate hardware failure :)
>
> Thanks,
> Lance
>
next prev parent reply other threads:[~2025-10-11 12:58 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-09 7:00 [PATCH RFC 0/1] mm/ksm: Add recovery mechanism for memory failures Longlong Xia
2025-10-09 7:00 ` [PATCH RFC 1/1] " Longlong Xia
2025-10-09 12:13 ` Lance Yang
2025-10-11 7:52 ` Lance Yang
2025-10-11 9:23 ` Miaohe Lin
2025-10-11 9:38 ` Lance Yang
2025-10-11 12:57 ` Lance Yang [this message]
2025-10-13 3:39 ` Miaohe Lin
2025-10-13 4:42 ` Lance Yang
2025-10-13 9:15 ` Lance Yang
2025-10-13 9:25 ` David Hildenbrand
2025-10-13 9:46 ` Balbir Singh
2025-10-13 11:00 ` Lance Yang
2025-10-13 11:13 ` David Hildenbrand
2025-10-13 11:18 ` Lance Yang
2025-10-11 3:25 ` Miaohe Lin
2025-10-13 20:10 ` [PATCH RFC] " Markus Elfring
2025-10-09 18:57 ` [PATCH RFC 0/1] " David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3954ac60-e818-42a0-b114-c2a09d34572b@linux.dev \
--to=lance.yang@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nao.horiguchi@gmail.com \
--cc=qiuxu.zhuo@intel.com \
--cc=wangkefeng.wang@huawei.com \
--cc=xialonglong2025@163.com \
--cc=xialonglong@kylinos.cn \
--cc=xu.xin16@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.