All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: david@redhat.com
Cc: Longlong Xia <xialonglong2025@163.com>,
	nao.horiguchi@gmail.com, akpm@linux-foundation.org,
	wangkefeng.wang@huawei.com, xu.xin16@zte.com.cn,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Longlong Xia <xialonglong@kylinos.cn>,
	lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, Miaohe Lin <linmiaohe@huawei.com>,
	qiuxu.zhuo@intel.com
Subject: Re: [PATCH RFC 1/1] mm/ksm: Add recovery mechanism for memory failures
Date: Mon, 13 Oct 2025 17:15:09 +0800	[thread overview]
Message-ID: <bd374ac3-05a2-41ae-8043-cc3575fb13c0@linux.dev> (raw)
In-Reply-To: <f12dfacb-05dd-4b22-90eb-fcc1a8ed552b@linux.dev>

@David

Cc: MM CORE folks

On 2025/10/13 12:42, Lance Yang wrote:
[...]

Cool. Hardware error injection with EINJ was the way to go!

I just ran some tests on the shared zero page (both regular and huge), and
found a tricky behavior:

1) When a hardware error is injected into the zeropage, the process that
attempts to read from a mapping backed by it is correctly killed with a 
SIGBUS.

2) However, even after the error is detected, the kernel continues to 
install
the known-poisoned zeropage for new anonymous mappings ...


For the shared zeropage:
```
[Mon Oct 13 16:29:02 2025] mce: Uncorrected hardware memory error in 
user-access at 29b8cf5000
[Mon Oct 13 16:29:02 2025] Memory failure: 0x29b8cf5: Sending SIGBUS to 
read_zeropage:13767 due to hardware memory corruption
[Mon Oct 13 16:29:02 2025] Memory failure: 0x29b8cf5: recovery action 
for already poisoned page: Failed
```
And for the shared huge zeropage:
```
[Mon Oct 13 16:35:34 2025] mce: Uncorrected hardware memory error in 
user-access at 1e1e00000
[Mon Oct 13 16:35:34 2025] Memory failure: 0x1e1e00: Sending SIGBUS to 
read_huge_zerop:13891 due to hardware memory corruption
[Mon Oct 13 16:35:34 2025] Memory failure: 0x1e1e00: recovery action for 
already poisoned page: Failed
```

Since we've identified an uncorrectable hardware error on such a critical,
singleton page, should we be doing something more?

Thanks,
Lance


  reply	other threads:[~2025-10-13  9:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-09  7:00 [PATCH RFC 0/1] mm/ksm: Add recovery mechanism for memory failures Longlong Xia
2025-10-09  7:00 ` [PATCH RFC 1/1] " Longlong Xia
2025-10-09 12:13   ` Lance Yang
2025-10-11  7:52     ` Lance Yang
2025-10-11  9:23       ` Miaohe Lin
2025-10-11  9:38         ` Lance Yang
2025-10-11 12:57           ` Lance Yang
2025-10-13  3:39           ` Miaohe Lin
2025-10-13  4:42             ` Lance Yang
2025-10-13  9:15               ` Lance Yang [this message]
2025-10-13  9:25                 ` David Hildenbrand
2025-10-13  9:46                   ` Balbir Singh
2025-10-13 11:00                   ` Lance Yang
2025-10-13 11:13                     ` David Hildenbrand
2025-10-13 11:18                       ` Lance Yang
2025-10-11  3:25   ` Miaohe Lin
2025-10-13 20:10   ` [PATCH RFC] " Markus Elfring
2025-10-09 18:57 ` [PATCH RFC 0/1] " David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd374ac3-05a2-41ae-8043-cc3575fb13c0@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=xialonglong2025@163.com \
    --cc=xialonglong@kylinos.cn \
    --cc=xu.xin16@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.