From: Lance Yang <lance.yang@linux.dev>
To: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: ziy@nvidia.com, baolin.wang@linux.alibaba.com,
akpm@linux-foundation.org, Liam.Howlett@oracle.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, lorenzo.stoakes@oracle.com,
nao.horiguchi@gmail.com, farrah.chen@intel.com,
jiaqiyan@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, tony.luck@intel.com,
linmiaohe@huawei.com, david@redhat.com
Subject: Re: [PATCH v2 1/1] mm: prevent poison consumption when splitting THP
Date: Sat, 11 Oct 2025 17:09:14 +0800 [thread overview]
Message-ID: <ba6767a4-8211-45b0-bf37-5a0bb303866d@linux.dev> (raw)
In-Reply-To: <20251011075520.320862-1-qiuxu.zhuo@intel.com>
On 2025/10/11 15:55, Qiuxu Zhuo wrote:
> When performing memory error injection on a THP (Transparent Huge Page)
> mapped to userspace on an x86 server, the kernel panics with the following
> trace. The expected behavior is to terminate the affected process instead
> of panicking the kernel, as the x86 Machine Check code can recover from an
> in-userspace #MC.
>
> mce: [Hardware Error]: CPU 0: Machine Check Exception: f Bank 3: bd80000000070134
> mce: [Hardware Error]: RIP 10:<ffffffff8372f8bc> {memchr_inv+0x4c/0xf0}
> mce: [Hardware Error]: TSC afff7bbff88a ADDR 1d301b000 MISC 80 PPIN 1e741e77539027db
> mce: [Hardware Error]: PROCESSOR 0:d06d0 TIME 1758093249 SOCKET 0 APIC 0 microcode 80000320
> mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
> Kernel panic - not syncing: Fatal local machine check
>
> The root cause of this panic is that handling a memory failure triggered by
> an in-userspace #MC necessitates splitting the THP. The splitting process
> employs a mechanism, implemented in try_to_map_unused_to_zeropage(), which
> reads the sub-pages of the THP to identify zero-filled pages. However,
> reading the sub-pages results in a second in-kernel #MC, occurring before
> the initial memory_failure() completes, ultimately leading to a kernel
> panic. See the kernel panic call trace on the two #MCs.
>
> First Machine Check occurs // [1]
> memory_failure() // [2]
> try_to_split_thp_page()
> split_huge_page()
> split_huge_page_to_list_to_order()
> __folio_split() // [3]
> remap_page()
> remove_migration_ptes()
> remove_migration_pte()
> try_to_map_unused_to_zeropage() // [4]
> memchr_inv() // [5]
> Second Machine Check occurs // [6]
> Kernel panic
>
> [1] Triggered by accessing a hardware-poisoned THP in userspace, which is
> typically recoverable by terminating the affected process.
>
> [2] Call folio_set_has_hwpoisoned() before try_to_split_thp_page().
>
> [3] Pass the RMP_USE_SHARED_ZEROPAGE remap flag to remap_page().
>
> [4] Try to map the unused THP to zeropage.
>
> [5] Re-access sub-pages of the hw-poisoned THP in the kernel.
>
> [6] Triggered in-kernel, leading to a panic kernel.
>
> In Step[2], memory_failure() sets the poisoned flag on the sub-page of the
> THP by TestSetPageHWPoison() before calling try_to_split_thp_page().
>
> As suggested by David Hildenbrand, fix this panic by not accessing to the
> poisoned sub-page of the THP during zeropage identification, while
> continuing to scan unaffected sub-pages of the THP for possible zeropage
> mapping. This prevents a second in-kernel #MC that would cause kernel
> panic in Step[4].
>
> [ Credits to Andrew Zaborowski <andrew.zaborowski@intel.com> for his
> original fix that prevents passing the RMP_USE_SHARED_ZEROPAGE flag
> to remap_page() in Step[3] if the THP has the has_hwpoisoned flag set,
> avoiding access to the entire THP for zero-page identification. ]
>
Thanks for the fix!
But one thing is missing: a "Fixes:" tag here. And also add:
Cc: <stable@vger.kernel.org>
> Reported-by: Farrah Chen <farrah.chen@intel.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Tested-by: Farrah Chen <farrah.chen@intel.com>
> Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> ---
Well, I think this fix should work ;)
Acked-by: Lance Yang <lance.yang@linux.dev>
next prev parent reply other threads:[~2025-10-11 9:09 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-28 3:28 [PATCH 1/1] mm: prevent poison consumption when splitting THP Qiuxu Zhuo
2025-09-28 21:55 ` Jiaqi Yan
2025-09-29 12:29 ` Miaohe Lin
2025-09-29 13:57 ` Zhuo, Qiuxu
2025-09-29 15:15 ` Jiaqi Yan
2025-09-29 13:27 ` Zhuo, Qiuxu
2025-09-29 15:51 ` Luck, Tony
2025-09-29 16:30 ` Zhuo, Qiuxu
2025-09-29 17:25 ` David Hildenbrand
2025-09-30 1:48 ` Lance Yang
2025-09-30 8:53 ` David Hildenbrand
2025-09-30 10:13 ` Lance Yang
2025-09-30 10:20 ` Lance Yang
2025-09-29 7:34 ` David Hildenbrand
2025-09-29 13:52 ` Zhuo, Qiuxu
2025-09-29 16:12 ` David Hildenbrand
2025-10-12 1:37 ` Wei Yang
2025-10-12 4:23 ` Jiaqi Yan
2025-10-11 7:55 ` [PATCH v2 " Qiuxu Zhuo
2025-10-11 9:09 ` Lance Yang [this message]
2025-10-11 18:18 ` Andrew Morton
2025-10-12 1:23 ` Wei Yang
2025-10-13 17:15 ` Zi Yan
2025-10-14 2:42 ` Miaohe Lin
2025-10-14 14:19 ` [PATCH v3 " Qiuxu Zhuo
2025-10-14 14:29 ` David Hildenbrand
2025-10-14 14:51 ` Zhuo, Qiuxu
2025-10-15 6:49 ` [PATCH v4 " Qiuxu Zhuo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba6767a4-8211-45b0-bf37-5a0bb303866d@linux.dev \
--to=lance.yang@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=farrah.chen@intel.com \
--cc=jiaqiyan@google.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=nao.horiguchi@gmail.com \
--cc=npache@redhat.com \
--cc=qiuxu.zhuo@intel.com \
--cc=ryan.roberts@arm.com \
--cc=tony.luck@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.