From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA15A2DC33D; Mon, 27 Oct 2025 19:29:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761593350; cv=none; b=bZh6YgZjXEYBNhNfu4Be6pSX/0HPoCsqLh3PegQEHBDBRUNEXwoRTsaDv5KMlTBBrtkpHvN3vGjakv4LhgRjRq2amsiDoKlHTo8tOJwC0Y/f5aY4HWn55m0NqogUDsTxepwizMXYP4XXqbLKmFW2ZL7EAlIIoSILvKyFd2FDlN0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761593350; c=relaxed/simple; bh=dfyAeGWCN/PCwQ4+Woi+eUbA4V9bFHbCh/4xYG/peG0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NQ/QcS7ukfdR1YMZBKthbhnekhPCf38U5pJXdT1OVB5Ysk7PL/Iaexka4rJK1Qcr2IENDXf885i4i9VBMcTvLFjBk6je8ah40ypFMl3D3yWpdFneIADbcIUrHMvozunqbtg2n17pgvRM/O16NOD9+DkzA0oK5EV+wkj4BzzJd+w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=hLrgZePP; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="hLrgZePP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 16B2DC4CEF1; Mon, 27 Oct 2025 19:29:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1761593350; bh=dfyAeGWCN/PCwQ4+Woi+eUbA4V9bFHbCh/4xYG/peG0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hLrgZePP+x7lATv8MrerdDk6K0whi/gWV5rd3BE1D6oMExu5l/xq93s3CqYBPPD2n +Qt3RIuZpZxW27vJhncmDHJEjEoL4vj4mxjSK9vsBbhaBE3vfRyYOsRgXtYGj2lSnn UxXGRJjKZEfGDAj1QET7xu0+rpaHPQCv14rHkDwo= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Qiuxu Zhuo , Farrah Chen , David Hildenbrand , Lance Yang , Wei Yang , Zi Yan , Miaohe Lin , Barry Song , Dev Jain , Jiaqi Yan , Liam Howlett , Lorenzo Stoakes , "Luck, Tony" , Mariano Pache , Naoya Horiguchi , Ryan Roberts , Andrew Morton Subject: [PATCH 6.17 105/184] mm: prevent poison consumption when splitting THP Date: Mon, 27 Oct 2025 19:36:27 +0100 Message-ID: <20251027183517.745158535@linuxfoundation.org> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251027183514.934710872@linuxfoundation.org> References: <20251027183514.934710872@linuxfoundation.org> User-Agent: quilt/0.69 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 6.17-stable review patch. If anyone has any objections, please let me know. ------------------ From: Qiuxu Zhuo commit 841a8bfcbad94bb1ba60f59ce34f75259074ae0d upstream. When performing memory error injection on a THP (Transparent Huge Page) mapped to userspace on an x86 server, the kernel panics with the following trace. The expected behavior is to terminate the affected process instead of panicking the kernel, as the x86 Machine Check code can recover from an in-userspace #MC. mce: [Hardware Error]: CPU 0: Machine Check Exception: f Bank 3: bd80000000070134 mce: [Hardware Error]: RIP 10: {memchr_inv+0x4c/0xf0} mce: [Hardware Error]: TSC afff7bbff88a ADDR 1d301b000 MISC 80 PPIN 1e741e77539027db mce: [Hardware Error]: PROCESSOR 0:d06d0 TIME 1758093249 SOCKET 0 APIC 0 microcode 80000320 mce: [Hardware Error]: Run the above through 'mcelog --ascii' mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel Kernel panic - not syncing: Fatal local machine check The root cause of this panic is that handling a memory failure triggered by an in-userspace #MC necessitates splitting the THP. The splitting process employs a mechanism, implemented in try_to_map_unused_to_zeropage(), which reads the pages in the THP to identify zero-filled pages. However, reading the pages in the THP results in a second in-kernel #MC, occurring before the initial memory_failure() completes, ultimately leading to a kernel panic. See the kernel panic call trace on the two #MCs. First Machine Check occurs // [1] memory_failure() // [2] try_to_split_thp_page() split_huge_page() split_huge_page_to_list_to_order() __folio_split() // [3] remap_page() remove_migration_ptes() remove_migration_pte() try_to_map_unused_to_zeropage() // [4] memchr_inv() // [5] Second Machine Check occurs // [6] Kernel panic [1] Triggered by accessing a hardware-poisoned THP in userspace, which is typically recoverable by terminating the affected process. [2] Call folio_set_has_hwpoisoned() before try_to_split_thp_page(). [3] Pass the RMP_USE_SHARED_ZEROPAGE remap flag to remap_page(). [4] Try to map the unused THP to zeropage. [5] Re-access pages in the hw-poisoned THP in the kernel. [6] Triggered in-kernel, leading to a panic kernel. In Step[2], memory_failure() sets the poisoned flag on the page in the THP by TestSetPageHWPoison() before calling try_to_split_thp_page(). As suggested by David Hildenbrand, fix this panic by not accessing to the poisoned page in the THP during zeropage identification, while continuing to scan unaffected pages in the THP for possible zeropage mapping. This prevents a second in-kernel #MC that would cause kernel panic in Step[4]. Thanks to Andrew Zaborowski for his initial work on fixing this issue. Link: https://lkml.kernel.org/r/20251015064926.1887643-1-qiuxu.zhuo@intel.com Link: https://lkml.kernel.org/r/20251011075520.320862-1-qiuxu.zhuo@intel.com Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp") Signed-off-by: Qiuxu Zhuo Reported-by: Farrah Chen Suggested-by: David Hildenbrand Acked-by: David Hildenbrand Tested-by: Farrah Chen Tested-by: Qiuxu Zhuo Acked-by: Lance Yang Reviewed-by: Wei Yang Acked-by: Zi Yan Reviewed-by: Miaohe Lin Cc: Barry Song Cc: Dev Jain Cc: Jiaqi Yan Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: "Luck, Tony" Cc: Mariano Pache Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Ryan Roberts Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/huge_memory.c | 3 +++ mm/migrate.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4120,6 +4120,9 @@ static bool thp_underused(struct folio * if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1) return false; + if (folio_contain_hwpoisoned_page(folio)) + return false; + for (i = 0; i < folio_nr_pages(folio); i++) { if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) { if (++num_zero_pages > khugepaged_max_ptes_none) --- a/mm/migrate.c +++ b/mm/migrate.c @@ -302,8 +302,9 @@ static bool try_to_map_unused_to_zeropag struct page *page = folio_page(folio, idx); pte_t newpte; - if (PageCompound(page)) + if (PageCompound(page) || PageHWPoison(page)) return false; + VM_BUG_ON_PAGE(!PageAnon(page), page); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(pte_present(old_pte), page);