From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6AB2391512; Fri, 26 Jun 2026 15:33:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782488038; cv=none; b=BHvP7Tki0LnCcZNPCwSl/d7JAuaYkA6eMTE1hKDG3VB8EfwcBgtf1hMqJ3z5ffan+94rj0t9sKSHeBTLtR2GUoy+1nyoWnrNr27R1JS4RvrrZDho1T49whnSsi0Mt7Gudk0qkzAW43re9t0KqQIns/8DovER4z689WITe+Y5xhA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782488038; c=relaxed/simple; bh=N16Suf8TPJtaU7ji4gu05w60pkCD4r5zujzdnjvNYGw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=qsfkncskBIG2wTKn5HRcYDU84qsy9XaMJSoc/WicB6RYFY9STl+uqW4b2cdsQ+W8McflomvtOoVn17LQH3OmNM9Nq7wLUCxKv9i/BRRIkjbTiwLQXRQ9vw3/we1xsgRbIjfPSeS/whaS4q7zn81dWJJlBNhB0lZFm51mC8klwTk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=steeb120; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="steeb120" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=QOiPkwothfTA2rmi+c/kq8SP1KePaaV5Q1IWxlsVhMY=; b=steeb120KU41n9ZeVlTyW0pGpU k7y9mdQoFsHvz5frOdb0PP5WUEi7W6SP6p4LKClySNcPCK7gBF9Bt2nzLw6uLLxyvO68kYac53jrl H7BXMr7+eO1j/HO9uCgukHVZ9GgpTMDnE8BnVGMTpSTvKE3nLMh5sFWX3Uu3JiS9HO658TK6xjesR XNddL3kFRIZzwkkvC+FeuCVreS3veQ2E3ZO8vWrDKVcjeTfBmOLMSz0LptZowoKCgcbhx/jPeIlpj XNte/LIauhCKhJfrPfGf90CwTr05761RBBV90XxuU8zcj9qqLzKb5RnWOVtd9rvi3+gqgNOunsQXf 8BHvX07A==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wd8Yo-004489-2J; Fri, 26 Jun 2026 15:33:43 +0000 From: Breno Leitao Date: Fri, 26 Jun 2026 08:33:16 -0700 Subject: [PATCH v10 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20260626-ecc_panic-v10-2-6dacb8ad024d@debian.org> References: <20260626-ecc_panic-v10-0-6dacb8ad024d@debian.org> In-Reply-To: <20260626-ecc_panic-v10-0-6dacb8ad024d@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , lance.yang@linux.dev, Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=6471; i=leitao@debian.org; h=from:subject:message-id; bh=N16Suf8TPJtaU7ji4gu05w60pkCD4r5zujzdnjvNYGw=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqPpvEkahGSTW7zDidAQDNvyfvCI4lnqkBgT+xq NlVQr75DHuJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaj6bxAAKCRA1o5Of/Hh3 bTCVEACap2rkTh4AtD9dy1UYvGkg0B8XKDoewS9xwTRWGKgG/8LyO9WGZHJ2fzVx0zrUuP2p9gq ioh18rHB7jzDOADmagEUS3OhIO645Iv/LjG9rSRofOTkhZjfy2tLCn55dYwSv7Kl5XwgGlSCTdW fJ1rKOiacas3YvbivRuQEtfP+g2Me6mcggMP2F0f94Ro0DUGrOO/b0YBgN24Hi8pTgNOctn6tpR R36dL6HdzYedODmQgAF7XozMZCVYF3Fbs9AvoeQV0hb2eMd3700G1WyObB7DmKJGCkOPp7pJomd LJB1Esm+Bvat8vD8KijIDG4HsReDVkZaaQD50gUPLoeQpHSHUyc62TLeqpqbYYvoOimWCHXMhvc 5u1+MhJN4cUklevGOEapPIlHHc6vNJI1FsE12PAOOhS/jM3k4otU+eKwbuqsNrOxjUwHec6iZcK ZrrRqgdnH275deoYBM7F9qkhwZqJyXHJpgy+fRxi95ebMEsY3pfJUPh865zo5PgvMz7jnbvmEYS Y/mwXEhZ0ZKYzzSpiV4930ph/2kMoJXON73m2Ku0LI5tX5uMaIan6esJfnsGbZ7ln2HW1J9Cjlu deCwzXP37Tg682gCCTAHHrO6Zpz2NhlywAaRqU5KF1hC8FZwX9LcCwG4lC50H+aVe2nQuaR9gL6 z16owN1ct9RMeSw== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao get_any_page() collapses every HWPoisonHandlable() rejection into a single -EIO via the __get_hwpoison_page() -> -EBUSY -> shake_page() -> retry path. That is correct for the transient case (a userspace folio briefly off LRU during migration or compaction, which a later shake can drag back), but wrong for stable kernel-owned pages: slab, page-table, large-kmalloc and PG_reserved pages will never become HWPoisonHandlable(), so the retry loop is wasted work and the final -EIO loses the "this is structurally unrecoverable" information. memory_failure() then maps -EIO into MF_MSG_GET_HWPOISON, which the panic-on-unrecoverable sysctl deliberately does not act on. Introduce is_kernel_owned_page(), a small predicate that positively identifies pages the hwpoison handler cannot recover from: is_kernel_owned_page(p) := PageReserved(p) || PageSlab(head) || PageTable(head) || PageLargeKmalloc(head) where head = compound_head(p). PG_reserved is a per-page flag (PF_NO_COMPOUND) and is tested on the page directly. The slab, page-table and large-kmalloc page-type bits are only stored on the head page, so those tests resolve the compound head first, then re-read compound_head(page) afterwards: a concurrent split or compound free that moves head invalidates the just-read flags and the loop retries. The lookup still takes no refcount, mirroring the rest of get_any_page(); the recheck closes the common split race, and a residual free->alloc->free in the same window can only mis-tag a genuinely poisoned page, never reclassify a handlable one. No MF_SOFT_OFFLINE / page_has_movable_ops() opt-out is needed: a movable_ops page is always PageOffline or PageZsmalloc, whose page_type is mutually exclusive with slab, page-table and large-kmalloc, and it never carries PG_reserved, so it can never match any of the checks above. The list is intentionally not exhaustive. vmalloc and kernel-stack pages, for example, do not carry a page_type bit and would need a different oracle; they keep going through the existing retry path unchanged. This is the smallest set we can identify with certainty by page type. Wire the helper into the top of get_any_page() to short-circuit those pages before the retry loop runs. On a hit, drop the caller's MF_COUNT_INCREASED reference (if any) and return -ENOTRECOVERABLE straight away. Pages outside the helper's positive list still take the existing retry path and return -EIO, leaving operator-visible behaviour for those cases unchanged. Extend the unhandlable-page pr_err() to fire for either errno and update the get_hwpoison_page() kerneldoc to document the new return. memory_failure() still folds every negative return into MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so this patch on its own only changes the errno that soft_offline_page() can propagate to its callers. A follow-up wires -ENOTRECOVERABLE through memory_failure() and reports MF_MSG_KERNEL for the unrecoverable cases, which is what the panic_on_unrecoverable_memory_failure sysctl observes. Suggested-by: David Hildenbrand Suggested-by: Lance Yang Signed-off-by: Breno Leitao --- mm/memory-failure.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f4d3e6e20e13f..d08fbd0d8c39f 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1325,6 +1325,36 @@ static inline bool HWPoisonHandlable(struct page *page, unsigned long flags) return PageLRU(page) || is_free_buddy_page(page); } +/* + * Positive identification of pages the hwpoison handler cannot recover: + * pages owned by kernel internals with no userspace mapping to unmap, no + * file mapping to invalidate, and no migration target. + */ +static inline bool is_kernel_owned_page(struct page *page) +{ + struct page *head; + bool kernel_owned; + + /* PG_reserved is a per-page flag, never set on a compound page. */ + if (PageReserved(page)) + return true; + + /* + * Page-type bits live only on the head page, so resolve any tail + * first. The check takes no refcount; recheck the head afterwards + * so a concurrent split or compound free cannot leave us trusting + * a stale view. A free->alloc->free in the same window is still + * possible but closing it would require taking a reference here. + */ +retry: + head = compound_head(page); + kernel_owned = PageSlab(head) || PageTable(head) || + PageLargeKmalloc(head); + if (head != compound_head(page)) + goto retry; + return kernel_owned; +} + static int __get_hwpoison_page(struct page *page, unsigned long flags) { struct folio *folio = page_folio(page); @@ -1371,6 +1401,19 @@ static int get_any_page(struct page *p, unsigned long flags) if (flags & MF_COUNT_INCREASED) count_increased = true; + /* + * Page types we know are kernel-owned and cannot be recovered. + * Short-circuit before the shake_page() / retry loop, which + * cannot turn any of these into something HWPoisonHandlable(). + * Drop the caller's reference if MF_COUNT_INCREASED took one. + */ + if (is_kernel_owned_page(p)) { + if (count_increased) + put_page(p); + ret = -ENOTRECOVERABLE; + goto out; + } + try_again: if (!count_increased) { ret = __get_hwpoison_page(p, flags); @@ -1418,7 +1461,7 @@ static int get_any_page(struct page *p, unsigned long flags) ret = -EIO; } out: - if (ret == -EIO) + if (ret == -EIO || ret == -ENOTRECOVERABLE) pr_err("%#lx: unhandlable page.\n", page_to_pfn(p)); return ret; @@ -1475,7 +1518,10 @@ static int __get_unpoison_page(struct page *page) * -EIO for pages on which we can not handle memory errors, * -EBUSY when get_hwpoison_page() has raced with page lifecycle * operations like allocation and free, - * -EHWPOISON when the page is hwpoisoned and taken off from buddy. + * -EHWPOISON when the page is hwpoisoned and taken off from buddy, + * -ENOTRECOVERABLE for kernel-owned pages identified by + * is_kernel_owned_page() (PG_reserved, slab, + * page-table, large-kmalloc) that the handler cannot recover. */ static int get_hwpoison_page(struct page *p, unsigned long flags) { -- 2.53.0-Meta