From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 706D73F65F9; Tue, 30 Jun 2026 12:46:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782823609; cv=none; b=YTnCL43bQ6a6X8lutQKwWAMfSQv/PaD9mVYG+8TaSMkNdSdhUNZzeB8Uoq68lMCjqWQ9zmfgHnFBPdwVnef/k5OQEtREJyG7zcwGZBcjm4rWBTJsQ+o7J2pSmgdgiN13MinSw8L6iRnGfdJm3H5vYh4IjGfF3s8oeyiHfmOpnfo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782823609; c=relaxed/simple; bh=GaED5srKVuxGk+sMRMSMg1mKfXoDXSHlZoLJEzW0g88=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=M1bk0OVDMTUAWacGkbk7NXKJcSzS1YBPC4LU3oguSivgAaDBfYjKEeXepO8YjgHcSI6c9otbsGE8OClLpwomRkeC3Y26SXKAqDN8NfAzpFJzYEWil9HuOhhd3QfdAPolWleRXpkpmGpKS76+GnaRdQ7tGUga9VyjX89ZAP0R7FM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=hyOt5eZb; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="hyOt5eZb" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=weDADfQQC9XKEy/7w/rj1X2ztgRTHSRo5MmtKT2/6dE=; b=hyOt5eZbgEfmMIBv4QBhk37fEy GSvM6THyIFD3emJUSrZo2SI9w8lL4W3F4b8YQlMsycDv3LxrUxj9XvICbP3jTcs+VFzqQfk/13GZk kz7kHlClGocCPwhck9VjGDSE+24JfBCuup4MyZf6zsMgOqaczTLq9o8DtdX/u+kVhD9w3kfuNBYWQ fQEtCF5tj3sotCtiK5v5XPBYxQEZr6N62LlNllPkzCSbeQyZbtuSj27lSsEqVaj/9KgkdAyVmlDrp exyyepkKS69j0m6ZyBLE5SXrU2IShdTWKKabgPbF/xkOwPShPdTBTjfDZT8qQroB0xq4WsTnCnX3o DYXM3+4A==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1weXrM-0074bM-05; Tue, 30 Jun 2026 12:46:40 +0000 From: Breno Leitao Date: Tue, 30 Jun 2026 05:46:05 -0700 Subject: [PATCH v10 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20260630-ecc_panic-v10-2-c6ed5b62eea2@debian.org> References: <20260630-ecc_panic-v10-0-c6ed5b62eea2@debian.org> In-Reply-To: <20260630-ecc_panic-v10-0-c6ed5b62eea2@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , lance.yang@linux.dev, Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=6620; i=leitao@debian.org; h=from:subject:message-id; bh=GaED5srKVuxGk+sMRMSMg1mKfXoDXSHlZoLJEzW0g88=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqQ7qdfmSnk9V7vowWxhJ+2fhb0qD7NDDDv6ax9 MMzj0i3J2yJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCakO6nQAKCRA1o5Of/Hh3 baJhEACl3tXB/iBMjKZtuVvGXVxWjHMNEWUmKCgrKF+pBvwAg1xCR+nzPvdu+/ZInPcBmilU5Va ROBWT8FOz1KIDZ6r7dc272rZvS9huGCfl8NKrloKwjC2T6Wgxv+7hwRAxyR+yFTj3KSW/Xt+D9J NyGBsaxV1CpzAyIl7OQc1AmKt3KEhZOrLmmY4BgoKhxlW1WfpuRzh2JGvOs6KDlAPv6BlDK4gnS lF7vtkhmywymREDm+qqzSmYsTk4JK6ExpzaTVQffbD++N2q4ygSftdrjkx2OLCLIpcdu5MPQoC+ TPUakIDu+8aRzCS7q/dqCbKYEd1nyfsqRohcljfA7kKSje7bEGZ9dOlixaVlvWx0wtUZdMfbM6Y AakspsuRDdncccB6c3DqJYrti0MKU2rEJsmV8dZ1F5xwca+Exfn0k3pkDDkrKhDwqrGUVvipwmv SmAFWhBRYh7VKrtjjOsVYP4uGDTg73fpNVsFgdAyI53cy0vw4cyNewsiAZYne3DHOtEus5ncdvt 01PhC8Nb+ROU3oi/KkSBDJAZ2PsT9h5erMazpQipoQ6Jhhwf5uVhb/N7bdUCBC6iwff39Ji6mTj doSISstgh/z1IlNmZ0L9P6Dsom/Vr+6jSDgjBVKY1+zQxnG93Z5kBBVVmJ4Dni1OgmDRUlxNtyj E8LYRE64ZDx/Fkw== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao get_any_page() collapses every HWPoisonHandlable() rejection into a single -EIO via the __get_hwpoison_page() -> -EBUSY -> shake_page() -> retry path. That is correct for the transient case (a userspace folio briefly off LRU during migration or compaction, which a later shake can drag back), but wrong for stable kernel-owned pages: slab, page-table, large-kmalloc and PG_reserved pages will never become HWPoisonHandlable(), so the retry loop is wasted work and the final -EIO loses the "this is structurally unrecoverable" information. memory_failure() then maps -EIO into MF_MSG_GET_HWPOISON, which the panic-on-unrecoverable sysctl deliberately does not act on. Introduce is_kernel_owned_page(), a small predicate that positively identifies pages the hwpoison handler cannot recover from: is_kernel_owned_page(p) := PageReserved(p) || PageSlab(head) || PageTable(head) || PageLargeKmalloc(head) where head = compound_head(p). PG_reserved is a per-page flag (PF_NO_COMPOUND) and is tested on the page directly. The slab, page-table and large-kmalloc page-type bits are only stored on the head page, so those tests resolve the compound head first, then re-read compound_head(page) afterwards: a concurrent split or compound free that moves head invalidates the just-read flags and the loop retries. The lookup still takes no refcount, mirroring the rest of get_any_page(); the recheck closes the common split race, and a residual free->alloc->free in the same window can only mis-tag a genuinely poisoned page, never reclassify a handlable one. No MF_SOFT_OFFLINE / page_has_movable_ops() opt-out is needed: a movable_ops page is always PageOffline or PageZsmalloc, whose page_type is mutually exclusive with slab, page-table and large-kmalloc, and it never carries PG_reserved, so it can never match any of the checks above. The list is intentionally not exhaustive. vmalloc and kernel-stack pages, for example, do not carry a page_type bit and would need a different oracle; they keep going through the existing retry path unchanged. This is the smallest set we can identify with certainty by page type. Wire the helper into the top of get_any_page() to short-circuit those pages before the retry loop runs. On a hit, drop the caller's MF_COUNT_INCREASED reference (if any) and return -ENOTRECOVERABLE straight away. Pages outside the helper's positive list still take the existing retry path and return -EIO, leaving operator-visible behaviour for those cases unchanged. Extend the unhandlable-page pr_err() to fire for either errno and update the get_hwpoison_page() kerneldoc to document the new return. memory_failure() still folds every negative return into MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so this patch on its own only changes the errno that soft_offline_page() can propagate to its callers. A follow-up wires -ENOTRECOVERABLE through memory_failure() and reports MF_MSG_KERNEL for the unrecoverable cases, which is what the panic_on_unrecoverable_memory_failure sysctl observes. Suggested-by: David Hildenbrand Suggested-by: Lance Yang Signed-off-by: Breno Leitao --- mm/memory-failure.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f4d3e6e20e13f..087658484e242 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1325,6 +1325,38 @@ static inline bool HWPoisonHandlable(struct page *page, unsigned long flags) return PageLRU(page) || is_free_buddy_page(page); } +/* + * Positive identification of pages the hwpoison handler cannot recover: + * pages owned by kernel internals with no userspace mapping to unmap, no + * file mapping to invalidate, and no migration target. + */ +static inline bool is_kernel_owned_page(struct page *page) +{ + struct page *head; + bool kernel_owned; + + /* PG_reserved is a per-page flag, never set on a compound page. */ + if (PageReserved(page)) + return true; + + /* + * Page-type bits live only on the head page, so resolve any tail + * first. The check takes no refcount; recheck the head afterwards + * so a concurrent split or compound free cannot leave us trusting + * a stale view. A residual free->alloc->free cannot be closed here + * (frozen slab and large-kmalloc pages cannot be pinned), but is + * harmless: where a wrong verdict could panic, memory_failure() has + * already set PageHWPoison, which bars the page from the allocator. + */ +retry: + head = compound_head(page); + kernel_owned = PageSlab(head) || PageTable(head) || + PageLargeKmalloc(head); + if (head != compound_head(page)) + goto retry; + return kernel_owned; +} + static int __get_hwpoison_page(struct page *page, unsigned long flags) { struct folio *folio = page_folio(page); @@ -1371,6 +1403,19 @@ static int get_any_page(struct page *p, unsigned long flags) if (flags & MF_COUNT_INCREASED) count_increased = true; + /* + * Page types we know are kernel-owned and cannot be recovered. + * Short-circuit before the shake_page() / retry loop, which + * cannot turn any of these into something HWPoisonHandlable(). + * Drop the caller's reference if MF_COUNT_INCREASED took one. + */ + if (is_kernel_owned_page(p)) { + if (count_increased) + put_page(p); + ret = -ENOTRECOVERABLE; + goto out; + } + try_again: if (!count_increased) { ret = __get_hwpoison_page(p, flags); @@ -1418,7 +1463,7 @@ static int get_any_page(struct page *p, unsigned long flags) ret = -EIO; } out: - if (ret == -EIO) + if (ret == -EIO || ret == -ENOTRECOVERABLE) pr_err("%#lx: unhandlable page.\n", page_to_pfn(p)); return ret; @@ -1475,7 +1520,10 @@ static int __get_unpoison_page(struct page *page) * -EIO for pages on which we can not handle memory errors, * -EBUSY when get_hwpoison_page() has raced with page lifecycle * operations like allocation and free, - * -EHWPOISON when the page is hwpoisoned and taken off from buddy. + * -EHWPOISON when the page is hwpoisoned and taken off from buddy, + * -ENOTRECOVERABLE for kernel-owned pages identified by + * is_kernel_owned_page() (PG_reserved, slab, + * page-table, large-kmalloc) that the handler cannot recover. */ static int get_hwpoison_page(struct page *p, unsigned long flags) { -- 2.53.0-Meta