From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,zhiw@nvidia.com,xueshuai@linux.alibaba.com,vsethi@nvidia.com,vbabka@suse.cz,u.kleine-koenig@baylibre.com,tony.luck@intel.com,targupta@nvidia.com,surenb@google.com,smita.koralahallichannabasappa@amd.com,rppt@kernel.org,peterz@infradead.org,nao.horiguchi@gmail.com,mochs@nvidia.com,mhocko@suse.com,mchehab@kernel.org,lorenzo.stoakes@oracle.com,linmiaohe@huawei.com,liam.howlett@oracle.com,lenb@kernel.org,kwankhede@nvidia.com,kevin.tian@intel.com,Jonathan.Cameron@huawei.com,jgg@nvidia.com,ira.weiny@intel.com,guohanjun@huawei.com,david@redhat.com,cjia@nvidia.com,bp@alien8.de,aniketa@nvidia.com,ankita@nvidia.com,akpm@linux-foundation.org
Subject: [merged mm-stable] mm-handle-poisoning-of-pfn-without-struct-pages.patch removed from -mm tree
Date: Sun, 16 Nov 2025 17:34:47 -0800 [thread overview]
Message-ID: <20251117013448.2F691C4CEF5@smtp.kernel.org> (raw)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 11265 bytes --]
The quilt patch titled
Subject: mm: handle poisoning of pfn without struct pages
has been removed from the -mm tree. Its filename was
mm-handle-poisoning-of-pfn-without-struct-pages.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ankit Agrawal <ankita@nvidia.com>
Subject: mm: handle poisoning of pfn without struct pages
Date: Sun, 2 Nov 2025 18:44:33 +0000
Poison (or ECC) errors can be very common on a large size cluster. The
kernel MM currently does not handle ECC errors / poison on a memory region
that is not backed by struct pages. If a memory region mapped using
remap_pfn_range() for example, but not added to the kernel, MM will not
have associated struct pages. Add a new mechanism to handle memory
failure on such memory.
Make kernel MM expose a function to allow modules managing the device
memory to register the device memory SPA and the address space associated
it. MM maintains this information as an interval tree. On poison, MM can
search for the range that the poisoned PFN belong and use the
address_space to determine the mapping VMA.
In this implementation, kernel MM follows the following sequence that is
largely similar to the memory_failure() handler for struct page backed
memory:
1. memory_failure() is triggered on reception of a poison error. An
absence of struct page is detected and consequently
memory_failure_pfn() is executed.
2. memory_failure_pfn() collects the processes mapped to the PFN.
3. memory_failure_pfn() sends SIGBUS to all the processes mapping the
faulty PFN using kill_procs().
Note that there is one primary difference versus the handling of the
poison on struct pages, which is to skip unmapping to the faulty PFN.
This is done to handle the huge PFNMAP support added recently [1] that
enables VM_PFNMAP vmas to map at PMD or PUD level. A poison to a PFN
mapped in such as way would need breaking the PMD/PUD mapping into PTEs
that will get mirrored into the S2. This can greatly increase the cost of
table walks and have a major performance impact.
Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/ [1]
Link: https://lkml.kernel.org/r/20251102184434.2406-3-ankita@nvidia.com
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Cc: Aniket Agashe <aniketa@nvidia.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew R. Ochs <mochs@nvidia.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Neo Jia <cjia@nvidia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tarun Gupta <targupta@nvidia.com>
Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Cc: Vikram Sethi <vsethi@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
MAINTAINERS | 1
include/linux/memory-failure.h | 17 +++
include/linux/mm.h | 1
include/ras/ras_event.h | 1
mm/Kconfig | 1
mm/memory-failure.c | 145 ++++++++++++++++++++++++++++++-
6 files changed, 165 insertions(+), 1 deletion(-)
diff --git a/include/linux/memory-failure.h a/include/linux/memory-failure.h
new file mode 100644
--- /dev/null
+++ a/include/linux/memory-failure.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_MEMORY_FAILURE_H
+#define _LINUX_MEMORY_FAILURE_H
+
+#include <linux/interval_tree.h>
+
+struct pfn_address_space;
+
+struct pfn_address_space {
+ struct interval_tree_node node;
+ struct address_space *mapping;
+};
+
+int register_pfn_address_space(struct pfn_address_space *pfn_space);
+void unregister_pfn_address_space(struct pfn_address_space *pfn_space);
+
+#endif /* _LINUX_MEMORY_FAILURE_H */
--- a/include/linux/mm.h~mm-handle-poisoning-of-pfn-without-struct-pages
+++ a/include/linux/mm.h
@@ -4285,6 +4285,7 @@ enum mf_action_page_type {
MF_MSG_DAX,
MF_MSG_UNSPLIT_THP,
MF_MSG_ALREADY_POISONED,
+ MF_MSG_PFN_MAP,
MF_MSG_UNKNOWN,
};
--- a/include/ras/ras_event.h~mm-handle-poisoning-of-pfn-without-struct-pages
+++ a/include/ras/ras_event.h
@@ -375,6 +375,7 @@ TRACE_EVENT(aer_event,
EM ( MF_MSG_DAX, "dax page" ) \
EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \
EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \
+ EM ( MF_MSG_PFN_MAP, "non struct page pfn" ) \
EMe ( MF_MSG_UNKNOWN, "unknown page" )
/*
--- a/MAINTAINERS~mm-handle-poisoning-of-pfn-without-struct-pages
+++ a/MAINTAINERS
@@ -11557,6 +11557,7 @@ M: Miaohe Lin <linmiaohe@huawei.com>
R: Naoya Horiguchi <nao.horiguchi@gmail.com>
L: linux-mm@kvack.org
S: Maintained
+F: include/linux/memory-failure.h
F: mm/hwpoison-inject.c
F: mm/memory-failure.c
--- a/mm/Kconfig~mm-handle-poisoning-of-pfn-without-struct-pages
+++ a/mm/Kconfig
@@ -741,6 +741,7 @@ config MEMORY_FAILURE
depends on ARCH_SUPPORTS_MEMORY_FAILURE
bool "Enable recovery from hardware memory errors"
select RAS
+ select INTERVAL_TREE
help
Enables code to recover from some memory failures on systems
with MCA recovery. This allows a system to continue running
--- a/mm/memory-failure.c~mm-handle-poisoning-of-pfn-without-struct-pages
+++ a/mm/memory-failure.c
@@ -38,6 +38,7 @@
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/memory-failure.h>
#include <linux/page-flags.h>
#include <linux/sched/signal.h>
#include <linux/sched/task.h>
@@ -154,6 +155,10 @@ static const struct ctl_table memory_fai
}
};
+static struct rb_root_cached pfn_space_itree = RB_ROOT_CACHED;
+
+static DEFINE_MUTEX(pfn_space_lock);
+
/*
* Return values:
* 1: the page is dissolved (if needed) and taken off from buddy,
@@ -885,6 +890,7 @@ static const char * const action_page_ty
[MF_MSG_DAX] = "dax page",
[MF_MSG_UNSPLIT_THP] = "unsplit thp",
[MF_MSG_ALREADY_POISONED] = "already poisoned page",
+ [MF_MSG_PFN_MAP] = "non struct page pfn",
[MF_MSG_UNKNOWN] = "unknown page",
};
@@ -1277,7 +1283,7 @@ static int action_result(unsigned long p
{
trace_memory_failure_event(pfn, type, result);
- if (type != MF_MSG_ALREADY_POISONED) {
+ if (type != MF_MSG_ALREADY_POISONED && type != MF_MSG_PFN_MAP) {
num_poisoned_pages_inc(pfn);
update_per_node_mf_stats(pfn, result);
}
@@ -2147,6 +2153,135 @@ static void kill_procs_now(struct page *
kill_procs(&tokill, true, pfn, flags);
}
+int register_pfn_address_space(struct pfn_address_space *pfn_space)
+{
+ guard(mutex)(&pfn_space_lock);
+
+ if (interval_tree_iter_first(&pfn_space_itree,
+ pfn_space->node.start,
+ pfn_space->node.last))
+ return -EBUSY;
+
+ interval_tree_insert(&pfn_space->node, &pfn_space_itree);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(register_pfn_address_space);
+
+void unregister_pfn_address_space(struct pfn_address_space *pfn_space)
+{
+ guard(mutex)(&pfn_space_lock);
+
+ if (interval_tree_iter_first(&pfn_space_itree,
+ pfn_space->node.start,
+ pfn_space->node.last))
+ interval_tree_remove(&pfn_space->node, &pfn_space_itree);
+}
+EXPORT_SYMBOL_GPL(unregister_pfn_address_space);
+
+static void add_to_kill_pfn(struct task_struct *tsk,
+ struct vm_area_struct *vma,
+ struct list_head *to_kill,
+ unsigned long pfn)
+{
+ struct to_kill *tk;
+
+ tk = kmalloc(sizeof(*tk), GFP_ATOMIC);
+ if (!tk) {
+ pr_info("Unable to kill proc %d\n", tsk->pid);
+ return;
+ }
+
+ /* Check for pgoff not backed by struct page */
+ tk->addr = vma_address(vma, pfn, 1);
+ tk->size_shift = PAGE_SHIFT;
+
+ if (tk->addr == -EFAULT)
+ pr_info("Unable to find address %lx in %s\n",
+ pfn, tsk->comm);
+
+ get_task_struct(tsk);
+ tk->tsk = tsk;
+ list_add_tail(&tk->nd, to_kill);
+}
+
+/*
+ * Collect processes when the error hit a PFN not backed by struct page.
+ */
+static void collect_procs_pfn(struct address_space *mapping,
+ unsigned long pfn, struct list_head *to_kill)
+{
+ struct vm_area_struct *vma;
+ struct task_struct *tsk;
+
+ i_mmap_lock_read(mapping);
+ rcu_read_lock();
+ for_each_process(tsk) {
+ struct task_struct *t = tsk;
+
+ t = task_early_kill(tsk, true);
+ if (!t)
+ continue;
+ vma_interval_tree_foreach(vma, &mapping->i_mmap, pfn, pfn) {
+ if (vma->vm_mm == t->mm)
+ add_to_kill_pfn(t, vma, to_kill, pfn);
+ }
+ }
+ rcu_read_unlock();
+ i_mmap_unlock_read(mapping);
+}
+
+/**
+ * memory_failure_pfn - Handle memory failure on a page not backed by
+ * struct page.
+ * @pfn: Page Number of the corrupted page
+ * @flags: fine tune action taken
+ *
+ * Return:
+ * 0 - success,
+ * -EBUSY - Page PFN does not belong to any address space mapping.
+ */
+static int memory_failure_pfn(unsigned long pfn, int flags)
+{
+ struct interval_tree_node *node;
+ LIST_HEAD(tokill);
+
+ scoped_guard(mutex, &pfn_space_lock) {
+ bool mf_handled = false;
+
+ /*
+ * Modules registers with MM the address space mapping to
+ * the device memory they manage. Iterate to identify
+ * exactly which address space has mapped to this failing
+ * PFN.
+ */
+ for (node = interval_tree_iter_first(&pfn_space_itree, pfn, pfn); node;
+ node = interval_tree_iter_next(node, pfn, pfn)) {
+ struct pfn_address_space *pfn_space =
+ container_of(node, struct pfn_address_space, node);
+
+ collect_procs_pfn(pfn_space->mapping, pfn, &tokill);
+
+ mf_handled = true;
+ }
+
+ if (!mf_handled)
+ return action_result(pfn, MF_MSG_PFN_MAP, MF_IGNORED);
+ }
+
+ /*
+ * Unlike System-RAM there is no possibility to swap in a different
+ * physical page at a given virtual address, so all userspace
+ * consumption of direct PFN memory necessitates SIGBUS (i.e.
+ * MF_MUST_KILL)
+ */
+ flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
+
+ kill_procs(&tokill, true, pfn, flags);
+
+ return action_result(pfn, MF_MSG_PFN_MAP, MF_RECOVERED);
+}
+
/**
* memory_failure - Handle memory failure of a page.
* @pfn: Page Number of the corrupted page
@@ -2196,6 +2331,14 @@ int memory_failure(unsigned long pfn, in
if (res == 0)
goto unlock_mutex;
+ if (!pfn_valid(pfn) && !arch_is_platform_page(PFN_PHYS(pfn))) {
+ /*
+ * The PFN is not backed by struct page.
+ */
+ res = memory_failure_pfn(pfn, flags);
+ goto unlock_mutex;
+ }
+
if (pfn_valid(pfn)) {
pgmap = get_dev_pagemap(pfn);
put_ref_page(pfn, flags);
_
Patches currently in -mm which might be from ankita@nvidia.com are
reply other threads:[~2025-11-17 1:34 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251117013448.2F691C4CEF5@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=bp@alien8.de \
--cc=cjia@nvidia.com \
--cc=david@redhat.com \
--cc=guohanjun@huawei.com \
--cc=ira.weiny@intel.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kwankhede@nvidia.com \
--cc=lenb@kernel.org \
--cc=liam.howlett@oracle.com \
--cc=linmiaohe@huawei.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mchehab@kernel.org \
--cc=mhocko@suse.com \
--cc=mm-commits@vger.kernel.org \
--cc=mochs@nvidia.com \
--cc=nao.horiguchi@gmail.com \
--cc=peterz@infradead.org \
--cc=rppt@kernel.org \
--cc=smita.koralahallichannabasappa@amd.com \
--cc=surenb@google.com \
--cc=targupta@nvidia.com \
--cc=tony.luck@intel.com \
--cc=u.kleine-koenig@baylibre.com \
--cc=vbabka@suse.cz \
--cc=vsethi@nvidia.com \
--cc=xueshuai@linux.alibaba.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.