From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,zhiw@nvidia.com,xueshuai@linux.alibaba.com,vsethi@nvidia.com,vbabka@suse.cz,u.kleine-koenig@baylibre.com,tony.luck@intel.com,targupta@nvidia.com,surenb@google.com,smita.koralahallichannabasappa@amd.com,rppt@kernel.org,peterz@infradead.org,nao.horiguchi@gmail.com,mochs@nvidia.com,mhocko@suse.com,mchehab@kernel.org,lorenzo.stoakes@oracle.com,linmiaohe@huawei.com,liam.howlett@oracle.com,lenb@kernel.org,kwankhede@nvidia.com,kevin.tian@intel.com,Jonathan.Cameron@huawei.com,jgg@nvidia.com,ira.weiny@intel.com,guohanjun@huawei.com,david@redhat.com,cjia@nvidia.com,bp@alien8.de,aniketa@nvidia.com,ankita@nvidia.com,akpm@linux-foundation.org
Subject: + mm-change-ghes-code-to-allow-poison-of-non-struct-pfn.patch added to mm-new branch
Date: Mon, 03 Nov 2025 18:48:17 -0800 [thread overview]
Message-ID: <20251104024817.F16BDC4CEFD@smtp.kernel.org> (raw)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 5749 bytes --]
The patch titled
Subject: mm: change ghes code to allow poison of non-struct pfn
has been added to the -mm mm-new branch. Its filename is
mm-change-ghes-code-to-allow-poison-of-non-struct-pfn.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-change-ghes-code-to-allow-poison-of-non-struct-pfn.patch
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ankit Agrawal <ankita@nvidia.com>
Subject: mm: change ghes code to allow poison of non-struct pfn
Date: Sun, 2 Nov 2025 18:44:32 +0000
Poison (or ECC) errors can be very common on a large size cluster. The
kernel MM currently handles ECC errors / poison only on memory page backed
by struct page. The handling is currently missing for the PFNMAP memory
that does not have struct pages. The series adds such support.
Implement a new ECC handling for memory without struct pages. Kernel MM
expose registration APIs to allow modules that are managing the device to
register its device memory region. MM then tracks such regions using
interval tree.
The mechanism is largely similar to that of ECC on pfn with struct pages.
If there is an ECC error on a pfn, all the mapping to it are identified
and a SIGBUS is sent to the user space processes owning those mappings.
Note that there is one primary difference versus the handling of the
poison on struct pages, which is to skip unmapping to the faulty PFN.
This is done to handle the huge PFNMAP support added recently [1] that
enables VM_PFNMAP vmas to map at PMD or PUD level. A poison to a PFN
mapped in such as way would need breaking the PMD/PUD mapping into PTEs
that will get mirrored into the S2. This can greatly increase the cost of
table walks and have a major performance impact.
nvgrace-gpu-vfio-pci module maps the device memory to user VA (Qemu) using
remap_pfn_range without being added to the kernel [2]. These device
memory PFNs are not backed by struct page. So make nvgrace-gpu-vfio-pci
module make use of the mechanism to get poison handling support on the
device memory.
This patch (of 3):
The GHES code allows calling of memory_failure() on the PFNs that pass the
pfn_valid() check. This contract is broken for the remapped PFNs which
fails the check and ghes_do_memory_failure() returns without triggering
memory_failure().
Update code to allow memory_failure() call on PFNs failing pfn_valid().
Link: https://lkml.kernel.org/r/20251102184434.2406-1-ankita@nvidia.com
Link: https://lkml.kernel.org/r/20251102184434.2406-2-ankita@nvidia.com
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Aniket Agashe <aniketa@nvidia.com>
Cc: Ankit Agrawal <ankita@nvidia.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew R. Ochs <mochs@nvidia.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Neo Jia <cjia@nvidia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tarun Gupta <targupta@nvidia.com>
Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Cc: Vikram Sethi <vsethi@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/acpi/apei/ghes.c | 6 ------
1 file changed, 6 deletions(-)
--- a/drivers/acpi/apei/ghes.c~mm-change-ghes-code-to-allow-poison-of-non-struct-pfn
+++ a/drivers/acpi/apei/ghes.c
@@ -505,12 +505,6 @@ static bool ghes_do_memory_failure(u64 p
return false;
pfn = PHYS_PFN(physical_addr);
- if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
- pr_warn_ratelimited(FW_WARN GHES_PFX
- "Invalid address in generic error data: %#llx\n",
- physical_addr);
- return false;
- }
if (flags == MF_ACTION_REQUIRED && current->mm) {
twcb = (void *)gen_pool_alloc(ghes_estatus_pool, sizeof(*twcb));
_
Patches currently in -mm which might be from ankita@nvidia.com are
mm-change-ghes-code-to-allow-poison-of-non-struct-pfn.patch
mm-handle-poisoning-of-pfn-without-struct-pages.patch
vfio-nvgrace-gpu-register-device-memory-for-poison-handling.patch
reply other threads:[~2025-11-04 2:48 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251104024817.F16BDC4CEFD@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=bp@alien8.de \
--cc=cjia@nvidia.com \
--cc=david@redhat.com \
--cc=guohanjun@huawei.com \
--cc=ira.weiny@intel.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kwankhede@nvidia.com \
--cc=lenb@kernel.org \
--cc=liam.howlett@oracle.com \
--cc=linmiaohe@huawei.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mchehab@kernel.org \
--cc=mhocko@suse.com \
--cc=mm-commits@vger.kernel.org \
--cc=mochs@nvidia.com \
--cc=nao.horiguchi@gmail.com \
--cc=peterz@infradead.org \
--cc=rppt@kernel.org \
--cc=smita.koralahallichannabasappa@amd.com \
--cc=surenb@google.com \
--cc=targupta@nvidia.com \
--cc=tony.luck@intel.com \
--cc=u.kleine-koenig@baylibre.com \
--cc=vbabka@suse.cz \
--cc=vsethi@nvidia.com \
--cc=xueshuai@linux.alibaba.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).