[PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure
@ 2026-03-19 23:30 Lisa Wang
  2026-03-19 23:30 ` [PATCH RFC v2 1/7] mm: memory_failure: Clarify the MF_DELAYED definition Lisa Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Lisa Wang @ 2026-03-19 23:30 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Lisa Wang

Here's a second revision to fix MF_DELAYED handling on memory failure.

This patch series addresses an issue in the memory failure handling path
where MF_DELAYED is incorrectly treated as an error. This issue was
discovered while testing memory failure handling for guest_memfd.

The proposed solution involves -
1. Clarifying the definition of MF_DELAYED to mean that memory failure
   handling is only partially completed, and that the metadata for the
   memory that failed (as in struct page/folio) is still referenced.
2. Updating shmem’s handling to align with the clarified definition.
3. Updating how the result of .error_remove_folio() is interpreted.

RFC v2 is a more complete solution that includes parts 1 and 2 above to
address David’s comment [1]. Selftests are included for all the above.

+ RFC v1: https://lore.kernel.org/all/cover.1760551864.git.wyihan@google.com/

[1]: https://lore.kernel.org/all/91dbea57-d5b0-49b7-8920-3a2d252c46b0@redhat.com/

Signed-off-by: Lisa Wang <wyihan@google.com>
---
Lisa Wang (7):
      mm: memory_failure: Clarify the MF_DELAYED definition
      mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED
      mm: shmem: Update shmem handler to the MF_DELAYED definition
      mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases
      mm: selftests: Add shmem memory failure test
      KVM: selftests: Add memory failure tests in guest_memfd_test
      KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables

 mm/memory-failure.c                                |  17 +-
 mm/shmem.c                                         |   2 +-
 tools/testing/selftests/kvm/guest_memfd_test.c     | 233 +++++++++++++++++++++
 tools/testing/selftests/mm/Makefile                |   3 +
 tools/testing/selftests/mm/run_vmtests.sh          |   1 +
 .../selftests/mm/shmem_memory_failure_test.c       |  98 +++++++++
 6 files changed, 344 insertions(+), 10 deletions(-)
---
base-commit: 1f318b96cc84d7c2ab792fcc0bfd42a7ca890681
change-id: 20260319-memory-failure-mf-delayed-fix-rfc-v2-5ee11d6a7260

Best regards,
-- 
Lisa Wang <wyihan@google.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH RFC v2 1/7] mm: memory_failure: Clarify the MF_DELAYED definition
  2026-03-19 23:30 [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
@ 2026-03-19 23:30 ` Lisa Wang
  2026-03-22 21:34   ` Jiaqi Yan
  2026-03-19 23:30 ` [PATCH RFC v2 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED Lisa Wang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Lisa Wang @ 2026-03-19 23:30 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Lisa Wang

This patch clarifies the definition of MF_DELAYED to represent cases
where a folio's removal is initiated but not immediately completed
(e.g., due to remaining metadata references).

Signed-off-by: Lisa Wang <wyihan@google.com>
---
 mm/memory-failure.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d4361309..4f143334d5a1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -862,9 +862,10 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
  * by the m-f() handler immediately.
  *
  * MF_DELAYED - The m-f() handler marks the page as PG_hwpoisoned'ed.
- * The page is unmapped, and is removed from the LRU or file mapping.
- * An attempt to access the page again will trigger page fault and the
- * PF handler will kill the process.
+ * It means the page was partially isolated (e.g. removed from file mapping
+ * or the LRU) but full cleanup is deferred (e.g. the metadata for the
+ * memory, as in struct page/folio, is still referenced). Any further
+ * access to the page will result in the process being killed.
  *
  * MF_RECOVERED - The m-f() handler marks the page as PG_hwpoisoned'ed.
  * The page has been completely isolated, that is, unmapped, taken out of

-- 
2.53.0.959.g497ff81fa9-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC v2 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED
  2026-03-19 23:30 [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
  2026-03-19 23:30 ` [PATCH RFC v2 1/7] mm: memory_failure: Clarify the MF_DELAYED definition Lisa Wang
@ 2026-03-19 23:30 ` Lisa Wang
  2026-03-30  7:02   ` Miaohe Lin
  2026-03-19 23:30 ` [PATCH RFC v2 3/7] mm: shmem: Update shmem handler to the MF_DELAYED definition Lisa Wang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Lisa Wang @ 2026-03-19 23:30 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Lisa Wang

The .error_remove_folio a_ops is used by different filesystems to handle
folio truncation upon discovery of a memory failure in the memory
associated with the given folio.

Currently, MF_DELAYED is treated as an error, causing "Failed to punch
page" to be written to the console. MF_DELAYED is then relayed to the
caller of truncate_error_folio() as MF_FAILED. This further causes
memory_failure() to return -EBUSY, which then always causes a SIGBUS.

This is also implies that regardless of whether the thread's memory
corruption kill policy is PR_MCE_KILL_EARLY or PR_MCE_KILL_LATE, a
memory failure with MF_DELAYED will always cause a SIGBUS.

Update truncate_error_folio() to return MF_DELAYED to the caller if the
.error_remove_folio() callback reports MF_DELAYED.

Signed-off-by: Lisa Wang <wyihan@google.com>
---
 mm/memory-failure.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 4f143334d5a1..57f7762e7418 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -941,6 +941,8 @@ static int truncate_error_folio(struct folio *folio, unsigned long pfn,
 	if (mapping->a_ops->error_remove_folio) {
 		int err = mapping->a_ops->error_remove_folio(mapping, folio);

+		if (err == MF_DELAYED)
+			return err;
 		if (err != 0)
 			pr_info("%#lx: Failed to punch page: %d\n", pfn, err);
 		else if (!filemap_release_folio(folio, GFP_NOIO))

-- 
2.53.0.959.g497ff81fa9-goog

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC v2 3/7] mm: shmem: Update shmem handler to the MF_DELAYED definition
  2026-03-19 23:30 [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
  2026-03-19 23:30 ` [PATCH RFC v2 1/7] mm: memory_failure: Clarify the MF_DELAYED definition Lisa Wang
  2026-03-19 23:30 ` [PATCH RFC v2 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED Lisa Wang
@ 2026-03-19 23:30 ` Lisa Wang
  2026-03-19 23:30 ` [PATCH RFC v2 4/7] mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases Lisa Wang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Lisa Wang @ 2026-03-19 23:30 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Lisa Wang

To align with the definition of MF_DELAYED, update
shmem_error_remove_folio() to return MF_DELAYED.

shmem handles memory failures but defers the actual file truncation. The
function's return value should therefore be MF_DELAYED to accurately
reflect the state.

Currently, this logical error does not cause a bug, because:

- For shmem folios, folio->private is not set.
- As a result, filemap_release_folio() is a no-op and returns true.
- This, in turn, causes truncate_error_folio() to incorrectly return
  MF_RECOVERED.
- The caller then treats MF_RECOVERED as a success condition, masking the
  issue.

The previous patch relays MF_DELAYED to the caller of
truncate_error_folio() before any logging, so returning MF_DELAYED from
shmem_error_remove_folio() will retain the original behavior of not
adding any logs.

The return value of truncate_error_folio() is consumed in action_result(),
which treats MF_DELAYED the same way as MF_RECOVERED, hence action_result()
also returns the same thing after this change.

Signed-off-by: Lisa Wang <wyihan@google.com>
---
 mm/shmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index b40f3cd48961..fd8f90540361 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -5207,7 +5207,7 @@ static void __init shmem_destroy_inodecache(void)
 static int shmem_error_remove_folio(struct address_space *mapping,
 				   struct folio *folio)
 {
-	return 0;
+	return MF_DELAYED;
 }
 
 static const struct address_space_operations shmem_aops = {

-- 
2.53.0.959.g497ff81fa9-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC v2 4/7] mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases
  2026-03-19 23:30 [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
                   ` (2 preceding siblings ...)
  2026-03-19 23:30 ` [PATCH RFC v2 3/7] mm: shmem: Update shmem handler to the MF_DELAYED definition Lisa Wang
@ 2026-03-19 23:30 ` Lisa Wang
  2026-03-19 23:30 ` [PATCH RFC v2 5/7] mm: selftests: Add shmem memory failure test Lisa Wang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Lisa Wang @ 2026-03-19 23:30 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Lisa Wang

Generalize extra_pins handling to all MF_DELAYED cases not only
shmem_mapping.

If MF_DELAYED is returned, the filemap continues to hold refcounts on the
folio. Hence, take that into account when checking for extra refcounts.

As clarified in an earlier patch, a return value of MF_DELAYED implies that
the page still has elevated refcounts. Hence, set extra_pins to true if the
return value is MF_DELAYED. This is aligned with the implementation in
me_swapcache_dirty(), where, if a folio is still in the swap cache, ret is
set to MF_DELAYED and extra_pins is set to true.

Signed-off-by: Lisa Wang <wyihan@google.com>
---
 mm/memory-failure.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 57f7762e7418..86b6f7ba5d3a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1052,18 +1052,14 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
 		goto out;
 	}
 
-	/*
-	 * The shmem page is kept in page cache instead of truncating
-	 * so is expected to have an extra refcount after error-handling.
-	 */
-	extra_pins = shmem_mapping(mapping);
-
 	/*
 	 * Truncation is a bit tricky. Enable it per file system for now.
 	 *
 	 * Open: to take i_rwsem or not for this? Right now we don't.
 	 */
 	ret = truncate_error_folio(folio, page_to_pfn(p), mapping);
+
+	extra_pins = ret == MF_DELAYED;
 	if (has_extra_refcount(ps, p, extra_pins))
 		ret = MF_FAILED;
 

-- 
2.53.0.959.g497ff81fa9-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC v2 5/7] mm: selftests: Add shmem memory failure test
  2026-03-19 23:30 [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
                   ` (3 preceding siblings ...)
  2026-03-19 23:30 ` [PATCH RFC v2 4/7] mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases Lisa Wang
@ 2026-03-19 23:30 ` Lisa Wang
  2026-03-21  6:30   ` Baolin Wang
  2026-03-19 23:30 ` [PATCH RFC v2 6/7] KVM: selftests: Add memory failure tests in guest_memfd_test Lisa Wang
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Lisa Wang @ 2026-03-19 23:30 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Lisa Wang

Add a shmem memory failure selftest to test the shmem memory failure is
correct after modifying shmem return value.

Test that
+ madvise() call returns 0 at the first time
+ trigger a SIGBUS when the poisoned shmem page is fault-in again.

Signed-off-by: Lisa Wang <wyihan@google.com>
---
 tools/testing/selftests/mm/Makefile                |  3 +
 tools/testing/selftests/mm/run_vmtests.sh          |  1 +
 .../selftests/mm/shmem_memory_failure_test.c       | 98 ++++++++++++++++++++++
 3 files changed, 102 insertions(+)

diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 7a5de4e9bf52..ac033851c9eb 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -72,6 +72,7 @@ TEST_GEN_FILES += madv_populate
 TEST_GEN_FILES += map_fixed_noreplace
 TEST_GEN_FILES += map_hugetlb
 TEST_GEN_FILES += map_populate
+TEST_GEN_FILES += shmem_memory_failure_test
 ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64 loongarch32 loongarch64))
 TEST_GEN_FILES += memfd_secret
 endif
@@ -259,6 +260,8 @@ $(OUTPUT)/migration: LDLIBS += -lnuma
 
 $(OUTPUT)/rmap: LDLIBS += -lnuma
 
+$(OUTPUT)/shmem_memory_failure_test: CFLAGS += -I$(top_srcdir)/tools/include
+
 local_config.mk local_config.h: check_config.sh
 	CC="$(CC)" CFLAGS="$(CFLAGS)" ./check_config.sh
 
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index afdcfd0d7cef..58fb959a7936 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -402,6 +402,7 @@ CATEGORY="hugetlb" run_test ./hugetlb-soft-offline
 echo "$nr_hugepages_tmp" > /proc/sys/vm/nr_hugepages
 echo "$enable_soft_offline" > /proc/sys/vm/enable_soft_offline
 CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison
+CATEGORY="mmap" run_test ./shmem_memory_failure_test
 fi
 
 if [ $VADDR64 -ne 0 ]; then
diff --git a/tools/testing/selftests/mm/shmem_memory_failure_test.c b/tools/testing/selftests/mm/shmem_memory_failure_test.c
new file mode 100644
index 000000000000..44752024a7fc
--- /dev/null
+++ b/tools/testing/selftests/mm/shmem_memory_failure_test.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This test makes sure when memory failure happens, shmem can handle
+ * successfully.
+ */
+#include <linux/compiler.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <errno.h>
+#include "kselftest.h"
+#include "vm_util.h"
+
+static sigjmp_buf sigbuf;
+
+static void signal_handler(int sig, siginfo_t *info, void *ucontext)
+{
+	siglongjmp(sigbuf, 1);
+}
+
+static void set_signal_handler(int sig, void (*handler)(int, siginfo_t *, void *))
+{
+	struct sigaction sa = {};
+
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, NULL) == -1)
+		ksft_exit_fail_msg("Failed to set SIGBUS handler: %s\n", strerror(errno));
+}
+
+static unsigned long addr_to_pfn(char *addr)
+{
+	int pagemap_fd;
+	unsigned long pfn;
+
+	pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
+	if (pagemap_fd < 0)
+		ksft_exit_fail_msg("Failed to open /proc/self/pagemap: %s\n", strerror(errno));
+	pfn = pagemap_get_pfn(pagemap_fd, addr);
+	close(pagemap_fd);
+
+	return pfn;
+}
+
+static void test_shmem_memory_failure(size_t total_size, size_t page_size)
+{
+	unsigned long memory_failure_pfn;
+	char *memory_failure_mem;
+	char *memory_failure_addr;
+	int fd;
+
+	fd = memfd_create("shmem_hwpoison_test", 0);
+	if (fd < 0)
+		ksft_exit_skip("memfd_create failed: %s\n", strerror(errno));
+
+	if (ftruncate(fd, total_size) < 0)
+		ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno));
+
+	memory_failure_mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	if (memory_failure_mem == MAP_FAILED)
+		ksft_exit_fail_msg("mmap failed: %s\n", strerror(errno));
+	memory_failure_addr = memory_failure_mem + page_size;
+	READ_ONCE(memory_failure_addr[0]);
+	memory_failure_pfn = addr_to_pfn(memory_failure_addr);
+
+	if (madvise(memory_failure_addr, page_size, MADV_HWPOISON) != 0)
+		ksft_exit_fail_msg("MADV_HWPOISON failed: %s\n", strerror(errno));
+
+	if (sigsetjmp(sigbuf, 1) == 0) {
+		READ_ONCE(memory_failure_addr[0]);
+		ksft_test_result_fail("Read from poisoned page should have triggered SIGBUS\n");
+	} else {
+		ksft_test_result_pass("SIGBUS triggered as expected on poisoned page\n");
+	}
+
+	munmap(memory_failure_mem, total_size);
+	close(fd);
+	if (unpoison_memory(memory_failure_pfn) < 0)
+		ksft_exit_fail_msg("unpoison_memory failed: %s\n", strerror(errno));
+}
+
+int main(int argc, char *argv[])
+{
+	const size_t pagesize = getpagesize();
+
+	ksft_print_header();
+	ksft_set_plan(1);
+
+	set_signal_handler(SIGBUS, signal_handler);
+	test_shmem_memory_failure(pagesize * 4, pagesize);
+	ksft_finished();
+}

-- 
2.53.0.959.g497ff81fa9-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC v2 6/7] KVM: selftests: Add memory failure tests in guest_memfd_test
  2026-03-19 23:30 [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
                   ` (4 preceding siblings ...)
  2026-03-19 23:30 ` [PATCH RFC v2 5/7] mm: selftests: Add shmem memory failure test Lisa Wang
@ 2026-03-19 23:30 ` Lisa Wang
  2026-03-30  7:20   ` Miaohe Lin
  2026-03-19 23:30 ` [PATCH RFC v2 7/7] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables Lisa Wang
  2026-03-20  2:39 ` [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Andrew Morton
  7 siblings, 1 reply; 19+ messages in thread
From: Lisa Wang @ 2026-03-19 23:30 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Lisa Wang

After modifying truncate_error_folio(), we expect memory_failure() will
return 0 instead of MF_FAILED. Also, we want to make sure memory_failure()
signaling function is same.

Test that memory_failure() returns 0 for guest_memfd, where
.error_remove_folio() is handled by not actually truncating, and returning
MF_DELAYED.

In addition, test that SIGBUS signaling behavior is not changed before
and after this modification.

There are two kinds of guest memory failure injections - madvise or
debugfs. When memory failure is injected using madvise, the
MF_ACTION_REQUIRED flag is set, and the page is mapped and dirty, the
process should get a SIGBUS. When memory is failure is injected using
debugfs, the KILL_EARLY machine check memory corruption kill policy is
set, and the page is mapped and dirty, the process should get a SIGBUS.

Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Lisa Wang <wyihan@google.com>
---
 tools/testing/selftests/kvm/guest_memfd_test.c | 168 +++++++++++++++++++++++++
 1 file changed, 168 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 618c937f3c90..445e8155ee1e 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -10,6 +10,8 @@
 #include <errno.h>
 #include <stdio.h>
 #include <fcntl.h>
+#include <linux/prctl.h>
+#include <sys/prctl.h>
 
 #include <linux/bitmap.h>
 #include <linux/falloc.h>
@@ -193,6 +195,171 @@ static void test_fault_overflow(int fd, size_t total_size)
 	test_fault_sigbus(fd, total_size, total_size * 4);
 }
 
+static unsigned long addr_to_pfn(void *addr)
+{
+	const uint64_t pagemap_pfn_mask = BIT(54) - 1;
+	const uint64_t pagemap_page_present = BIT(63);
+	uint64_t page_info;
+	ssize_t n_bytes;
+	int pagemap_fd;
+
+	pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
+	TEST_ASSERT(pagemap_fd > 0, "Opening pagemap should succeed.");
+
+	n_bytes = pread(pagemap_fd, &page_info, 8, (uint64_t)addr / page_size * 8);
+	TEST_ASSERT(n_bytes == 8, "pread of pagemap failed. n_bytes=%ld", n_bytes);
+
+	close(pagemap_fd);
+
+	TEST_ASSERT(page_info & pagemap_page_present, "The page for addr should be present");
+	return page_info & pagemap_pfn_mask;
+}
+
+static void write_memory_failure(unsigned long pfn, bool mark, int return_code)
+{
+	char path[PATH_MAX];
+	char *filename;
+	char buf[20];
+	int ret;
+	int len;
+	int fd;
+
+	filename = mark ? "corrupt-pfn" : "unpoison-pfn";
+	snprintf(path, PATH_MAX, "/sys/kernel/debug/hwpoison/%s", filename);
+
+	fd = open(path, O_WRONLY);
+	TEST_ASSERT(fd > 0, "Failed to open %s.", path);
+
+	len = snprintf(buf, sizeof(buf), "0x%lx\n", pfn);
+	if (len < 0 || (unsigned int)len > sizeof(buf))
+		TEST_ASSERT(0, "snprintf failed or truncated.");
+
+	ret = write(fd, buf, len);
+	if (return_code == 0) {
+		/*
+		 * If the memory_failure() returns 0, write() should be successful,
+		 * which returns how many bytes it writes.
+		 */
+		TEST_ASSERT(ret > 0, "Writing memory failure (path: %s) failed: %s", path,
+			    strerror(errno));
+	} else {
+		TEST_ASSERT_EQ(ret, -1);
+		/* errno is memory_failure() return code. */
+		TEST_ASSERT_EQ(errno, return_code);
+	}
+
+	close(fd);
+}
+
+static void mark_memory_failure(unsigned long pfn, int return_code)
+{
+	write_memory_failure(pfn, true, return_code);
+}
+
+static void unmark_memory_failure(unsigned long pfn, int return_code)
+{
+	write_memory_failure(pfn, false, return_code);
+}
+
+enum memory_failure_injection_method {
+	MF_INJECT_DEBUGFS,
+	MF_INJECT_MADVISE,
+};
+
+static void do_test_memory_failure(int fd, size_t total_size,
+				   enum memory_failure_injection_method method, int kill_config,
+				   bool map_page, bool dirty_page, bool sigbus_expected,
+				   int return_code)
+{
+	unsigned long memory_failure_pfn;
+	char *memory_failure_addr;
+	char *mem;
+	int ret;
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmap() for guest_memfd should succeed.");
+	memory_failure_addr = mem + page_size;
+	if (dirty_page)
+		*memory_failure_addr = 'A';
+	else
+		READ_ONCE(*memory_failure_addr);
+
+	/* Fault in page to read pfn, then unmap page for testing if needed. */
+	memory_failure_pfn = addr_to_pfn(memory_failure_addr);
+	if (!map_page)
+		madvise(memory_failure_addr, page_size, MADV_DONTNEED);
+
+	ret = prctl(PR_MCE_KILL, PR_MCE_KILL_SET, kill_config, 0, 0);
+	TEST_ASSERT_EQ(ret, 0);
+
+	ret = 0;
+	switch (method) {
+	case MF_INJECT_DEBUGFS: {
+		/* DEBUGFS injection handles return_code test inside the mark_memory_failure(). */
+		if (sigbus_expected)
+			TEST_EXPECT_SIGBUS(mark_memory_failure(memory_failure_pfn, return_code));
+		else
+			mark_memory_failure(memory_failure_pfn, return_code);
+		break;
+	}
+	case MF_INJECT_MADVISE: {
+		/*
+		 * MADV_HWPOISON uses get_user_pages() so the page will always
+		 * be faulted in at the point of memory_failure()
+		 */
+		if (sigbus_expected)
+			TEST_EXPECT_SIGBUS(ret = madvise(memory_failure_addr,
+							 page_size, MADV_HWPOISON));
+		else
+			ret = madvise(memory_failure_addr, page_size, MADV_HWPOISON);
+
+		if (return_code == 0)
+			TEST_ASSERT(ret == return_code, "Memory failure failed. Errno: %s",
+							strerror(errno));
+		else {
+			/* errno is memory_failure() return code. */
+			TEST_ASSERT_EQ(errno, return_code);
+		}
+		break;
+	}
+	default:
+		TEST_FAIL("Unhandled memory failure injection method %d.", method);
+	}
+
+	TEST_EXPECT_SIGBUS(READ_ONCE(*memory_failure_addr));
+	TEST_EXPECT_SIGBUS(*memory_failure_addr = 'A');
+
+	ret = munmap(mem, total_size);
+	TEST_ASSERT(!ret, "munmap() should succeed.");
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
+			total_size);
+	TEST_ASSERT(!ret, "Truncate the entire file (cleanup) should succeed.");
+
+	ret = prctl(PR_MCE_KILL, PR_MCE_KILL_SET, PR_MCE_KILL_DEFAULT, 0, 0);
+	TEST_ASSERT_EQ(ret, 0);
+
+	unmark_memory_failure(memory_failure_pfn, 0);
+}
+
+static void test_memory_failure(int fd, size_t total_size)
+{
+	do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_EARLY, true, true, true, 0);
+	do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_EARLY, true, false, false, 0);
+	do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_EARLY, false, true, false, 0);
+	do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_LATE, true, true, false, 0);
+	do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_LATE, true, false, false, 0);
+	do_test_memory_failure(fd, total_size, MF_INJECT_DEBUGFS, PR_MCE_KILL_LATE, false, true, false, 0);
+	/*
+	 * If madvise() is used to inject errors, memory_failure() handling is invoked with the
+	 * MF_ACTION_REQUIRED flag set, aligned with memory failure handling for a consumed memory
+	 * error, where the machine check memory corruption kill policy is ignored. Hence, testing with
+	 * PR_MCE_KILL_DEFAULT covers all cases.
+	 */
+	do_test_memory_failure(fd, total_size, MF_INJECT_MADVISE, PR_MCE_KILL_DEFAULT, true, true, true, 0);
+	do_test_memory_failure(fd, total_size, MF_INJECT_MADVISE, PR_MCE_KILL_DEFAULT, true, false, false, 0);
+}
+
 static void test_fault_private(int fd, size_t total_size)
 {
 	test_fault_sigbus(fd, 0, total_size);
@@ -370,6 +537,7 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
 			gmem_test(mmap_supported, vm, flags);
 			gmem_test(fault_overflow, vm, flags);
 			gmem_test(numa_allocation, vm, flags);
+			gmem_test(memory_failure, vm, flags);
 		} else {
 			gmem_test(fault_private, vm, flags);
 		}

-- 
2.53.0.959.g497ff81fa9-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC v2 7/7] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables
  2026-03-19 23:30 [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
                   ` (5 preceding siblings ...)
  2026-03-19 23:30 ` [PATCH RFC v2 6/7] KVM: selftests: Add memory failure tests in guest_memfd_test Lisa Wang
@ 2026-03-19 23:30 ` Lisa Wang
  2026-03-20  2:39 ` [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Andrew Morton
  7 siblings, 0 replies; 19+ messages in thread
From: Lisa Wang @ 2026-03-19 23:30 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Lisa Wang

Test that
+ memory failure handling results in unmapping of bad memory from stage
  2 page tables, hence requiring faulting on next guest access
+ when the guest tries to fault a poisoned page from guest_memfd, the
  userspace VMM informed with EHWPOISON

Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Lisa Wang <wyihan@google.com>
---
 tools/testing/selftests/kvm/guest_memfd_test.c | 65 ++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 445e8155ee1e..50907875dc43 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -637,6 +637,70 @@ static void test_guest_memfd_guest(void)
 	kvm_vm_free(vm);
 }
 
+static void __guest_code_read(uint8_t *mem)
+{
+	READ_ONCE(*mem);
+	GUEST_SYNC(0);
+	READ_ONCE(*mem);
+	GUEST_DONE();
+}
+
+static void guest_read(struct kvm_vcpu *vcpu, uint64_t gpa, int expected_errno)
+{
+	vcpu_args_set(vcpu, 1, gpa);
+
+	if (expected_errno) {
+		TEST_ASSERT_EQ(_vcpu_run(vcpu), -1);
+		TEST_ASSERT_EQ(errno, expected_errno);
+	} else {
+		vcpu_run(vcpu);
+		TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_SYNC);
+	}
+}
+
+static void test_memory_failure_guest(void)
+{
+	const uint64_t gpa = SZ_4G;
+	const int slot = 1;
+
+	unsigned long memory_failure_pfn;
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	uint8_t *mem;
+	size_t size;
+	int fd;
+
+	if (!kvm_has_cap(KVM_CAP_GUEST_MEMFD_FLAGS))
+		return;
+
+	vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1, __guest_code_read);
+
+	size = vm->page_size;
+	fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED);
+	vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
+
+	mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmap() for guest_memfd should succeed.");
+	virt_pg_map(vm, gpa, gpa);
+
+	/* Fault in page to read pfn, then unmap page for testing. */
+	READ_ONCE(*mem);
+	memory_failure_pfn = addr_to_pfn(mem);
+	munmap(mem, size);
+
+	/* Fault page into stage2 page tables. */
+	guest_read(vcpu, gpa, 0);
+
+	mark_memory_failure(memory_failure_pfn, 0);
+
+	guest_read(vcpu, gpa, EHWPOISON);
+
+	close(fd);
+	kvm_vm_free(vm);
+
+	unmark_memory_failure(memory_failure_pfn, 0);
+}
+
 int main(int argc, char *argv[])
 {
 	unsigned long vm_types, vm_type;
@@ -657,4 +721,5 @@ int main(int argc, char *argv[])
 		test_guest_memfd(vm_type);
 
 	test_guest_memfd_guest();
+	test_memory_failure_guest();
 }

-- 
2.53.0.959.g497ff81fa9-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure
  2026-03-19 23:30 [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
                   ` (6 preceding siblings ...)
  2026-03-19 23:30 ` [PATCH RFC v2 7/7] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables Lisa Wang
@ 2026-03-20  2:39 ` Andrew Morton
  7 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2026-03-20  2:39 UTC (permalink / raw)
  To: Lisa Wang
  Cc: Miaohe Lin, Naoya Horiguchi, Paolo Bonzini, Shuah Khan,
	Hugh Dickins, Baolin Wang, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest, rientjes, seanjc, ackerleytng, vannapurve,
	michael.roth, jiaqiyan, tabba, dave.hansen

On Thu, 19 Mar 2026 23:30:27 +0000 Lisa Wang <wyihan@google.com> wrote:

> Here's a second revision to fix MF_DELAYED handling on memory failure.
> 
> This patch series addresses an issue in the memory failure handling path
> where MF_DELAYED is incorrectly treated as an error. This issue was
> discovered while testing memory failure handling for guest_memfd.
> 
> The proposed solution involves -
> 1. Clarifying the definition of MF_DELAYED to mean that memory failure
>    handling is only partially completed, and that the metadata for the
>    memory that failed (as in struct page/folio) is still referenced.
> 2. Updating shmem’s handling to align with the clarified definition.
> 3. Updating how the result of .error_remove_folio() is interpreted.
> 
> RFC v2 is a more complete solution that includes parts 1 and 2 above to
> address David’s comment [1]. Selftests are included for all the above.

A few questions from Sashiko:
	https://sashiko.dev/#/patchset/20260319-memory-failure-mf-delayed-fix-rfc-v2-v2-0-92c596402a7a%40google.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 5/7] mm: selftests: Add shmem memory failure test
  2026-03-19 23:30 ` [PATCH RFC v2 5/7] mm: selftests: Add shmem memory failure test Lisa Wang
@ 2026-03-21  6:30   ` Baolin Wang
  2026-03-24  0:43     ` Lisa Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Baolin Wang @ 2026-03-21  6:30 UTC (permalink / raw)
  To: Lisa Wang, Miaohe Lin, Naoya Horiguchi, Andrew Morton,
	Paolo Bonzini, Shuah Khan, Hugh Dickins, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen



On 3/20/26 7:30 AM, Lisa Wang wrote:
> Add a shmem memory failure selftest to test the shmem memory failure is
> correct after modifying shmem return value.
> 
> Test that
> + madvise() call returns 0 at the first time
> + trigger a SIGBUS when the poisoned shmem page is fault-in again.
> 
> Signed-off-by: Lisa Wang <wyihan@google.com>
> ---

Why not move the shmem memory failure test into memory-failure.c?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 1/7] mm: memory_failure: Clarify the MF_DELAYED definition
  2026-03-19 23:30 ` [PATCH RFC v2 1/7] mm: memory_failure: Clarify the MF_DELAYED definition Lisa Wang
@ 2026-03-22 21:34   ` Jiaqi Yan
  2026-03-23 21:18     ` Lisa Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Jiaqi Yan @ 2026-03-22 21:34 UTC (permalink / raw)
  To: Lisa Wang
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest, rientjes, seanjc, ackerleytng, vannapurve,
	michael.roth, tabba, dave.hansen

On Thu, Mar 19, 2026 at 4:30 PM Lisa Wang <wyihan@google.com> wrote:
>
> This patch clarifies the definition of MF_DELAYED to represent cases
> where a folio's removal is initiated but not immediately completed
> (e.g., due to remaining metadata references).
>
> Signed-off-by: Lisa Wang <wyihan@google.com>
> ---
>  mm/memory-failure.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ee42d4361309..4f143334d5a1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -862,9 +862,10 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
>   * by the m-f() handler immediately.
>   *
>   * MF_DELAYED - The m-f() handler marks the page as PG_hwpoisoned'ed.

nit: would it be worth correcting PG_hwpoisoned to PG_hwpoison'ed? as
there is really no "PG_hwpoisoned" page flag.

> - * The page is unmapped, and is removed from the LRU or file mapping.
> - * An attempt to access the page again will trigger page fault and the
> - * PF handler will kill the process.
> + * It means the page was partially isolated (e.g. removed from file mapping

nit: what about "unmapped"?

> + * or the LRU) but full cleanup is deferred (e.g. the metadata for the
> + * memory, as in struct page/folio, is still referenced). Any further
> + * access to the page will result in the process being killed.
>   *
>   * MF_RECOVERED - The m-f() handler marks the page as PG_hwpoisoned'ed.
>   * The page has been completely isolated, that is, unmapped, taken out of
>
> --
> 2.53.0.959.g497ff81fa9-goog
>

Reviewed-by: Jiaqi Yan <jiaqiyan@google.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 1/7] mm: memory_failure: Clarify the MF_DELAYED definition
  2026-03-22 21:34   ` Jiaqi Yan
@ 2026-03-23 21:18     ` Lisa Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Lisa Wang @ 2026-03-23 21:18 UTC (permalink / raw)
  To: Jiaqi Yan
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest, rientjes, seanjc, ackerleytng, vannapurve,
	michael.roth, tabba, dave.hansen

On Sun, Mar 22, 2026 at 02:34:59PM -0700, Jiaqi Yan wrote:
> >   [...snip...]
> >   * MF_DELAYED - The m-f() handler marks the page as PG_hwpoisoned'ed.
> 
> nit: would it be worth correcting PG_hwpoisoned to PG_hwpoison'ed? as
> there is really no "PG_hwpoisoned" page flag.

I will change PG_hwpoisoned'ed to PG_hwpoison'ed in the next version.

> > - * The page is unmapped, and is removed from the LRU or file mapping.
> > - * An attempt to access the page again will trigger page fault and the
> > - * PF handler will kill the process.
> > + * It means the page was partially isolated (e.g. removed from file mapping
> 
> nit: what about "unmapped"?

Thanks to point this out.
I will change to "It means the page was unmapped and partially isolated
(e.g. ..." in the next version.

> > + * or the LRU) but full cleanup is deferred (e.g. the metadata for the
> > + * memory, as in struct page/folio, is still referenced). Any further
> > + * access to the page will result in the process being killed.
> >   *
> >   * MF_RECOVERED - The m-f() handler marks the page as PG_hwpoisoned'ed.
> >   * The page has been completely isolated, that is, unmapped, taken out of
> >
> > --
> > 2.53.0.959.g497ff81fa9-goog
> >
> 
> Reviewed-by: Jiaqi Yan <jiaqiyan@google.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 5/7] mm: selftests: Add shmem memory failure test
  2026-03-21  6:30   ` Baolin Wang
@ 2026-03-24  0:43     ` Lisa Wang
  2026-03-24 12:36       ` Baolin Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Lisa Wang @ 2026-03-24  0:43 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest, rientjes, seanjc, ackerleytng, vannapurve,
	michael.roth, jiaqiyan, tabba, dave.hansen

On Sat, Mar 21, 2026 at 02:30:04PM +0800, Baolin Wang wrote:
> 
> 
> On 3/20/26 7:30 AM, Lisa Wang wrote:
> > Add a shmem memory failure selftest to test the shmem memory failure is
> > correct after modifying shmem return value.
> > 
> > Test that
> > + madvise() call returns 0 at the first time
> > + trigger a SIGBUS when the poisoned shmem page is fault-in again.
> > 
> > Signed-off-by: Lisa Wang <wyihan@google.com>
> > ---
> 
> Why not move the shmem memory failure test into memory-failure.c?

Do you mean let memory-failure.c kernel code check by itself?
The reason I write the selftest instead of combining in memory-failure.c
is because
+ do not need extra checking code in kernel code
+ make it easier to trace the entire execution flow, starting from the
  madvise() down through shmem_error_remove_folio() and into the
  truncate_error_folio() logic.

Pleas let me know if I've missed something. Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 5/7] mm: selftests: Add shmem memory failure test
  2026-03-24  0:43     ` Lisa Wang
@ 2026-03-24 12:36       ` Baolin Wang
  2026-03-28  0:40         ` Lisa Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Baolin Wang @ 2026-03-24 12:36 UTC (permalink / raw)
  To: Lisa Wang
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest, rientjes, seanjc, ackerleytng, vannapurve,
	michael.roth, jiaqiyan, tabba, dave.hansen



On 3/24/26 8:43 AM, Lisa Wang wrote:
> On Sat, Mar 21, 2026 at 02:30:04PM +0800, Baolin Wang wrote:
>>
>>
>> On 3/20/26 7:30 AM, Lisa Wang wrote:
>>> Add a shmem memory failure selftest to test the shmem memory failure is
>>> correct after modifying shmem return value.
>>>
>>> Test that
>>> + madvise() call returns 0 at the first time
>>> + trigger a SIGBUS when the poisoned shmem page is fault-in again.
>>>
>>> Signed-off-by: Lisa Wang <wyihan@google.com>
>>> ---
>>
>> Why not move the shmem memory failure test into memory-failure.c?
> 
> Do you mean let memory-failure.c kernel code check by itself?
> The reason I write the selftest instead of combining in memory-failure.c
> is because
> + do not need extra checking code in kernel code
> + make it easier to trace the entire execution flow, starting from the
>    madvise() down through shmem_error_remove_folio() and into the
>    truncate_error_folio() logic.
> 
> Pleas let me know if I've missed something. Thanks!

That's not quite what I meant. I mean, since there is already a 
memory-failure.c in mm selftests (see [1]), I think we should move the 
shmem memory failure test cases into that file.

[1] 
https://lore.kernel.org/all/20260206031639.2707102-1-linmiaohe@huawei.com/T/#m18e62ccb3e87316ec37dcde9389c1ba1c56d0951

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 5/7] mm: selftests: Add shmem memory failure test
  2026-03-24 12:36       ` Baolin Wang
@ 2026-03-28  0:40         ` Lisa Wang
  2026-03-30  7:12           ` Miaohe Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Lisa Wang @ 2026-03-28  0:40 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest, rientjes, seanjc, ackerleytng, vannapurve,
	michael.roth, jiaqiyan, tabba, dave.hansen

On Tue, Mar 24, 2026 at 08:36:36PM +0800, Baolin Wang wrote:
> 
> 
> On 3/24/26 8:43 AM, Lisa Wang wrote:
> > On Sat, Mar 21, 2026 at 02:30:04PM +0800, Baolin Wang wrote:
> > > 
> > > 
> > > On 3/20/26 7:30 AM, Lisa Wang wrote:
> > > > Add a shmem memory failure selftest to test the shmem memory failure is
> > > > correct after modifying shmem return value.
> > > > 
> > > > Test that
> > > > + madvise() call returns 0 at the first time
> > > > + trigger a SIGBUS when the poisoned shmem page is fault-in again.
> > > > 
> > > > Signed-off-by: Lisa Wang <wyihan@google.com>
> > > > ---
> > > 
> > > Why not move the shmem memory failure test into memory-failure.c?
> > 
> > Do you mean let memory-failure.c kernel code check by itself?
> > The reason I write the selftest instead of combining in memory-failure.c
> > is because
> > + do not need extra checking code in kernel code
> > + make it easier to trace the entire execution flow, starting from the
> >    madvise() down through shmem_error_remove_folio() and into the
> >    truncate_error_folio() logic.
> > 
> > Pleas let me know if I've missed something. Thanks!
> 
> That's not quite what I meant. I mean, since there is already a
> memory-failure.c in mm selftests (see [1]), I think we should move the shmem
> memory failure test cases into that file.

Got it. Thank you for pointing out.
Is anyone currently working on the shmem memory failure test? If not, I
will merge it into my next version.

I have a question regarding the current implementation:
```
ret = sigsetjmp(signal_jmp_buf, 1);
if (!self->triggered) {
	self->triggered = true;
	ASSERT_EQ(variant->inject(self, addr), 0);
	FORCE_READ(*addr);
}
```
Here is difficult to distinguish whether the SIGBUS is triggered by the
injection or the read operation. I am considering splitting these into
two separate SIGBUS jump blocks. Is it reasonable for me to split them?


> [1] https://lore.kernel.org/all/20260206031639.2707102-1-linmiaohe@huawei.com/T/#m18e62ccb3e87316ec37dcde9389c1ba1c56d0951

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED
  2026-03-19 23:30 ` [PATCH RFC v2 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED Lisa Wang
@ 2026-03-30  7:02   ` Miaohe Lin
  2026-04-03 22:31     ` Lisa Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Miaohe Lin @ 2026-03-30  7:02 UTC (permalink / raw)
  To: Lisa Wang
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest

On 2026/3/20 7:30, Lisa Wang wrote:
> The .error_remove_folio a_ops is used by different filesystems to handle
> folio truncation upon discovery of a memory failure in the memory
> associated with the given folio.
> 
> Currently, MF_DELAYED is treated as an error, causing "Failed to punch
> page" to be written to the console. MF_DELAYED is then relayed to the
> caller of truncate_error_folio() as MF_FAILED. This further causes
> memory_failure() to return -EBUSY, which then always causes a SIGBUS.
> 
> This is also implies that regardless of whether the thread's memory
> corruption kill policy is PR_MCE_KILL_EARLY or PR_MCE_KILL_LATE, a
> memory failure with MF_DELAYED will always cause a SIGBUS.
> 
> Update truncate_error_folio() to return MF_DELAYED to the caller if the
> .error_remove_folio() callback reports MF_DELAYED.
> 
> Signed-off-by: Lisa Wang <wyihan@google.com>
> ---
>  mm/memory-failure.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 4f143334d5a1..57f7762e7418 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -941,6 +941,8 @@ static int truncate_error_folio(struct folio *folio, unsigned long pfn,
>  	if (mapping->a_ops->error_remove_folio) {
>  		int err = mapping->a_ops->error_remove_folio(mapping, folio);
>  
> +		if (err == MF_DELAYED)
> +			return err;

Will it be better to add a pr_info here to provide some information for users?

Thanks.
.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 5/7] mm: selftests: Add shmem memory failure test
  2026-03-28  0:40         ` Lisa Wang
@ 2026-03-30  7:12           ` Miaohe Lin
  0 siblings, 0 replies; 19+ messages in thread
From: Miaohe Lin @ 2026-03-30  7:12 UTC (permalink / raw)
  To: Lisa Wang
  Cc: Naoya Horiguchi, Andrew Morton, Paolo Bonzini, Shuah Khan,
	Hugh Dickins, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel, kvm, linux-kselftest, rientjes, seanjc,
	ackerleytng, vannapurve, michael.roth, jiaqiyan, tabba,
	dave.hansen, Baolin Wang

On 2026/3/28 8:40, Lisa Wang wrote:
> On Tue, Mar 24, 2026 at 08:36:36PM +0800, Baolin Wang wrote:
>>
>>
>> On 3/24/26 8:43 AM, Lisa Wang wrote:
>>> On Sat, Mar 21, 2026 at 02:30:04PM +0800, Baolin Wang wrote:
>>>>
>>>>
>>>> On 3/20/26 7:30 AM, Lisa Wang wrote:
>>>>> Add a shmem memory failure selftest to test the shmem memory failure is
>>>>> correct after modifying shmem return value.
>>>>>
>>>>> Test that
>>>>> + madvise() call returns 0 at the first time
>>>>> + trigger a SIGBUS when the poisoned shmem page is fault-in again.
>>>>>
>>>>> Signed-off-by: Lisa Wang <wyihan@google.com>
>>>>> ---
>>>>
>>>> Why not move the shmem memory failure test into memory-failure.c?
>>>
>>> Do you mean let memory-failure.c kernel code check by itself?
>>> The reason I write the selftest instead of combining in memory-failure.c
>>> is because
>>> + do not need extra checking code in kernel code
>>> + make it easier to trace the entire execution flow, starting from the
>>>    madvise() down through shmem_error_remove_folio() and into the
>>>    truncate_error_folio() logic.
>>>
>>> Pleas let me know if I've missed something. Thanks!
>>
>> That's not quite what I meant. I mean, since there is already a
>> memory-failure.c in mm selftests (see [1]), I think we should move the shmem
>> memory failure test cases into that file.
> 
> Got it. Thank you for pointing out.
> Is anyone currently working on the shmem memory failure test? If not, I
> will merge it into my next version.

I'm working on shmem testcases. But please feel free to add it. I could move to
work on other scenarios.

> 
> I have a question regarding the current implementation:
> ```
> ret = sigsetjmp(signal_jmp_buf, 1);
> if (!self->triggered) {
> 	self->triggered = true;
> 	ASSERT_EQ(variant->inject(self, addr), 0);
> 	FORCE_READ(*addr);
> }
> ```
> Here is difficult to distinguish whether the SIGBUS is triggered by the
> injection or the read operation. I am considering splitting these into
> two separate SIGBUS jump blocks. Is it reasonable for me to split them?

It might be better to add two separate testcases, i.e. one for SIGBUS triggered
by injection, another one for SIGBUS triggered by read operation if possible.

Thanks.
.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 6/7] KVM: selftests: Add memory failure tests in guest_memfd_test
  2026-03-19 23:30 ` [PATCH RFC v2 6/7] KVM: selftests: Add memory failure tests in guest_memfd_test Lisa Wang
@ 2026-03-30  7:20   ` Miaohe Lin
  0 siblings, 0 replies; 19+ messages in thread
From: Miaohe Lin @ 2026-03-30  7:20 UTC (permalink / raw)
  To: Lisa Wang
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest

On 2026/3/20 7:30, Lisa Wang wrote:
> After modifying truncate_error_folio(), we expect memory_failure() will
> return 0 instead of MF_FAILED. Also, we want to make sure memory_failure()
> signaling function is same.
> 
> Test that memory_failure() returns 0 for guest_memfd, where
> .error_remove_folio() is handled by not actually truncating, and returning
> MF_DELAYED.
> 
> In addition, test that SIGBUS signaling behavior is not changed before
> and after this modification.
> 
> There are two kinds of guest memory failure injections - madvise or
> debugfs. When memory failure is injected using madvise, the
> MF_ACTION_REQUIRED flag is set, and the page is mapped and dirty, the
> process should get a SIGBUS. When memory is failure is injected using
> debugfs, the KILL_EARLY machine check memory corruption kill policy is
> set, and the page is mapped and dirty, the process should get a SIGBUS.
> 
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Lisa Wang <wyihan@google.com>

Should we add a testcase for hugetlbfs? It seems hugetlbfs_error_remove_folio() behaves same as shmem.

Thanks.
.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC v2 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED
  2026-03-30  7:02   ` Miaohe Lin
@ 2026-04-03 22:31     ` Lisa Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Lisa Wang @ 2026-04-03 22:31 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
	tabba, dave.hansen, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
	Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
	linux-kselftest

> On Mon, Mar 30, 2026 at 03:02:01PM +0800, Miaohe Lin wrote:
[...snip...]
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -941,6 +941,8 @@ static int truncate_error_folio(struct folio *folio, unsigned long pfn,
> >  	if (mapping->a_ops->error_remove_folio) {
> >  		int err = mapping->a_ops->error_remove_folio(mapping, folio);
> >  
> > +		if (err == MF_DELAYED)
> > +			return err;
> 
> Will it be better to add a pr_info here to provide some information for users?
> 
> Thanks.
> .
I think we don't need to add pr_info here; truncate_error_folio() always
leads to action_result, which already logs the recovery status.

Lisa

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-04-03 22:31 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19 23:30 [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
2026-03-19 23:30 ` [PATCH RFC v2 1/7] mm: memory_failure: Clarify the MF_DELAYED definition Lisa Wang
2026-03-22 21:34   ` Jiaqi Yan
2026-03-23 21:18     ` Lisa Wang
2026-03-19 23:30 ` [PATCH RFC v2 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED Lisa Wang
2026-03-30  7:02   ` Miaohe Lin
2026-04-03 22:31     ` Lisa Wang
2026-03-19 23:30 ` [PATCH RFC v2 3/7] mm: shmem: Update shmem handler to the MF_DELAYED definition Lisa Wang
2026-03-19 23:30 ` [PATCH RFC v2 4/7] mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases Lisa Wang
2026-03-19 23:30 ` [PATCH RFC v2 5/7] mm: selftests: Add shmem memory failure test Lisa Wang
2026-03-21  6:30   ` Baolin Wang
2026-03-24  0:43     ` Lisa Wang
2026-03-24 12:36       ` Baolin Wang
2026-03-28  0:40         ` Lisa Wang
2026-03-30  7:12           ` Miaohe Lin
2026-03-19 23:30 ` [PATCH RFC v2 6/7] KVM: selftests: Add memory failure tests in guest_memfd_test Lisa Wang
2026-03-30  7:20   ` Miaohe Lin
2026-03-19 23:30 ` [PATCH RFC v2 7/7] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables Lisa Wang
2026-03-20  2:39 ` [PATCH RFC v2 0/7] mm: Fix MF_DELAYED handling on memory failure Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox