* [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure
@ 2026-06-02 21:55 Lisa Wang
2026-06-02 21:55 ` [PATCH v4 1/7] mm: memory_failure: Clarify the MF_DELAYED definition Lisa Wang
` (7 more replies)
0 siblings, 8 replies; 14+ messages in thread
From: Lisa Wang @ 2026-06-02 21:55 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen, Lisa Wang
Here's a fourth revision to fix MF_DELAYED handling on memory failure.
This patch series addresses an issue in the memory failure handling path
where MF_DELAYED is incorrectly treated as an error. This issue was
discovered while testing memory failure handling for guest_memfd.
The proposed solution involves -
1. Clarifying the definition of MF_DELAYED to mean that memory failure
handling is only partially completed, and that the metadata for the
memory that failed (as in struct page/folio) is still referenced.
2. Updating shmem’s handling to align with the clarified definition.
3. Updating how the result of .error_remove_folio() is interpreted.
Changes from v3:
+ Split an independent guest_memfd_memory_failure_test, as suggested by
Ackerley and Sean
+ Align error logging style in truncate_error_folio, as suggested by
Miaohe and David
+ Verify a clean shmem page can be read successfully after soft-offline
memory failure, as suggested by Miaohe
Thanks!
+ RFC v3: https://lore.kernel.org/all/20260408-memory-failure-mf-delayed-fix-rfc-v3-v3-0-718f45eb7c75@google.com/
Signed-off-by: Lisa Wang <wyihan@google.com>
---
Lisa Wang (7):
mm: memory_failure: Clarify the MF_DELAYED definition
mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED
mm: shmem: Update shmem handler to the MF_DELAYED definition
mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases
mm: selftests: Add shmem into memory failure test
KVM: selftests: Add the guest_memfd memory failure test
KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables
mm/memory-failure.c | 29 +-
mm/shmem.c | 2 +-
tools/testing/selftests/kvm/Makefile.kvm | 2 +
.../kvm/guest_memfd_memory_failure_test.c | 402 +++++++++++++++++++++
tools/testing/selftests/mm/memory-failure.c | 111 +++++-
5 files changed, 527 insertions(+), 19 deletions(-)
---
base-commit: 38741a8e3bc1b809d64f8c8885ab15c3e40700ff
change-id: 20260527-memory-failure-mf-delayed-fix-7d5a8f4a8a8b
Best regards,
--
Lisa Wang <wyihan@google.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 1/7] mm: memory_failure: Clarify the MF_DELAYED definition
2026-06-02 21:55 [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
@ 2026-06-02 21:55 ` Lisa Wang
2026-06-05 11:30 ` David Hildenbrand (Arm)
2026-06-02 21:55 ` [PATCH v4 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED Lisa Wang
` (6 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Lisa Wang @ 2026-06-02 21:55 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen, Lisa Wang
This patch clarifies the definition of MF_DELAYED to represent cases
where a folio's removal is initiated but not immediately completed
(e.g., due to remaining metadata references).
Signed-off-by: Lisa Wang <wyihan@google.com>
---
mm/memory-failure.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d4361309..2e53b3024391 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -849,24 +849,25 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
}
/*
- * MF_IGNORED - The m-f() handler marks the page as PG_hwpoisoned'ed.
+ * MF_IGNORED - The m-f() handler marks the page as PG_hwpoison'ed.
* But it could not do more to isolate the page from being accessed again,
* nor does it kill the process. This is extremely rare and one of the
* potential causes is that the page state has been changed due to
* underlying race condition. This is the most severe outcomes.
*
- * MF_FAILED - The m-f() handler marks the page as PG_hwpoisoned'ed.
+ * MF_FAILED - The m-f() handler marks the page as PG_hwpoison'ed.
* It should have killed the process, but it can't isolate the page,
* due to conditions such as extra pin, unmap failure, etc. Accessing
* the page again may trigger another MCE and the process will be killed
* by the m-f() handler immediately.
*
- * MF_DELAYED - The m-f() handler marks the page as PG_hwpoisoned'ed.
- * The page is unmapped, and is removed from the LRU or file mapping.
- * An attempt to access the page again will trigger page fault and the
- * PF handler will kill the process.
+ * MF_DELAYED - The m-f() handler marks the page as PG_hwpoison'ed.
+ * It means the page was unmapped and partially isolated (e.g. removed from
+ * file mapping or the LRU) but full cleanup is deferred (e.g. the metadata
+ * for the memory, as in struct page/folio, is still referenced). Any
+ * further access to the page will result in the process being killed.
*
- * MF_RECOVERED - The m-f() handler marks the page as PG_hwpoisoned'ed.
+ * MF_RECOVERED - The m-f() handler marks the page as PG_hwpoison'ed.
* The page has been completely isolated, that is, unmapped, taken out of
* the buddy system, or hole-punched out of the file mapping.
*/
--
2.54.0.1013.g208068f2d8-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED
2026-06-02 21:55 [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
2026-06-02 21:55 ` [PATCH v4 1/7] mm: memory_failure: Clarify the MF_DELAYED definition Lisa Wang
@ 2026-06-02 21:55 ` Lisa Wang
2026-06-05 11:32 ` David Hildenbrand (Arm)
2026-06-02 21:55 ` [PATCH v4 3/7] mm: shmem: Update shmem handler to the MF_DELAYED definition Lisa Wang
` (5 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Lisa Wang @ 2026-06-02 21:55 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen, Lisa Wang
The .error_remove_folio a_ops is used by different filesystems to handle
folio truncation upon discovery of a memory failure in the memory
associated with the given folio.
Currently, MF_DELAYED is treated as an error, causing "Failed to punch
page" to be written to the console. MF_DELAYED is then relayed to the
caller of truncate_error_folio() as MF_FAILED. This further causes
memory_failure() to return -EBUSY, which then always causes a SIGBUS.
This is also implies that regardless of whether the thread's memory
corruption kill policy is PR_MCE_KILL_EARLY or PR_MCE_KILL_LATE, a
memory failure with MF_DELAYED will always cause a SIGBUS.
Update truncate_error_folio() to return MF_DELAYED to the caller if the
.error_remove_folio() callback reports MF_DELAYED.
Signed-off-by: Lisa Wang <wyihan@google.com>
---
mm/memory-failure.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2e53b3024391..3aff0c981fcd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -941,10 +941,12 @@ static int truncate_error_folio(struct folio *folio, unsigned long pfn,
if (mapping->a_ops->error_remove_folio) {
int err = mapping->a_ops->error_remove_folio(mapping, folio);
- if (err != 0)
+ if (err == MF_DELAYED)
+ ret = err;
+ else if (err != 0)
pr_info("%#lx: Failed to punch page: %d\n", pfn, err);
else if (!filemap_release_folio(folio, GFP_NOIO))
- pr_info("%#lx: failed to release buffers\n", pfn);
+ pr_info("%#lx: Failed to release buffers\n", pfn);
else
ret = MF_RECOVERED;
} else {
--
2.54.0.1013.g208068f2d8-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 3/7] mm: shmem: Update shmem handler to the MF_DELAYED definition
2026-06-02 21:55 [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
2026-06-02 21:55 ` [PATCH v4 1/7] mm: memory_failure: Clarify the MF_DELAYED definition Lisa Wang
2026-06-02 21:55 ` [PATCH v4 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED Lisa Wang
@ 2026-06-02 21:55 ` Lisa Wang
2026-06-05 11:35 ` David Hildenbrand (Arm)
2026-06-02 21:55 ` [PATCH v4 4/7] mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases Lisa Wang
` (4 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Lisa Wang @ 2026-06-02 21:55 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen, Lisa Wang
To align with the definition of MF_DELAYED, update
shmem_error_remove_folio() to return MF_DELAYED.
shmem handles memory failures but defers the actual file truncation. The
function's return value should therefore be MF_DELAYED to accurately
reflect the state.
Currently, this logical error does not cause a bug, because:
- For shmem folios, folio->private is not set.
- As a result, filemap_release_folio() is a no-op and returns true.
- This, in turn, causes truncate_error_folio() to incorrectly return
MF_RECOVERED.
- The caller then treats MF_RECOVERED as a success condition, masking the
issue.
The previous patch relays MF_DELAYED to the caller of
truncate_error_folio() before any logging, so returning MF_DELAYED from
shmem_error_remove_folio() will retain the original behavior of not
adding any logs.
The return value of truncate_error_folio() is consumed in action_result(),
which treats MF_DELAYED the same way as MF_RECOVERED, hence action_result()
also returns the same thing after this change.
Signed-off-by: Lisa Wang <wyihan@google.com>
---
mm/shmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index b40f3cd48961..fd8f90540361 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -5207,7 +5207,7 @@ static void __init shmem_destroy_inodecache(void)
static int shmem_error_remove_folio(struct address_space *mapping,
struct folio *folio)
{
- return 0;
+ return MF_DELAYED;
}
static const struct address_space_operations shmem_aops = {
--
2.54.0.1013.g208068f2d8-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 4/7] mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases
2026-06-02 21:55 [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
` (2 preceding siblings ...)
2026-06-02 21:55 ` [PATCH v4 3/7] mm: shmem: Update shmem handler to the MF_DELAYED definition Lisa Wang
@ 2026-06-02 21:55 ` Lisa Wang
2026-06-05 11:35 ` David Hildenbrand (Arm)
2026-06-02 21:55 ` [PATCH v4 5/7] mm: selftests: Add shmem into memory failure test Lisa Wang
` (3 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Lisa Wang @ 2026-06-02 21:55 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen, Lisa Wang
Generalize extra_pins handling to all MF_DELAYED cases not only
shmem_mapping.
If MF_DELAYED is returned, the filemap continues to hold refcounts on the
folio. Hence, take that into account when checking for extra refcounts.
As clarified in an earlier patch, a return value of MF_DELAYED implies that
the page still has elevated refcounts. Hence, set extra_pins to true if the
return value is MF_DELAYED. This is aligned with the implementation in
me_swapcache_dirty(), where, if a folio is still in the swap cache, ret is
set to MF_DELAYED and extra_pins is set to true.
Signed-off-by: Lisa Wang <wyihan@google.com>
---
mm/memory-failure.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3aff0c981fcd..dcc56bedcb28 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1052,18 +1052,14 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
goto out;
}
- /*
- * The shmem page is kept in page cache instead of truncating
- * so is expected to have an extra refcount after error-handling.
- */
- extra_pins = shmem_mapping(mapping);
-
/*
* Truncation is a bit tricky. Enable it per file system for now.
*
* Open: to take i_rwsem or not for this? Right now we don't.
*/
ret = truncate_error_folio(folio, page_to_pfn(p), mapping);
+
+ extra_pins = ret == MF_DELAYED;
if (has_extra_refcount(ps, p, extra_pins))
ret = MF_FAILED;
--
2.54.0.1013.g208068f2d8-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 5/7] mm: selftests: Add shmem into memory failure test
2026-06-02 21:55 [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
` (3 preceding siblings ...)
2026-06-02 21:55 ` [PATCH v4 4/7] mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases Lisa Wang
@ 2026-06-02 21:55 ` Lisa Wang
2026-06-05 11:38 ` David Hildenbrand (Arm)
2026-06-02 21:55 ` [PATCH v4 6/7] KVM: selftests: Add the guest_memfd " Lisa Wang
` (2 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Lisa Wang @ 2026-06-02 21:55 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen, Lisa Wang
Add a shmem memory failure selftest to test the shmem memory failure is
correct after modifying shmem return value.
Specifically, test the expected behavior under various scenarios
combining page dirtiness (dirty vs clean) and failure types (hard vs
soft):
+ Dirty + Hard: Trigger a SIGBUS on injection, and trigger another
SIGBUS when reading the page again.
+ Dirty + Soft: No SIGBUS is triggered, and the original value can be
read successfully.
+ Clean + Hard: No SIGBUS is triggered on injection, but trigger a
SIGBUS when trying to read the page again.
+ Clean + Soft: No SIGBUS is triggered, and the page can be read
successfully.
Signed-off-by: Lisa Wang <wyihan@google.com>
---
tools/testing/selftests/mm/memory-failure.c | 111 +++++++++++++++++++++++++++-
1 file changed, 108 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/memory-failure.c b/tools/testing/selftests/mm/memory-failure.c
index 3d9e0b9ffb41..43949b3b3565 100644
--- a/tools/testing/selftests/mm/memory-failure.c
+++ b/tools/testing/selftests/mm/memory-failure.c
@@ -30,9 +30,14 @@ enum result_type {
MADV_HARD_ANON,
MADV_HARD_CLEAN_PAGECACHE,
MADV_HARD_DIRTY_PAGECACHE,
+ MADV_HARD_CLEAN_SHMEM,
+ MADV_HARD_DIRTY_SHMEM,
MADV_SOFT_ANON,
MADV_SOFT_CLEAN_PAGECACHE,
MADV_SOFT_DIRTY_PAGECACHE,
+ MADV_SOFT_CLEAN_SHMEM,
+ MADV_SOFT_DIRTY_SHMEM,
+ READ_ERROR,
};
static jmp_buf signal_jmp_buf;
@@ -165,17 +170,21 @@ static void check(struct __test_metadata *_metadata, FIXTURE_DATA(memory_failure
case MADV_HARD_CLEAN_PAGECACHE:
case MADV_SOFT_CLEAN_PAGECACHE:
case MADV_SOFT_DIRTY_PAGECACHE:
- /* It is not expected to receive a SIGBUS signal. */
- ASSERT_EQ(setjmp, 0);
-
+ case MADV_SOFT_DIRTY_SHMEM:
/* The page content should remain unchanged. */
ASSERT_TRUE(check_memory(vaddr, self->page_size));
+ case MADV_HARD_CLEAN_SHMEM:
+ case MADV_SOFT_CLEAN_SHMEM:
+ /* It is not expected to receive a SIGBUS signal. */
+ ASSERT_EQ(setjmp, 0);
/* The backing pfn of addr should have changed. */
ASSERT_NE(pagemap_get_pfn(self->pagemap_fd, vaddr), self->pfn);
break;
case MADV_HARD_ANON:
case MADV_HARD_DIRTY_PAGECACHE:
+ case MADV_HARD_DIRTY_SHMEM:
+ case READ_ERROR:
/* The SIGBUS signal should have been received. */
ASSERT_EQ(setjmp, 1);
@@ -260,6 +269,20 @@ static int prepare_file(const char *fname, unsigned long size)
return fd;
}
+static int prepare_shmem(const char *fname, unsigned long size)
+{
+ int fd;
+
+ fd = memfd_create(fname, 0);
+ if (fd < 0)
+ return -1;
+ if (ftruncate(fd, size) < 0) {
+ close(fd);
+ return -1;
+ }
+ return fd;
+}
+
/* Borrowed from mm/gup_longterm.c. */
static int get_fs_type(int fd)
{
@@ -356,4 +379,86 @@ TEST_F(memory_failure, dirty_pagecache)
ASSERT_EQ(close(fd), 0);
}
+TEST_F(memory_failure, dirty_shmem)
+{
+ int fd;
+ char *addr;
+ int ret;
+
+ fd = prepare_shmem("shmem-file", self->page_size);
+ if (fd < 0)
+ SKIP(return, "failed to open test shmem-file.\n");
+
+ addr = mmap(0, self->page_size, PROT_READ | PROT_WRITE,
+ MAP_SHARED, fd, 0);
+ if (addr == MAP_FAILED) {
+ close(fd);
+ SKIP(return, "mmap failed, not enough memory.\n");
+ }
+ memset(addr, 0xce, self->page_size);
+
+ prepare(_metadata, self, addr);
+
+ ret = sigsetjmp(signal_jmp_buf, 1);
+ if (ret == 0)
+ ASSERT_EQ(variant->inject(self, addr), 0);
+
+ if (variant->type == MADV_HARD) {
+ check(_metadata, self, addr, MADV_HARD_DIRTY_SHMEM, ret);
+ ret = sigsetjmp(signal_jmp_buf, 1);
+ if (ret == 0)
+ FORCE_READ(*addr);
+ check(_metadata, self, addr, READ_ERROR, ret);
+ } else {
+ check(_metadata, self, addr, MADV_SOFT_DIRTY_SHMEM, ret);
+ }
+
+ ASSERT_EQ(munmap(addr, self->page_size), 0);
+
+ ASSERT_EQ(close(fd), 0);
+ cleanup(_metadata, self, addr);
+}
+
+TEST_F(memory_failure, clean_shmem)
+{
+ int fd;
+ char *addr;
+ int ret;
+
+ fd = prepare_shmem("shmem-file", self->page_size);
+ if (fd < 0)
+ SKIP(return, "failed to open test shmem-file.\n");
+
+ addr = mmap(0, self->page_size, PROT_READ | PROT_WRITE,
+ MAP_SHARED, fd, 0);
+ if (addr == MAP_FAILED) {
+ close(fd);
+ SKIP(return, "mmap failed, not enough memory.\n");
+ }
+ FORCE_READ(*addr);
+
+ prepare(_metadata, self, addr);
+
+ ret = sigsetjmp(signal_jmp_buf, 1);
+ if (ret == 0)
+ ASSERT_EQ(variant->inject(self, addr), 0);
+
+ if (variant->type == MADV_HARD) {
+ check(_metadata, self, addr, MADV_HARD_CLEAN_SHMEM, ret);
+ ret = sigsetjmp(signal_jmp_buf, 1);
+ if (ret == 0)
+ FORCE_READ(*addr);
+ check(_metadata, self, addr, READ_ERROR, ret);
+ } else {
+ /* Test the address accessability without check_memory(). */
+ FORCE_READ(*addr);
+ check(_metadata, self, addr, MADV_SOFT_CLEAN_SHMEM, ret);
+ }
+
+ ASSERT_EQ(munmap(addr, self->page_size), 0);
+
+ ASSERT_EQ(close(fd), 0);
+ cleanup(_metadata, self, addr);
+}
+
TEST_HARNESS_MAIN
--
2.54.0.1013.g208068f2d8-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 6/7] KVM: selftests: Add the guest_memfd memory failure test
2026-06-02 21:55 [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
` (4 preceding siblings ...)
2026-06-02 21:55 ` [PATCH v4 5/7] mm: selftests: Add shmem into memory failure test Lisa Wang
@ 2026-06-02 21:55 ` Lisa Wang
2026-06-02 21:55 ` [PATCH v4 7/7] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables Lisa Wang
2026-06-03 20:48 ` [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Andrew Morton
7 siblings, 0 replies; 14+ messages in thread
From: Lisa Wang @ 2026-06-02 21:55 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen, Lisa Wang
After modifying truncate_error_folio(), we expect memory_failure() will
return 0 instead of MF_FAILED. Also, we want to make sure
memory_failure() signaling function is same.
Test that memory_failure() returns 0 for guest_memfd, where
.error_remove_folio() is handled by not actually truncating, and
returning MF_DELAYED.
In addition, test that SIGBUS signaling behavior is not changed before
and after this modification.
There are two kinds of guest memory failure injections - madvise or
debugfs. When memory failure is injected using madvise, the
MF_ACTION_REQUIRED flag is set, and the page is mapped and dirty, the
process should get a SIGBUS. When memory is failure is injected using
debugfs, the KILL_EARLY machine check memory corruption kill policy is
set, and the page is mapped and dirty, the process should get a SIGBUS.
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Lisa Wang <wyihan@google.com>
---
tools/testing/selftests/kvm/Makefile.kvm | 2 +
.../kvm/guest_memfd_memory_failure_test.c | 336 +++++++++++++++++++++
2 files changed, 338 insertions(+)
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index fdec90e85467..9409ded6cbce 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -146,6 +146,7 @@ TEST_GEN_PROGS_x86 += access_tracking_perf_test
TEST_GEN_PROGS_x86 += coalesced_io_test
TEST_GEN_PROGS_x86 += dirty_log_perf_test
TEST_GEN_PROGS_x86 += guest_memfd_test
+TEST_GEN_PROGS_x86 += guest_memfd_memory_failure_test
TEST_GEN_PROGS_x86 += hardware_disable_test
TEST_GEN_PROGS_x86 += memslot_modification_stress_test
TEST_GEN_PROGS_x86 += memslot_perf_test
@@ -186,6 +187,7 @@ TEST_GEN_PROGS_arm64 += coalesced_io_test
TEST_GEN_PROGS_arm64 += dirty_log_perf_test
TEST_GEN_PROGS_arm64 += get-reg-list
TEST_GEN_PROGS_arm64 += guest_memfd_test
+TEST_GEN_PROGS_arm64 += guest_memfd_memory_failure_test
TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
TEST_GEN_PROGS_arm64 += memslot_perf_test
TEST_GEN_PROGS_arm64 += mmu_stress_test
diff --git a/tools/testing/selftests/kvm/guest_memfd_memory_failure_test.c b/tools/testing/selftests/kvm/guest_memfd_memory_failure_test.c
new file mode 100644
index 000000000000..6c8032d390ae
--- /dev/null
+++ b/tools/testing/selftests/kvm/guest_memfd_memory_failure_test.c
@@ -0,0 +1,336 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright Intel Corporation, 2026
+ *
+ * Author: Ackerley Tng <ackerleytng@google.com>
+ * Author: Lisa Wang <wyihan@google.com>
+ */
+
+#define _GNU_SOURCE
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <linux/prctl.h>
+#include <sys/prctl.h>
+
+#include <linux/bitmap.h>
+#include <linux/falloc.h>
+#include <linux/sizes.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include "kvm_util.h"
+#include "test_util.h"
+#include "kselftest_harness.h"
+
+static size_t page_size, total_size;
+
+enum memory_failure_injection_method {
+ MF_INJECT_DEBUGFS,
+ MF_INJECT_MADVISE,
+};
+
+FIXTURE(guest_memfd_failure) {
+ struct kvm_vm *vm;
+ int fd;
+ unsigned long poisoned_pfn;
+};
+
+FIXTURE_VARIANT(guest_memfd_failure) {
+ enum memory_failure_injection_method method;
+ int kill_config;
+ bool map_page;
+ bool dirty_page;
+ bool sigbus_expected;
+ int return_code;
+};
+
+FIXTURE_VARIANT_ADD(guest_memfd_failure, debugfs_early_dirty) {
+ .method = MF_INJECT_DEBUGFS,
+ .kill_config = PR_MCE_KILL_EARLY,
+ .map_page = true,
+ .dirty_page = true,
+ .sigbus_expected = true,
+ .return_code = 0,
+};
+
+FIXTURE_VARIANT_ADD(guest_memfd_failure, debugfs_early_clean) {
+ .method = MF_INJECT_DEBUGFS,
+ .kill_config = PR_MCE_KILL_EARLY,
+ .map_page = true,
+ .dirty_page = false,
+ .sigbus_expected = false,
+ .return_code = 0,
+};
+
+FIXTURE_VARIANT_ADD(guest_memfd_failure, debugfs_early_unmapped) {
+ .method = MF_INJECT_DEBUGFS,
+ .kill_config = PR_MCE_KILL_EARLY,
+ .map_page = false,
+ .dirty_page = true,
+ .sigbus_expected = false,
+ .return_code = 0,
+};
+
+FIXTURE_VARIANT_ADD(guest_memfd_failure, debugfs_late_dirty) {
+ .method = MF_INJECT_DEBUGFS,
+ .kill_config = PR_MCE_KILL_LATE,
+ .map_page = true,
+ .dirty_page = true,
+ .sigbus_expected = false,
+ .return_code = 0,
+};
+
+FIXTURE_VARIANT_ADD(guest_memfd_failure, debugfs_late_clean) {
+ .method = MF_INJECT_DEBUGFS,
+ .kill_config = PR_MCE_KILL_LATE,
+ .map_page = true,
+ .dirty_page = false,
+ .sigbus_expected = false,
+ .return_code = 0,
+};
+
+FIXTURE_VARIANT_ADD(guest_memfd_failure, debugfs_late_unmapped) {
+ .method = MF_INJECT_DEBUGFS,
+ .kill_config = PR_MCE_KILL_LATE,
+ .map_page = false,
+ .dirty_page = true,
+ .sigbus_expected = false,
+ .return_code = 0,
+};
+
+FIXTURE_VARIANT_ADD(guest_memfd_failure, madvise_dirty) {
+ .method = MF_INJECT_MADVISE,
+ .kill_config = PR_MCE_KILL_DEFAULT,
+ .map_page = true,
+ .dirty_page = true,
+ .sigbus_expected = true,
+ .return_code = 0,
+};
+
+FIXTURE_VARIANT_ADD(guest_memfd_failure, madvise_clean) {
+ .method = MF_INJECT_MADVISE,
+ .kill_config = PR_MCE_KILL_DEFAULT,
+ .map_page = true,
+ .dirty_page = false,
+ .sigbus_expected = false,
+ .return_code = 0,
+};
+
+FIXTURE_SETUP(guest_memfd_failure)
+{
+ self->vm = NULL;
+ self->fd = -1;
+ self->poisoned_pfn = 0;
+}
+
+static void write_memory_failure(unsigned long pfn, bool mark, int expected_return_code)
+{
+ char path[PATH_MAX];
+ char *filename;
+ char buf[20];
+ int ret;
+ int len;
+ int fd;
+
+ filename = mark ? "corrupt-pfn" : "unpoison-pfn";
+ snprintf(path, PATH_MAX, "/sys/kernel/debug/hwpoison/%s", filename);
+
+ fd = open(path, O_WRONLY);
+ TEST_ASSERT(fd >= 0, "Failed to open %s.", path);
+
+ len = snprintf(buf, sizeof(buf), "0x%lx\n", pfn);
+ if (len < 0 || (unsigned int)len >= sizeof(buf))
+ TEST_ASSERT(0, "snprintf failed or truncated.");
+
+ ret = write(fd, buf, len);
+ if (expected_return_code == 0) {
+ /*
+ * If the memory_failure() returns 0, write() should be successful,
+ * which returns how many bytes it writes.
+ */
+ TEST_ASSERT(ret > 0, "Writing memory failure (path: %s) failed: %s", path,
+ strerror(errno));
+ } else {
+ TEST_ASSERT_EQ(ret, -1);
+ /* errno is memory_failure() return code. */
+ TEST_ASSERT_EQ(errno, expected_return_code);
+ }
+
+ close(fd);
+}
+
+static void mark_memory_failure(unsigned long pfn, int expected_return_code)
+{
+ write_memory_failure(pfn, true, expected_return_code);
+}
+
+static void unmark_memory_failure(unsigned long pfn, int expected_return_code)
+{
+ write_memory_failure(pfn, false, expected_return_code);
+}
+
+FIXTURE_TEARDOWN(guest_memfd_failure)
+{
+ if (self->fd >= 0)
+ close(self->fd);
+ if (self->vm)
+ kvm_vm_free(self->vm);
+ if (self->poisoned_pfn)
+ unmark_memory_failure(self->poisoned_pfn, 0);
+}
+
+static unsigned long addr_to_pfn(void *addr)
+{
+ const uint64_t pagemap_pfn_mask = BIT(54) - 1;
+ const uint64_t pagemap_page_present = BIT(63);
+ uint64_t page_info;
+ ssize_t n_bytes;
+ int pagemap_fd;
+
+ pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
+ TEST_ASSERT(pagemap_fd >= 0, "Opening pagemap should succeed.");
+
+ n_bytes = pread(pagemap_fd, &page_info, 8, (uint64_t)addr / page_size * 8);
+ TEST_ASSERT(n_bytes == 8, "pread of pagemap failed. n_bytes=%ld", n_bytes);
+
+ close(pagemap_fd);
+
+ TEST_ASSERT(page_info & pagemap_page_present, "The page for addr should be present");
+ return page_info & pagemap_pfn_mask;
+}
+
+static void do_test_memory_failure(FIXTURE_DATA(guest_memfd_failure) * self,
+ const FIXTURE_VARIANT(guest_memfd_failure) * variant)
+{
+ unsigned long memory_failure_pfn;
+ char *memory_failure_addr;
+ char *mem;
+ int ret;
+
+ mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, self->fd, 0);
+ TEST_ASSERT(mem != MAP_FAILED, "mmap() for guest_memfd should succeed.");
+ memory_failure_addr = mem + page_size;
+ if (variant->dirty_page)
+ *memory_failure_addr = 'A';
+ else
+ READ_ONCE(*memory_failure_addr);
+
+ /* Fault in page to read pfn, then unmap page for testing if needed. */
+ memory_failure_pfn = addr_to_pfn(memory_failure_addr);
+ if (!variant->map_page)
+ madvise(memory_failure_addr, page_size, MADV_DONTNEED);
+
+ ret = prctl(PR_MCE_KILL, PR_MCE_KILL_SET, variant->kill_config, 0, 0);
+ TEST_ASSERT_EQ(ret, 0);
+
+ self->poisoned_pfn = memory_failure_pfn;
+
+ ret = 0;
+ switch (variant->method) {
+ case MF_INJECT_DEBUGFS: {
+ /* DEBUGFS injection handles return_code test inside the mark_memory_failure(). */
+ if (variant->sigbus_expected)
+ TEST_EXPECT_SIGBUS(mark_memory_failure(memory_failure_pfn,
+ variant->return_code));
+ else
+ mark_memory_failure(memory_failure_pfn, variant->return_code);
+ break;
+ }
+ case MF_INJECT_MADVISE: {
+ /*
+ * MADV_HWPOISON uses get_user_pages() so the page will always
+ * be faulted in at the point of memory_failure()
+ */
+ if (variant->sigbus_expected)
+ TEST_EXPECT_SIGBUS(ret = madvise(memory_failure_addr,
+ page_size, MADV_HWPOISON));
+ else
+ ret = madvise(memory_failure_addr, page_size, MADV_HWPOISON);
+
+ if (variant->return_code == 0)
+ TEST_ASSERT(ret == variant->return_code, "Memory failure failed. Errno: %s",
+ strerror(errno));
+ else {
+ /* errno is memory_failure() return code. */
+ TEST_ASSERT_EQ(errno, variant->return_code);
+ }
+ break;
+ }
+ default:
+ TEST_FAIL("Unhandled memory failure injection method %d.", variant->method);
+ }
+
+ TEST_EXPECT_SIGBUS(READ_ONCE(*memory_failure_addr));
+ TEST_EXPECT_SIGBUS(*memory_failure_addr = 'A');
+
+ ret = munmap(mem, total_size);
+ TEST_ASSERT(!ret, "munmap() should succeed.");
+
+ ret = fallocate(self->fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, total_size);
+ TEST_ASSERT(!ret, "Truncate the entire file (cleanup) should succeed.");
+
+ ret = prctl(PR_MCE_KILL, PR_MCE_KILL_SET, PR_MCE_KILL_DEFAULT, 0, 0);
+ TEST_ASSERT_EQ(ret, 0);
+
+ unmark_memory_failure(memory_failure_pfn, 0);
+ self->poisoned_pfn = 0;
+}
+
+TEST_F(guest_memfd_failure, test_memory_failure)
+{
+ unsigned long vm_types, vm_type;
+
+ total_size = page_size * 4;
+
+ vm_types = kvm_check_cap(KVM_CAP_VM_TYPES);
+ if (!vm_types)
+ vm_types = BIT(VM_TYPE_DEFAULT);
+
+ for_each_set_bit(vm_type, &vm_types, BITS_PER_TYPE(vm_types)) {
+ uint64_t flags;
+
+ self->vm = vm_create_barebones_type(vm_type);
+ flags = vm_check_cap(self->vm, KVM_CAP_GUEST_MEMFD_FLAGS);
+ if (!(flags & GUEST_MEMFD_FLAG_INIT_SHARED)) {
+ kvm_vm_free(self->vm);
+ self->vm = NULL;
+ continue;
+ }
+
+ self->fd = vm_create_guest_memfd(self->vm, total_size,
+ GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED);
+ ASSERT_GE(self->fd, 0) TH_LOG("vm_create_guest_memfd failed");
+
+ do_test_memory_failure(self, variant);
+
+ close(self->fd);
+ self->fd = -1;
+ kvm_vm_free(self->vm);
+ self->vm = NULL;
+ }
+}
+
+static bool can_inject_memory_failure(void)
+{
+ int fd;
+
+ fd = open("/sys/kernel/debug/hwpoison/corrupt-pfn", O_WRONLY);
+ if (fd < 0)
+ return false;
+
+ close(fd);
+ return true;
+}
+
+int main(int argc, char **argv)
+{
+ TEST_REQUIRE(kvm_check_cap(KVM_CAP_GUEST_MEMFD_FLAGS) & GUEST_MEMFD_FLAG_INIT_SHARED);
+ __TEST_REQUIRE(can_inject_memory_failure(),
+ "Insufficient permissions to access hwpoison debugfs (requires CAP_SYS_ADMIN / root))");
+ page_size = getpagesize();
+
+ return test_harness_run(argc, argv);
+}
--
2.54.0.1013.g208068f2d8-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 7/7] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables
2026-06-02 21:55 [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
` (5 preceding siblings ...)
2026-06-02 21:55 ` [PATCH v4 6/7] KVM: selftests: Add the guest_memfd " Lisa Wang
@ 2026-06-02 21:55 ` Lisa Wang
2026-06-03 20:48 ` [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Andrew Morton
7 siblings, 0 replies; 14+ messages in thread
From: Lisa Wang @ 2026-06-02 21:55 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Paolo Bonzini,
Shuah Khan, Hugh Dickins, Baolin Wang, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen, Lisa Wang
Test that
+ memory failure handling results in unmapping of bad memory from stage
2 page tables, hence requiring faulting on next guest access
+ when the guest tries to fault a poisoned page from guest_memfd, the
userspace VMM informed with EHWPOISON
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Lisa Wang <wyihan@google.com>
---
.../kvm/guest_memfd_memory_failure_test.c | 66 ++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/tools/testing/selftests/kvm/guest_memfd_memory_failure_test.c b/tools/testing/selftests/kvm/guest_memfd_memory_failure_test.c
index 6c8032d390ae..e6f4c327bd5a 100644
--- a/tools/testing/selftests/kvm/guest_memfd_memory_failure_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_memory_failure_test.c
@@ -24,6 +24,7 @@
#include "kvm_util.h"
#include "test_util.h"
#include "kselftest_harness.h"
+#include "ucall_common.h"
static size_t page_size, total_size;
@@ -313,6 +314,71 @@ TEST_F(guest_memfd_failure, test_memory_failure)
}
}
+static void __guest_code_read(uint64_t gpa)
+{
+ uint8_t *mem = (uint8_t *)gpa;
+
+ READ_ONCE(*mem);
+ GUEST_SYNC(0);
+ READ_ONCE(*mem);
+ GUEST_DONE();
+}
+
+static void guest_read(struct kvm_vcpu *vcpu, int expected_errno)
+{
+ if (expected_errno) {
+ TEST_ASSERT_EQ(_vcpu_run(vcpu), -1);
+ TEST_ASSERT_EQ(errno, expected_errno);
+ } else {
+ vcpu_run(vcpu);
+ TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_SYNC);
+ }
+}
+
+TEST_F(guest_memfd_failure, test_memory_failure_guest)
+{
+ const uint64_t gpa = SZ_4G;
+ const int slot = 1;
+
+ unsigned long memory_failure_pfn;
+ struct kvm_vcpu *vcpu;
+ uint8_t *mem;
+
+ /* Limit guest test execution to a single variant to avoid redundant runs. */
+ if (variant->method != MF_INJECT_DEBUGFS ||
+ variant->kill_config != PR_MCE_KILL_EARLY ||
+ !variant->map_page || !variant->dirty_page)
+ return;
+
+ self->vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1, __guest_code_read);
+ vcpu_args_set(vcpu, 1, gpa);
+
+ self->fd = vm_create_guest_memfd(self->vm, self->vm->page_size,
+ GUEST_MEMFD_FLAG_MMAP |
+ GUEST_MEMFD_FLAG_INIT_SHARED);
+ vm_set_user_memory_region2(self->vm, slot, KVM_MEM_GUEST_MEMFD, gpa,
+ self->vm->page_size, NULL, self->fd, 0);
+
+ mem = mmap(NULL, self->vm->page_size, PROT_READ | PROT_WRITE,
+ MAP_SHARED, self->fd, 0);
+ TEST_ASSERT(mem != MAP_FAILED, "mmap() for guest_memfd should succeed.");
+ virt_pg_map(self->vm, gpa, gpa);
+
+ /* Fault in page to read pfn, then unmap page for testing. */
+ READ_ONCE(*mem);
+
+ memory_failure_pfn = addr_to_pfn(mem);
+ munmap(mem, self->vm->page_size);
+
+ /* Fault page into stage2 page tables. */
+ guest_read(vcpu, 0);
+
+ self->poisoned_pfn = memory_failure_pfn;
+ mark_memory_failure(memory_failure_pfn, 0);
+
+ guest_read(vcpu, EHWPOISON);
+}
+
static bool can_inject_memory_failure(void)
{
int fd;
--
2.54.0.1013.g208068f2d8-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure
2026-06-02 21:55 [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
` (6 preceding siblings ...)
2026-06-02 21:55 ` [PATCH v4 7/7] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables Lisa Wang
@ 2026-06-03 20:48 ` Andrew Morton
7 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2026-06-03 20:48 UTC (permalink / raw)
To: Lisa Wang
Cc: Miaohe Lin, Naoya Horiguchi, Paolo Bonzini, Shuah Khan,
Hugh Dickins, Baolin Wang, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest, rientjes, seanjc, ackerleytng, vannapurve,
michael.roth, jiaqiyan, tabba, dave.hansen
On Tue, 02 Jun 2026 21:55:40 +0000 Lisa Wang <wyihan@google.com> wrote:
> Here's a fourth revision to fix MF_DELAYED handling on memory failure.
>
> This patch series addresses an issue in the memory failure handling path
> where MF_DELAYED is incorrectly treated as an error. This issue was
> discovered while testing memory failure handling for guest_memfd.
Please include a description of the userspace-visible effects of the issue?
> The proposed solution involves -
> 1. Clarifying the definition of MF_DELAYED to mean that memory failure
> handling is only partially completed, and that the metadata for the
> memory that failed (as in struct page/folio) is still referenced.
> 2. Updating shmem’s handling to align with the clarified definition.
> 3. Updating how the result of .error_remove_folio() is interpreted.
Thanks. I'll take no action at this time - it's late in the cycle and we
lack review.
For some reason Sashiko wasn't able to apply this series to the various
branches which it attempts. Click on the "baseline" thingy at
https://sashiko.dev/#/patchset/20260602-memory-failure-mf-delayed-fix-v4-0-a5bc7db5a9b2@google.com
to see what it tried to apply this to.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 1/7] mm: memory_failure: Clarify the MF_DELAYED definition
2026-06-02 21:55 ` [PATCH v4 1/7] mm: memory_failure: Clarify the MF_DELAYED definition Lisa Wang
@ 2026-06-05 11:30 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-05 11:30 UTC (permalink / raw)
To: Lisa Wang, Miaohe Lin, Naoya Horiguchi, Andrew Morton,
Paolo Bonzini, Shuah Khan, Hugh Dickins, Baolin Wang,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen
On 6/2/26 23:55, Lisa Wang wrote:
> This patch clarifies the definition of MF_DELAYED to represent cases
> where a folio's removal is initiated but not immediately completed
> (e.g., due to remaining metadata references).
>
> Signed-off-by: Lisa Wang <wyihan@google.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED
2026-06-02 21:55 ` [PATCH v4 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED Lisa Wang
@ 2026-06-05 11:32 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-05 11:32 UTC (permalink / raw)
To: Lisa Wang, Miaohe Lin, Naoya Horiguchi, Andrew Morton,
Paolo Bonzini, Shuah Khan, Hugh Dickins, Baolin Wang,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen
On 6/2/26 23:55, Lisa Wang wrote:
> The .error_remove_folio a_ops is used by different filesystems to handle
> folio truncation upon discovery of a memory failure in the memory
> associated with the given folio.
>
> Currently, MF_DELAYED is treated as an error, causing "Failed to punch
> page" to be written to the console. MF_DELAYED is then relayed to the
> caller of truncate_error_folio() as MF_FAILED. This further causes
> memory_failure() to return -EBUSY, which then always causes a SIGBUS.
>
> This is also implies that regardless of whether the thread's memory
> corruption kill policy is PR_MCE_KILL_EARLY or PR_MCE_KILL_LATE, a
> memory failure with MF_DELAYED will always cause a SIGBUS.
>
> Update truncate_error_folio() to return MF_DELAYED to the caller if the
> .error_remove_folio() callback reports MF_DELAYED.
>
> Signed-off-by: Lisa Wang <wyihan@google.com>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 3/7] mm: shmem: Update shmem handler to the MF_DELAYED definition
2026-06-02 21:55 ` [PATCH v4 3/7] mm: shmem: Update shmem handler to the MF_DELAYED definition Lisa Wang
@ 2026-06-05 11:35 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-05 11:35 UTC (permalink / raw)
To: Lisa Wang, Miaohe Lin, Naoya Horiguchi, Andrew Morton,
Paolo Bonzini, Shuah Khan, Hugh Dickins, Baolin Wang,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen
On 6/2/26 23:55, Lisa Wang wrote:
> To align with the definition of MF_DELAYED, update
> shmem_error_remove_folio() to return MF_DELAYED.
>
> shmem handles memory failures but defers the actual file truncation. The
> function's return value should therefore be MF_DELAYED to accurately
> reflect the state.
>
> Currently, this logical error does not cause a bug, because:
>
> - For shmem folios, folio->private is not set.
> - As a result, filemap_release_folio() is a no-op and returns true.
> - This, in turn, causes truncate_error_folio() to incorrectly return
> MF_RECOVERED.
> - The caller then treats MF_RECOVERED as a success condition, masking the
> issue.
>
> The previous patch relays MF_DELAYED to the caller of
> truncate_error_folio() before any logging, so returning MF_DELAYED from
> shmem_error_remove_folio() will retain the original behavior of not
> adding any logs.
>
> The return value of truncate_error_folio() is consumed in action_result(),
> which treats MF_DELAYED the same way as MF_RECOVERED, hence action_result()
> also returns the same thing after this change.
>
> Signed-off-by: Lisa Wang <wyihan@google.com>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 4/7] mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases
2026-06-02 21:55 ` [PATCH v4 4/7] mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases Lisa Wang
@ 2026-06-05 11:35 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-05 11:35 UTC (permalink / raw)
To: Lisa Wang, Miaohe Lin, Naoya Horiguchi, Andrew Morton,
Paolo Bonzini, Shuah Khan, Hugh Dickins, Baolin Wang,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen
On 6/2/26 23:55, Lisa Wang wrote:
> Generalize extra_pins handling to all MF_DELAYED cases not only
> shmem_mapping.
>
> If MF_DELAYED is returned, the filemap continues to hold refcounts on the
> folio. Hence, take that into account when checking for extra refcounts.
>
> As clarified in an earlier patch, a return value of MF_DELAYED implies that
> the page still has elevated refcounts. Hence, set extra_pins to true if the
> return value is MF_DELAYED. This is aligned with the implementation in
> me_swapcache_dirty(), where, if a folio is still in the swap cache, ret is
> set to MF_DELAYED and extra_pins is set to true.
>
> Signed-off-by: Lisa Wang <wyihan@google.com>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 5/7] mm: selftests: Add shmem into memory failure test
2026-06-02 21:55 ` [PATCH v4 5/7] mm: selftests: Add shmem into memory failure test Lisa Wang
@ 2026-06-05 11:38 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-05 11:38 UTC (permalink / raw)
To: Lisa Wang, Miaohe Lin, Naoya Horiguchi, Andrew Morton,
Paolo Bonzini, Shuah Khan, Hugh Dickins, Baolin Wang,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, kvm,
linux-kselftest
Cc: rientjes, seanjc, ackerleytng, vannapurve, michael.roth, jiaqiyan,
tabba, dave.hansen
On 6/2/26 23:55, Lisa Wang wrote:
> Add a shmem memory failure selftest to test the shmem memory failure is
> correct after modifying shmem return value.
>
> Specifically, test the expected behavior under various scenarios
> combining page dirtiness (dirty vs clean) and failure types (hard vs
> soft):
> + Dirty + Hard: Trigger a SIGBUS on injection, and trigger another
> SIGBUS when reading the page again.
> + Dirty + Soft: No SIGBUS is triggered, and the original value can be
> read successfully.
> + Clean + Hard: No SIGBUS is triggered on injection, but trigger a
> SIGBUS when trying to read the page again.
> + Clean + Soft: No SIGBUS is triggered, and the page can be read
> successfully.
>
> Signed-off-by: Lisa Wang <wyihan@google.com>
> ---
> tools/testing/selftests/mm/memory-failure.c | 111 +++++++++++++++++++++++++++-
> 1 file changed, 108 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/memory-failure.c b/tools/testing/selftests/mm/memory-failure.c
> index 3d9e0b9ffb41..43949b3b3565 100644
> --- a/tools/testing/selftests/mm/memory-failure.c
> +++ b/tools/testing/selftests/mm/memory-failure.c
> @@ -30,9 +30,14 @@ enum result_type {
> MADV_HARD_ANON,
> MADV_HARD_CLEAN_PAGECACHE,
> MADV_HARD_DIRTY_PAGECACHE,
> + MADV_HARD_CLEAN_SHMEM,
> + MADV_HARD_DIRTY_SHMEM,
> MADV_SOFT_ANON,
> MADV_SOFT_CLEAN_PAGECACHE,
> MADV_SOFT_DIRTY_PAGECACHE,
> + MADV_SOFT_CLEAN_SHMEM,
> + MADV_SOFT_DIRTY_SHMEM,
> + READ_ERROR,
> };
>
> static jmp_buf signal_jmp_buf;
> @@ -165,17 +170,21 @@ static void check(struct __test_metadata *_metadata, FIXTURE_DATA(memory_failure
> case MADV_HARD_CLEAN_PAGECACHE:
> case MADV_SOFT_CLEAN_PAGECACHE:
> case MADV_SOFT_DIRTY_PAGECACHE:
> - /* It is not expected to receive a SIGBUS signal. */
> - ASSERT_EQ(setjmp, 0);
> -
> + case MADV_SOFT_DIRTY_SHMEM:
> /* The page content should remain unchanged. */
> ASSERT_TRUE(check_memory(vaddr, self->page_size));
You should likely use "fallthrough;" ... unless you are missing a break; here.
--
Cheers,
David
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-06-05 11:38 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02 21:55 [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
2026-06-02 21:55 ` [PATCH v4 1/7] mm: memory_failure: Clarify the MF_DELAYED definition Lisa Wang
2026-06-05 11:30 ` David Hildenbrand (Arm)
2026-06-02 21:55 ` [PATCH v4 2/7] mm: memory_failure: Allow truncate_error_folio to return MF_DELAYED Lisa Wang
2026-06-05 11:32 ` David Hildenbrand (Arm)
2026-06-02 21:55 ` [PATCH v4 3/7] mm: shmem: Update shmem handler to the MF_DELAYED definition Lisa Wang
2026-06-05 11:35 ` David Hildenbrand (Arm)
2026-06-02 21:55 ` [PATCH v4 4/7] mm: memory_failure: Generalize extra_pins handling to all MF_DELAYED cases Lisa Wang
2026-06-05 11:35 ` David Hildenbrand (Arm)
2026-06-02 21:55 ` [PATCH v4 5/7] mm: selftests: Add shmem into memory failure test Lisa Wang
2026-06-05 11:38 ` David Hildenbrand (Arm)
2026-06-02 21:55 ` [PATCH v4 6/7] KVM: selftests: Add the guest_memfd " Lisa Wang
2026-06-02 21:55 ` [PATCH v4 7/7] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables Lisa Wang
2026-06-03 20:48 ` [PATCH v4 0/7] mm: Fix MF_DELAYED handling on memory failure Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox