* [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock
@ 2026-06-09 10:12 Michael S. Tsirkin
2026-06-09 12:50 ` David Hildenbrand (Arm)
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2026-06-09 10:12 UTC (permalink / raw)
To: linux-kernel
Cc: Miaohe Lin, David Hildenbrand (Arm), Jason Wang, Xuan Zhuo,
Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi
TestSetPageHWPoison() is called without zone->lock, so its atomic
update to page->flags can race with non-atomic flag operations
that run under zone->lock in the buddy allocator.
In particular, __free_pages_prepare() does:
page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
This non-atomic read-modify-write, while correctly excluding
__PG_HWPOISON from the mask, can still lose a concurrent
TestSetPageHWPoison if the read happens before the poison bit
is set and the write happens after. Will only get worse if/when
we add more non-atomic flag operations.
Fix by acquiring zone->lock around TestSetPageHWPoison and
around ClearPageHWPoison in the retry path. This
serializes with all buddy flag manipulation. The cost is
negligible: one lock/unlock in an extremely rare path
(hardware memory errors).
Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere
in this file operate on pages already removed from the buddy
allocator or on non-buddy pages (DAX, hugetlb), so they do not
need zone->lock protection.
Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7")
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
Sending separately as suggested by multiple people. I also added
a Fixes tag.
mm/memory-failure.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d4361309..3880486028a1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2348,6 +2348,8 @@ int memory_failure(unsigned long pfn, int flags)
unsigned long page_flags;
bool retry = true;
int hugetlb = 0;
+ struct zone *zone;
+ unsigned long mf_flags;
if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn);
@@ -2390,7 +2392,11 @@ int memory_failure(unsigned long pfn, int flags)
if (hugetlb)
goto unlock_mutex;
+ /* Serialize with non-atomic buddy flag operations */
+ zone = page_zone(p);
+ spin_lock_irqsave(&zone->lock, mf_flags);
if (TestSetPageHWPoison(p)) {
+ spin_unlock_irqrestore(&zone->lock, mf_flags);
res = -EHWPOISON;
if (flags & MF_ACTION_REQUIRED)
res = kill_accessing_process(current, pfn, flags);
@@ -2399,6 +2405,7 @@ int memory_failure(unsigned long pfn, int flags)
action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
goto unlock_mutex;
}
+ spin_unlock_irqrestore(&zone->lock, mf_flags);
/*
* We need/can do nothing about count=0 pages.
@@ -2420,7 +2427,10 @@ int memory_failure(unsigned long pfn, int flags)
} else {
/* We lost the race, try again */
if (retry) {
+ /* Serialize with non-atomic buddy flag operations */
+ spin_lock_irqsave(&zone->lock, mf_flags);
ClearPageHWPoison(p);
+ spin_unlock_irqrestore(&zone->lock, mf_flags);
retry = false;
goto try_again;
}
--
MST
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 10:12 [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock Michael S. Tsirkin @ 2026-06-09 12:50 ` David Hildenbrand (Arm) 2026-06-09 16:12 ` Zi Yan 2026-06-09 18:10 ` Andrew Morton 2 siblings, 0 replies; 19+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-09 12:50 UTC (permalink / raw) To: Michael S. Tsirkin, linux-kernel Cc: Miaohe Lin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On 6/9/26 12:12, Michael S. Tsirkin wrote: > TestSetPageHWPoison() is called without zone->lock, so its atomic > update to page->flags can race with non-atomic flag operations > that run under zone->lock in the buddy allocator. > > In particular, __free_pages_prepare() does: > > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > > This non-atomic read-modify-write, while correctly excluding > __PG_HWPOISON from the mask, can still lose a concurrent > TestSetPageHWPoison if the read happens before the poison bit > is set and the write happens after. Will only get worse if/when > we add more non-atomic flag operations. > > Fix by acquiring zone->lock around TestSetPageHWPoison and > around ClearPageHWPoison in the retry path. This > serializes with all buddy flag manipulation. The cost is > negligible: one lock/unlock in an extremely rare path > (hardware memory errors). > > Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > in this file operate on pages already removed from the buddy > allocator or on non-buddy pages (DAX, hugetlb), so they do not > need zone->lock protection. > > Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7") > Acked-by: Miaohe Lin <linmiaohe@huawei.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > Assisted-by: Claude:claude-opus-4-6 > --- memory_failure() is documented that it must not be called with the spinlock held, so this cannot deadlock. In particular, we would grab the lock already in take_page_off_buddy(). LGTM, thanks Acked-by: David Hildenbrand (Arm) <david@kernel.org> -- Cheers, David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 10:12 [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock Michael S. Tsirkin 2026-06-09 12:50 ` David Hildenbrand (Arm) @ 2026-06-09 16:12 ` Zi Yan 2026-06-09 18:10 ` Andrew Morton 2 siblings, 0 replies; 19+ messages in thread From: Zi Yan @ 2026-06-09 16:12 UTC (permalink / raw) To: Michael S. Tsirkin Cc: linux-kernel, Miaohe Lin, David Hildenbrand (Arm), Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On 9 Jun 2026, at 6:12, Michael S. Tsirkin wrote: > TestSetPageHWPoison() is called without zone->lock, so its atomic > update to page->flags can race with non-atomic flag operations > that run under zone->lock in the buddy allocator. > > In particular, __free_pages_prepare() does: > > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > > This non-atomic read-modify-write, while correctly excluding > __PG_HWPOISON from the mask, can still lose a concurrent > TestSetPageHWPoison if the read happens before the poison bit > is set and the write happens after. Will only get worse if/when > we add more non-atomic flag operations. > > Fix by acquiring zone->lock around TestSetPageHWPoison and > around ClearPageHWPoison in the retry path. This > serializes with all buddy flag manipulation. The cost is > negligible: one lock/unlock in an extremely rare path > (hardware memory errors). > > Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > in this file operate on pages already removed from the buddy > allocator or on non-buddy pages (DAX, hugetlb), so they do not > need zone->lock protection. > > Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7") > Acked-by: Miaohe Lin <linmiaohe@huawei.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > Assisted-by: Claude:claude-opus-4-6 > --- > > Sending separately as suggested by multiple people. I also added > a Fixes tag. > > > mm/memory-failure.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > Makes sense to me. Thanks. Acked-by: Zi Yan <ziy@nvidia.com> Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 10:12 [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock Michael S. Tsirkin 2026-06-09 12:50 ` David Hildenbrand (Arm) 2026-06-09 16:12 ` Zi Yan @ 2026-06-09 18:10 ` Andrew Morton 2026-06-09 18:38 ` David Hildenbrand (Arm) 2026-06-09 20:24 ` Michael S. Tsirkin 2 siblings, 2 replies; 19+ messages in thread From: Andrew Morton @ 2026-06-09 18:10 UTC (permalink / raw) To: Michael S. Tsirkin Cc: linux-kernel, Miaohe Lin, David Hildenbrand (Arm), Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > TestSetPageHWPoison() is called without zone->lock, so its atomic > update to page->flags can race with non-atomic flag operations > that run under zone->lock in the buddy allocator. > > In particular, __free_pages_prepare() does: > > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > > This non-atomic read-modify-write, while correctly excluding > __PG_HWPOISON from the mask, can still lose a concurrent > TestSetPageHWPoison if the read happens before the poison bit > is set and the write happens after. Will only get worse if/when > we add more non-atomic flag operations. > > Fix by acquiring zone->lock around TestSetPageHWPoison and > around ClearPageHWPoison in the retry path. This > serializes with all buddy flag manipulation. The cost is > negligible: one lock/unlock in an extremely rare path > (hardware memory errors). > > Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > in this file operate on pages already removed from the buddy > allocator or on non-buddy pages (DAX, hugetlb), so they do not > need zone->lock protection. Sashiko is saying this doesn't do anything "Because __free_pages_prepare() executes entirely locklessly". Did it goof? https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 18:10 ` Andrew Morton @ 2026-06-09 18:38 ` David Hildenbrand (Arm) 2026-06-09 18:39 ` Zi Yan 2026-06-11 6:33 ` Michael S. Tsirkin 2026-06-09 20:24 ` Michael S. Tsirkin 1 sibling, 2 replies; 19+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-09 18:38 UTC (permalink / raw) To: Andrew Morton, Michael S. Tsirkin Cc: linux-kernel, Miaohe Lin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On 6/9/26 20:10, Andrew Morton wrote: > On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> TestSetPageHWPoison() is called without zone->lock, so its atomic >> update to page->flags can race with non-atomic flag operations >> that run under zone->lock in the buddy allocator. >> >> In particular, __free_pages_prepare() does: >> >> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >> >> This non-atomic read-modify-write, while correctly excluding >> __PG_HWPOISON from the mask, can still lose a concurrent >> TestSetPageHWPoison if the read happens before the poison bit >> is set and the write happens after. Will only get worse if/when >> we add more non-atomic flag operations. >> >> Fix by acquiring zone->lock around TestSetPageHWPoison and >> around ClearPageHWPoison in the retry path. This >> serializes with all buddy flag manipulation. The cost is >> negligible: one lock/unlock in an extremely rare path >> (hardware memory errors). >> >> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere >> in this file operate on pages already removed from the buddy >> allocator or on non-buddy pages (DAX, hugetlb), so they do not >> need zone->lock protection. > > Sashiko is saying this doesn't do anything "Because > __free_pages_prepare() executes entirely locklessly". Did it goof? > > https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com Battle of the bots: it's right. -- Cheers, David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 18:38 ` David Hildenbrand (Arm) @ 2026-06-09 18:39 ` Zi Yan 2026-06-09 18:52 ` Zi Yan 2026-06-11 6:33 ` Michael S. Tsirkin 1 sibling, 1 reply; 19+ messages in thread From: Zi Yan @ 2026-06-09 18:39 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Andrew Morton, Michael S. Tsirkin, linux-kernel, Miaohe Lin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: > On 6/9/26 20:10, Andrew Morton wrote: >> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >>> TestSetPageHWPoison() is called without zone->lock, so its atomic >>> update to page->flags can race with non-atomic flag operations >>> that run under zone->lock in the buddy allocator. >>> >>> In particular, __free_pages_prepare() does: >>> >>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >>> >>> This non-atomic read-modify-write, while correctly excluding >>> __PG_HWPOISON from the mask, can still lose a concurrent >>> TestSetPageHWPoison if the read happens before the poison bit >>> is set and the write happens after. Will only get worse if/when >>> we add more non-atomic flag operations. >>> >>> Fix by acquiring zone->lock around TestSetPageHWPoison and >>> around ClearPageHWPoison in the retry path. This >>> serializes with all buddy flag manipulation. The cost is >>> negligible: one lock/unlock in an extremely rare path >>> (hardware memory errors). >>> >>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere >>> in this file operate on pages already removed from the buddy >>> allocator or on non-buddy pages (DAX, hugetlb), so they do not >>> need zone->lock protection. >> >> Sashiko is saying this doesn't do anything "Because >> __free_pages_prepare() executes entirely locklessly". Did it goof? >> >> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com > > Battle of the bots: it's right. Yep, __free_pages_prepare() changes the page flag without holding zone->lock. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 18:39 ` Zi Yan @ 2026-06-09 18:52 ` Zi Yan 2026-06-09 20:34 ` Michael S. Tsirkin 0 siblings, 1 reply; 19+ messages in thread From: Zi Yan @ 2026-06-09 18:52 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Andrew Morton, Michael S. Tsirkin, linux-kernel, Miaohe Lin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On 9 Jun 2026, at 14:39, Zi Yan wrote: > On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: > >> On 6/9/26 20:10, Andrew Morton wrote: >>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: >>> >>>> TestSetPageHWPoison() is called without zone->lock, so its atomic >>>> update to page->flags can race with non-atomic flag operations >>>> that run under zone->lock in the buddy allocator. >>>> >>>> In particular, __free_pages_prepare() does: >>>> >>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >>>> >>>> This non-atomic read-modify-write, while correctly excluding >>>> __PG_HWPOISON from the mask, can still lose a concurrent >>>> TestSetPageHWPoison if the read happens before the poison bit >>>> is set and the write happens after. Will only get worse if/when >>>> we add more non-atomic flag operations. >>>> >>>> Fix by acquiring zone->lock around TestSetPageHWPoison and >>>> around ClearPageHWPoison in the retry path. This >>>> serializes with all buddy flag manipulation. The cost is >>>> negligible: one lock/unlock in an extremely rare path >>>> (hardware memory errors). >>>> >>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere >>>> in this file operate on pages already removed from the buddy >>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not >>>> need zone->lock protection. >>> >>> Sashiko is saying this doesn't do anything "Because >>> __free_pages_prepare() executes entirely locklessly". Did it goof? >>> >>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com >> >> Battle of the bots: it's right. > > Yep, __free_pages_prepare() changes the page flag without holding > zone->lock. __free_pages_prepare() works on frozen pages and assumes no one else touches the input page. To avoid this race, memory_failure() might want to try_get_page() before TestClearPageHWPoison(), but I am not sure if that works along with memory failure flow. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 18:52 ` Zi Yan @ 2026-06-09 20:34 ` Michael S. Tsirkin 2026-06-09 20:54 ` Zi Yan 0 siblings, 1 reply; 19+ messages in thread From: Michael S. Tsirkin @ 2026-06-09 20:34 UTC (permalink / raw) To: Zi Yan Cc: David Hildenbrand (Arm), Andrew Morton, linux-kernel, Miaohe Lin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: > On 9 Jun 2026, at 14:39, Zi Yan wrote: > > > On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: > > > >> On 6/9/26 20:10, Andrew Morton wrote: > >>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > >>> > >>>> TestSetPageHWPoison() is called without zone->lock, so its atomic > >>>> update to page->flags can race with non-atomic flag operations > >>>> that run under zone->lock in the buddy allocator. > >>>> > >>>> In particular, __free_pages_prepare() does: > >>>> > >>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > >>>> > >>>> This non-atomic read-modify-write, while correctly excluding > >>>> __PG_HWPOISON from the mask, can still lose a concurrent > >>>> TestSetPageHWPoison if the read happens before the poison bit > >>>> is set and the write happens after. Will only get worse if/when > >>>> we add more non-atomic flag operations. > >>>> > >>>> Fix by acquiring zone->lock around TestSetPageHWPoison and > >>>> around ClearPageHWPoison in the retry path. This > >>>> serializes with all buddy flag manipulation. The cost is > >>>> negligible: one lock/unlock in an extremely rare path > >>>> (hardware memory errors). > >>>> > >>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > >>>> in this file operate on pages already removed from the buddy > >>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not > >>>> need zone->lock protection. > >>> > >>> Sashiko is saying this doesn't do anything "Because > >>> __free_pages_prepare() executes entirely locklessly". Did it goof? > >>> > >>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com > >> > >> Battle of the bots: it's right. > > > > Yep, __free_pages_prepare() changes the page flag without holding > > zone->lock. > > __free_pages_prepare() works on frozen pages and assumes no one else > touches the input page. To avoid this race, memory_failure() might > want to try_get_page() before TestClearPageHWPoison(), but I am not > sure if that works along with memory failure flow. > > Best Regards, > Yan, Zi Actually memory failure already plays with this down the road no? So maybe it's enough to just SetPageHWPoison afterwards again? diff --git a/mm/memory-failure.c b/mm/memory-failure.c index ee42d4361309..4758fea94a96 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) if (!res) { if (is_free_buddy_page(p)) { if (take_page_off_buddy(p)) { + SetPageHWPoison(p); page_ref_inc(p); res = MF_RECOVERED; } else { and maybe in a bunch of other places in there? -- MST ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 20:34 ` Michael S. Tsirkin @ 2026-06-09 20:54 ` Zi Yan 2026-06-09 21:00 ` Michael S. Tsirkin 0 siblings, 1 reply; 19+ messages in thread From: Zi Yan @ 2026-06-09 20:54 UTC (permalink / raw) To: Michael S. Tsirkin, Miaohe Lin Cc: David Hildenbrand (Arm), Andrew Morton, linux-kernel, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: > On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: >> On 9 Jun 2026, at 14:39, Zi Yan wrote: >> >>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: >>> >>>> On 6/9/26 20:10, Andrew Morton wrote: >>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: >>>>> >>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic >>>>>> update to page->flags can race with non-atomic flag operations >>>>>> that run under zone->lock in the buddy allocator. >>>>>> >>>>>> In particular, __free_pages_prepare() does: >>>>>> >>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >>>>>> >>>>>> This non-atomic read-modify-write, while correctly excluding >>>>>> __PG_HWPOISON from the mask, can still lose a concurrent >>>>>> TestSetPageHWPoison if the read happens before the poison bit >>>>>> is set and the write happens after. Will only get worse if/when >>>>>> we add more non-atomic flag operations. >>>>>> >>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and >>>>>> around ClearPageHWPoison in the retry path. This >>>>>> serializes with all buddy flag manipulation. The cost is >>>>>> negligible: one lock/unlock in an extremely rare path >>>>>> (hardware memory errors). >>>>>> >>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere >>>>>> in this file operate on pages already removed from the buddy >>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not >>>>>> need zone->lock protection. >>>>> >>>>> Sashiko is saying this doesn't do anything "Because >>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? >>>>> >>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com >>>> >>>> Battle of the bots: it's right. >>> >>> Yep, __free_pages_prepare() changes the page flag without holding >>> zone->lock. >> >> __free_pages_prepare() works on frozen pages and assumes no one else >> touches the input page. To avoid this race, memory_failure() might >> want to try_get_page() before TestClearPageHWPoison(), but I am not >> sure if that works along with memory failure flow. >> >> Best Regards, >> Yan, Zi > > > > Actually memory failure already plays with this down the road no? > > So maybe it's enough to just SetPageHWPoison afterwards again? > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index ee42d4361309..4758fea94a96 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) > if (!res) { > if (is_free_buddy_page(p)) { > if (take_page_off_buddy(p)) { > + SetPageHWPoison(p); > page_ref_inc(p); > res = MF_RECOVERED; > } else { > > > and maybe in a bunch of other places in there? You mean for fear of losing HWPoison flag in the earlier TestSetPageHWPoison(), just set it again here? Why not do it after get_hwpoison_page(), since that is the expected page flag? Miaohe probably can give a better answer here. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 20:54 ` Zi Yan @ 2026-06-09 21:00 ` Michael S. Tsirkin 2026-06-10 7:24 ` Miaohe Lin 0 siblings, 1 reply; 19+ messages in thread From: Michael S. Tsirkin @ 2026-06-09 21:00 UTC (permalink / raw) To: Zi Yan Cc: Miaohe Lin, David Hildenbrand (Arm), Andrew Morton, linux-kernel, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote: > On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: > > > On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: > >> On 9 Jun 2026, at 14:39, Zi Yan wrote: > >> > >>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: > >>> > >>>> On 6/9/26 20:10, Andrew Morton wrote: > >>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > >>>>> > >>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic > >>>>>> update to page->flags can race with non-atomic flag operations > >>>>>> that run under zone->lock in the buddy allocator. > >>>>>> > >>>>>> In particular, __free_pages_prepare() does: > >>>>>> > >>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > >>>>>> > >>>>>> This non-atomic read-modify-write, while correctly excluding > >>>>>> __PG_HWPOISON from the mask, can still lose a concurrent > >>>>>> TestSetPageHWPoison if the read happens before the poison bit > >>>>>> is set and the write happens after. Will only get worse if/when > >>>>>> we add more non-atomic flag operations. > >>>>>> > >>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and > >>>>>> around ClearPageHWPoison in the retry path. This > >>>>>> serializes with all buddy flag manipulation. The cost is > >>>>>> negligible: one lock/unlock in an extremely rare path > >>>>>> (hardware memory errors). > >>>>>> > >>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > >>>>>> in this file operate on pages already removed from the buddy > >>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not > >>>>>> need zone->lock protection. > >>>>> > >>>>> Sashiko is saying this doesn't do anything "Because > >>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? > >>>>> > >>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com > >>>> > >>>> Battle of the bots: it's right. > >>> > >>> Yep, __free_pages_prepare() changes the page flag without holding > >>> zone->lock. > >> > >> __free_pages_prepare() works on frozen pages and assumes no one else > >> touches the input page. To avoid this race, memory_failure() might > >> want to try_get_page() before TestClearPageHWPoison(), but I am not > >> sure if that works along with memory failure flow. > >> > >> Best Regards, > >> Yan, Zi > > > > > > > > Actually memory failure already plays with this down the road no? > > > > So maybe it's enough to just SetPageHWPoison afterwards again? > > > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index ee42d4361309..4758fea94a96 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) > > if (!res) { > > if (is_free_buddy_page(p)) { > > if (take_page_off_buddy(p)) { > > + SetPageHWPoison(p); > > page_ref_inc(p); > > res = MF_RECOVERED; > > } else { > > > > > > and maybe in a bunch of other places in there? > > You mean for fear of losing HWPoison flag in the earlier TestSetPageHWPoison(), > just set it again here? Yea. > Why not do it after get_hwpoison_page(), since that > is the expected page flag? It's still in the buddy at that point right? I'm worried buddy might poke at flags. > Miaohe probably can give a better answer here. > > > Best Regards, > Yan, Zi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 21:00 ` Michael S. Tsirkin @ 2026-06-10 7:24 ` Miaohe Lin 2026-06-10 7:35 ` Michael S. Tsirkin 2026-06-10 21:18 ` Michael S. Tsirkin 0 siblings, 2 replies; 19+ messages in thread From: Miaohe Lin @ 2026-06-10 7:24 UTC (permalink / raw) To: Michael S. Tsirkin, Zi Yan Cc: David Hildenbrand (Arm), Andrew Morton, linux-kernel, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On 2026/6/10 5:00, Michael S. Tsirkin wrote: > On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote: >> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: >> >>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: >>>> On 9 Jun 2026, at 14:39, Zi Yan wrote: >>>> >>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: >>>>> >>>>>> On 6/9/26 20:10, Andrew Morton wrote: >>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: >>>>>>> >>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic >>>>>>>> update to page->flags can race with non-atomic flag operations >>>>>>>> that run under zone->lock in the buddy allocator. >>>>>>>> >>>>>>>> In particular, __free_pages_prepare() does: >>>>>>>> >>>>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >>>>>>>> >>>>>>>> This non-atomic read-modify-write, while correctly excluding >>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent >>>>>>>> TestSetPageHWPoison if the read happens before the poison bit >>>>>>>> is set and the write happens after. Will only get worse if/when >>>>>>>> we add more non-atomic flag operations. >>>>>>>> >>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and >>>>>>>> around ClearPageHWPoison in the retry path. This >>>>>>>> serializes with all buddy flag manipulation. The cost is >>>>>>>> negligible: one lock/unlock in an extremely rare path >>>>>>>> (hardware memory errors). >>>>>>>> >>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere >>>>>>>> in this file operate on pages already removed from the buddy >>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not >>>>>>>> need zone->lock protection. >>>>>>> >>>>>>> Sashiko is saying this doesn't do anything "Because >>>>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? >>>>>>> >>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com >>>>>> >>>>>> Battle of the bots: it's right. >>>>> >>>>> Yep, __free_pages_prepare() changes the page flag without holding >>>>> zone->lock. >>>> >>>> __free_pages_prepare() works on frozen pages and assumes no one else >>>> touches the input page. To avoid this race, memory_failure() might >>>> want to try_get_page() before TestClearPageHWPoison(), but I am not >>>> sure if that works along with memory failure flow. >>>> >>>> Best Regards, >>>> Yan, Zi >>> >>> >>> >>> Actually memory failure already plays with this down the road no? >>> >>> So maybe it's enough to just SetPageHWPoison afterwards again? >>> >>> >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>> index ee42d4361309..4758fea94a96 100644 >>> --- a/mm/memory-failure.c >>> +++ b/mm/memory-failure.c >>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) >>> if (!res) { >>> if (is_free_buddy_page(p)) { >>> if (take_page_off_buddy(p)) { >>> + SetPageHWPoison(p); >>> page_ref_inc(p); >>> res = MF_RECOVERED; >>> } else { >>> >>> >>> and maybe in a bunch of other places in there? >> >> You mean for fear of losing HWPoison flag in the earlier TestSetPageHWPoison(), >> just set it again here? > > Yea. > >> Why not do it after get_hwpoison_page(), since that >> is the expected page flag? > > It's still in the buddy at that point right? I'm worried buddy might > poke at flags. Since __free_pages_prepare() executes entirely locklessly, the only way to ensure HWPoison flag won't be lost might be only set hwpoison flag iff we can make sure pages are not on the way to buddy... Thanks. . ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-10 7:24 ` Miaohe Lin @ 2026-06-10 7:35 ` Michael S. Tsirkin 2026-06-10 21:18 ` Michael S. Tsirkin 1 sibling, 0 replies; 19+ messages in thread From: Michael S. Tsirkin @ 2026-06-10 7:35 UTC (permalink / raw) To: Miaohe Lin Cc: Zi Yan, David Hildenbrand (Arm), Andrew Morton, linux-kernel, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On Wed, Jun 10, 2026 at 03:24:30PM +0800, Miaohe Lin wrote: > On 2026/6/10 5:00, Michael S. Tsirkin wrote: > > On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote: > >> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: > >> > >>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: > >>>> On 9 Jun 2026, at 14:39, Zi Yan wrote: > >>>> > >>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: > >>>>> > >>>>>> On 6/9/26 20:10, Andrew Morton wrote: > >>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > >>>>>>> > >>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic > >>>>>>>> update to page->flags can race with non-atomic flag operations > >>>>>>>> that run under zone->lock in the buddy allocator. > >>>>>>>> > >>>>>>>> In particular, __free_pages_prepare() does: > >>>>>>>> > >>>>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > >>>>>>>> > >>>>>>>> This non-atomic read-modify-write, while correctly excluding > >>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent > >>>>>>>> TestSetPageHWPoison if the read happens before the poison bit > >>>>>>>> is set and the write happens after. Will only get worse if/when > >>>>>>>> we add more non-atomic flag operations. > >>>>>>>> > >>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and > >>>>>>>> around ClearPageHWPoison in the retry path. This > >>>>>>>> serializes with all buddy flag manipulation. The cost is > >>>>>>>> negligible: one lock/unlock in an extremely rare path > >>>>>>>> (hardware memory errors). > >>>>>>>> > >>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > >>>>>>>> in this file operate on pages already removed from the buddy > >>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not > >>>>>>>> need zone->lock protection. > >>>>>>> > >>>>>>> Sashiko is saying this doesn't do anything "Because > >>>>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? > >>>>>>> > >>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com > >>>>>> > >>>>>> Battle of the bots: it's right. > >>>>> > >>>>> Yep, __free_pages_prepare() changes the page flag without holding > >>>>> zone->lock. > >>>> > >>>> __free_pages_prepare() works on frozen pages and assumes no one else > >>>> touches the input page. To avoid this race, memory_failure() might > >>>> want to try_get_page() before TestClearPageHWPoison(), but I am not > >>>> sure if that works along with memory failure flow. > >>>> > >>>> Best Regards, > >>>> Yan, Zi > >>> > >>> > >>> > >>> Actually memory failure already plays with this down the road no? > >>> > >>> So maybe it's enough to just SetPageHWPoison afterwards again? > >>> > >>> > >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>> index ee42d4361309..4758fea94a96 100644 > >>> --- a/mm/memory-failure.c > >>> +++ b/mm/memory-failure.c > >>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) > >>> if (!res) { > >>> if (is_free_buddy_page(p)) { > >>> if (take_page_off_buddy(p)) { > >>> + SetPageHWPoison(p); > >>> page_ref_inc(p); > >>> res = MF_RECOVERED; > >>> } else { > >>> > >>> > >>> and maybe in a bunch of other places in there? > >> > >> You mean for fear of losing HWPoison flag in the earlier TestSetPageHWPoison(), > >> just set it again here? > > > > Yea. > > > >> Why not do it after get_hwpoison_page(), since that > >> is the expected page flag? > > > > It's still in the buddy at that point right? I'm worried buddy might > > poke at flags. > > Since __free_pages_prepare() executes entirely locklessly, the only way to ensure > HWPoison flag won't be lost might be only set hwpoison flag iff we can make sure > pages are not on the way to buddy... > > Thanks. > . Right so here after take_page_off_buddy it's ok, for example. -- MST ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-10 7:24 ` Miaohe Lin 2026-06-10 7:35 ` Michael S. Tsirkin @ 2026-06-10 21:18 ` Michael S. Tsirkin 2026-06-11 3:35 ` Miaohe Lin 1 sibling, 1 reply; 19+ messages in thread From: Michael S. Tsirkin @ 2026-06-10 21:18 UTC (permalink / raw) To: Miaohe Lin Cc: Zi Yan, David Hildenbrand (Arm), Andrew Morton, linux-kernel, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On Wed, Jun 10, 2026 at 03:24:30PM +0800, Miaohe Lin wrote: > On 2026/6/10 5:00, Michael S. Tsirkin wrote: > > On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote: > >> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: > >> > >>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: > >>>> On 9 Jun 2026, at 14:39, Zi Yan wrote: > >>>> > >>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: > >>>>> > >>>>>> On 6/9/26 20:10, Andrew Morton wrote: > >>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > >>>>>>> > >>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic > >>>>>>>> update to page->flags can race with non-atomic flag operations > >>>>>>>> that run under zone->lock in the buddy allocator. > >>>>>>>> > >>>>>>>> In particular, __free_pages_prepare() does: > >>>>>>>> > >>>>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > >>>>>>>> > >>>>>>>> This non-atomic read-modify-write, while correctly excluding > >>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent > >>>>>>>> TestSetPageHWPoison if the read happens before the poison bit > >>>>>>>> is set and the write happens after. Will only get worse if/when > >>>>>>>> we add more non-atomic flag operations. > >>>>>>>> > >>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and > >>>>>>>> around ClearPageHWPoison in the retry path. This > >>>>>>>> serializes with all buddy flag manipulation. The cost is > >>>>>>>> negligible: one lock/unlock in an extremely rare path > >>>>>>>> (hardware memory errors). > >>>>>>>> > >>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > >>>>>>>> in this file operate on pages already removed from the buddy > >>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not > >>>>>>>> need zone->lock protection. > >>>>>>> > >>>>>>> Sashiko is saying this doesn't do anything "Because > >>>>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? > >>>>>>> > >>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com > >>>>>> > >>>>>> Battle of the bots: it's right. > >>>>> > >>>>> Yep, __free_pages_prepare() changes the page flag without holding > >>>>> zone->lock. > >>>> > >>>> __free_pages_prepare() works on frozen pages and assumes no one else > >>>> touches the input page. To avoid this race, memory_failure() might > >>>> want to try_get_page() before TestClearPageHWPoison(), but I am not > >>>> sure if that works along with memory failure flow. > >>>> > >>>> Best Regards, > >>>> Yan, Zi > >>> > >>> > >>> > >>> Actually memory failure already plays with this down the road no? > >>> > >>> So maybe it's enough to just SetPageHWPoison afterwards again? > >>> > >>> > >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>> index ee42d4361309..4758fea94a96 100644 > >>> --- a/mm/memory-failure.c > >>> +++ b/mm/memory-failure.c > >>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) > >>> if (!res) { > >>> if (is_free_buddy_page(p)) { > >>> if (take_page_off_buddy(p)) { > >>> + SetPageHWPoison(p); > >>> page_ref_inc(p); > >>> res = MF_RECOVERED; > >>> } else { > >>> > >>> > >>> and maybe in a bunch of other places in there? > >> > >> You mean for fear of losing HWPoison flag in the earlier TestSetPageHWPoison(), > >> just set it again here? > > > > Yea. > > > >> Why not do it after get_hwpoison_page(), since that > >> is the expected page flag? > > > > It's still in the buddy at that point right? I'm worried buddy might > > poke at flags. > > Since __free_pages_prepare() executes entirely locklessly, the only way to ensure > HWPoison flag won't be lost might be only set hwpoison flag iff we can make sure > pages are not on the way to buddy... > > Thanks. > . To clarify do you not agree repeating SetPageHWPoison is enough for this? And if not, do you have suggestions on how to fix this race? Thanks a lot, -- MST ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-10 21:18 ` Michael S. Tsirkin @ 2026-06-11 3:35 ` Miaohe Lin 2026-06-11 5:43 ` Michael S. Tsirkin 0 siblings, 1 reply; 19+ messages in thread From: Miaohe Lin @ 2026-06-11 3:35 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zi Yan, David Hildenbrand (Arm), Andrew Morton, linux-kernel, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On 2026/6/11 5:18, Michael S. Tsirkin wrote: > On Wed, Jun 10, 2026 at 03:24:30PM +0800, Miaohe Lin wrote: >> On 2026/6/10 5:00, Michael S. Tsirkin wrote: >>> On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote: >>>> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: >>>> >>>>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: >>>>>> On 9 Jun 2026, at 14:39, Zi Yan wrote: >>>>>> >>>>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: >>>>>>> >>>>>>>> On 6/9/26 20:10, Andrew Morton wrote: >>>>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic >>>>>>>>>> update to page->flags can race with non-atomic flag operations >>>>>>>>>> that run under zone->lock in the buddy allocator. >>>>>>>>>> >>>>>>>>>> In particular, __free_pages_prepare() does: >>>>>>>>>> >>>>>>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >>>>>>>>>> >>>>>>>>>> This non-atomic read-modify-write, while correctly excluding >>>>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent >>>>>>>>>> TestSetPageHWPoison if the read happens before the poison bit >>>>>>>>>> is set and the write happens after. Will only get worse if/when >>>>>>>>>> we add more non-atomic flag operations. >>>>>>>>>> >>>>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and >>>>>>>>>> around ClearPageHWPoison in the retry path. This >>>>>>>>>> serializes with all buddy flag manipulation. The cost is >>>>>>>>>> negligible: one lock/unlock in an extremely rare path >>>>>>>>>> (hardware memory errors). >>>>>>>>>> >>>>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere >>>>>>>>>> in this file operate on pages already removed from the buddy >>>>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not >>>>>>>>>> need zone->lock protection. >>>>>>>>> >>>>>>>>> Sashiko is saying this doesn't do anything "Because >>>>>>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? >>>>>>>>> >>>>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com >>>>>>>> >>>>>>>> Battle of the bots: it's right. >>>>>>> >>>>>>> Yep, __free_pages_prepare() changes the page flag without holding >>>>>>> zone->lock. >>>>>> >>>>>> __free_pages_prepare() works on frozen pages and assumes no one else >>>>>> touches the input page. To avoid this race, memory_failure() might >>>>>> want to try_get_page() before TestClearPageHWPoison(), but I am not >>>>>> sure if that works along with memory failure flow. >>>>>> >>>>>> Best Regards, >>>>>> Yan, Zi >>>>> >>>>> >>>>> >>>>> Actually memory failure already plays with this down the road no? >>>>> >>>>> So maybe it's enough to just SetPageHWPoison afterwards again? >>>>> >>>>> >>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>>> index ee42d4361309..4758fea94a96 100644 >>>>> --- a/mm/memory-failure.c >>>>> +++ b/mm/memory-failure.c >>>>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) >>>>> if (!res) { >>>>> if (is_free_buddy_page(p)) { >>>>> if (take_page_off_buddy(p)) { >>>>> + SetPageHWPoison(p); >>>>> page_ref_inc(p); >>>>> res = MF_RECOVERED; >>>>> } else { >>>>> >>>>> >>>>> and maybe in a bunch of other places in there? >>>> >>>> You mean for fear of losing HWPoison flag in the earlier TestSetPageHWPoison(), >>>> just set it again here? >>> >>> Yea. >>> >>>> Why not do it after get_hwpoison_page(), since that >>>> is the expected page flag? >>> >>> It's still in the buddy at that point right? I'm worried buddy might >>> poke at flags. >> >> Since __free_pages_prepare() executes entirely locklessly, the only way to ensure >> HWPoison flag won't be lost might be only set hwpoison flag iff we can make sure >> pages are not on the way to buddy... >> >> Thanks. >> . > > > To clarify do you not agree repeating SetPageHWPoison is enough for > this? And if not, do you have suggestions on how to fix this race? Do you mean repeating SetPageHWPoison on every branch? Is it possible to make __free_pages_prepare changes page->flags atomically or this race is specified to memory_failure? Thanks. . ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-11 3:35 ` Miaohe Lin @ 2026-06-11 5:43 ` Michael S. Tsirkin 2026-06-11 7:36 ` Miaohe Lin 0 siblings, 1 reply; 19+ messages in thread From: Michael S. Tsirkin @ 2026-06-11 5:43 UTC (permalink / raw) To: Miaohe Lin Cc: Zi Yan, David Hildenbrand (Arm), Andrew Morton, linux-kernel, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On Thu, Jun 11, 2026 at 11:35:36AM +0800, Miaohe Lin wrote: > On 2026/6/11 5:18, Michael S. Tsirkin wrote: > > On Wed, Jun 10, 2026 at 03:24:30PM +0800, Miaohe Lin wrote: > >> On 2026/6/10 5:00, Michael S. Tsirkin wrote: > >>> On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote: > >>>> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: > >>>> > >>>>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: > >>>>>> On 9 Jun 2026, at 14:39, Zi Yan wrote: > >>>>>> > >>>>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: > >>>>>>> > >>>>>>>> On 6/9/26 20:10, Andrew Morton wrote: > >>>>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > >>>>>>>>> > >>>>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic > >>>>>>>>>> update to page->flags can race with non-atomic flag operations > >>>>>>>>>> that run under zone->lock in the buddy allocator. > >>>>>>>>>> > >>>>>>>>>> In particular, __free_pages_prepare() does: > >>>>>>>>>> > >>>>>>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > >>>>>>>>>> > >>>>>>>>>> This non-atomic read-modify-write, while correctly excluding > >>>>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent > >>>>>>>>>> TestSetPageHWPoison if the read happens before the poison bit > >>>>>>>>>> is set and the write happens after. Will only get worse if/when > >>>>>>>>>> we add more non-atomic flag operations. > >>>>>>>>>> > >>>>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and > >>>>>>>>>> around ClearPageHWPoison in the retry path. This > >>>>>>>>>> serializes with all buddy flag manipulation. The cost is > >>>>>>>>>> negligible: one lock/unlock in an extremely rare path > >>>>>>>>>> (hardware memory errors). > >>>>>>>>>> > >>>>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > >>>>>>>>>> in this file operate on pages already removed from the buddy > >>>>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not > >>>>>>>>>> need zone->lock protection. > >>>>>>>>> > >>>>>>>>> Sashiko is saying this doesn't do anything "Because > >>>>>>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? > >>>>>>>>> > >>>>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com > >>>>>>>> > >>>>>>>> Battle of the bots: it's right. > >>>>>>> > >>>>>>> Yep, __free_pages_prepare() changes the page flag without holding > >>>>>>> zone->lock. > >>>>>> > >>>>>> __free_pages_prepare() works on frozen pages and assumes no one else > >>>>>> touches the input page. To avoid this race, memory_failure() might > >>>>>> want to try_get_page() before TestClearPageHWPoison(), but I am not > >>>>>> sure if that works along with memory failure flow. > >>>>>> > >>>>>> Best Regards, > >>>>>> Yan, Zi > >>>>> > >>>>> > >>>>> > >>>>> Actually memory failure already plays with this down the road no? > >>>>> > >>>>> So maybe it's enough to just SetPageHWPoison afterwards again? > >>>>> > >>>>> > >>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>>>> index ee42d4361309..4758fea94a96 100644 > >>>>> --- a/mm/memory-failure.c > >>>>> +++ b/mm/memory-failure.c > >>>>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) > >>>>> if (!res) { > >>>>> if (is_free_buddy_page(p)) { > >>>>> if (take_page_off_buddy(p)) { > >>>>> + SetPageHWPoison(p); > >>>>> page_ref_inc(p); > >>>>> res = MF_RECOVERED; > >>>>> } else { > >>>>> > >>>>> > >>>>> and maybe in a bunch of other places in there? > >>>> > >>>> You mean for fear of losing HWPoison flag in the earlier TestSetPageHWPoison(), > >>>> just set it again here? > >>> > >>> Yea. > >>> > >>>> Why not do it after get_hwpoison_page(), since that > >>>> is the expected page flag? > >>> > >>> It's still in the buddy at that point right? I'm worried buddy might > >>> poke at flags. > >> > >> Since __free_pages_prepare() executes entirely locklessly, the only way to ensure > >> HWPoison flag won't be lost might be only set hwpoison flag iff we can make sure > >> pages are not on the way to buddy... > >> > >> Thanks. > >> . > > > > > > To clarify do you not agree repeating SetPageHWPoison is enough for > > this? And if not, do you have suggestions on how to fix this race? > > Do you mean repeating SetPageHWPoison on every branch? Right. > Is it possible > to make __free_pages_prepare changes page->flags atomically or this race > is specified to memory_failure? > > Thanks. > . Adding an atomic op on every fast path page allocation is, I am guessing, going to slow down Linux measureably. Doing it for the benefit of memory_failure, which is the slowest of slow paths, seems unpalatable, to me. Neither am I sure it's the only racy place - grep for __SetPage and __ClearPage - all these have the same issue, I suspect. At the same time, I'm not an mm maintainer. If you disagree, try to upstream a change converting all non atomics in mm to atomics, and see what others say. -- MST ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-11 5:43 ` Michael S. Tsirkin @ 2026-06-11 7:36 ` Miaohe Lin 0 siblings, 0 replies; 19+ messages in thread From: Miaohe Lin @ 2026-06-11 7:36 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Zi Yan, David Hildenbrand (Arm), Andrew Morton, linux-kernel, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On 2026/6/11 13:43, Michael S. Tsirkin wrote: > On Thu, Jun 11, 2026 at 11:35:36AM +0800, Miaohe Lin wrote: >> On 2026/6/11 5:18, Michael S. Tsirkin wrote: >>> On Wed, Jun 10, 2026 at 03:24:30PM +0800, Miaohe Lin wrote: >>>> On 2026/6/10 5:00, Michael S. Tsirkin wrote: >>>>> On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote: >>>>>> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: >>>>>> >>>>>>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: >>>>>>>> On 9 Jun 2026, at 14:39, Zi Yan wrote: >>>>>>>> >>>>>>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: >>>>>>>>> >>>>>>>>>> On 6/9/26 20:10, Andrew Morton wrote: >>>>>>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic >>>>>>>>>>>> update to page->flags can race with non-atomic flag operations >>>>>>>>>>>> that run under zone->lock in the buddy allocator. >>>>>>>>>>>> >>>>>>>>>>>> In particular, __free_pages_prepare() does: >>>>>>>>>>>> >>>>>>>>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >>>>>>>>>>>> >>>>>>>>>>>> This non-atomic read-modify-write, while correctly excluding >>>>>>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent >>>>>>>>>>>> TestSetPageHWPoison if the read happens before the poison bit >>>>>>>>>>>> is set and the write happens after. Will only get worse if/when >>>>>>>>>>>> we add more non-atomic flag operations. >>>>>>>>>>>> >>>>>>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and >>>>>>>>>>>> around ClearPageHWPoison in the retry path. This >>>>>>>>>>>> serializes with all buddy flag manipulation. The cost is >>>>>>>>>>>> negligible: one lock/unlock in an extremely rare path >>>>>>>>>>>> (hardware memory errors). >>>>>>>>>>>> >>>>>>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere >>>>>>>>>>>> in this file operate on pages already removed from the buddy >>>>>>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not >>>>>>>>>>>> need zone->lock protection. >>>>>>>>>>> >>>>>>>>>>> Sashiko is saying this doesn't do anything "Because >>>>>>>>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? >>>>>>>>>>> >>>>>>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com >>>>>>>>>> >>>>>>>>>> Battle of the bots: it's right. >>>>>>>>> >>>>>>>>> Yep, __free_pages_prepare() changes the page flag without holding >>>>>>>>> zone->lock. >>>>>>>> >>>>>>>> __free_pages_prepare() works on frozen pages and assumes no one else >>>>>>>> touches the input page. To avoid this race, memory_failure() might >>>>>>>> want to try_get_page() before TestClearPageHWPoison(), but I am not >>>>>>>> sure if that works along with memory failure flow. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Yan, Zi >>>>>>> >>>>>>> >>>>>>> >>>>>>> Actually memory failure already plays with this down the road no? >>>>>>> >>>>>>> So maybe it's enough to just SetPageHWPoison afterwards again? >>>>>>> >>>>>>> >>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>>>>> index ee42d4361309..4758fea94a96 100644 >>>>>>> --- a/mm/memory-failure.c >>>>>>> +++ b/mm/memory-failure.c >>>>>>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) >>>>>>> if (!res) { >>>>>>> if (is_free_buddy_page(p)) { >>>>>>> if (take_page_off_buddy(p)) { >>>>>>> + SetPageHWPoison(p); >>>>>>> page_ref_inc(p); >>>>>>> res = MF_RECOVERED; >>>>>>> } else { >>>>>>> >>>>>>> >>>>>>> and maybe in a bunch of other places in there? >>>>>> >>>>>> You mean for fear of losing HWPoison flag in the earlier TestSetPageHWPoison(), >>>>>> just set it again here? >>>>> >>>>> Yea. >>>>> >>>>>> Why not do it after get_hwpoison_page(), since that >>>>>> is the expected page flag? >>>>> >>>>> It's still in the buddy at that point right? I'm worried buddy might >>>>> poke at flags. >>>> >>>> Since __free_pages_prepare() executes entirely locklessly, the only way to ensure >>>> HWPoison flag won't be lost might be only set hwpoison flag iff we can make sure >>>> pages are not on the way to buddy... >>>> >>>> Thanks. >>>> . >>> >>> >>> To clarify do you not agree repeating SetPageHWPoison is enough for >>> this? And if not, do you have suggestions on how to fix this race? >> >> Do you mean repeating SetPageHWPoison on every branch? > > Right. > >> Is it possible >> to make __free_pages_prepare changes page->flags atomically or this race >> is specified to memory_failure? >> >> Thanks. >> . > > > Adding an atomic op on every fast path page allocation is, I am > guessing, going to slow down Linux measureably. > > Doing it for the benefit of memory_failure, which is the slowest of > slow paths, seems unpalatable, to me. Agree, it's not worth to do so. > > Neither am I sure it's the only racy place - > grep for __SetPage and __ClearPage - all these have the same issue, I > suspect. > > At the same time, I'm not an mm maintainer. If you disagree, try to > upstream a change converting all non atomics in mm to atomics, and see > what others say. Since memory_failure might be the only place, this change would be unacceptable. We should come up with a better solution. Maybe we can try repeating SetPageHWPoison and ClearPageHWPoison at a first attempt though it looks somewhat weird to me and makes code more complicated. But it's already complicated. :) Thanks. . ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 18:38 ` David Hildenbrand (Arm) 2026-06-09 18:39 ` Zi Yan @ 2026-06-11 6:33 ` Michael S. Tsirkin 2026-06-11 11:33 ` David Hildenbrand (Arm) 1 sibling, 1 reply; 19+ messages in thread From: Michael S. Tsirkin @ 2026-06-11 6:33 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Andrew Morton, linux-kernel, Miaohe Lin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On Tue, Jun 09, 2026 at 08:38:09PM +0200, David Hildenbrand (Arm) wrote: > On 6/9/26 20:10, Andrew Morton wrote: > > On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > >> TestSetPageHWPoison() is called without zone->lock, so its atomic > >> update to page->flags can race with non-atomic flag operations > >> that run under zone->lock in the buddy allocator. > >> > >> In particular, __free_pages_prepare() does: > >> > >> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > >> > >> This non-atomic read-modify-write, while correctly excluding > >> __PG_HWPOISON from the mask, can still lose a concurrent > >> TestSetPageHWPoison if the read happens before the poison bit > >> is set and the write happens after. Will only get worse if/when > >> we add more non-atomic flag operations. > >> > >> Fix by acquiring zone->lock around TestSetPageHWPoison and > >> around ClearPageHWPoison in the retry path. This > >> serializes with all buddy flag manipulation. The cost is > >> negligible: one lock/unlock in an extremely rare path > >> (hardware memory errors). > >> > >> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > >> in this file operate on pages already removed from the buddy > >> allocator or on non-buddy pages (DAX, hugetlb), so they do not > >> need zone->lock protection. > > > > Sashiko is saying this doesn't do anything "Because > > __free_pages_prepare() executes entirely locklessly". Did it goof? > > > > https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com > > Battle of the bots: it's right. Ugh it's bot against human - I remembered we have zone lock normally in alloc and thought it helps and didn't double check we don't have it here . The bot wins ( > -- > Cheers, > > Davidc ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-11 6:33 ` Michael S. Tsirkin @ 2026-06-11 11:33 ` David Hildenbrand (Arm) 0 siblings, 0 replies; 19+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-11 11:33 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Andrew Morton, linux-kernel, Miaohe Lin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On 6/11/26 08:33, Michael S. Tsirkin wrote: > On Tue, Jun 09, 2026 at 08:38:09PM +0200, David Hildenbrand (Arm) wrote: >> On 6/9/26 20:10, Andrew Morton wrote: >>> >>> >>> Sashiko is saying this doesn't do anything "Because >>> __free_pages_prepare() executes entirely locklessly". Did it goof? >>> >>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com >> >> Battle of the bots: it's right. > > Ugh it's bot against human - I remembered we have zone lock > normally in alloc and thought it helps and didn't double check we don't > have it here . The bot wins ( Heh, I was primarily looking whether taking the lock would be harmful, not if it would be useful. :) -- Cheers, David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock 2026-06-09 18:10 ` Andrew Morton 2026-06-09 18:38 ` David Hildenbrand (Arm) @ 2026-06-09 20:24 ` Michael S. Tsirkin 1 sibling, 0 replies; 19+ messages in thread From: Michael S. Tsirkin @ 2026-06-09 20:24 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, Miaohe Lin, David Hildenbrand (Arm), Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang, Alistair Popple, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi On Tue, Jun 09, 2026 at 11:10:20AM -0700, Andrew Morton wrote: > On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > TestSetPageHWPoison() is called without zone->lock, so its atomic > > update to page->flags can race with non-atomic flag operations > > that run under zone->lock in the buddy allocator. > > > > In particular, __free_pages_prepare() does: > > > > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > > > > This non-atomic read-modify-write, while correctly excluding > > __PG_HWPOISON from the mask, can still lose a concurrent > > TestSetPageHWPoison if the read happens before the poison bit > > is set and the write happens after. Will only get worse if/when > > we add more non-atomic flag operations. > > > > Fix by acquiring zone->lock around TestSetPageHWPoison and > > around ClearPageHWPoison in the retry path. This > > serializes with all buddy flag manipulation. The cost is > > negligible: one lock/unlock in an extremely rare path > > (hardware memory errors). > > > > Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > > in this file operate on pages already removed from the buddy > > allocator or on non-buddy pages (DAX, hugetlb), so they do not > > need zone->lock protection. > > Sashiko is saying this doesn't do anything "Because > __free_pages_prepare() executes entirely locklessly". Did it goof? > > https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com > Oh. So it only helps with the prezero patches. Maybe other places where flags are touched locklessly. Not __free_pages_prepare. I was too focused on that. Scrap this please. I'll try to think of something. -- MST ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-06-11 11:33 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-09 10:12 [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock Michael S. Tsirkin 2026-06-09 12:50 ` David Hildenbrand (Arm) 2026-06-09 16:12 ` Zi Yan 2026-06-09 18:10 ` Andrew Morton 2026-06-09 18:38 ` David Hildenbrand (Arm) 2026-06-09 18:39 ` Zi Yan 2026-06-09 18:52 ` Zi Yan 2026-06-09 20:34 ` Michael S. Tsirkin 2026-06-09 20:54 ` Zi Yan 2026-06-09 21:00 ` Michael S. Tsirkin 2026-06-10 7:24 ` Miaohe Lin 2026-06-10 7:35 ` Michael S. Tsirkin 2026-06-10 21:18 ` Michael S. Tsirkin 2026-06-11 3:35 ` Miaohe Lin 2026-06-11 5:43 ` Michael S. Tsirkin 2026-06-11 7:36 ` Miaohe Lin 2026-06-11 6:33 ` Michael S. Tsirkin 2026-06-11 11:33 ` David Hildenbrand (Arm) 2026-06-09 20:24 ` Michael S. Tsirkin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.