From: "Michael S. Tsirkin" <mst@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, "Miaohe Lin" <linmiaohe@huawei.com>,
"Jason Wang" <jasowang@redhat.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Oscar Salvador" <osalvador@suse.de>,
"Lorenzo Stoakes" <ljs@kernel.org>,
"Liam R. Howlett" <liam@infradead.org>,
"Vlastimil Babka" <vbabka@kernel.org>,
"Mike Rapoport" <rppt@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Michal Hocko" <mhocko@suse.com>,
"Brendan Jackman" <jackmanb@google.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Baolin Wang" <baolin.wang@linux.alibaba.com>,
"Nico Pache" <npache@redhat.com>,
"Ryan Roberts" <ryan.roberts@arm.com>,
"Dev Jain" <dev.jain@arm.com>, "Barry Song" <baohua@kernel.org>,
"Lance Yang" <lance.yang@linux.dev>,
"Hugh Dickins" <hughd@google.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Joshua Hahn" <joshua.hahnjy@gmail.com>,
"Rakie Kim" <rakie.kim@sk.com>,
"Byungchul Park" <byungchul@sk.com>,
"Gregory Price" <gourry@gourry.net>,
"Ying Huang" <ying.huang@linux.alibaba.com>,
"Alistair Popple" <apopple@nvidia.com>,
"Christoph Lameter" <cl@gentwo.org>,
"David Rientjes" <rientjes@google.com>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Harry Yoo" <harry.yoo@oracle.com>,
"Axel Rasmussen" <axelrasmussen@google.com>,
"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
"Chris Li" <chrisl@kernel.org>,
"Kairui Song" <kasong@tencent.com>,
"Kemeng Shi" <shikemeng@huaweicloud.com>,
"Nhat Pham" <nphamcs@gmail.com>, "Baoquan He" <bhe@redhat.com>,
virtualization@lists.linux.dev, linux-mm@kvack.org,
"Andrea Arcangeli" <aarcange@redhat.com>,
"Naoya Horiguchi" <nao.horiguchi@gmail.com>
Subject: Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock
Date: Tue, 9 Jun 2026 16:34:19 -0400 [thread overview]
Message-ID: <20260609162437-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <38C84F23-E881-4DB2-86BA-93F39D44AE1B@nvidia.com>
On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote:
> On 9 Jun 2026, at 14:39, Zi Yan wrote:
>
> > On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote:
> >
> >> On 6/9/26 20:10, Andrew Morton wrote:
> >>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>>
> >>>> TestSetPageHWPoison() is called without zone->lock, so its atomic
> >>>> update to page->flags can race with non-atomic flag operations
> >>>> that run under zone->lock in the buddy allocator.
> >>>>
> >>>> In particular, __free_pages_prepare() does:
> >>>>
> >>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
> >>>>
> >>>> This non-atomic read-modify-write, while correctly excluding
> >>>> __PG_HWPOISON from the mask, can still lose a concurrent
> >>>> TestSetPageHWPoison if the read happens before the poison bit
> >>>> is set and the write happens after. Will only get worse if/when
> >>>> we add more non-atomic flag operations.
> >>>>
> >>>> Fix by acquiring zone->lock around TestSetPageHWPoison and
> >>>> around ClearPageHWPoison in the retry path. This
> >>>> serializes with all buddy flag manipulation. The cost is
> >>>> negligible: one lock/unlock in an extremely rare path
> >>>> (hardware memory errors).
> >>>>
> >>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere
> >>>> in this file operate on pages already removed from the buddy
> >>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not
> >>>> need zone->lock protection.
> >>>
> >>> Sashiko is saying this doesn't do anything "Because
> >>> __free_pages_prepare() executes entirely locklessly". Did it goof?
> >>>
> >>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com
> >>
> >> Battle of the bots: it's right.
> >
> > Yep, __free_pages_prepare() changes the page flag without holding
> > zone->lock.
>
> __free_pages_prepare() works on frozen pages and assumes no one else
> touches the input page. To avoid this race, memory_failure() might
> want to try_get_page() before TestClearPageHWPoison(), but I am not
> sure if that works along with memory failure flow.
>
> Best Regards,
> Yan, Zi
Actually memory failure already plays with this down the road no?
So maybe it's enough to just SetPageHWPoison afterwards again?
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d4361309..4758fea94a96 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags)
if (!res) {
if (is_free_buddy_page(p)) {
if (take_page_off_buddy(p)) {
+ SetPageHWPoison(p);
page_ref_inc(p);
res = MF_RECOVERED;
} else {
and maybe in a bunch of other places in there?
--
MST
next prev parent reply other threads:[~2026-06-09 20:34 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 10:12 [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock Michael S. Tsirkin
2026-06-09 12:50 ` David Hildenbrand (Arm)
2026-06-09 16:12 ` Zi Yan
2026-06-09 18:10 ` Andrew Morton
2026-06-09 18:38 ` David Hildenbrand (Arm)
2026-06-09 18:39 ` Zi Yan
2026-06-09 18:52 ` Zi Yan
2026-06-09 20:34 ` Michael S. Tsirkin [this message]
2026-06-09 20:54 ` Zi Yan
2026-06-09 21:00 ` Michael S. Tsirkin
2026-06-10 7:24 ` Miaohe Lin
2026-06-10 7:35 ` Michael S. Tsirkin
2026-06-09 20:24 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260609162437-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=byungchul@sk.com \
--cc=chrisl@kernel.org \
--cc=cl@gentwo.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=eperezma@redhat.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hughd@google.com \
--cc=jackmanb@google.com \
--cc=jasowang@redhat.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kasong@tencent.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=nao.horiguchi@gmail.com \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=osalvador@suse.de \
--cc=rakie.kim@sk.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=virtualization@lists.linux.dev \
--cc=weixugc@google.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.