From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Miaohe Lin <linmiaohe@huawei.com>, "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Zi Yan" <ziy@nvidia.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, "Jason Wang" <jasowang@redhat.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Oscar Salvador" <osalvador@suse.de>,
"Lorenzo Stoakes" <ljs@kernel.org>,
"Liam R. Howlett" <liam@infradead.org>,
"Vlastimil Babka" <vbabka@kernel.org>,
"Mike Rapoport" <rppt@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Michal Hocko" <mhocko@suse.com>,
"Brendan Jackman" <jackmanb@google.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Baolin Wang" <baolin.wang@linux.alibaba.com>,
"Nico Pache" <npache@redhat.com>,
"Ryan Roberts" <ryan.roberts@arm.com>,
"Dev Jain" <dev.jain@arm.com>, "Barry Song" <baohua@kernel.org>,
"Lance Yang" <lance.yang@linux.dev>,
"Hugh Dickins" <hughd@google.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Joshua Hahn" <joshua.hahnjy@gmail.com>,
"Rakie Kim" <rakie.kim@sk.com>,
"Byungchul Park" <byungchul@sk.com>,
"Gregory Price" <gourry@gourry.net>,
"Ying Huang" <ying.huang@linux.alibaba.com>,
"Alistair Popple" <apopple@nvidia.com>,
"Christoph Lameter" <cl@gentwo.org>,
"David Rientjes" <rientjes@google.com>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Harry Yoo" <harry.yoo@oracle.com>,
"Axel Rasmussen" <axelrasmussen@google.com>,
"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
"Chris Li" <chrisl@kernel.org>,
"Kairui Song" <kasong@tencent.com>,
"Kemeng Shi" <shikemeng@huaweicloud.com>,
"Nhat Pham" <nphamcs@gmail.com>, "Baoquan He" <bhe@redhat.com>,
virtualization@lists.linux.dev, linux-mm@kvack.org,
"Andrea Arcangeli" <aarcange@redhat.com>,
"Naoya Horiguchi" <nao.horiguchi@gmail.com>
Subject: Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock
Date: Tue, 16 Jun 2026 14:18:57 +0200 [thread overview]
Message-ID: <ab84b317-fecc-4197-be2f-4b4aeba3f4e3@kernel.org> (raw)
In-Reply-To: <438389f2-332d-2f70-cad4-784d7f54af9f@huawei.com>
On 6/16/26 13:40, Miaohe Lin wrote:
> On 2026/6/16 14:56, David Hildenbrand (Arm) wrote:
>>>
>>> These non-atomics are defined and used because they want to avoid atomic ops overhead?
>>> So I'm afraid using rcu read lock in these places would lead to unexpected overhead.
>>
>> It should be cheaper than atomics IIUC. Further, I assume that some pages could
>> batch over multiple such operations (esp. page freeing path when we process tail
>> pages).
>>
>> With !CONFIG_PREEMPT_RCU it's simply preempt_disable()/preempt_enable(), which
>> is either a NOP or just adjusting the preempt counter of the current thread. Cheap.
>>
>> With CONFIG_PREEMPT_RCU we mostly increment current->rcu_read_lock_nesting. But
>> there might be a function call involved (did not look into the details). So that
>> variant should be slightly more expensive.
>
> I scanned the code and found rcu_read_unlock_special might be called in some cases.
> Some expensive ops, e.g. irq_work_queue_on, might be called in some corner cases.
> So the overhead of rcu read lock might be fluctuating.
Right. Usually rcu_read_lock+unlock is supposed to be very lightweight, but that
might not be completely the case with that PREEMPT_RCU thingy ...
>
>>
>> We'd have to measure what an addition rcu read lock would cost in there. that
>> should be fairly easy to benchmark.
>
> Sure. We can do that if needed.
>
>>
>>>
>>> I think this is a good idea, although there are some remaining issues.
>>> But such race should be really rare, is it worth all this effort? Could we
>>> simply aim to resolve, not to be flawless? I.e. could we simply check
>>> and re-set the hwpoison flag at the end of memory_failure handling to
>>> simply avoid losing hwpoison flag as a best-effort attempt? Would it be
>>> acceptable?
>>
>> Hacky. Sufficient for the hypervisor to suspend the nonatomic-setting CPU at the
>> wrong time to still trigger the same behavior.
>
> Right. hypervisor could make the issue easier to trigger...
>
>>
>> I think, either we fix it properly, or we redesign hwpoison handling to deal
>> with setting/clearing becoming stale at some random point in the future.
>
> I think your proposal, although there are still some issues to be resolved, is
> nevertheless a good solution. We could also wait and see if anyone comes up with
> a better one.
I wouldn't call it "good" ... it's the only thing I was easily able to come up
with :)
The only alternative would be moving the hwpoison bit out of page->flags,
storing it in a sparse bitmap or sth. like that. It would be a bigger rework and
I am sure there are issues with that as well.
--
Cheers,
David
next prev parent reply other threads:[~2026-06-16 12:19 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 10:12 [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock Michael S. Tsirkin
2026-06-09 12:50 ` David Hildenbrand (Arm)
2026-06-09 16:12 ` Zi Yan
2026-06-09 18:10 ` Andrew Morton
2026-06-09 18:38 ` David Hildenbrand (Arm)
2026-06-09 18:39 ` Zi Yan
2026-06-09 18:52 ` Zi Yan
2026-06-09 20:34 ` Michael S. Tsirkin
2026-06-09 20:54 ` Zi Yan
2026-06-09 21:00 ` Michael S. Tsirkin
2026-06-10 7:24 ` Miaohe Lin
2026-06-10 7:35 ` Michael S. Tsirkin
2026-06-10 21:18 ` Michael S. Tsirkin
2026-06-11 3:35 ` Miaohe Lin
2026-06-11 5:43 ` Michael S. Tsirkin
2026-06-11 7:36 ` Miaohe Lin
2026-06-11 13:20 ` David Hildenbrand (Arm)
2026-06-15 3:29 ` Miaohe Lin
2026-06-15 10:54 ` David Hildenbrand (Arm)
2026-06-16 6:32 ` Miaohe Lin
2026-06-16 6:56 ` David Hildenbrand (Arm)
2026-06-16 11:40 ` Miaohe Lin
2026-06-16 12:18 ` David Hildenbrand (Arm) [this message]
2026-06-11 6:33 ` Michael S. Tsirkin
2026-06-11 11:33 ` David Hildenbrand (Arm)
2026-06-09 20:24 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab84b317-fecc-4197-be2f-4b4aeba3f4e3@kernel.org \
--to=david@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=byungchul@sk.com \
--cc=chrisl@kernel.org \
--cc=cl@gentwo.org \
--cc=dev.jain@arm.com \
--cc=eperezma@redhat.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hughd@google.com \
--cc=jackmanb@google.com \
--cc=jasowang@redhat.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kasong@tencent.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=mst@redhat.com \
--cc=muchun.song@linux.dev \
--cc=nao.horiguchi@gmail.com \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=osalvador@suse.de \
--cc=rakie.kim@sk.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=virtualization@lists.linux.dev \
--cc=weixugc@google.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox