* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
@ 2026-04-28 13:52 Muchun Song
2026-04-29 3:08 ` Miaohe Lin
0 siblings, 1 reply; 12+ messages in thread
From: Muchun Song @ 2026-04-28 13:52 UTC (permalink / raw)
To: Miaohe Lin
Cc: Muchun Song, Vishal Verma, Ying Huang, Dan Williams,
Naoya Horiguchi, linux-mm, linux-cxl, driver-core, linux-kernel,
stable, David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
Rafael J Wysocki, Danilo Krummrich, Andrew Morton
> On Apr 28, 2026, at 20:34, Miaohe Lin <linmiaohe@huawei.com> wrote:
> On 2026/4/28 19:40, Muchun Song wrote:
>>
>>
>>> On Apr 28, 2026, at 19:37, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>> On 2026/4/28 16:52, Muchun Song wrote:
>>>> memblk_nr_poison_inc() and memblk_nr_poison_sub() call
>>>> find_memory_block_by_id(), which requires device_hotplug_lock to
>>>> serialize the xarray lookup against memory block removal.
>>>> Take device_hotplug_lock around the lookup and nr_hwpoison update so
>>>> the memory block cannot disappear between xa_load() and get_device().
>>>> Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> Thanks for update.
>>>> ---
>>>> drivers/base/memory.c | 10 ++++++++--
>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>>>> index 6981b55d582a..f76aee29e9a5 100644
>>>> --- a/drivers/base/memory.c
>>>> +++ b/drivers/base/memory.c
>>>> @@ -1228,23 +1228,29 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
>>>> void memblk_nr_poison_inc(unsigned long pfn)
>>>> {
>>>> const unsigned long block_id = pfn_to_block_id(pfn);
>>>> - struct memory_block *mem = find_memory_block_by_id(block_id);
>>>> + struct memory_block *mem;
>>>> + lock_device_hotplug();
>>> memblk_nr_poison_inc() and memblk_nr_poison_sub() are both called from memory_failure() context.
>>> I'm afraid if memory_failure() is triggered while lock_device_hotplug is held, it will lead to
>>> deadlock. Or am I miss something?
>>
>> I am curious is there any place where memory_failure() is called with holding lock_device_hotplug?
>
> Sorry for dumb scenario, I was a bit too presumptuous. But there might be another possible deadlock:
>
> remove_memory
> lock_device_hotplug <-- first called here
> try_remove_memory
> remove_memory_block_devices
> num_poisoned_pages_sub
Passing pfn = -1 here.
> memblk_nr_poison_sub
> lock_device_hotplug <-- deadlock here
No. Can’t reach here. No deadlock.
Thanks.
>
> Hope I'm not mistaken again. :)
>
> Thank.
> .
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
2026-04-28 13:52 [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup Muchun Song
@ 2026-04-29 3:08 ` Miaohe Lin
2026-04-29 3:32 ` Oscar Salvador
0 siblings, 1 reply; 12+ messages in thread
From: Miaohe Lin @ 2026-04-29 3:08 UTC (permalink / raw)
To: Muchun Song
Cc: Muchun Song, Vishal Verma, Ying Huang, Dan Williams,
Naoya Horiguchi, linux-mm, linux-cxl, driver-core, linux-kernel,
stable, David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
Rafael J Wysocki, Danilo Krummrich, Andrew Morton
On 2026/4/28 21:52, Muchun Song wrote:
>
>
>
>> On Apr 28, 2026, at 20:34, Miaohe Lin <linmiaohe@huawei.com> wrote:
>> On 2026/4/28 19:40, Muchun Song wrote:
>>>
>>>
>>>> On Apr 28, 2026, at 19:37, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>> On 2026/4/28 16:52, Muchun Song wrote:
>>>>> memblk_nr_poison_inc() and memblk_nr_poison_sub() call
>>>>> find_memory_block_by_id(), which requires device_hotplug_lock to
>>>>> serialize the xarray lookup against memory block removal.
>>>>> Take device_hotplug_lock around the lookup and nr_hwpoison update so
>>>>> the memory block cannot disappear between xa_load() and get_device().
>>>>> Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter")
>>>>> Cc: stable@vger.kernel.org
>>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>> Thanks for update.
>>>>> ---
>>>>> drivers/base/memory.c | 10 ++++++++--
>>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>>>>> index 6981b55d582a..f76aee29e9a5 100644
>>>>> --- a/drivers/base/memory.c
>>>>> +++ b/drivers/base/memory.c
>>>>> @@ -1228,23 +1228,29 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
>>>>> void memblk_nr_poison_inc(unsigned long pfn)
>>>>> {
>>>>> const unsigned long block_id = pfn_to_block_id(pfn);
>>>>> - struct memory_block *mem = find_memory_block_by_id(block_id);
>>>>> + struct memory_block *mem;
>>>>> + lock_device_hotplug();
>>>> memblk_nr_poison_inc() and memblk_nr_poison_sub() are both called from memory_failure() context.
>>>> I'm afraid if memory_failure() is triggered while lock_device_hotplug is held, it will lead to
>>>> deadlock. Or am I miss something?
>>>
>>> I am curious is there any place where memory_failure() is called with holding lock_device_hotplug?
>>
>> Sorry for dumb scenario, I was a bit too presumptuous. But there might be another possible deadlock:
>>
>> remove_memory
>> lock_device_hotplug <-- first called here
>> try_remove_memory
>> remove_memory_block_devices
>> num_poisoned_pages_sub
>
> Passing pfn = -1 here.
>
>> memblk_nr_poison_sub
>> lock_device_hotplug <-- deadlock here
>
> No. Can’t reach here. No deadlock.
Right, I missed that. Thanks. But I'm still worried that there might be potential issues.
For example, this function could be called while lock_page is held. Acquiring lock_device_hotplug
while already holding lock_page might cause problems, though I haven't seen any specific issues yet.
Also there might be some other potential scenarios that haven't been considered. Hope I'm just
overthinking it. :)
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Thanks.
.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
2026-04-29 3:08 ` Miaohe Lin
@ 2026-04-29 3:32 ` Oscar Salvador
2026-04-29 4:18 ` Muchun Song
0 siblings, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2026-04-29 3:32 UTC (permalink / raw)
To: Miaohe Lin
Cc: Muchun Song, Muchun Song, Vishal Verma, Ying Huang, Dan Williams,
Naoya Horiguchi, linux-mm, linux-cxl, driver-core, linux-kernel,
stable, David Hildenbrand, Greg Kroah-Hartman, Rafael J Wysocki,
Danilo Krummrich, Andrew Morton
On Wed, Apr 29, 2026 at 11:08:51AM +0800, Miaohe Lin wrote:
> Right, I missed that. Thanks. But I'm still worried that there might be potential issues.
> For example, this function could be called while lock_page is held. Acquiring lock_device_hotplug
> while already holding lock_page might cause problems, though I haven't seen any specific issues yet.
> Also there might be some other potential scenarios that haven't been considered. Hope I'm just
> overthinking it. :)
lock_device_hotplug is a mutex lock, and we already take other mutex locks while
holding lock_folio in other paths, so I am not sure I see what should be special
in this case.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
2026-04-29 3:32 ` Oscar Salvador
@ 2026-04-29 4:18 ` Muchun Song
2026-04-29 10:11 ` Usama Arif
0 siblings, 1 reply; 12+ messages in thread
From: Muchun Song @ 2026-04-29 4:18 UTC (permalink / raw)
To: Oscar Salvador, Miaohe Lin
Cc: Muchun Song, Vishal Verma, Ying Huang, Dan Williams,
Naoya Horiguchi, linux-mm, linux-cxl, driver-core, linux-kernel,
stable, David Hildenbrand, Greg Kroah-Hartman, Rafael J Wysocki,
Danilo Krummrich, Andrew Morton
> On Apr 29, 2026, at 11:32, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Wed, Apr 29, 2026 at 11:08:51AM +0800, Miaohe Lin wrote:
>> Right, I missed that. Thanks. But I'm still worried that there might be potential issues.
>> For example, this function could be called while lock_page is held. Acquiring lock_device_hotplug
>> while already holding lock_page might cause problems, though I haven't seen any specific issues yet.
>> Also there might be some other potential scenarios that haven't been considered. Hope I'm just
>> overthinking it. :)
>
> lock_device_hotplug is a mutex lock, and we already take other mutex locks while
> holding lock_folio in other paths, so I am not sure I see what should be special
> in this case.
Hi Oscar and Miaohe,
I saw sashiko's report [1] related to folio lock and lock_device_hotplug.
Seems it is possible. You can correct me if I am wrong.
[1] https://sashiko.dev/#/patchset/20260428085219.1316047-1-songmuchun%40bytedance.com
We could fix this by calling action_result() without holding folio lock.
What do you think?
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d4361309..bc76066547ce 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2534,8 +2534,9 @@ int memory_failure(unsigned long pfn, int flags)
* Abort on fail: __filemap_remove_folio() assumes unmapped page.
*/
if (!hwpoison_user_mappings(folio, p, pfn, flags)) {
+ folio_unlock(folio);
res = action_result(pfn, MF_MSG_UNMAP_FAILED, MF_FAILED);
- goto unlock_page;
+ goto unlock_mutex;
}
/*
@@ -2543,16 +2544,15 @@ int memory_failure(unsigned long pfn, int flags)
*/
if (folio_test_lru(folio) && !folio_test_swapcache(folio) &&
folio->mapping == NULL) {
+ folio_unlock(folio);
res = action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
- goto unlock_page;
+ goto unlock_mutex;
}
identify_page_state:
res = identify_page_state(pfn, p, page_flags);
mutex_unlock(&mf_mutex);
return res;
-unlock_page:
- folio_unlock(folio);
unlock_mutex:
mutex_unlock(&mf_mutex);
return res;
>
>
> --
> Oscar Salvador
> SUSE Labs
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
2026-04-29 4:18 ` Muchun Song
@ 2026-04-29 10:11 ` Usama Arif
2026-04-29 10:44 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 12+ messages in thread
From: Usama Arif @ 2026-04-29 10:11 UTC (permalink / raw)
To: Muchun Song
Cc: Usama Arif, Oscar Salvador, Miaohe Lin, Muchun Song, Vishal Verma,
Ying Huang, Dan Williams, Naoya Horiguchi, linux-mm, linux-cxl,
driver-core, linux-kernel, stable, David Hildenbrand,
Greg Kroah-Hartman, Rafael J Wysocki, Danilo Krummrich,
Andrew Morton
On Wed, 29 Apr 2026 12:18:08 +0800 Muchun Song <muchun.song@linux.dev> wrote:
>
>
> > On Apr 29, 2026, at 11:32, Oscar Salvador <osalvador@suse.de> wrote:
> >
> > On Wed, Apr 29, 2026 at 11:08:51AM +0800, Miaohe Lin wrote:
> >> Right, I missed that. Thanks. But I'm still worried that there might be potential issues.
> >> For example, this function could be called while lock_page is held. Acquiring lock_device_hotplug
> >> while already holding lock_page might cause problems, though I haven't seen any specific issues yet.
> >> Also there might be some other potential scenarios that haven't been considered. Hope I'm just
> >> overthinking it. :)
> >
> > lock_device_hotplug is a mutex lock, and we already take other mutex locks while
> > holding lock_folio in other paths, so I am not sure I see what should be special
> > in this case.
>
> Hi Oscar and Miaohe,
>
> I saw sashiko's report [1] related to folio lock and lock_device_hotplug.
> Seems it is possible. You can correct me if I am wrong.
>
> [1] https://sashiko.dev/#/patchset/20260428085219.1316047-1-songmuchun%40bytedance.com
>
> We could fix this by calling action_result() without holding folio lock.
> What do you think?
>
Hello Muchun,
You could end up in memblk_nr_poison_sub() while holding hugetlb_lock spin lock
from get_huge_page_for_hwpoison(), right?
Lockdep would flag this as sleeping while atomic when acquiring mutex I think.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
2026-04-29 10:11 ` Usama Arif
@ 2026-04-29 10:44 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-29 10:44 UTC (permalink / raw)
To: Usama Arif, Muchun Song
Cc: Oscar Salvador, Miaohe Lin, Muchun Song, Vishal Verma, Ying Huang,
Dan Williams, Naoya Horiguchi, linux-mm, linux-cxl, driver-core,
linux-kernel, stable, Greg Kroah-Hartman, Rafael J Wysocki,
Danilo Krummrich, Andrew Morton
On 4/29/26 12:11, Usama Arif wrote:
> On Wed, 29 Apr 2026 12:18:08 +0800 Muchun Song <muchun.song@linux.dev> wrote:
>
>>
>>
>>>
>>>
>>> lock_device_hotplug is a mutex lock, and we already take other mutex locks while
>>> holding lock_folio in other paths, so I am not sure I see what should be special
>>> in this case.
>>
>> Hi Oscar and Miaohe,
>>
>> I saw sashiko's report [1] related to folio lock and lock_device_hotplug.
>> Seems it is possible. You can correct me if I am wrong.
>>
>> [1] https://sashiko.dev/#/patchset/20260428085219.1316047-1-songmuchun%40bytedance.com
>>
>> We could fix this by calling action_result() without holding folio lock.
>> What do you think?
>>
>
> Hello Muchun,
>
> You could end up in memblk_nr_poison_sub() while holding hugetlb_lock spin lock
> from get_huge_page_for_hwpoison(), right?
>
> Lockdep would flag this as sleeping while atomic when acquiring mutex I think.
Another thought would be, that we always call the inc/sub from memory failure
code while we hold a folio reference and the page is not poisoned yet.
That way, memory offlining cannot continue and the memory block cannot go away.
So we'd let out page reference keep the memory block alive.
--
Cheers,
David
^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20260428085219.1316047-1-songmuchun@bytedance.com>]
* [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
[not found] <20260428085219.1316047-1-songmuchun@bytedance.com>
@ 2026-04-28 8:52 ` Muchun Song
2026-04-28 9:17 ` Oscar Salvador
2026-04-28 11:37 ` Miaohe Lin
0 siblings, 2 replies; 12+ messages in thread
From: Muchun Song @ 2026-04-28 8:52 UTC (permalink / raw)
To: David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
Rafael J . Wysocki, Danilo Krummrich, Andrew Morton
Cc: Vishal Verma, Ying Huang, Dan Williams, Miaohe Lin,
Naoya Horiguchi, linux-mm, linux-cxl, driver-core, linux-kernel,
Muchun Song, stable, muchun.song
memblk_nr_poison_inc() and memblk_nr_poison_sub() call
find_memory_block_by_id(), which requires device_hotplug_lock to
serialize the xarray lookup against memory block removal.
Take device_hotplug_lock around the lookup and nr_hwpoison update so
the memory block cannot disappear between xa_load() and get_device().
Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter")
Cc: stable@vger.kernel.org
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
drivers/base/memory.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 6981b55d582a..f76aee29e9a5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -1228,23 +1228,29 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
void memblk_nr_poison_inc(unsigned long pfn)
{
const unsigned long block_id = pfn_to_block_id(pfn);
- struct memory_block *mem = find_memory_block_by_id(block_id);
+ struct memory_block *mem;
+ lock_device_hotplug();
+ mem = find_memory_block_by_id(block_id);
if (mem) {
atomic_long_inc(&mem->nr_hwpoison);
put_device(&mem->dev);
}
+ unlock_device_hotplug();
}
void memblk_nr_poison_sub(unsigned long pfn, long i)
{
const unsigned long block_id = pfn_to_block_id(pfn);
- struct memory_block *mem = find_memory_block_by_id(block_id);
+ struct memory_block *mem;
+ lock_device_hotplug();
+ mem = find_memory_block_by_id(block_id);
if (mem) {
atomic_long_sub(i, &mem->nr_hwpoison);
put_device(&mem->dev);
}
+ unlock_device_hotplug();
}
static unsigned long memblk_nr_poison(struct memory_block *mem)
--
2.20.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
2026-04-28 8:52 ` Muchun Song
@ 2026-04-28 9:17 ` Oscar Salvador
2026-04-28 9:21 ` Muchun Song
2026-04-28 11:37 ` Miaohe Lin
1 sibling, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2026-04-28 9:17 UTC (permalink / raw)
To: Muchun Song
Cc: David Hildenbrand, Greg Kroah-Hartman, Rafael J . Wysocki,
Danilo Krummrich, Andrew Morton, Vishal Verma, Ying Huang,
Dan Williams, Miaohe Lin, Naoya Horiguchi, linux-mm, linux-cxl,
driver-core, linux-kernel, stable, muchun.song
On Tue, Apr 28, 2026 at 04:52:19PM +0800, Muchun Song wrote:
> memblk_nr_poison_inc() and memblk_nr_poison_sub() call
> find_memory_block_by_id(), which requires device_hotplug_lock to
> serialize the xarray lookup against memory block removal.
>
> Take device_hotplug_lock around the lookup and nr_hwpoison update so
> the memory block cannot disappear between xa_load() and get_device().
>
> Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter")
> Cc: stable@vger.kernel.org
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
It might have made sense to join both patches? Anyway:
Acked-by: Oscar Salvador <osalvador@suse.de>
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
2026-04-28 9:17 ` Oscar Salvador
@ 2026-04-28 9:21 ` Muchun Song
0 siblings, 0 replies; 12+ messages in thread
From: Muchun Song @ 2026-04-28 9:21 UTC (permalink / raw)
To: Oscar Salvador
Cc: Muchun Song, David Hildenbrand, Greg Kroah-Hartman,
Rafael J . Wysocki, Danilo Krummrich, Andrew Morton, Vishal Verma,
Ying Huang, Dan Williams, Miaohe Lin, Naoya Horiguchi, linux-mm,
linux-cxl, driver-core, linux-kernel, stable
> On Apr 28, 2026, at 17:17, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Tue, Apr 28, 2026 at 04:52:19PM +0800, Muchun Song wrote:
>> memblk_nr_poison_inc() and memblk_nr_poison_sub() call
>> find_memory_block_by_id(), which requires device_hotplug_lock to
>> serialize the xarray lookup against memory block removal.
>>
>> Take device_hotplug_lock around the lookup and nr_hwpoison update so
>> the memory block cannot disappear between xa_load() and get_device().
>>
>> Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>
> It might have made sense to join both patches? Anyway:
Either way works for me. I’ve been following the 'one thing per
patch' principle. If I still need to update v3, I can merge them;
otherwise, I’d prefer to keep it as is. I'm a little lazy. :)
>
> Acked-by: Oscar Salvador <osalvador@suse.de>
Thanks.
>
>
> --
> Oscar Salvador
> SUSE Labs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
2026-04-28 8:52 ` Muchun Song
2026-04-28 9:17 ` Oscar Salvador
@ 2026-04-28 11:37 ` Miaohe Lin
2026-04-28 11:40 ` Muchun Song
1 sibling, 1 reply; 12+ messages in thread
From: Miaohe Lin @ 2026-04-28 11:37 UTC (permalink / raw)
To: Muchun Song
Cc: Vishal Verma, Ying Huang, Dan Williams, Naoya Horiguchi, linux-mm,
linux-cxl, driver-core, linux-kernel, stable, muchun.song,
David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
Rafael J . Wysocki, Danilo Krummrich, Andrew Morton
On 2026/4/28 16:52, Muchun Song wrote:
> memblk_nr_poison_inc() and memblk_nr_poison_sub() call
> find_memory_block_by_id(), which requires device_hotplug_lock to
> serialize the xarray lookup against memory block removal.
>
> Take device_hotplug_lock around the lookup and nr_hwpoison update so
> the memory block cannot disappear between xa_load() and get_device().
>
> Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter")
> Cc: stable@vger.kernel.org
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Thanks for update.
> ---
> drivers/base/memory.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 6981b55d582a..f76aee29e9a5 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -1228,23 +1228,29 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
> void memblk_nr_poison_inc(unsigned long pfn)
> {
> const unsigned long block_id = pfn_to_block_id(pfn);
> - struct memory_block *mem = find_memory_block_by_id(block_id);
> + struct memory_block *mem;
>
> + lock_device_hotplug();
memblk_nr_poison_inc() and memblk_nr_poison_sub() are both called from memory_failure() context.
I'm afraid if memory_failure() is triggered while lock_device_hotplug is held, it will lead to
deadlock. Or am I miss something?
Thanks.
.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
2026-04-28 11:37 ` Miaohe Lin
@ 2026-04-28 11:40 ` Muchun Song
2026-04-28 12:34 ` Miaohe Lin
0 siblings, 1 reply; 12+ messages in thread
From: Muchun Song @ 2026-04-28 11:40 UTC (permalink / raw)
To: Miaohe Lin
Cc: Muchun Song, Vishal Verma, Ying Huang, Dan Williams,
Naoya Horiguchi, linux-mm, linux-cxl, driver-core, linux-kernel,
stable, David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
Rafael J . Wysocki, Danilo Krummrich, Andrew Morton
> On Apr 28, 2026, at 19:37, Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2026/4/28 16:52, Muchun Song wrote:
>> memblk_nr_poison_inc() and memblk_nr_poison_sub() call
>> find_memory_block_by_id(), which requires device_hotplug_lock to
>> serialize the xarray lookup against memory block removal.
>>
>> Take device_hotplug_lock around the lookup and nr_hwpoison update so
>> the memory block cannot disappear between xa_load() and get_device().
>>
>> Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>
> Thanks for update.
>
>> ---
>> drivers/base/memory.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 6981b55d582a..f76aee29e9a5 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -1228,23 +1228,29 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
>> void memblk_nr_poison_inc(unsigned long pfn)
>> {
>> const unsigned long block_id = pfn_to_block_id(pfn);
>> - struct memory_block *mem = find_memory_block_by_id(block_id);
>> + struct memory_block *mem;
>>
>> + lock_device_hotplug();
>
> memblk_nr_poison_inc() and memblk_nr_poison_sub() are both called from memory_failure() context.
> I'm afraid if memory_failure() is triggered while lock_device_hotplug is held, it will lead to
> deadlock. Or am I miss something?
I am curious is there any place where memory_failure() is called with holding lock_device_hotplug?
Thanks.
>
> Thanks.
> .
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup
2026-04-28 11:40 ` Muchun Song
@ 2026-04-28 12:34 ` Miaohe Lin
0 siblings, 0 replies; 12+ messages in thread
From: Miaohe Lin @ 2026-04-28 12:34 UTC (permalink / raw)
To: Muchun Song
Cc: Muchun Song, Vishal Verma, Ying Huang, Dan Williams,
Naoya Horiguchi, linux-mm, linux-cxl, driver-core, linux-kernel,
stable, David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
Rafael J . Wysocki, Danilo Krummrich, Andrew Morton
On 2026/4/28 19:40, Muchun Song wrote:
>
>
>> On Apr 28, 2026, at 19:37, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2026/4/28 16:52, Muchun Song wrote:
>>> memblk_nr_poison_inc() and memblk_nr_poison_sub() call
>>> find_memory_block_by_id(), which requires device_hotplug_lock to
>>> serialize the xarray lookup against memory block removal.
>>>
>>> Take device_hotplug_lock around the lookup and nr_hwpoison update so
>>> the memory block cannot disappear between xa_load() and get_device().
>>>
>>> Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>
>> Thanks for update.
>>
>>> ---
>>> drivers/base/memory.c | 10 ++++++++--
>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>>> index 6981b55d582a..f76aee29e9a5 100644
>>> --- a/drivers/base/memory.c
>>> +++ b/drivers/base/memory.c
>>> @@ -1228,23 +1228,29 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
>>> void memblk_nr_poison_inc(unsigned long pfn)
>>> {
>>> const unsigned long block_id = pfn_to_block_id(pfn);
>>> - struct memory_block *mem = find_memory_block_by_id(block_id);
>>> + struct memory_block *mem;
>>>
>>> + lock_device_hotplug();
>>
>> memblk_nr_poison_inc() and memblk_nr_poison_sub() are both called from memory_failure() context.
>> I'm afraid if memory_failure() is triggered while lock_device_hotplug is held, it will lead to
>> deadlock. Or am I miss something?
>
> I am curious is there any place where memory_failure() is called with holding lock_device_hotplug?
Sorry for dumb scenario, I was a bit too presumptuous. But there might be another possible deadlock:
remove_memory
lock_device_hotplug <-- first called here
try_remove_memory
remove_memory_block_devices
num_poisoned_pages_sub
memblk_nr_poison_sub
lock_device_hotplug <-- deadlock here
Hope I'm not mistaken again. :)
Thank.
.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-04-29 10:44 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 13:52 [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup Muchun Song
2026-04-29 3:08 ` Miaohe Lin
2026-04-29 3:32 ` Oscar Salvador
2026-04-29 4:18 ` Muchun Song
2026-04-29 10:11 ` Usama Arif
2026-04-29 10:44 ` David Hildenbrand (Arm)
[not found] <20260428085219.1316047-1-songmuchun@bytedance.com>
2026-04-28 8:52 ` Muchun Song
2026-04-28 9:17 ` Oscar Salvador
2026-04-28 9:21 ` Muchun Song
2026-04-28 11:37 ` Miaohe Lin
2026-04-28 11:40 ` Muchun Song
2026-04-28 12:34 ` Miaohe Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox