* [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
@ 2025-03-24 13:17 Jinjiang Tu
2025-03-24 13:44 ` David Hildenbrand
2025-03-28 13:23 ` Oscar Salvador
0 siblings, 2 replies; 13+ messages in thread
From: Jinjiang Tu @ 2025-03-24 13:17 UTC (permalink / raw)
To: david, osalvador, akpm, nao.horiguchi, zi.yan
Cc: linux-mm, wangkefeng.wang, sunnanyong, tujinjiang
We triggered the below BUG:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2 pfn:0x240402
head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff)
page_type: f4(hugetlb)
page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1)
------------[ cut here ]------------
kernel BUG at ./include/linux/page-flags.h:310!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : const_folio_flags+0x3c/0x58
lr : const_folio_flags+0x3c/0x58
Call trace:
const_folio_flags+0x3c/0x58 (P)
do_migrate_range+0x164/0x720
offline_pages+0x63c/0x6fc
memory_subsys_offline+0x190/0x1f4
device_offline+0xc0/0x13c
state_store+0x90/0xd8
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x44/0x54
kernfs_fop_write_iter+0x120/0x1cc
vfs_write+0x240/0x378
ksys_write+0x70/0x108
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x48/0x10c
el0_svc_common.constprop.0+0x40/0xe0
When allocating a hugetlb folio, between the folio is taken from buddy
and prep_compound_page() is called, start_isolate_page_range() and
do_migrate_range() is called. When do_migrate_range() scans the head page
of the hugetlb folio, the compound_head field isn't set, so scans the
tail page next. And at this time, the compound_head field of tail page is
set, folio_test_large() is called by tail page, thus triggers VM_BUG_ON().
To fix it, get folio refcount before calling folio_test_large().
Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
mm/memory_hotplug.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 16cf9e17077e..f600c26ce5de 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1813,21 +1813,15 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
page = pfn_to_page(pfn);
folio = page_folio(page);
- /*
- * No reference or lock is held on the folio, so it might
- * be modified concurrently (e.g. split). As such,
- * folio_nr_pages() may read garbage. This is fine as the outer
- * loop will revisit the split folio later.
- */
- if (folio_test_large(folio))
- pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
-
if (!folio_try_get(folio))
continue;
if (unlikely(page_folio(page) != folio))
goto put_folio;
+ if (folio_test_large(folio))
+ pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
+
if (folio_test_hwpoison(folio) ||
(folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
if (WARN_ON(folio_test_lru(folio)))
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-03-24 13:17 [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range Jinjiang Tu
@ 2025-03-24 13:44 ` David Hildenbrand
2025-03-25 3:02 ` Jinjiang Tu
2025-03-28 13:23 ` Oscar Salvador
1 sibling, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2025-03-24 13:44 UTC (permalink / raw)
To: Jinjiang Tu, osalvador, akpm, nao.horiguchi, zi.yan
Cc: linux-mm, wangkefeng.wang, sunnanyong
On 24.03.25 14:17, Jinjiang Tu wrote:
> We triggered the below BUG:
>
> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2 pfn:0x240402
> head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff)
> page_type: f4(hugetlb)
> page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1)
> ------------[ cut here ]------------
> kernel BUG at ./include/linux/page-flags.h:310!
> Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374
> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : const_folio_flags+0x3c/0x58
> lr : const_folio_flags+0x3c/0x58
> Call trace:
> const_folio_flags+0x3c/0x58 (P)
> do_migrate_range+0x164/0x720
> offline_pages+0x63c/0x6fc
> memory_subsys_offline+0x190/0x1f4
> device_offline+0xc0/0x13c
> state_store+0x90/0xd8
> dev_attr_store+0x18/0x2c
> sysfs_kf_write+0x44/0x54
> kernfs_fop_write_iter+0x120/0x1cc
> vfs_write+0x240/0x378
> ksys_write+0x70/0x108
> __arm64_sys_write+0x1c/0x28
> invoke_syscall+0x48/0x10c
> el0_svc_common.constprop.0+0x40/0xe0
>
> When allocating a hugetlb folio, between the folio is taken from buddy
> and prep_compound_page() is called, start_isolate_page_range() and
> do_migrate_range() is called. When do_migrate_range() scans the head page
> of the hugetlb folio, the compound_head field isn't set, so scans the
> tail page next. And at this time, the compound_head field of tail page is
> set, folio_test_large() is called by tail page, thus triggers VM_BUG_ON().
>
> To fix it, get folio refcount before calling folio_test_large().
>
> Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
> ---
> mm/memory_hotplug.c | 12 +++---------
> 1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 16cf9e17077e..f600c26ce5de 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1813,21 +1813,15 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> page = pfn_to_page(pfn);
> folio = page_folio(page);
>
> - /*
> - * No reference or lock is held on the folio, so it might
> - * be modified concurrently (e.g. split). As such,
> - * folio_nr_pages() may read garbage. This is fine as the outer
> - * loop will revisit the split folio later.
> - */
> - if (folio_test_large(folio))
> - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
> -
> if (!folio_try_get(folio))
> continue;
>
> if (unlikely(page_folio(page) != folio))
> goto put_folio;
>
> + if (folio_test_large(folio))
> + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
Moving that will not make it able to skip the large frozen (refcount==0,
e.g., free hugetlb) folio in the continue/put_folio case above. Hmmmm ..
We could similarly to dumping folios, snapshot them, so we can read
stable data.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-03-24 13:44 ` David Hildenbrand
@ 2025-03-25 3:02 ` Jinjiang Tu
2025-03-25 13:18 ` Oscar Salvador
2025-03-25 19:05 ` David Hildenbrand
0 siblings, 2 replies; 13+ messages in thread
From: Jinjiang Tu @ 2025-03-25 3:02 UTC (permalink / raw)
To: David Hildenbrand, osalvador, akpm, nao.horiguchi, zi.yan
Cc: linux-mm, wangkefeng.wang, sunnanyong
在 2025/3/24 21:44, David Hildenbrand 写道:
> On 24.03.25 14:17, Jinjiang Tu wrote:
>> We triggered the below BUG:
>>
>> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2
>> pfn:0x240402
>> head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0
>> pincount:0
>> flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff)
>> page_type: f4(hugetlb)
>> page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1)
>> ------------[ cut here ]------------
>> kernel BUG at ./include/linux/page-flags.h:310!
>> Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>> Modules linked in:
>> CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374
>> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : const_folio_flags+0x3c/0x58
>> lr : const_folio_flags+0x3c/0x58
>> Call trace:
>> const_folio_flags+0x3c/0x58 (P)
>> do_migrate_range+0x164/0x720
>> offline_pages+0x63c/0x6fc
>> memory_subsys_offline+0x190/0x1f4
>> device_offline+0xc0/0x13c
>> state_store+0x90/0xd8
>> dev_attr_store+0x18/0x2c
>> sysfs_kf_write+0x44/0x54
>> kernfs_fop_write_iter+0x120/0x1cc
>> vfs_write+0x240/0x378
>> ksys_write+0x70/0x108
>> __arm64_sys_write+0x1c/0x28
>> invoke_syscall+0x48/0x10c
>> el0_svc_common.constprop.0+0x40/0xe0
>>
>> When allocating a hugetlb folio, between the folio is taken from buddy
>> and prep_compound_page() is called, start_isolate_page_range() and
>> do_migrate_range() is called. When do_migrate_range() scans the head
>> page
>> of the hugetlb folio, the compound_head field isn't set, so scans the
>> tail page next. And at this time, the compound_head field of tail
>> page is
>> set, folio_test_large() is called by tail page, thus triggers
>> VM_BUG_ON().
>>
>> To fix it, get folio refcount before calling folio_test_large().
>>
>> Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports
>> thp migration")
>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>> ---
>> mm/memory_hotplug.c | 12 +++---------
>> 1 file changed, 3 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 16cf9e17077e..f600c26ce5de 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1813,21 +1813,15 @@ static void do_migrate_range(unsigned long
>> start_pfn, unsigned long end_pfn)
>> page = pfn_to_page(pfn);
>> folio = page_folio(page);
>> - /*
>> - * No reference or lock is held on the folio, so it might
>> - * be modified concurrently (e.g. split). As such,
>> - * folio_nr_pages() may read garbage. This is fine as the
>> outer
>> - * loop will revisit the split folio later.
>> - */
>> - if (folio_test_large(folio))
>> - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>> -
>> if (!folio_try_get(folio))
>> continue;
>> if (unlikely(page_folio(page) != folio))
>> goto put_folio;
>> + if (folio_test_large(folio))
>> + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>
> Moving that will not make it able to skip the large frozen
> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio case
> above. Hmmmm ..
For free hugetlb, pfn is increased by 1 in each loop. This leads to skip
free hugetlb slower.
>
> We could similarly to dumping folios, snapshot them, so we can read
> stable data.
extract the code in __dump_page()? But snapshot may lead to
do_migrate_range() slower too.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-03-25 3:02 ` Jinjiang Tu
@ 2025-03-25 13:18 ` Oscar Salvador
2025-03-25 19:05 ` David Hildenbrand
1 sibling, 0 replies; 13+ messages in thread
From: Oscar Salvador @ 2025-03-25 13:18 UTC (permalink / raw)
To: Jinjiang Tu
Cc: David Hildenbrand, akpm, nao.horiguchi, zi.yan, linux-mm,
wangkefeng.wang, sunnanyong
On Tue, Mar 25, 2025 at 11:02:30AM +0800, Jinjiang Tu wrote:
> extract the code in __dump_page()? But snapshot may lead to
> do_migrate_range() slower too.
Yes, but offline is already a slow operationg anyway, so it might be
worth doing if we do not want to step into this "middle-operations".
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-03-25 3:02 ` Jinjiang Tu
2025-03-25 13:18 ` Oscar Salvador
@ 2025-03-25 19:05 ` David Hildenbrand
2025-03-26 2:40 ` Jinjiang Tu
2025-03-28 23:37 ` Andrew Morton
1 sibling, 2 replies; 13+ messages in thread
From: David Hildenbrand @ 2025-03-25 19:05 UTC (permalink / raw)
To: Jinjiang Tu, osalvador, akpm, nao.horiguchi, zi.yan
Cc: linux-mm, wangkefeng.wang, sunnanyong
On 25.03.25 04:02, Jinjiang Tu wrote:
>
> 在 2025/3/24 21:44, David Hildenbrand 写道:
>> On 24.03.25 14:17, Jinjiang Tu wrote:
>>> We triggered the below BUG:
>>>
>>> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2
>>> pfn:0x240402
>>> head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0
>>> pincount:0
>>> flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff)
>>> page_type: f4(hugetlb)
>>> page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1)
>>> ------------[ cut here ]------------
>>> kernel BUG at ./include/linux/page-flags.h:310!
>>> Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>> Modules linked in:
>>> CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374
>>> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> pc : const_folio_flags+0x3c/0x58
>>> lr : const_folio_flags+0x3c/0x58
>>> Call trace:
>>> const_folio_flags+0x3c/0x58 (P)
>>> do_migrate_range+0x164/0x720
>>> offline_pages+0x63c/0x6fc
>>> memory_subsys_offline+0x190/0x1f4
>>> device_offline+0xc0/0x13c
>>> state_store+0x90/0xd8
>>> dev_attr_store+0x18/0x2c
>>> sysfs_kf_write+0x44/0x54
>>> kernfs_fop_write_iter+0x120/0x1cc
>>> vfs_write+0x240/0x378
>>> ksys_write+0x70/0x108
>>> __arm64_sys_write+0x1c/0x28
>>> invoke_syscall+0x48/0x10c
>>> el0_svc_common.constprop.0+0x40/0xe0
>>>
>>> When allocating a hugetlb folio, between the folio is taken from buddy
>>> and prep_compound_page() is called, start_isolate_page_range() and
>>> do_migrate_range() is called. When do_migrate_range() scans the head
>>> page
>>> of the hugetlb folio, the compound_head field isn't set, so scans the
>>> tail page next. And at this time, the compound_head field of tail
>>> page is
>>> set, folio_test_large() is called by tail page, thus triggers
>>> VM_BUG_ON().
>>>
>>> To fix it, get folio refcount before calling folio_test_large().
>>>
>>> Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports
>>> thp migration")
>>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>>> ---
>>> mm/memory_hotplug.c | 12 +++---------
>>> 1 file changed, 3 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 16cf9e17077e..f600c26ce5de 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1813,21 +1813,15 @@ static void do_migrate_range(unsigned long
>>> start_pfn, unsigned long end_pfn)
>>> page = pfn_to_page(pfn);
>>> folio = page_folio(page);
>>> - /*
>>> - * No reference or lock is held on the folio, so it might
>>> - * be modified concurrently (e.g. split). As such,
>>> - * folio_nr_pages() may read garbage. This is fine as the
>>> outer
>>> - * loop will revisit the split folio later.
>>> - */
>>> - if (folio_test_large(folio))
>>> - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>> -
>>> if (!folio_try_get(folio))
>>> continue;
>>> if (unlikely(page_folio(page) != folio))
>>> goto put_folio;
>>> + if (folio_test_large(folio))
>>> + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>
>> Moving that will not make it able to skip the large frozen
>> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio case
>> above. Hmmmm ..
> For free hugetlb, pfn is increased by 1 in each loop. This leads to skip
> free hugetlb slower.
Yes. But now I realize that we have the same issue with free buddy pages
already (folio_try_get of each individual page :( ).
>>
>> We could similarly to dumping folios, snapshot them, so we can read
>> stable data.
> extract the code in __dump_page()? But snapshot may lead to
> do_migrate_range() slower too.
There is a patch series on the list to do that, but it might take a
while to clean that up. Ideally, we'd also jump over free buddy pages.
In the future we might have better ways to do that.
I don't consider this change here really important, but if all it
affects is free hugetlb folios, it's not really worth it to have this
code around.
Acked-by: David Hildenbrand <david@redhat.com>
But: I suspect 8135d8926c08 is not the introducing commit. Please re-verify.
We should not CC stable.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-03-25 19:05 ` David Hildenbrand
@ 2025-03-26 2:40 ` Jinjiang Tu
2025-03-26 12:53 ` David Hildenbrand
2025-03-28 23:37 ` Andrew Morton
1 sibling, 1 reply; 13+ messages in thread
From: Jinjiang Tu @ 2025-03-26 2:40 UTC (permalink / raw)
To: David Hildenbrand, osalvador, akpm, nao.horiguchi, zi.yan
Cc: linux-mm, wangkefeng.wang, sunnanyong
在 2025/3/26 3:05, David Hildenbrand 写道:
> On 25.03.25 04:02, Jinjiang Tu wrote:
>>
>> 在 2025/3/24 21:44, David Hildenbrand 写道:
>>> On 24.03.25 14:17, Jinjiang Tu wrote:
>>>> We triggered the below BUG:
>>>>
>>>> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2
>>>> pfn:0x240402
>>>> head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0
>>>> pincount:0
>>>> flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff)
>>>> page_type: f4(hugetlb)
>>>> page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1)
>>>> ------------[ cut here ]------------
>>>> kernel BUG at ./include/linux/page-flags.h:310!
>>>> Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>>> Modules linked in:
>>>> CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374
>>>> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>> pc : const_folio_flags+0x3c/0x58
>>>> lr : const_folio_flags+0x3c/0x58
>>>> Call trace:
>>>> const_folio_flags+0x3c/0x58 (P)
>>>> do_migrate_range+0x164/0x720
>>>> offline_pages+0x63c/0x6fc
>>>> memory_subsys_offline+0x190/0x1f4
>>>> device_offline+0xc0/0x13c
>>>> state_store+0x90/0xd8
>>>> dev_attr_store+0x18/0x2c
>>>> sysfs_kf_write+0x44/0x54
>>>> kernfs_fop_write_iter+0x120/0x1cc
>>>> vfs_write+0x240/0x378
>>>> ksys_write+0x70/0x108
>>>> __arm64_sys_write+0x1c/0x28
>>>> invoke_syscall+0x48/0x10c
>>>> el0_svc_common.constprop.0+0x40/0xe0
>>>>
>>>> When allocating a hugetlb folio, between the folio is taken from buddy
>>>> and prep_compound_page() is called, start_isolate_page_range() and
>>>> do_migrate_range() is called. When do_migrate_range() scans the head
>>>> page
>>>> of the hugetlb folio, the compound_head field isn't set, so scans the
>>>> tail page next. And at this time, the compound_head field of tail
>>>> page is
>>>> set, folio_test_large() is called by tail page, thus triggers
>>>> VM_BUG_ON().
>>>>
>>>> To fix it, get folio refcount before calling folio_test_large().
>>>>
>>>> Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports
>>>> thp migration")
>>>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>>>> ---
>>>> mm/memory_hotplug.c | 12 +++---------
>>>> 1 file changed, 3 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>> index 16cf9e17077e..f600c26ce5de 100644
>>>> --- a/mm/memory_hotplug.c
>>>> +++ b/mm/memory_hotplug.c
>>>> @@ -1813,21 +1813,15 @@ static void do_migrate_range(unsigned long
>>>> start_pfn, unsigned long end_pfn)
>>>> page = pfn_to_page(pfn);
>>>> folio = page_folio(page);
>>>> - /*
>>>> - * No reference or lock is held on the folio, so it might
>>>> - * be modified concurrently (e.g. split). As such,
>>>> - * folio_nr_pages() may read garbage. This is fine as the
>>>> outer
>>>> - * loop will revisit the split folio later.
>>>> - */
>>>> - if (folio_test_large(folio))
>>>> - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>>> -
>>>> if (!folio_try_get(folio))
>>>> continue;
>>>> if (unlikely(page_folio(page) != folio))
>>>> goto put_folio;
>>>> + if (folio_test_large(folio))
>>>> + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>>
>>> Moving that will not make it able to skip the large frozen
>>> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio case
>>> above. Hmmmm ..
>> For free hugetlb, pfn is increased by 1 in each loop. This leads to skip
>> free hugetlb slower.
>
> Yes. But now I realize that we have the same issue with free buddy
> pages already (folio_try_get of each individual page :( ).
>
>>>
>>> We could similarly to dumping folios, snapshot them, so we can read
>>> stable data.
>> extract the code in __dump_page()? But snapshot may lead to
>> do_migrate_range() slower too.
>
> There is a patch series on the list to do that, but it might take a
> while to clean that up. Ideally, we'd also jump over free buddy pages.
> In the future we might have better ways to do that.
>
> I don't consider this change here really important, but if all it
> affects is free hugetlb folios, it's not really worth it to have this
> code around.
>
> Acked-by: David Hildenbrand <david@redhat.com>
>
> But: I suspect 8135d8926c08 is not the introducing commit. Please
> re-verify.
commit 8135d8926c08 adds PageTransHuge() call, which may lead to VM_BUG_ON if the page is tail page.
Before it, for hugetlb, PageHuge()、compound_head() and compound_order() are called before getting
page ref, PageHuge()、compound_head() allows to pass a tail page, compound_order() will not trigger
trigger VM_BUG_ON for tail page, and only lead to reading garbage data, leading to fail offline, but
it will not trigger any VM_BUG_ON. So I think 8135d8926c08 is the introducing commit.
Thanks.
>
> We should not CC stable.
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-03-26 2:40 ` Jinjiang Tu
@ 2025-03-26 12:53 ` David Hildenbrand
2025-03-27 11:19 ` Jinjiang Tu
0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2025-03-26 12:53 UTC (permalink / raw)
To: Jinjiang Tu, osalvador, akpm, nao.horiguchi, zi.yan
Cc: linux-mm, wangkefeng.wang, sunnanyong
On 26.03.25 03:40, Jinjiang Tu wrote:
>
> 在 2025/3/26 3:05, David Hildenbrand 写道:
>> On 25.03.25 04:02, Jinjiang Tu wrote:
>>>
>>> 在 2025/3/24 21:44, David Hildenbrand 写道:
>>>> On 24.03.25 14:17, Jinjiang Tu wrote:
>>>>> We triggered the below BUG:
>>>>>
>>>>> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2
>>>>> pfn:0x240402
>>>>> head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0
>>>>> pincount:0
>>>>> flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff)
>>>>> page_type: f4(hugetlb)
>>>>> page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1)
>>>>> ------------[ cut here ]------------
>>>>> kernel BUG at ./include/linux/page-flags.h:310!
>>>>> Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>>>> Modules linked in:
>>>>> CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374
>>>>> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>> pc : const_folio_flags+0x3c/0x58
>>>>> lr : const_folio_flags+0x3c/0x58
>>>>> Call trace:
>>>>> const_folio_flags+0x3c/0x58 (P)
>>>>> do_migrate_range+0x164/0x720
>>>>> offline_pages+0x63c/0x6fc
>>>>> memory_subsys_offline+0x190/0x1f4
>>>>> device_offline+0xc0/0x13c
>>>>> state_store+0x90/0xd8
>>>>> dev_attr_store+0x18/0x2c
>>>>> sysfs_kf_write+0x44/0x54
>>>>> kernfs_fop_write_iter+0x120/0x1cc
>>>>> vfs_write+0x240/0x378
>>>>> ksys_write+0x70/0x108
>>>>> __arm64_sys_write+0x1c/0x28
>>>>> invoke_syscall+0x48/0x10c
>>>>> el0_svc_common.constprop.0+0x40/0xe0
>>>>>
>>>>> When allocating a hugetlb folio, between the folio is taken from buddy
>>>>> and prep_compound_page() is called, start_isolate_page_range() and
>>>>> do_migrate_range() is called. When do_migrate_range() scans the head
>>>>> page
>>>>> of the hugetlb folio, the compound_head field isn't set, so scans the
>>>>> tail page next. And at this time, the compound_head field of tail
>>>>> page is
>>>>> set, folio_test_large() is called by tail page, thus triggers
>>>>> VM_BUG_ON().
>>>>>
>>>>> To fix it, get folio refcount before calling folio_test_large().
>>>>>
>>>>> Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports
>>>>> thp migration")
>>>>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>>>>> ---
>>>>> mm/memory_hotplug.c | 12 +++---------
>>>>> 1 file changed, 3 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>>> index 16cf9e17077e..f600c26ce5de 100644
>>>>> --- a/mm/memory_hotplug.c
>>>>> +++ b/mm/memory_hotplug.c
>>>>> @@ -1813,21 +1813,15 @@ static void do_migrate_range(unsigned long
>>>>> start_pfn, unsigned long end_pfn)
>>>>> page = pfn_to_page(pfn);
>>>>> folio = page_folio(page);
>>>>> - /*
>>>>> - * No reference or lock is held on the folio, so it might
>>>>> - * be modified concurrently (e.g. split). As such,
>>>>> - * folio_nr_pages() may read garbage. This is fine as the
>>>>> outer
>>>>> - * loop will revisit the split folio later.
>>>>> - */
>>>>> - if (folio_test_large(folio))
>>>>> - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>>>> -
>>>>> if (!folio_try_get(folio))
>>>>> continue;
>>>>> if (unlikely(page_folio(page) != folio))
>>>>> goto put_folio;
>>>>> + if (folio_test_large(folio))
>>>>> + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>>>
>>>> Moving that will not make it able to skip the large frozen
>>>> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio case
>>>> above. Hmmmm ..
>>> For free hugetlb, pfn is increased by 1 in each loop. This leads to skip
>>> free hugetlb slower.
>>
>> Yes. But now I realize that we have the same issue with free buddy
>> pages already (folio_try_get of each individual page :( ).
>>
>>>>
>>>> We could similarly to dumping folios, snapshot them, so we can read
>>>> stable data.
>>> extract the code in __dump_page()? But snapshot may lead to
>>> do_migrate_range() slower too.
>>
>> There is a patch series on the list to do that, but it might take a
>> while to clean that up. Ideally, we'd also jump over free buddy pages.
>> In the future we might have better ways to do that.
>>
>> I don't consider this change here really important, but if all it
>> affects is free hugetlb folios, it's not really worth it to have this
>> code around.
>>
>> Acked-by: David Hildenbrand <david@redhat.com>
>>
>> But: I suspect 8135d8926c08 is not the introducing commit. Please
>> re-verify.
>
> commit 8135d8926c08 adds PageTransHuge() call, which may lead to VM_BUG_ON if the page is tail page.
Right; we check for PageHuge before that.
> Before it, for hugetlb, PageHuge()、compound_head() and compound_order() are called before getting
> page ref, PageHuge()、compound_head() allows to pass a tail page, compound_order() will not trigger
> trigger VM_BUG_ON for tail page, and only lead to reading garbage data, leading to fail offline, but
We will retry as documented, so offlining should not fail.
> it will not trigger any VM_BUG_ON. So I think 8135d8926c08 is the introducing commit.
Not for the hugetlb issue you describe I assume. That should be caused by
commit b62b51d2d1593e6590669e781768c3c5a7394eb5
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date: Tue Aug 27 19:47:24 2024 +0800
mm: memory_hotplug: remove head variable in do_migrate_range()
Patch series "mm: memory_hotplug: improve do_migrate_range()", v3.
So probably we should have both commits as Fixes (one for the hugetlb
path, one for the THP path)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-03-26 12:53 ` David Hildenbrand
@ 2025-03-27 11:19 ` Jinjiang Tu
0 siblings, 0 replies; 13+ messages in thread
From: Jinjiang Tu @ 2025-03-27 11:19 UTC (permalink / raw)
To: David Hildenbrand, osalvador, akpm, nao.horiguchi, zi.yan
Cc: linux-mm, wangkefeng.wang, sunnanyong
在 2025/3/26 20:53, David Hildenbrand 写道:
> On 26.03.25 03:40, Jinjiang Tu wrote:
>>
>> 在 2025/3/26 3:05, David Hildenbrand 写道:
>>> On 25.03.25 04:02, Jinjiang Tu wrote:
>>>>
>>>> 在 2025/3/24 21:44, David Hildenbrand 写道:
>>>>> On 24.03.25 14:17, Jinjiang Tu wrote:
>>>>>> We triggered the below BUG:
>>>>>>
>>>>>> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2
>>>>>> pfn:0x240402
>>>>>> head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0
>>>>>> pincount:0
>>>>>> flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff)
>>>>>> page_type: f4(hugetlb)
>>>>>> page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1)
>>>>>> ------------[ cut here ]------------
>>>>>> kernel BUG at ./include/linux/page-flags.h:310!
>>>>>> Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>>>>> Modules linked in:
>>>>>> CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty
>>>>>> #374
>>>>>> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>>>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>> pc : const_folio_flags+0x3c/0x58
>>>>>> lr : const_folio_flags+0x3c/0x58
>>>>>> Call trace:
>>>>>> const_folio_flags+0x3c/0x58 (P)
>>>>>> do_migrate_range+0x164/0x720
>>>>>> offline_pages+0x63c/0x6fc
>>>>>> memory_subsys_offline+0x190/0x1f4
>>>>>> device_offline+0xc0/0x13c
>>>>>> state_store+0x90/0xd8
>>>>>> dev_attr_store+0x18/0x2c
>>>>>> sysfs_kf_write+0x44/0x54
>>>>>> kernfs_fop_write_iter+0x120/0x1cc
>>>>>> vfs_write+0x240/0x378
>>>>>> ksys_write+0x70/0x108
>>>>>> __arm64_sys_write+0x1c/0x28
>>>>>> invoke_syscall+0x48/0x10c
>>>>>> el0_svc_common.constprop.0+0x40/0xe0
>>>>>>
>>>>>> When allocating a hugetlb folio, between the folio is taken from
>>>>>> buddy
>>>>>> and prep_compound_page() is called, start_isolate_page_range() and
>>>>>> do_migrate_range() is called. When do_migrate_range() scans the head
>>>>>> page
>>>>>> of the hugetlb folio, the compound_head field isn't set, so scans
>>>>>> the
>>>>>> tail page next. And at this time, the compound_head field of tail
>>>>>> page is
>>>>>> set, folio_test_large() is called by tail page, thus triggers
>>>>>> VM_BUG_ON().
>>>>>>
>>>>>> To fix it, get folio refcount before calling folio_test_large().
>>>>>>
>>>>>> Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports
>>>>>> thp migration")
>>>>>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>>>>>> ---
>>>>>> mm/memory_hotplug.c | 12 +++---------
>>>>>> 1 file changed, 3 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>>>> index 16cf9e17077e..f600c26ce5de 100644
>>>>>> --- a/mm/memory_hotplug.c
>>>>>> +++ b/mm/memory_hotplug.c
>>>>>> @@ -1813,21 +1813,15 @@ static void do_migrate_range(unsigned long
>>>>>> start_pfn, unsigned long end_pfn)
>>>>>> page = pfn_to_page(pfn);
>>>>>> folio = page_folio(page);
>>>>>> - /*
>>>>>> - * No reference or lock is held on the folio, so it might
>>>>>> - * be modified concurrently (e.g. split). As such,
>>>>>> - * folio_nr_pages() may read garbage. This is fine as the
>>>>>> outer
>>>>>> - * loop will revisit the split folio later.
>>>>>> - */
>>>>>> - if (folio_test_large(folio))
>>>>>> - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>>>>> -
>>>>>> if (!folio_try_get(folio))
>>>>>> continue;
>>>>>> if (unlikely(page_folio(page) != folio))
>>>>>> goto put_folio;
>>>>>> + if (folio_test_large(folio))
>>>>>> + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>>>>
>>>>> Moving that will not make it able to skip the large frozen
>>>>> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio
>>>>> case
>>>>> above. Hmmmm ..
>>>> For free hugetlb, pfn is increased by 1 in each loop. This leads to
>>>> skip
>>>> free hugetlb slower.
>>>
>>> Yes. But now I realize that we have the same issue with free buddy
>>> pages already (folio_try_get of each individual page :( ).
>>>
>>>>>
>>>>> We could similarly to dumping folios, snapshot them, so we can read
>>>>> stable data.
>>>> extract the code in __dump_page()? But snapshot may lead to
>>>> do_migrate_range() slower too.
>>>
>>> There is a patch series on the list to do that, but it might take a
>>> while to clean that up. Ideally, we'd also jump over free buddy pages.
>>> In the future we might have better ways to do that.
>>>
>>> I don't consider this change here really important, but if all it
>>> affects is free hugetlb folios, it's not really worth it to have this
>>> code around.
>>>
>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>
>>> But: I suspect 8135d8926c08 is not the introducing commit. Please
>>> re-verify.
>>
>> commit 8135d8926c08 adds PageTransHuge() call, which may lead to
>> VM_BUG_ON if the page is tail page.
>
> Right; we check for PageHuge before that.
>
>> Before it, for hugetlb, PageHuge()、compound_head() and
>> compound_order() are called before getting
>> page ref, PageHuge()、compound_head() allows to pass a tail page,
>> compound_order() will not trigger
>> trigger VM_BUG_ON for tail page, and only lead to reading garbage
>> data, leading to fail offline, but
>
> We will retry as documented, so offlining should not fail.
>
>> it will not trigger any VM_BUG_ON. So I think 8135d8926c08 is the
>> introducing commit.
>
> Not for the hugetlb issue you describe I assume. That should be caused by
>
> commit b62b51d2d1593e6590669e781768c3c5a7394eb5
> Author: Kefeng Wang <wangkefeng.wang@huawei.com>
> Date: Tue Aug 27 19:47:24 2024 +0800
>
> mm: memory_hotplug: remove head variable in do_migrate_range()
>
> Patch series "mm: memory_hotplug: improve do_migrate_range()", v3.
>
> So probably we should have both commits as Fixes (one for the hugetlb
> path, one for the THP path)
>
Yes, b62b51d2d159 ("mm: memory_hotplug: remove head variable in do_migrate_range()") introduced this for hugetlb.
Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-03-24 13:17 [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range Jinjiang Tu
2025-03-24 13:44 ` David Hildenbrand
@ 2025-03-28 13:23 ` Oscar Salvador
1 sibling, 0 replies; 13+ messages in thread
From: Oscar Salvador @ 2025-03-28 13:23 UTC (permalink / raw)
To: Jinjiang Tu
Cc: david, akpm, nao.horiguchi, zi.yan, linux-mm, wangkefeng.wang,
sunnanyong
On Mon, Mar 24, 2025 at 09:17:50PM +0800, Jinjiang Tu wrote:
> We triggered the below BUG:
>
...
> When allocating a hugetlb folio, between the folio is taken from buddy
> and prep_compound_page() is called, start_isolate_page_range() and
> do_migrate_range() is called. When do_migrate_range() scans the head page
> of the hugetlb folio, the compound_head field isn't set, so scans the
> tail page next. And at this time, the compound_head field of tail page is
> set, folio_test_large() is called by tail page, thus triggers VM_BUG_ON().
>
> To fix it, get folio refcount before calling folio_test_large().
>
> Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
As David mentions, I think too that this was introduced in
commit: b62b51d2d1593e65 ("mm: memory_hotplug: remove head variable in
do_migrate_range()")
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-03-25 19:05 ` David Hildenbrand
2025-03-26 2:40 ` Jinjiang Tu
@ 2025-03-28 23:37 ` Andrew Morton
2025-04-01 16:44 ` David Hildenbrand
1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2025-03-28 23:37 UTC (permalink / raw)
To: David Hildenbrand
Cc: Jinjiang Tu, osalvador, nao.horiguchi, zi.yan, linux-mm,
wangkefeng.wang, sunnanyong
On Tue, 25 Mar 2025 20:05:50 +0100 David Hildenbrand <david@redhat.com> wrote:
> On 25.03.25 04:02, Jinjiang Tu wrote:
> >
> >> Moving that will not make it able to skip the large frozen
> >> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio case
> >> above. Hmmmm ..
> > For free hugetlb, pfn is increased by 1 in each loop. This leads to skip
> > free hugetlb slower.
>
> Yes. But now I realize that we have the same issue with free buddy pages
> already (folio_try_get of each individual page :( ).
>
> >>
> >> We could similarly to dumping folios, snapshot them, so we can read
> >> stable data.
> > extract the code in __dump_page()? But snapshot may lead to
> > do_migrate_range() slower too.
>
> There is a patch series on the list to do that, but it might take a
> while to clean that up. Ideally, we'd also jump over free buddy pages.
> In the future we might have better ways to do that.
>
> I don't consider this change here really important, but if all it
> affects is free hugetlb folios, it's not really worth it to have this
> code around.
>
> Acked-by: David Hildenbrand <david@redhat.com>
>
> But: I suspect 8135d8926c08 is not the introducing commit. Please re-verify.
>
> We should not CC stable.
Why shouldn't we cc:stable? A userspace-triggerable BUG?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-03-28 23:37 ` Andrew Morton
@ 2025-04-01 16:44 ` David Hildenbrand
2025-04-07 1:59 ` Andrew Morton
0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2025-04-01 16:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Jinjiang Tu, osalvador, nao.horiguchi, zi.yan, linux-mm,
wangkefeng.wang, sunnanyong
On 29.03.25 00:37, Andrew Morton wrote:
> On Tue, 25 Mar 2025 20:05:50 +0100 David Hildenbrand <david@redhat.com> wrote:
>
>> On 25.03.25 04:02, Jinjiang Tu wrote:
>>>
>>>> Moving that will not make it able to skip the large frozen
>>>> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio case
>>>> above. Hmmmm ..
>>> For free hugetlb, pfn is increased by 1 in each loop. This leads to skip
>>> free hugetlb slower.
>>
>> Yes. But now I realize that we have the same issue with free buddy pages
>> already (folio_try_get of each individual page :( ).
>>
>>>>
>>>> We could similarly to dumping folios, snapshot them, so we can read
>>>> stable data.
>>> extract the code in __dump_page()? But snapshot may lead to
>>> do_migrate_range() slower too.
>>
>> There is a patch series on the list to do that, but it might take a
>> while to clean that up. Ideally, we'd also jump over free buddy pages.
>> In the future we might have better ways to do that.
>>
>> I don't consider this change here really important, but if all it
>> affects is free hugetlb folios, it's not really worth it to have this
>> code around.
>>
>> Acked-by: David Hildenbrand <david@redhat.com>
>>
>> But: I suspect 8135d8926c08 is not the introducing commit. Please re-verify.
>>
>> We should not CC stable.
>
Sorry for the late reply.
> Why shouldn't we cc:stable? A userspace-triggerable BUG?
My reasoning was that it's a VM_BUG_ON_PAGE, and nothing is actually
broken on !CONFIG_DEBUG_VM systems ... combined with hugetlb allocation
and memory offlining usually being rare and usually privileged operations.
It would be a different story if the VM_BUG_ON_PAGE revealed an actual
issue; but this is a well documented race.
One might argue that unplugging a DIMM could automatically trigger it
(rare operation ..), and as we discovered in the meantime, THPs are also
affected. But again, CONFIG_VM_DEBUG ... which even Fedora stopped setting.
So I don't think CC stable is warranted here, but I wouldn't object if
there are good reasons to do it: like assuming that there are
CONFIG_VM_DEBUG stable users out there that rely on these features.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-04-01 16:44 ` David Hildenbrand
@ 2025-04-07 1:59 ` Andrew Morton
2025-04-07 7:07 ` David Hildenbrand
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2025-04-07 1:59 UTC (permalink / raw)
To: David Hildenbrand
Cc: Jinjiang Tu, osalvador, nao.horiguchi, zi.yan, linux-mm,
wangkefeng.wang, sunnanyong
On Tue, 1 Apr 2025 18:44:06 +0200 David Hildenbrand <david@redhat.com> wrote:
> On 29.03.25 00:37, Andrew Morton wrote:
> > On Tue, 25 Mar 2025 20:05:50 +0100 David Hildenbrand <david@redhat.com> wrote:
> >
> >> On 25.03.25 04:02, Jinjiang Tu wrote:
> >>>
> >>>> Moving that will not make it able to skip the large frozen
> >>>> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio case
> >>>> above. Hmmmm ..
> >>> For free hugetlb, pfn is increased by 1 in each loop. This leads to skip
> >>> free hugetlb slower.
> >>
> >> Yes. But now I realize that we have the same issue with free buddy pages
> >> already (folio_try_get of each individual page :( ).
> >>
> >>>>
> >>>> We could similarly to dumping folios, snapshot them, so we can read
> >>>> stable data.
> >>> extract the code in __dump_page()? But snapshot may lead to
> >>> do_migrate_range() slower too.
> >>
> >> There is a patch series on the list to do that, but it might take a
> >> while to clean that up. Ideally, we'd also jump over free buddy pages.
> >> In the future we might have better ways to do that.
> >>
> >> I don't consider this change here really important, but if all it
> >> affects is free hugetlb folios, it's not really worth it to have this
> >> code around.
> >>
> >> Acked-by: David Hildenbrand <david@redhat.com>
> >>
> >> But: I suspect 8135d8926c08 is not the introducing commit. Please re-verify.
> >>
> >> We should not CC stable.
> >
>
> Sorry for the late reply.
>
> > Why shouldn't we cc:stable? A userspace-triggerable BUG?
>
> My reasoning was that it's a VM_BUG_ON_PAGE, and nothing is actually
> broken on !CONFIG_DEBUG_VM systems ... combined with hugetlb allocation
> and memory offlining usually being rare and usually privileged operations.
>
> It would be a different story if the VM_BUG_ON_PAGE revealed an actual
> issue; but this is a well documented race.
>
> One might argue that unplugging a DIMM could automatically trigger it
> (rare operation ..), and as we discovered in the meantime, THPs are also
> affected. But again, CONFIG_VM_DEBUG ... which even Fedora stopped setting.
>
> So I don't think CC stable is warranted here, but I wouldn't object if
> there are good reasons to do it: like assuming that there are
> CONFIG_VM_DEBUG stable users out there that rely on these features.
Oh. I thought that quite a few distros use CONFIG_DEBUG_VM.
I think I shall add the cc:stable - the patch is pretty simple and will
be well tested by the time it hits -stable trees and probably someone
will benefit from it.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
2025-04-07 1:59 ` Andrew Morton
@ 2025-04-07 7:07 ` David Hildenbrand
0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2025-04-07 7:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Jinjiang Tu, osalvador, nao.horiguchi, zi.yan, linux-mm,
wangkefeng.wang, sunnanyong
On 07.04.25 03:59, Andrew Morton wrote:
> On Tue, 1 Apr 2025 18:44:06 +0200 David Hildenbrand <david@redhat.com> wrote:
>
>> On 29.03.25 00:37, Andrew Morton wrote:
>>> On Tue, 25 Mar 2025 20:05:50 +0100 David Hildenbrand <david@redhat.com> wrote:
>>>
>>>> On 25.03.25 04:02, Jinjiang Tu wrote:
>>>>>
>>>>>> Moving that will not make it able to skip the large frozen
>>>>>> (refcount==0, e.g., free hugetlb) folio in the continue/put_folio case
>>>>>> above. Hmmmm ..
>>>>> For free hugetlb, pfn is increased by 1 in each loop. This leads to skip
>>>>> free hugetlb slower.
>>>>
>>>> Yes. But now I realize that we have the same issue with free buddy pages
>>>> already (folio_try_get of each individual page :( ).
>>>>
>>>>>>
>>>>>> We could similarly to dumping folios, snapshot them, so we can read
>>>>>> stable data.
>>>>> extract the code in __dump_page()? But snapshot may lead to
>>>>> do_migrate_range() slower too.
>>>>
>>>> There is a patch series on the list to do that, but it might take a
>>>> while to clean that up. Ideally, we'd also jump over free buddy pages.
>>>> In the future we might have better ways to do that.
>>>>
>>>> I don't consider this change here really important, but if all it
>>>> affects is free hugetlb folios, it's not really worth it to have this
>>>> code around.
>>>>
>>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>>
>>>> But: I suspect 8135d8926c08 is not the introducing commit. Please re-verify.
>>>>
>>>> We should not CC stable.
>>>
>>
>> Sorry for the late reply.
>>
>>> Why shouldn't we cc:stable? A userspace-triggerable BUG?
>>
>> My reasoning was that it's a VM_BUG_ON_PAGE, and nothing is actually
>> broken on !CONFIG_DEBUG_VM systems ... combined with hugetlb allocation
>> and memory offlining usually being rare and usually privileged operations.
>>
>> It would be a different story if the VM_BUG_ON_PAGE revealed an actual
>> issue; but this is a well documented race.
>>
>> One might argue that unplugging a DIMM could automatically trigger it
>> (rare operation ..), and as we discovered in the meantime, THPs are also
>> affected. But again, CONFIG_VM_DEBUG ... which even Fedora stopped setting.
>>
>> So I don't think CC stable is warranted here, but I wouldn't object if
>> there are good reasons to do it: like assuming that there are
>> CONFIG_VM_DEBUG stable users out there that rely on these features.
>
> Oh. I thought that quite a few distros use CONFIG_DEBUG_VM.
Fedora used to be the prime example people kept mentioning. But they
stopped using it for performance reasons IIRC.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-04-07 7:07 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-24 13:17 [PATCH] mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range Jinjiang Tu
2025-03-24 13:44 ` David Hildenbrand
2025-03-25 3:02 ` Jinjiang Tu
2025-03-25 13:18 ` Oscar Salvador
2025-03-25 19:05 ` David Hildenbrand
2025-03-26 2:40 ` Jinjiang Tu
2025-03-26 12:53 ` David Hildenbrand
2025-03-27 11:19 ` Jinjiang Tu
2025-03-28 23:37 ` Andrew Morton
2025-04-01 16:44 ` David Hildenbrand
2025-04-07 1:59 ` Andrew Morton
2025-04-07 7:07 ` David Hildenbrand
2025-03-28 13:23 ` Oscar Salvador
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).