* [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()
@ 2026-04-30 4:11 Bibo Mao
2026-04-30 4:28 ` Lance Yang
0 siblings, 1 reply; 10+ messages in thread
From: Bibo Mao @ 2026-04-30 4:11 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Zi Yan,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang
Cc: linux-mm, linux-kernel
when executing command "make check" with qemu software, there is
error report like this:
BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802
BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815
BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825
The problem is that when application exits, rss counter is calculated
with huge_zero_pmd huge page, instead it should be skipped.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
mm/huge_memory.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 970e077019b7..3cbea344d4a2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
{
const bool is_device_private = folio_is_device_private(folio);
+ if (is_huge_zero_pmd(pmdval))
+ return;
+
/* Present and device private folios are rmappable. */
if (is_present || is_device_private)
folio_remove_rmap_pmd(folio, &folio->page, vma);
base-commit: 3b3bea6d4b9c162f9e555905d96b8c1da67ecd5b
--
2.39.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()
2026-04-30 4:11 [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() Bibo Mao
@ 2026-04-30 4:28 ` Lance Yang
2026-04-30 4:58 ` Lance Yang
2026-04-30 6:34 ` Bibo Mao
0 siblings, 2 replies; 10+ messages in thread
From: Lance Yang @ 2026-04-30 4:28 UTC (permalink / raw)
To: maobibo
Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, lance.yang, linux-mm,
linux-kernel
On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote:
>when executing command "make check" with qemu software, there is
>error report like this:
> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802
> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815
> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825
Good catch!
>The problem is that when application exits, rss counter is calculated
>with huge_zero_pmd huge page, instead it should be skipped.
Looks like the same problem[1] we discussed recently.
[1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/
>Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>---
> mm/huge_memory.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index 970e077019b7..3cbea344d4a2 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
> {
> const bool is_device_private = folio_is_device_private(folio);
>
>+ if (is_huge_zero_pmd(pmdval))
>+ return;
>+
The huge zero PMD should not be returned by vm_normal_page_pmd() or
vm_normal_folio_pmd() as a normal folio. If it reaches
zap_huge_pmd_folio(), we already made the wrong normal-vs-special
decision ...
So I don't think we should special-case it in zap_huge_pmd_folio(). That
only avoids this RSS decrement :)
Could you please check whether the fix[2] also fixes your QEMU test?
[2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/
Thanks, Lance
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()
2026-04-30 4:28 ` Lance Yang
@ 2026-04-30 4:58 ` Lance Yang
2026-04-30 6:34 ` Bibo Mao
1 sibling, 0 replies; 10+ messages in thread
From: Lance Yang @ 2026-04-30 4:58 UTC (permalink / raw)
To: maobibo
Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel
On 2026/4/30 12:28, Lance Yang wrote:
>
> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote:
>> when executing command "make check" with qemu software, there is
>> error report like this:
>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802
>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815
>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825
>
> Good catch!
>
>> The problem is that when application exits, rss counter is calculated
>> with huge_zero_pmd huge page, instead it should be skipped.
>
> Looks like the same problem[1] we discussed recently.
>
> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/
>
>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>> ---
>> mm/huge_memory.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 970e077019b7..3cbea344d4a2 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
>> {
>> const bool is_device_private = folio_is_device_private(folio);
>>
>> + if (is_huge_zero_pmd(pmdval))
>> + return;
>> +
>
> The huge zero PMD should not be returned by vm_normal_page_pmd() or
> vm_normal_folio_pmd() as a normal folio. If it reaches
> zap_huge_pmd_folio(), we already made the wrong normal-vs-special
> decision ...
>
> So I don't think we should special-case it in zap_huge_pmd_folio(). That
> only avoids this RSS decrement :)
>
> Could you please check whether the fix[2] also fixes your QEMU test?
In addition, like x86-32, 64-bit LoongArch selects ARCH_HAS_PTE_SPECIAL,
but not ARCH_SUPPORTS_HUGE_PFNMAP. So CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
is not enabled, and pmd_special() falls back to the generic stub that
always returns false.
So I guess the fix should do the trick :)
> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/
>
> Thanks, Lance
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()
2026-04-30 4:28 ` Lance Yang
2026-04-30 4:58 ` Lance Yang
@ 2026-04-30 6:34 ` Bibo Mao
2026-04-30 7:02 ` Lance Yang
1 sibling, 1 reply; 10+ messages in thread
From: Bibo Mao @ 2026-04-30 6:34 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel
On 2026/4/30 下午12:28, Lance Yang wrote:
>
> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote:
>> when executing command "make check" with qemu software, there is
>> error report like this:
>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802
>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815
>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825
>
> Good catch!
>
>> The problem is that when application exits, rss counter is calculated
>> with huge_zero_pmd huge page, instead it should be skipped.
>
> Looks like the same problem[1] we discussed recently.
>
> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/
>
>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>> ---
>> mm/huge_memory.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 970e077019b7..3cbea344d4a2 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
>> {
>> const bool is_device_private = folio_is_device_private(folio);
>>
>> + if (is_huge_zero_pmd(pmdval))
>> + return;
>> +
>
> The huge zero PMD should not be returned by vm_normal_page_pmd() or
> vm_normal_folio_pmd() as a normal folio. If it reaches
> zap_huge_pmd_folio(), we already made the wrong normal-vs-special
> decision ...
>
> So I don't think we should special-case it in zap_huge_pmd_folio(). That
> only avoids this RSS decrement :)
>
> Could you please check whether the fix[2] also fixes your QEMU test?
>
> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/
yes, I think it will solve this problem.
Only that I think that there should be tlb flush operation after
pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so
tlb_remove_page_size() should be called. Is that right?
Regards
Bibo Mao
>
> Thanks, Lance
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()
2026-04-30 6:34 ` Bibo Mao
@ 2026-04-30 7:02 ` Lance Yang
2026-04-30 7:05 ` Bibo Mao
2026-04-30 7:12 ` Lance Yang
0 siblings, 2 replies; 10+ messages in thread
From: Lance Yang @ 2026-04-30 7:02 UTC (permalink / raw)
To: maobibo
Cc: lance.yang, akpm, david, ljs, ziy, baolin.wang, Liam.Howlett,
npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel
On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote:
>
>
>On 2026/4/30 下午12:28, Lance Yang wrote:
>>
>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote:
>>> when executing command "make check" with qemu software, there is
>>> error report like this:
>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802
>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815
>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825
>>
>> Good catch!
>>
>>> The problem is that when application exits, rss counter is calculated
>>> with huge_zero_pmd huge page, instead it should be skipped.
>>
>> Looks like the same problem[1] we discussed recently.
>>
>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/
>>
>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>> ---
>>> mm/huge_memory.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 970e077019b7..3cbea344d4a2 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
>>> {
>>> const bool is_device_private = folio_is_device_private(folio);
>>>
>>> + if (is_huge_zero_pmd(pmdval))
>>> + return;
>>> +
>>
>> The huge zero PMD should not be returned by vm_normal_page_pmd() or
>> vm_normal_folio_pmd() as a normal folio. If it reaches
>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special
>> decision ...
>>
>> So I don't think we should special-case it in zap_huge_pmd_folio(). That
>> only avoids this RSS decrement :)
>>
>> Could you please check whether the fix[2] also fixes your QEMU test?
>>
>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/
>yes, I think it will solve this problem.
>
>Only that I think that there should be tlb flush operation after
>pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so
>tlb_remove_page_size() should be called. Is that right?
Calling tlb_remove_page_size() is not necessary there :)
zap_huge_pmd() already marks the PMD range for TLB invalidation right
after clearing the entry:
orig_pmd = pmdp_huge_get_and_clear_full(...);
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
The later tlb_remove_page_size() is guarded by "is_present && folio",
and is for the normal folio case after normal_or_softleaf_folio_pmd()
return one :)
Please correct me if I missed something :D
Cheers, Lance
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()
2026-04-30 7:02 ` Lance Yang
@ 2026-04-30 7:05 ` Bibo Mao
2026-04-30 7:16 ` Lance Yang
2026-04-30 7:12 ` Lance Yang
1 sibling, 1 reply; 10+ messages in thread
From: Bibo Mao @ 2026-04-30 7:05 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel
On 2026/4/30 下午3:02, Lance Yang wrote:
>
> On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote:
>>
>>
>> On 2026/4/30 下午12:28, Lance Yang wrote:
>>>
>>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote:
>>>> when executing command "make check" with qemu software, there is
>>>> error report like this:
>>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802
>>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815
>>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825
>>>
>>> Good catch!
>>>
>>>> The problem is that when application exits, rss counter is calculated
>>>> with huge_zero_pmd huge page, instead it should be skipped.
>>>
>>> Looks like the same problem[1] we discussed recently.
>>>
>>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/
>>>
>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>> ---
>>>> mm/huge_memory.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 970e077019b7..3cbea344d4a2 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
>>>> {
>>>> const bool is_device_private = folio_is_device_private(folio);
>>>>
>>>> + if (is_huge_zero_pmd(pmdval))
>>>> + return;
>>>> +
>>>
>>> The huge zero PMD should not be returned by vm_normal_page_pmd() or
>>> vm_normal_folio_pmd() as a normal folio. If it reaches
>>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special
>>> decision ...
>>>
>>> So I don't think we should special-case it in zap_huge_pmd_folio(). That
>>> only avoids this RSS decrement :)
>>>
>>> Could you please check whether the fix[2] also fixes your QEMU test?
>>>
>>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/
>> yes, I think it will solve this problem.
>>
>> Only that I think that there should be tlb flush operation after
>> pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so
>> tlb_remove_page_size() should be called. Is that right?
>
> Calling tlb_remove_page_size() is not necessary there :)
>
> zap_huge_pmd() already marks the PMD range for TLB invalidation right
> after clearing the entry:
>
> orig_pmd = pmdp_huge_get_and_clear_full(...);
> tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
Yes, it is. I forget the tlb_flush_pmd_range() calling in
tlb_remove_pmd_tlb_entry().
So the fix solves this problem. And thanks for your explanation.
Regards
Bibo Mao
>
> The later tlb_remove_page_size() is guarded by "is_present && folio",
> and is for the normal folio case after normal_or_softleaf_folio_pmd()
> return one :)
>
> Please correct me if I missed something :D
>
> Cheers, Lance
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()
2026-04-30 7:02 ` Lance Yang
2026-04-30 7:05 ` Bibo Mao
@ 2026-04-30 7:12 ` Lance Yang
1 sibling, 0 replies; 10+ messages in thread
From: Lance Yang @ 2026-04-30 7:12 UTC (permalink / raw)
To: maobibo
Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel
On 2026/4/30 15:02, Lance Yang wrote:
>
> On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote:
>>
>>
>> On 2026/4/30 下午12:28, Lance Yang wrote:
>>>
>>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote:
>>>> when executing command "make check" with qemu software, there is
>>>> error report like this:
>>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802
>>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815
>>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825
>>>
>>> Good catch!
>>>
>>>> The problem is that when application exits, rss counter is calculated
>>>> with huge_zero_pmd huge page, instead it should be skipped.
>>>
>>> Looks like the same problem[1] we discussed recently.
>>>
>>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/
>>>
>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>> ---
>>>> mm/huge_memory.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 970e077019b7..3cbea344d4a2 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
>>>> {
>>>> const bool is_device_private = folio_is_device_private(folio);
>>>>
>>>> + if (is_huge_zero_pmd(pmdval))
>>>> + return;
>>>> +
>>>
>>> The huge zero PMD should not be returned by vm_normal_page_pmd() or
>>> vm_normal_folio_pmd() as a normal folio. If it reaches
>>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special
>>> decision ...
>>>
>>> So I don't think we should special-case it in zap_huge_pmd_folio(). That
>>> only avoids this RSS decrement :)
>>>
>>> Could you please check whether the fix[2] also fixes your QEMU test?
>>>
>>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/
>> yes, I think it will solve this problem.
>>
>> Only that I think that there should be tlb flush operation after
>> pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so
>> tlb_remove_page_size() should be called. Is that right?
>
> Calling tlb_remove_page_size() is not necessary there :)
>
> zap_huge_pmd() already marks the PMD range for TLB invalidation right
> after clearing the entry:
>
> orig_pmd = pmdp_huge_get_and_clear_full(...);
> tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
>
> The later tlb_remove_page_size() is guarded by "is_present && folio",
> and is for the normal folio case after normal_or_softleaf_folio_pmd()
> return one :)
Forgot to add:
tlb_remove_page_size() queues the folio for freeing via mmu_gather.
The shared huge zero folio only needs PMD TLB invalidation, not the
delayed freeing :)
>
> Please correct me if I missed something :D
>
> Cheers, Lance
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()
2026-04-30 7:05 ` Bibo Mao
@ 2026-04-30 7:16 ` Lance Yang
2026-04-30 8:09 ` Bibo Mao
0 siblings, 1 reply; 10+ messages in thread
From: Lance Yang @ 2026-04-30 7:16 UTC (permalink / raw)
To: Bibo Mao
Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel
On 2026/4/30 15:05, Bibo Mao wrote:
>
>
> On 2026/4/30 下午3:02, Lance Yang wrote:
>>
>> On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote:
>>>
>>>
>>> On 2026/4/30 下午12:28, Lance Yang wrote:
>>>>
>>>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote:
>>>>> when executing command "make check" with qemu software, there is
>>>>> error report like this:
>>>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES
>>>>> val:-4096 Comm:bios-tables-tes Pid:27802
>>>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES
>>>>> val:-2048 Comm:worker Pid:27815
>>>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES
>>>>> val:-2048 Comm:qom-test Pid:27825
>>>>
>>>> Good catch!
>>>>
>>>>> The problem is that when application exits, rss counter is calculated
>>>>> with huge_zero_pmd huge page, instead it should be skipped.
>>>>
>>>> Looks like the same problem[1] we discussed recently.
>>>>
>>>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-
>>>> d5521f39df2a@google.com/
>>>>
>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>>> ---
>>>>> mm/huge_memory.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 970e077019b7..3cbea344d4a2 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct
>>>>> mm_struct *mm, struct vm_area_struct *vma,
>>>>> {
>>>>> const bool is_device_private = folio_is_device_private(folio);
>>>>>
>>>>> + if (is_huge_zero_pmd(pmdval))
>>>>> + return;
>>>>> +
>>>>
>>>> The huge zero PMD should not be returned by vm_normal_page_pmd() or
>>>> vm_normal_folio_pmd() as a normal folio. If it reaches
>>>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special
>>>> decision ...
>>>>
>>>> So I don't think we should special-case it in zap_huge_pmd_folio().
>>>> That
>>>> only avoids this RSS decrement :)
>>>>
>>>> Could you please check whether the fix[2] also fixes your QEMU test?
>>>>
>>>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-
>>>> ac7e-2758586393b2@kernel.org/
>>> yes, I think it will solve this problem.
>>>
>>> Only that I think that there should be tlb flush operation after
>>> pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so
>>> tlb_remove_page_size() should be called. Is that right?
>>
>> Calling tlb_remove_page_size() is not necessary there :)
>>
>> zap_huge_pmd() already marks the PMD range for TLB invalidation right
>> after clearing the entry:
>>
>> orig_pmd = pmdp_huge_get_and_clear_full(...);
>> tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
> Yes, it is. I forget the tlb_flush_pmd_range() calling in
> tlb_remove_pmd_tlb_entry().
>
> So the fix solves this problem. And thanks for your explanation.
If possible, can you test the fix[1] with your QEMU workload and
provide a Tested-by? That would be very helpful :D
[1]
https://lore.kernel.org/linux-mm/4d950326-6944-409b-b108-a4e67256857f@kernel.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()
2026-04-30 7:16 ` Lance Yang
@ 2026-04-30 8:09 ` Bibo Mao
2026-04-30 8:15 ` Lance Yang
0 siblings, 1 reply; 10+ messages in thread
From: Bibo Mao @ 2026-04-30 8:09 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel
On 2026/4/30 下午3:16, Lance Yang wrote:
>
>
> On 2026/4/30 15:05, Bibo Mao wrote:
>>
>>
>> On 2026/4/30 下午3:02, Lance Yang wrote:
>>>
>>> On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote:
>>>>
>>>>
>>>> On 2026/4/30 下午12:28, Lance Yang wrote:
>>>>>
>>>>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote:
>>>>>> when executing command "make check" with qemu software, there is
>>>>>> error report like this:
>>>>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES
>>>>>> val:-4096 Comm:bios-tables-tes Pid:27802
>>>>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES
>>>>>> val:-2048 Comm:worker Pid:27815
>>>>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES
>>>>>> val:-2048 Comm:qom-test Pid:27825
>>>>>
>>>>> Good catch!
>>>>>
>>>>>> The problem is that when application exits, rss counter is calculated
>>>>>> with huge_zero_pmd huge page, instead it should be skipped.
>>>>>
>>>>> Looks like the same problem[1] we discussed recently.
>>>>>
>>>>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-
>>>>> d5521f39df2a@google.com/
>>>>>
>>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>>>> ---
>>>>>> mm/huge_memory.c | 3 +++
>>>>>> 1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index 970e077019b7..3cbea344d4a2 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct
>>>>>> mm_struct *mm, struct vm_area_struct *vma,
>>>>>> {
>>>>>> const bool is_device_private = folio_is_device_private(folio);
>>>>>>
>>>>>> + if (is_huge_zero_pmd(pmdval))
>>>>>> + return;
>>>>>> +
>>>>>
>>>>> The huge zero PMD should not be returned by vm_normal_page_pmd() or
>>>>> vm_normal_folio_pmd() as a normal folio. If it reaches
>>>>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special
>>>>> decision ...
>>>>>
>>>>> So I don't think we should special-case it in zap_huge_pmd_folio().
>>>>> That
>>>>> only avoids this RSS decrement :)
>>>>>
>>>>> Could you please check whether the fix[2] also fixes your QEMU test?
>>>>>
>>>>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-
>>>>> ac7e-2758586393b2@kernel.org/
>>>> yes, I think it will solve this problem.
>>>>
>>>> Only that I think that there should be tlb flush operation after
>>>> pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so
>>>> tlb_remove_page_size() should be called. Is that right?
>>>
>>> Calling tlb_remove_page_size() is not necessary there :)
>>>
>>> zap_huge_pmd() already marks the PMD range for TLB invalidation right
>>> after clearing the entry:
>>>
>>> orig_pmd = pmdp_huge_get_and_clear_full(...);
>>> tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
>> Yes, it is. I forget the tlb_flush_pmd_range() calling in
>> tlb_remove_pmd_tlb_entry().
>>
>> So the fix solves this problem. And thanks for your explanation.
>
> If possible, can you test the fix[1] with your QEMU workload and
> provide a Tested-by? That would be very helpful :D
yes, this patch solves the problem. I do not subscribe
linux-mm@kvack.org mailing list, please feel free to add
Tested-by: Bibo Mao <maobibo@loongson.cn>
Regards
Bibo Mao
>
> [1]
> https://lore.kernel.org/linux-mm/4d950326-6944-409b-b108-a4e67256857f@kernel.org/
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()
2026-04-30 8:09 ` Bibo Mao
@ 2026-04-30 8:15 ` Lance Yang
0 siblings, 0 replies; 10+ messages in thread
From: Lance Yang @ 2026-04-30 8:15 UTC (permalink / raw)
To: Bibo Mao
Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel
On 2026/4/30 16:09, Bibo Mao wrote:
>
>
> On 2026/4/30 下午3:16, Lance Yang wrote:
>>
>>
>> On 2026/4/30 15:05, Bibo Mao wrote:
>>>
>>>
>>> On 2026/4/30 下午3:02, Lance Yang wrote:
>>>>
>>>> On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote:
>>>>>
>>>>>
>>>>> On 2026/4/30 下午12:28, Lance Yang wrote:
>>>>>>
>>>>>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote:
>>>>>>> when executing command "make check" with qemu software, there is
>>>>>>> error report like this:
>>>>>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES
>>>>>>> val:-4096 Comm:bios-tables-tes Pid:27802
>>>>>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES
>>>>>>> val:-2048 Comm:worker Pid:27815
>>>>>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES
>>>>>>> val:-2048 Comm:qom-test Pid:27825
>>>>>>
>>>>>> Good catch!
>>>>>>
>>>>>>> The problem is that when application exits, rss counter is
>>>>>>> calculated
>>>>>>> with huge_zero_pmd huge page, instead it should be skipped.
>>>>>>
>>>>>> Looks like the same problem[1] we discussed recently.
>>>>>>
>>>>>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-
>>>>>> d5521f39df2a@google.com/
>>>>>>
>>>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>>>>> ---
>>>>>>> mm/huge_memory.c | 3 +++
>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>>> index 970e077019b7..3cbea344d4a2 100644
>>>>>>> --- a/mm/huge_memory.c
>>>>>>> +++ b/mm/huge_memory.c
>>>>>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct
>>>>>>> mm_struct *mm, struct vm_area_struct *vma,
>>>>>>> {
>>>>>>> const bool is_device_private = folio_is_device_private(folio);
>>>>>>>
>>>>>>> + if (is_huge_zero_pmd(pmdval))
>>>>>>> + return;
>>>>>>> +
>>>>>>
>>>>>> The huge zero PMD should not be returned by vm_normal_page_pmd() or
>>>>>> vm_normal_folio_pmd() as a normal folio. If it reaches
>>>>>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special
>>>>>> decision ...
>>>>>>
>>>>>> So I don't think we should special-case it in
>>>>>> zap_huge_pmd_folio(). That
>>>>>> only avoids this RSS decrement :)
>>>>>>
>>>>>> Could you please check whether the fix[2] also fixes your QEMU test?
>>>>>>
>>>>>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-
>>>>>> ac7e-2758586393b2@kernel.org/
>>>>> yes, I think it will solve this problem.
>>>>>
>>>>> Only that I think that there should be tlb flush operation after
>>>>> pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so
>>>>> tlb_remove_page_size() should be called. Is that right?
>>>>
>>>> Calling tlb_remove_page_size() is not necessary there :)
>>>>
>>>> zap_huge_pmd() already marks the PMD range for TLB invalidation right
>>>> after clearing the entry:
>>>>
>>>> orig_pmd = pmdp_huge_get_and_clear_full(...);
>>>> tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
>>> Yes, it is. I forget the tlb_flush_pmd_range() calling in
>>> tlb_remove_pmd_tlb_entry().
>>>
>>> So the fix solves this problem. And thanks for your explanation.
>>
>> If possible, can you test the fix[1] with your QEMU workload and
>> provide a Tested-by? That would be very helpful :D
> yes, this patch solves the problem. I do not subscribe linux-
> mm@kvack.org mailing list, please feel free to add
>
> Tested-by: Bibo Mao <maobibo@loongson.cn>
Thanks for testing!
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-30 8:15 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 4:11 [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() Bibo Mao
2026-04-30 4:28 ` Lance Yang
2026-04-30 4:58 ` Lance Yang
2026-04-30 6:34 ` Bibo Mao
2026-04-30 7:02 ` Lance Yang
2026-04-30 7:05 ` Bibo Mao
2026-04-30 7:16 ` Lance Yang
2026-04-30 8:09 ` Bibo Mao
2026-04-30 8:15 ` Lance Yang
2026-04-30 7:12 ` Lance Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox