* [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() @ 2026-04-30 4:11 Bibo Mao 2026-04-30 4:28 ` Lance Yang 0 siblings, 1 reply; 10+ messages in thread From: Bibo Mao @ 2026-04-30 4:11 UTC (permalink / raw) To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang Cc: linux-mm, linux-kernel when executing command "make check" with qemu software, there is error report like this: BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802 BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815 BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825 The problem is that when application exits, rss counter is calculated with huge_zero_pmd huge page, instead it should be skipped. Signed-off-by: Bibo Mao <maobibo@loongson.cn> --- mm/huge_memory.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 970e077019b7..3cbea344d4a2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma, { const bool is_device_private = folio_is_device_private(folio); + if (is_huge_zero_pmd(pmdval)) + return; + /* Present and device private folios are rmappable. */ if (is_present || is_device_private) folio_remove_rmap_pmd(folio, &folio->page, vma); base-commit: 3b3bea6d4b9c162f9e555905d96b8c1da67ecd5b -- 2.39.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() 2026-04-30 4:11 [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() Bibo Mao @ 2026-04-30 4:28 ` Lance Yang 2026-04-30 4:58 ` Lance Yang 2026-04-30 6:34 ` Bibo Mao 0 siblings, 2 replies; 10+ messages in thread From: Lance Yang @ 2026-04-30 4:28 UTC (permalink / raw) To: maobibo Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang, linux-mm, linux-kernel On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote: >when executing command "make check" with qemu software, there is >error report like this: > BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802 > BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815 > BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825 Good catch! >The problem is that when application exits, rss counter is calculated >with huge_zero_pmd huge page, instead it should be skipped. Looks like the same problem[1] we discussed recently. [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/ >Signed-off-by: Bibo Mao <maobibo@loongson.cn> >--- > mm/huge_memory.c | 3 +++ > 1 file changed, 3 insertions(+) > >diff --git a/mm/huge_memory.c b/mm/huge_memory.c >index 970e077019b7..3cbea344d4a2 100644 >--- a/mm/huge_memory.c >+++ b/mm/huge_memory.c >@@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma, > { > const bool is_device_private = folio_is_device_private(folio); > >+ if (is_huge_zero_pmd(pmdval)) >+ return; >+ The huge zero PMD should not be returned by vm_normal_page_pmd() or vm_normal_folio_pmd() as a normal folio. If it reaches zap_huge_pmd_folio(), we already made the wrong normal-vs-special decision ... So I don't think we should special-case it in zap_huge_pmd_folio(). That only avoids this RSS decrement :) Could you please check whether the fix[2] also fixes your QEMU test? [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/ Thanks, Lance ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() 2026-04-30 4:28 ` Lance Yang @ 2026-04-30 4:58 ` Lance Yang 2026-04-30 6:34 ` Bibo Mao 1 sibling, 0 replies; 10+ messages in thread From: Lance Yang @ 2026-04-30 4:58 UTC (permalink / raw) To: maobibo Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel On 2026/4/30 12:28, Lance Yang wrote: > > On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote: >> when executing command "make check" with qemu software, there is >> error report like this: >> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802 >> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815 >> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825 > > Good catch! > >> The problem is that when application exits, rss counter is calculated >> with huge_zero_pmd huge page, instead it should be skipped. > > Looks like the same problem[1] we discussed recently. > > [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/ > >> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >> --- >> mm/huge_memory.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 970e077019b7..3cbea344d4a2 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma, >> { >> const bool is_device_private = folio_is_device_private(folio); >> >> + if (is_huge_zero_pmd(pmdval)) >> + return; >> + > > The huge zero PMD should not be returned by vm_normal_page_pmd() or > vm_normal_folio_pmd() as a normal folio. If it reaches > zap_huge_pmd_folio(), we already made the wrong normal-vs-special > decision ... > > So I don't think we should special-case it in zap_huge_pmd_folio(). That > only avoids this RSS decrement :) > > Could you please check whether the fix[2] also fixes your QEMU test? In addition, like x86-32, 64-bit LoongArch selects ARCH_HAS_PTE_SPECIAL, but not ARCH_SUPPORTS_HUGE_PFNMAP. So CONFIG_ARCH_SUPPORTS_PMD_PFNMAP is not enabled, and pmd_special() falls back to the generic stub that always returns false. So I guess the fix should do the trick :) > [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/ > > Thanks, Lance ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() 2026-04-30 4:28 ` Lance Yang 2026-04-30 4:58 ` Lance Yang @ 2026-04-30 6:34 ` Bibo Mao 2026-04-30 7:02 ` Lance Yang 1 sibling, 1 reply; 10+ messages in thread From: Bibo Mao @ 2026-04-30 6:34 UTC (permalink / raw) To: Lance Yang Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel On 2026/4/30 下午12:28, Lance Yang wrote: > > On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote: >> when executing command "make check" with qemu software, there is >> error report like this: >> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802 >> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815 >> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825 > > Good catch! > >> The problem is that when application exits, rss counter is calculated >> with huge_zero_pmd huge page, instead it should be skipped. > > Looks like the same problem[1] we discussed recently. > > [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/ > >> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >> --- >> mm/huge_memory.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 970e077019b7..3cbea344d4a2 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma, >> { >> const bool is_device_private = folio_is_device_private(folio); >> >> + if (is_huge_zero_pmd(pmdval)) >> + return; >> + > > The huge zero PMD should not be returned by vm_normal_page_pmd() or > vm_normal_folio_pmd() as a normal folio. If it reaches > zap_huge_pmd_folio(), we already made the wrong normal-vs-special > decision ... > > So I don't think we should special-case it in zap_huge_pmd_folio(). That > only avoids this RSS decrement :) > > Could you please check whether the fix[2] also fixes your QEMU test? > > [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/ yes, I think it will solve this problem. Only that I think that there should be tlb flush operation after pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so tlb_remove_page_size() should be called. Is that right? Regards Bibo Mao > > Thanks, Lance > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() 2026-04-30 6:34 ` Bibo Mao @ 2026-04-30 7:02 ` Lance Yang 2026-04-30 7:05 ` Bibo Mao 2026-04-30 7:12 ` Lance Yang 0 siblings, 2 replies; 10+ messages in thread From: Lance Yang @ 2026-04-30 7:02 UTC (permalink / raw) To: maobibo Cc: lance.yang, akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote: > > >On 2026/4/30 下午12:28, Lance Yang wrote: >> >> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote: >>> when executing command "make check" with qemu software, there is >>> error report like this: >>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802 >>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815 >>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825 >> >> Good catch! >> >>> The problem is that when application exits, rss counter is calculated >>> with huge_zero_pmd huge page, instead it should be skipped. >> >> Looks like the same problem[1] we discussed recently. >> >> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/ >> >>> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >>> --- >>> mm/huge_memory.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index 970e077019b7..3cbea344d4a2 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma, >>> { >>> const bool is_device_private = folio_is_device_private(folio); >>> >>> + if (is_huge_zero_pmd(pmdval)) >>> + return; >>> + >> >> The huge zero PMD should not be returned by vm_normal_page_pmd() or >> vm_normal_folio_pmd() as a normal folio. If it reaches >> zap_huge_pmd_folio(), we already made the wrong normal-vs-special >> decision ... >> >> So I don't think we should special-case it in zap_huge_pmd_folio(). That >> only avoids this RSS decrement :) >> >> Could you please check whether the fix[2] also fixes your QEMU test? >> >> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/ >yes, I think it will solve this problem. > >Only that I think that there should be tlb flush operation after >pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so >tlb_remove_page_size() should be called. Is that right? Calling tlb_remove_page_size() is not necessary there :) zap_huge_pmd() already marks the PMD range for TLB invalidation right after clearing the entry: orig_pmd = pmdp_huge_get_and_clear_full(...); tlb_remove_pmd_tlb_entry(tlb, pmd, addr); The later tlb_remove_page_size() is guarded by "is_present && folio", and is for the normal folio case after normal_or_softleaf_folio_pmd() return one :) Please correct me if I missed something :D Cheers, Lance ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() 2026-04-30 7:02 ` Lance Yang @ 2026-04-30 7:05 ` Bibo Mao 2026-04-30 7:16 ` Lance Yang 2026-04-30 7:12 ` Lance Yang 1 sibling, 1 reply; 10+ messages in thread From: Bibo Mao @ 2026-04-30 7:05 UTC (permalink / raw) To: Lance Yang Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel On 2026/4/30 下午3:02, Lance Yang wrote: > > On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote: >> >> >> On 2026/4/30 下午12:28, Lance Yang wrote: >>> >>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote: >>>> when executing command "make check" with qemu software, there is >>>> error report like this: >>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802 >>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815 >>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825 >>> >>> Good catch! >>> >>>> The problem is that when application exits, rss counter is calculated >>>> with huge_zero_pmd huge page, instead it should be skipped. >>> >>> Looks like the same problem[1] we discussed recently. >>> >>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/ >>> >>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >>>> --- >>>> mm/huge_memory.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 970e077019b7..3cbea344d4a2 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma, >>>> { >>>> const bool is_device_private = folio_is_device_private(folio); >>>> >>>> + if (is_huge_zero_pmd(pmdval)) >>>> + return; >>>> + >>> >>> The huge zero PMD should not be returned by vm_normal_page_pmd() or >>> vm_normal_folio_pmd() as a normal folio. If it reaches >>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special >>> decision ... >>> >>> So I don't think we should special-case it in zap_huge_pmd_folio(). That >>> only avoids this RSS decrement :) >>> >>> Could you please check whether the fix[2] also fixes your QEMU test? >>> >>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/ >> yes, I think it will solve this problem. >> >> Only that I think that there should be tlb flush operation after >> pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so >> tlb_remove_page_size() should be called. Is that right? > > Calling tlb_remove_page_size() is not necessary there :) > > zap_huge_pmd() already marks the PMD range for TLB invalidation right > after clearing the entry: > > orig_pmd = pmdp_huge_get_and_clear_full(...); > tlb_remove_pmd_tlb_entry(tlb, pmd, addr); Yes, it is. I forget the tlb_flush_pmd_range() calling in tlb_remove_pmd_tlb_entry(). So the fix solves this problem. And thanks for your explanation. Regards Bibo Mao > > The later tlb_remove_page_size() is guarded by "is_present && folio", > and is for the normal folio case after normal_or_softleaf_folio_pmd() > return one :) > > Please correct me if I missed something :D > > Cheers, Lance > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() 2026-04-30 7:05 ` Bibo Mao @ 2026-04-30 7:16 ` Lance Yang 2026-04-30 8:09 ` Bibo Mao 0 siblings, 1 reply; 10+ messages in thread From: Lance Yang @ 2026-04-30 7:16 UTC (permalink / raw) To: Bibo Mao Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel On 2026/4/30 15:05, Bibo Mao wrote: > > > On 2026/4/30 下午3:02, Lance Yang wrote: >> >> On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote: >>> >>> >>> On 2026/4/30 下午12:28, Lance Yang wrote: >>>> >>>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote: >>>>> when executing command "make check" with qemu software, there is >>>>> error report like this: >>>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES >>>>> val:-4096 Comm:bios-tables-tes Pid:27802 >>>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES >>>>> val:-2048 Comm:worker Pid:27815 >>>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES >>>>> val:-2048 Comm:qom-test Pid:27825 >>>> >>>> Good catch! >>>> >>>>> The problem is that when application exits, rss counter is calculated >>>>> with huge_zero_pmd huge page, instead it should be skipped. >>>> >>>> Looks like the same problem[1] we discussed recently. >>>> >>>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99- >>>> d5521f39df2a@google.com/ >>>> >>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >>>>> --- >>>>> mm/huge_memory.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>> index 970e077019b7..3cbea344d4a2 100644 >>>>> --- a/mm/huge_memory.c >>>>> +++ b/mm/huge_memory.c >>>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct >>>>> mm_struct *mm, struct vm_area_struct *vma, >>>>> { >>>>> const bool is_device_private = folio_is_device_private(folio); >>>>> >>>>> + if (is_huge_zero_pmd(pmdval)) >>>>> + return; >>>>> + >>>> >>>> The huge zero PMD should not be returned by vm_normal_page_pmd() or >>>> vm_normal_folio_pmd() as a normal folio. If it reaches >>>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special >>>> decision ... >>>> >>>> So I don't think we should special-case it in zap_huge_pmd_folio(). >>>> That >>>> only avoids this RSS decrement :) >>>> >>>> Could you please check whether the fix[2] also fixes your QEMU test? >>>> >>>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334- >>>> ac7e-2758586393b2@kernel.org/ >>> yes, I think it will solve this problem. >>> >>> Only that I think that there should be tlb flush operation after >>> pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so >>> tlb_remove_page_size() should be called. Is that right? >> >> Calling tlb_remove_page_size() is not necessary there :) >> >> zap_huge_pmd() already marks the PMD range for TLB invalidation right >> after clearing the entry: >> >> orig_pmd = pmdp_huge_get_and_clear_full(...); >> tlb_remove_pmd_tlb_entry(tlb, pmd, addr); > Yes, it is. I forget the tlb_flush_pmd_range() calling in > tlb_remove_pmd_tlb_entry(). > > So the fix solves this problem. And thanks for your explanation. If possible, can you test the fix[1] with your QEMU workload and provide a Tested-by? That would be very helpful :D [1] https://lore.kernel.org/linux-mm/4d950326-6944-409b-b108-a4e67256857f@kernel.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() 2026-04-30 7:16 ` Lance Yang @ 2026-04-30 8:09 ` Bibo Mao 2026-04-30 8:15 ` Lance Yang 0 siblings, 1 reply; 10+ messages in thread From: Bibo Mao @ 2026-04-30 8:09 UTC (permalink / raw) To: Lance Yang Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel On 2026/4/30 下午3:16, Lance Yang wrote: > > > On 2026/4/30 15:05, Bibo Mao wrote: >> >> >> On 2026/4/30 下午3:02, Lance Yang wrote: >>> >>> On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote: >>>> >>>> >>>> On 2026/4/30 下午12:28, Lance Yang wrote: >>>>> >>>>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote: >>>>>> when executing command "make check" with qemu software, there is >>>>>> error report like this: >>>>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES >>>>>> val:-4096 Comm:bios-tables-tes Pid:27802 >>>>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES >>>>>> val:-2048 Comm:worker Pid:27815 >>>>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES >>>>>> val:-2048 Comm:qom-test Pid:27825 >>>>> >>>>> Good catch! >>>>> >>>>>> The problem is that when application exits, rss counter is calculated >>>>>> with huge_zero_pmd huge page, instead it should be skipped. >>>>> >>>>> Looks like the same problem[1] we discussed recently. >>>>> >>>>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99- >>>>> d5521f39df2a@google.com/ >>>>> >>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >>>>>> --- >>>>>> mm/huge_memory.c | 3 +++ >>>>>> 1 file changed, 3 insertions(+) >>>>>> >>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>>> index 970e077019b7..3cbea344d4a2 100644 >>>>>> --- a/mm/huge_memory.c >>>>>> +++ b/mm/huge_memory.c >>>>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct >>>>>> mm_struct *mm, struct vm_area_struct *vma, >>>>>> { >>>>>> const bool is_device_private = folio_is_device_private(folio); >>>>>> >>>>>> + if (is_huge_zero_pmd(pmdval)) >>>>>> + return; >>>>>> + >>>>> >>>>> The huge zero PMD should not be returned by vm_normal_page_pmd() or >>>>> vm_normal_folio_pmd() as a normal folio. If it reaches >>>>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special >>>>> decision ... >>>>> >>>>> So I don't think we should special-case it in zap_huge_pmd_folio(). >>>>> That >>>>> only avoids this RSS decrement :) >>>>> >>>>> Could you please check whether the fix[2] also fixes your QEMU test? >>>>> >>>>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334- >>>>> ac7e-2758586393b2@kernel.org/ >>>> yes, I think it will solve this problem. >>>> >>>> Only that I think that there should be tlb flush operation after >>>> pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so >>>> tlb_remove_page_size() should be called. Is that right? >>> >>> Calling tlb_remove_page_size() is not necessary there :) >>> >>> zap_huge_pmd() already marks the PMD range for TLB invalidation right >>> after clearing the entry: >>> >>> orig_pmd = pmdp_huge_get_and_clear_full(...); >>> tlb_remove_pmd_tlb_entry(tlb, pmd, addr); >> Yes, it is. I forget the tlb_flush_pmd_range() calling in >> tlb_remove_pmd_tlb_entry(). >> >> So the fix solves this problem. And thanks for your explanation. > > If possible, can you test the fix[1] with your QEMU workload and > provide a Tested-by? That would be very helpful :D yes, this patch solves the problem. I do not subscribe linux-mm@kvack.org mailing list, please feel free to add Tested-by: Bibo Mao <maobibo@loongson.cn> Regards Bibo Mao > > [1] > https://lore.kernel.org/linux-mm/4d950326-6944-409b-b108-a4e67256857f@kernel.org/ > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() 2026-04-30 8:09 ` Bibo Mao @ 2026-04-30 8:15 ` Lance Yang 0 siblings, 0 replies; 10+ messages in thread From: Lance Yang @ 2026-04-30 8:15 UTC (permalink / raw) To: Bibo Mao Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel On 2026/4/30 16:09, Bibo Mao wrote: > > > On 2026/4/30 下午3:16, Lance Yang wrote: >> >> >> On 2026/4/30 15:05, Bibo Mao wrote: >>> >>> >>> On 2026/4/30 下午3:02, Lance Yang wrote: >>>> >>>> On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote: >>>>> >>>>> >>>>> On 2026/4/30 下午12:28, Lance Yang wrote: >>>>>> >>>>>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote: >>>>>>> when executing command "make check" with qemu software, there is >>>>>>> error report like this: >>>>>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES >>>>>>> val:-4096 Comm:bios-tables-tes Pid:27802 >>>>>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES >>>>>>> val:-2048 Comm:worker Pid:27815 >>>>>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES >>>>>>> val:-2048 Comm:qom-test Pid:27825 >>>>>> >>>>>> Good catch! >>>>>> >>>>>>> The problem is that when application exits, rss counter is >>>>>>> calculated >>>>>>> with huge_zero_pmd huge page, instead it should be skipped. >>>>>> >>>>>> Looks like the same problem[1] we discussed recently. >>>>>> >>>>>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99- >>>>>> d5521f39df2a@google.com/ >>>>>> >>>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >>>>>>> --- >>>>>>> mm/huge_memory.c | 3 +++ >>>>>>> 1 file changed, 3 insertions(+) >>>>>>> >>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>>>> index 970e077019b7..3cbea344d4a2 100644 >>>>>>> --- a/mm/huge_memory.c >>>>>>> +++ b/mm/huge_memory.c >>>>>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct >>>>>>> mm_struct *mm, struct vm_area_struct *vma, >>>>>>> { >>>>>>> const bool is_device_private = folio_is_device_private(folio); >>>>>>> >>>>>>> + if (is_huge_zero_pmd(pmdval)) >>>>>>> + return; >>>>>>> + >>>>>> >>>>>> The huge zero PMD should not be returned by vm_normal_page_pmd() or >>>>>> vm_normal_folio_pmd() as a normal folio. If it reaches >>>>>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special >>>>>> decision ... >>>>>> >>>>>> So I don't think we should special-case it in >>>>>> zap_huge_pmd_folio(). That >>>>>> only avoids this RSS decrement :) >>>>>> >>>>>> Could you please check whether the fix[2] also fixes your QEMU test? >>>>>> >>>>>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334- >>>>>> ac7e-2758586393b2@kernel.org/ >>>>> yes, I think it will solve this problem. >>>>> >>>>> Only that I think that there should be tlb flush operation after >>>>> pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so >>>>> tlb_remove_page_size() should be called. Is that right? >>>> >>>> Calling tlb_remove_page_size() is not necessary there :) >>>> >>>> zap_huge_pmd() already marks the PMD range for TLB invalidation right >>>> after clearing the entry: >>>> >>>> orig_pmd = pmdp_huge_get_and_clear_full(...); >>>> tlb_remove_pmd_tlb_entry(tlb, pmd, addr); >>> Yes, it is. I forget the tlb_flush_pmd_range() calling in >>> tlb_remove_pmd_tlb_entry(). >>> >>> So the fix solves this problem. And thanks for your explanation. >> >> If possible, can you test the fix[1] with your QEMU workload and >> provide a Tested-by? That would be very helpful :D > yes, this patch solves the problem. I do not subscribe linux- > mm@kvack.org mailing list, please feel free to add > > Tested-by: Bibo Mao <maobibo@loongson.cn> Thanks for testing! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() 2026-04-30 7:02 ` Lance Yang 2026-04-30 7:05 ` Bibo Mao @ 2026-04-30 7:12 ` Lance Yang 1 sibling, 0 replies; 10+ messages in thread From: Lance Yang @ 2026-04-30 7:12 UTC (permalink / raw) To: maobibo Cc: akpm, david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel On 2026/4/30 15:02, Lance Yang wrote: > > On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote: >> >> >> On 2026/4/30 下午12:28, Lance Yang wrote: >>> >>> On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote: >>>> when executing command "make check" with qemu software, there is >>>> error report like this: >>>> BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802 >>>> BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815 >>>> BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825 >>> >>> Good catch! >>> >>>> The problem is that when application exits, rss counter is calculated >>>> with huge_zero_pmd huge page, instead it should be skipped. >>> >>> Looks like the same problem[1] we discussed recently. >>> >>> [1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99-d5521f39df2a@google.com/ >>> >>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >>>> --- >>>> mm/huge_memory.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 970e077019b7..3cbea344d4a2 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma, >>>> { >>>> const bool is_device_private = folio_is_device_private(folio); >>>> >>>> + if (is_huge_zero_pmd(pmdval)) >>>> + return; >>>> + >>> >>> The huge zero PMD should not be returned by vm_normal_page_pmd() or >>> vm_normal_folio_pmd() as a normal folio. If it reaches >>> zap_huge_pmd_folio(), we already made the wrong normal-vs-special >>> decision ... >>> >>> So I don't think we should special-case it in zap_huge_pmd_folio(). That >>> only avoids this RSS decrement :) >>> >>> Could you please check whether the fix[2] also fixes your QEMU test? >>> >>> [2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org/ >> yes, I think it will solve this problem. >> >> Only that I think that there should be tlb flush operation after >> pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so >> tlb_remove_page_size() should be called. Is that right? > > Calling tlb_remove_page_size() is not necessary there :) > > zap_huge_pmd() already marks the PMD range for TLB invalidation right > after clearing the entry: > > orig_pmd = pmdp_huge_get_and_clear_full(...); > tlb_remove_pmd_tlb_entry(tlb, pmd, addr); > > The later tlb_remove_page_size() is guarded by "is_present && folio", > and is for the normal folio case after normal_or_softleaf_folio_pmd() > return one :) Forgot to add: tlb_remove_page_size() queues the folio for freeing via mmu_gather. The shared huge zero folio only needs PMD TLB invalidation, not the delayed freeing :) > > Please correct me if I missed something :D > > Cheers, Lance ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-30 8:15 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-30 4:11 [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio() Bibo Mao 2026-04-30 4:28 ` Lance Yang 2026-04-30 4:58 ` Lance Yang 2026-04-30 6:34 ` Bibo Mao 2026-04-30 7:02 ` Lance Yang 2026-04-30 7:05 ` Bibo Mao 2026-04-30 7:16 ` Lance Yang 2026-04-30 8:09 ` Bibo Mao 2026-04-30 8:15 ` Lance Yang 2026-04-30 7:12 ` Lance Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox