From: Hugh Dickins <hughd@google.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>,
hughd@google.com, akpm@linux-foundation.org,
baolin.wang@linux.alibaba.com, baohua@kernel.org,
dev.jain@arm.com, liam.howlett@oracle.com, ljs@kernel.org,
mhocko@suse.com, rppt@kernel.org, npache@redhat.com,
zhengqi.arch@bytedance.com, ryan.roberts@arm.com,
surenb@google.com, ziy@nvidia.com, linux-mm@kvack.org
Subject: Re: [PATCH hotfix] mm: fix pmd_special() fallback to observe huge_zero
Date: Thu, 30 Apr 2026 01:48:05 -0700 (PDT) [thread overview]
Message-ID: <e92e45df-0821-1d67-e026-88c57960814f@google.com> (raw)
In-Reply-To: <4d950326-6944-409b-b108-a4e67256857f@kernel.org>
On Thu, 30 Apr 2026, David Hildenbrand (Arm) wrote:
>
> Okay, I'd say we do the following:
>
> From fd9ead548f102f7c257980ccc7b96cce7e42a570 Mon Sep 17 00:00:00 2001
> From: Hugh Dickins <hughd@google.com>
> Date: Thu, 30 Apr 2026 07:35:31 +0200
> Subject: [PATCH] mm: fix pmd_special() fallback to observe huge_zero
No to those lines, this patch of yours is quite different.
>
> On x86 32-bit with THP enabled, zap_huge_pmd() is seen to generate a
> "WARNING: mm/memory.c:735 at __vm_normal_page+0x6a/0x7d", from the
> VM_WARN_ON_ONCE(is_zero_pfn(pfn) || is_huge_zero_pfn(pfn)); followed
> by "BUG: Bad rss-counter state"s, then later "BUG: Bad page state"s
> when reclaim gets to call shrink_huge_zero_folio_scan().
>
> It's as if the _PAGE_SPECIAL bit never got set in the huge_zero pmd:
> and indeed, whereas pte_special() and pte_mkspecial() are subject to a
> dedicated CONFIG_ARCH_HAS_PTE_SPECIAL, pmd_special() and pmd_mkspecial()
> are subject to CONFIG_ARCH_SUPPORTS_PMD_PFNMAP, which is never enabled
> on any 32-bit architecture.
>
> While the problem was exposed through d80a9cb1a64a ("mm/huge_memory: add
> and use normal_or_softleaf_folio_pmd()"), it was an oversight in
> af38538801c6 ("mm/memory: factor out common code from vm_normal_page_*()")
> and would result in other problems:
> * huge zero folio accounted in smaps, pagemap (PAGE_IS_FILE) and
> numamaps as file-backed THP
> * folio_walk_start() returning the folio even without FW_ZEROPAGE set.
> Callers seem to tolerate that, though.
Yes, I hadn't thought to check other uses when I posted yesterday;
but later took a look through, and was coming to exactly the conclusion
you reach in that paragraph (well, I gave up before following through
far enough on damon: perhaps it could also have been affected).
>
> ... and triggering the VM_WARN_ON_ONE(), although never reported so far.
>
> To fix it, teach vm_normal_page_pmd()/vm_normal_page_pud() whether
> pmd_special/pud_special is actually implemented.
>
> Fixes: af38538801c6 ("mm/memory: factor out common code from vm_normal_page_*()")
Agreed. My Fixes tag (where my bisection arrived) was correct for the
common zap_huge_pmd() symptom I was seeing (Lorenzo's commit removed
an independent is_huge_zero_pmd() check from it, so it now relies on
vm_normal_folio_pmd() to give the right answer). But you've chased
up other usages, and realized it goes back further. You could just as
well blame the other commit mentioned in this thread, the d82d09e48219
("mm/huge_memory: mark PMD mappings of the huge zero folio special"),
because in come configs it is not doing what it expects to be doing.
But af38538801c6 is where effects start appearing, so fine to blame
it (and both come from the same 6.18 series, do it doesn't matter).
> Signed-off-by: Hugh Dickins <hughd@google.com>>
> Co-developed-by: David Hildenbrand (Arm) <david@kernel.org>
That's generous, but the patch is not mine at all, and I'll
happily let you grab my two paragraphs above. Please, just
Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
> mm/memory.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 7322a40e73b9..a60bc07b48b2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -612,6 +612,21 @@ static void print_bad_page_map(struct vm_area_struct *vma,
> dump_stack();
> add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
> }
> +
> +static inline bool pgtable_level_has_pxx_special(enum pgtable_level level)
> +{
> + switch (level) {
> + case PGTABLE_LEVEL_PTE:
> + return IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL);
> + case PGTABLE_LEVEL_PMD:
> + return IS_ENABLED(CONFIG_ARCH_SUPPORTS_PMD_PFNMAP);
> + case PGTABLE_LEVEL_PUD:
> + return IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP);
> + default:
> + return false;
> + }
> +}
> +
> #define print_bad_pte(vma, addr, pte, page) \
> print_bad_page_map(vma, addr, pte_val(pte), page, PGTABLE_LEVEL_PTE)
>
> @@ -684,7 +699,7 @@ static inline struct page *__vm_normal_page(struct vm_area_struct *vma,
> unsigned long addr, unsigned long pfn, bool special,
> unsigned long long entry, enum pgtable_level level)
> {
> - if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
> + if (pgtable_level_has_pxx_special(level)) {
> if (unlikely(special)) {
> #ifdef CONFIG_FIND_NORMAL_PAGE
> if (vma->vm_ops && vma->vm_ops->find_normal_page)
That block ends with a comment on CONFIG_ARCH_HAS_PTE_SPECIAL:
perhaps better reworded now - but I don't know what to suggest!
This patch seems okay, but TBH I have no enthusiasm for it -
it forces me to think too hard, and I prefer my own one-liner
(iwhich Lance found odd: odd if you're thinking pmd_special(pmd)
means pmd has the _PAGE_SPECIAL bit set, yes, but not odd if you
think it means that pmd is of a special folio).
But whatever, you and Lance prefer this one: thanks for the fix!
Hugh
next prev parent reply other threads:[~2026-04-30 8:48 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-29 5:08 [PATCH hotfix] mm: fix pmd_special() fallback to observe huge_zero Hugh Dickins
2026-04-29 5:54 ` Lance Yang
2026-04-29 6:12 ` David Hildenbrand (Arm)
2026-04-29 6:57 ` Lance Yang
2026-04-29 7:14 ` David Hildenbrand (Arm)
2026-04-29 7:33 ` Lance Yang
2026-04-30 5:53 ` David Hildenbrand (Arm)
2026-04-30 6:46 ` Lance Yang
2026-04-30 8:30 ` Lance Yang
2026-04-30 8:48 ` Hugh Dickins [this message]
2026-04-30 8:54 ` David Hildenbrand (Arm)
2026-04-30 9:10 ` Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e92e45df-0821-1d67-e026-88c57960814f@google.com \
--to=hughd@google.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=lance.yang@linux.dev \
--cc=liam.howlett@oracle.com \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox