All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: "David Hildenbrand (Arm)" <david@kernel.org>, hughd@google.com
Cc: akpm@linux-foundation.org, baolin.wang@linux.alibaba.com,
	baohua@kernel.org, dev.jain@arm.com, liam.howlett@oracle.com,
	ljs@kernel.org, mhocko@suse.com, rppt@kernel.org,
	npache@redhat.com, zhengqi.arch@bytedance.com,
	ryan.roberts@arm.com, surenb@google.com, ziy@nvidia.com,
	linux-mm@kvack.org
Subject: Re: [PATCH hotfix] mm: fix pmd_special() fallback to observe huge_zero
Date: Wed, 29 Apr 2026 15:33:52 +0800	[thread overview]
Message-ID: <aa3937bb-7f06-4136-ba36-ce4a06acf80a@linux.dev> (raw)
In-Reply-To: <ea1453a6-14c9-4334-ac7e-2758586393b2@kernel.org>



On 2026/4/29 15:14, David Hildenbrand (Arm) wrote:
> On 4/29/26 08:57, Lance Yang wrote:
>>
>> On Wed, Apr 29, 2026 at 08:12:55AM +0200, David Hildenbrand (Arm) wrote:
>>> On 4/29/26 07:54, Lance Yang wrote:
>>>>
>>>>
>>>> Good catch!
>>>>
>>>
>>> Likely it should be
>>>
>>> 	Fixes: d82d09e48219 ("mm/huge_memory: mark PMD mappings of the huge zero folio special")
>>>
>>> Because vm_normal_page_pmd() would return the wrong thing.
>>
>> Right.
>>
>>> But I am surprised that we didn't run into the
>>>
>>> 	VM_WARN_ON_ONCE(is_zero_pfn(pfn) || is_huge_zero_pfn(pfn));
>>>
>>> earlier. 	
>>
>> The history seems to be:
>>
>> 	2025-09-13 d82d09e48219 ("mm/huge_memory: mark PMD mappings of the huge zero folio special")
>> 	2025-09-13 af38538801c6 ("mm/memory: factor out common code from vm_normal_page_*()")
>>
>> After d82d09e48219, vm_normal_page_pmd() still had the explicit huge
>> zero check before returning the page:
>>
>> --8<---
>> struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>> 				pmd_t pmd)
>> {
>> 	unsigned long pfn = pmd_pfn(pmd);
>>
>> 	if (unlikely(pmd_special(pmd)))
>> 		return NULL;
>>
>> 	if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
>> 		if (vma->vm_flags & VM_MIXEDMAP) {
>> 			if (!pfn_valid(pfn))
>> 				return NULL;
>> 			goto out;
>> 		} else {
>> 			unsigned long off;
>> 			off = (addr - vma->vm_start) >> PAGE_SHIFT;
>> 			if (pfn == vma->vm_pgoff + off)
>> 				return NULL;
>> 			if (!is_cow_mapping(vma->vm_flags))
>> 				return NULL;
>> 		}
>> 	}
>>
>> 	if (is_huge_zero_pfn(pfn))
>> 		return NULL;
>> 	if (unlikely(pfn > highest_memmap_pfn))
>> 		return NULL;
>>
>> 	/*
>> 	 * NOTE! We still have PageReserved() pages in the page tables.
>> 	 * eg. VDSO mappings can cause them to exist.
>> 	 */
>> out:
>> 	return pfn_to_page(pfn);
>> }
>> ---
>>
>> So even if pmd_mkspecial() was a no-op and pmd_special() stayed false,
>> we would still return NULL there.
>>
>> Then af38538801c6 moved the PMD path into __vm_normal_page():
>>
>> ---8<---
>> struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>> 				pmd_t pmd)
>> {
>> 	return __vm_normal_page(vma, addr, pmd_pfn(pmd), pmd_special(pmd),
>> 				pmd_val(pmd), PGTABLE_LEVEL_PMD);
>> }
>> ---
>>
>> For CONFIG_ARCH_HAS_PTE_SPECIAL=y, __vm_normal_page() only returns NULL
>> for the huge zero PFN if special == true. On x86 32-bit, pmd_special()
>> stays false, so this can now fall through to VM_WARN_ON_ONCE():
>>
>> ---8<---
>> 	if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
>> 		if (unlikely(special)) {
>> 			if (is_zero_pfn(pfn) || is_huge_zero_pfn(pfn))
>> 				return NULL;
>> ...
>> 		}
>> ...
>> 	} else {
>> ...
>> 		if (is_zero_pfn(pfn) || is_huge_zero_pfn(pfn))
>> 			return NULL;
>> 	}
>>
>> ...
>> 	VM_WARN_ON_ONCE(is_zero_pfn(pfn) || is_huge_zero_pfn(pfn));
>> ...
>> ---
>>
>> So my guess is that the warning above became possible after
>> af38538801c6, IIUC.
> 
> Yes, I think you are right about af38538801c6.
> 
> What about the following then as a temporary solution:

Nice, that works for me :)

> diff --git a/mm/memory.c b/mm/memory.c
> index 199214f8de08..bf447c8b2f57 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -684,7 +684,9 @@ static inline struct page *__vm_normal_page(struct vm_area_struct *vma,
>                  unsigned long addr, unsigned long pfn, bool special,
>                  unsigned long long entry, enum pgtable_level level)
>   {
> -       if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
> +       if ((IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && level == PGTABLE_LEVEL_PTE) ||
> +           (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PMD_PFNMAP) && level == PGTABLE_LEVEL_PMD) ||
> +           (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && level == PGTABLE_LEVEL_PUD)) {
>                  if (unlikely(special)) {
>   #ifdef CONFIG_FIND_NORMAL_PAGE
>                          if (vma->vm_ops && vma->vm_ops->find_normal_page)
> 
> 
> We could wrap the check in a fancy helper.


Cheers,
Lance



  reply	other threads:[~2026-04-29  7:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29  5:08 [PATCH hotfix] mm: fix pmd_special() fallback to observe huge_zero Hugh Dickins
2026-04-29  5:54 ` Lance Yang
2026-04-29  6:12   ` David Hildenbrand (Arm)
2026-04-29  6:57     ` Lance Yang
2026-04-29  7:14       ` David Hildenbrand (Arm)
2026-04-29  7:33         ` Lance Yang [this message]
2026-04-30  5:53           ` David Hildenbrand (Arm)
2026-04-30  6:46             ` Lance Yang
2026-04-30  8:30             ` Lance Yang
2026-04-30  8:48             ` Hugh Dickins
2026-04-30  8:54               ` David Hildenbrand (Arm)
2026-04-30  9:10                 ` Lance Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa3937bb-7f06-4136-ba36-ce4a06acf80a@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hughd@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.