Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Gavin Guo <gavinguo@igalia.com>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@redhat.com>, Zi Yan <ziy@nvidia.com>,
	 Gavin Shan <gshan@redhat.com>,
	Florent Revest <revest@google.com>,
	 Matthew Wilcox <willy@infradead.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	 stable@vger.kernel.org
Subject: Re: [PATCH 5.15.y] mm/huge_memory: fix dereferencing invalid pmd migration entry
Date: Thu, 19 Jun 2025 13:17:18 -0700 (PDT)	[thread overview]
Message-ID: <1993fdd9-656a-6c25-fb83-cb2993bc18eb@google.com> (raw)
In-Reply-To: <c6eed86d-9a79-4845-8289-e9b24c46b88a@igalia.com>

On Thu, 19 Jun 2025, Gavin Guo wrote:
> On 6/19/25 11:30, Hugh Dickins wrote:
> > On Mon, 16 Jun 2025, Gavin Guo wrote:
> > 
> >> [ Upstream commit be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 ]
> >>
> >> When migrating a THP, concurrent access to the PMD migration entry during
> >> a deferred split scan can lead to an invalid address access, as
> >> illustrated below.  To prevent this invalid access, it is necessary to
> >> check the PMD migration entry and return early.  In this context, there is
> >> no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the
> >> equality of the target folio.  Since the PMD migration entry is locked, it
> >> cannot be served as the target.
> >>
> >> Mailing list discussion and explanation from Hugh Dickins: "An anon_vma
> >> lookup points to a location which may contain the folio of interest, but
> >> might instead contain another folio: and weeding out those other folios is
> >> precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of
> >> replacing the wrong folio" comment a few lines above it) is for."
> >>
> >> BUG: unable to handle page fault for address: ffffea60001db008
> >> CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE
> >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> >> 1.16.3-debian-1.16.3-2 04/01/2014
> >> RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60
> >> Call Trace:
> >> <TASK>
> >> try_to_migrate_one+0x28c/0x3730
> >> rmap_walk_anon+0x4f6/0x770
> >> unmap_folio+0x196/0x1f0
> >> split_huge_page_to_list_to_order+0x9f6/0x1560
> >> deferred_split_scan+0xac5/0x12a0
> >> shrinker_debugfs_scan_write+0x376/0x470
> >> full_proxy_write+0x15c/0x220
> >> vfs_write+0x2fc/0xcb0
> >> ksys_write+0x146/0x250
> >> do_syscall_64+0x6a/0x120
> >> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >>
> >> The bug is found by syzkaller on an internal kernel, then confirmed on
> >> upstream.
> >>
> >> Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com
> >> Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/
> >> Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/
> >> Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
> >> Signed-off-by: Gavin Guo <gavinguo@igalia.com>
> >> Acked-by: David Hildenbrand <david@redhat.com>
> >> Acked-by: Hugh Dickins <hughd@google.com>
> >> Acked-by: Zi Yan <ziy@nvidia.com>
> >> Reviewed-by: Gavin Shan <gshan@redhat.com>
> >> Cc: Florent Revest <revest@google.com>
> >> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> >> Cc: Miaohe Lin <linmiaohe@huawei.com>
> >> Cc: <stable@vger.kernel.org>
> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >> [gavin: backport the migration checking logic to __split_huge_pmd]
> >> Signed-off-by: Gavin Guo <gavinguo@igalia.com>
> >> ---
> >>   mm/huge_memory.c | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 9139da4baa39..bcefc17954d6 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -2161,7 +2161,7 @@ void __split_huge_pmd(struct vm_area_struct *vma,
> >> pmd_t *pmd,
> >>    VM_BUG_ON(freeze && !page);
> >>    if (page) {
> >>   		VM_WARN_ON_ONCE(!PageLocked(page));
> >> -		if (page != pmd_page(*pmd))
> >> +		if (is_pmd_migration_entry(*pmd) || page != pmd_page(*pmd))
> >>    		goto out;
> >>    }
> >>   @@ -2196,7 +2196,7 @@ void __split_huge_pmd(struct vm_area_struct *vma,
> >> pmd_t *pmd,
> >>     }
> >>     if (PageMlocked(page))
> >>   			clear_page_mlock(page);
> >> -	} else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
> >> +	} else if (!pmd_devmap(*pmd))
> >>     goto out;
> > 
> > I'm sorry, Gavin, but this 5.15 and the 5.10 and 5.4 backports look wrong
> > to me, because here you drop the is_pmd_migration_entry(*pmd) condition,
> > but if !page then that has not been checked earlier (this check here is
> > specifically allowing a pmd migration entry to proceed to the split).
> > 
> > Hugh
> 
> Hi Hugh,
> 
> Thank you again for the review.
> 
> Regarding the 5.4/5.10/5.15. How do you think about the following changes?

I think you are going way off track with the following changes.

The first hunk of your backport (the pmd_page line) was fine, it was the
second hunk (the pmd_devmap line) that I objected to: that second hunk
should just be deleted, to make no change on the pmd_devmap line.

Maybe you're misreading that pmd_devmap line, it is easy to get lost
in its ! and parentheses.

> 
> @@ -2327,6 +2327,8 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t
> *pmd,
>         mmu_notifier_invalidate_range_start(&range);
>         ptl = pmd_lock(vma->vm_mm, pmd);
> 
> +       if (is_pmd_migration_entry(*pmd))
> +               goto out;

No.  In general, __split_huge_pmd_locked() works on pmd migration entries;
the bug you are fixing is not with pmd migration entries as such, but with
applying pmd_page(*pmd) when *pmd is a migration entry.

I do not recall offhand how important it is that __split_huge_pmd_locked()
should apply to pmd migration entries (when page here is NULL), and I do
not wish to spend time researching that: maybe it's just an optimization,
or maybe it's essential on some path.  What is clear is that this bugfix
backport should not be making any change to that behaviour.

>         /*
>          * If caller asks to setup a migration entries, we need a page to
> check
>          * pmd against. Otherwise we can end up replacing wrong page.
> @@ -2369,7 +2371,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t
> *pmd,
>                 }
>                 if (PageMlocked(page))
>                         clear_page_mlock(page);
> -       } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
> +       } else if (!pmd_devmap(*pmd) )
>                 goto out;
>         __split_huge_pmd_locked(vma, pmd, range.start, freeze);
>  out:
> 
> There is still an access, page = pmd_page(*pmd), inside the if(!page). I'm not
> sure if pmd could be a migration entry when the page is NULL. To avoid this as
> well, maybe just goto out directly in the beginning?

No.  The other pmd_page(*pmd) is inside a pmd_trans_huge(*pmd) block,
so it's safe, *pmd cannot be a migration entry there.  (Though admittedly
I have to check rather carefully, because, at least in the x86 case,
pmd_trans_huge(*pmd) does not guarantee that the present bit is set.)

Hugh

> 
> > 
> >>   	__split_huge_pmd_locked(vma, pmd, range.start, freeze);
> >>   out:
> >>
> >> base-commit: 1c700860e8bc079c5c71d73c55e51865d273943c
> >> -- 
> >> 2.43.0

  reply	other threads:[~2025-06-19 20:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12  9:20 FAILED: patch "[PATCH] mm/huge_memory: fix dereferencing invalid pmd migration entry" failed to apply to 5.15-stable tree gregkh
2025-06-16  2:42 ` [PATCH 5.15.y] mm/huge_memory: fix dereferencing invalid pmd migration entry Gavin Guo
2025-06-19  3:30   ` Hugh Dickins
2025-06-19  5:25     ` Gavin Guo
2025-06-19 20:17       ` Hugh Dickins [this message]
2025-06-20  3:00         ` Gavin Guo
2025-06-21  5:41 ` Gavin Guo
2025-06-21  7:48   ` Hugh Dickins
2025-06-21 10:15   ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1993fdd9-656a-6c25-fb83-cb2993bc18eb@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=gavinguo@igalia.com \
    --cc=gshan@redhat.com \
    --cc=linmiaohe@huawei.com \
    --cc=revest@google.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox