Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Gavin Guo <gavinguo@igalia.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>, Zi Yan <ziy@nvidia.com>,
	Gavin Shan <gshan@redhat.com>, Florent Revest <revest@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH 5.15.y] mm/huge_memory: fix dereferencing invalid pmd migration entry
Date: Fri, 20 Jun 2025 11:00:31 +0800	[thread overview]
Message-ID: <aa32c9cd-9dac-4982-8484-ea11c1a003f1@igalia.com> (raw)
In-Reply-To: <1993fdd9-656a-6c25-fb83-cb2993bc18eb@google.com>

On 6/20/25 04:17, Hugh Dickins wrote:
> On Thu, 19 Jun 2025, Gavin Guo wrote:
>> On 6/19/25 11:30, Hugh Dickins wrote:
>>> On Mon, 16 Jun 2025, Gavin Guo wrote:
>>>
>>>> [ Upstream commit be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 ]
>>>>
>>>> When migrating a THP, concurrent access to the PMD migration entry during
>>>> a deferred split scan can lead to an invalid address access, as
>>>> illustrated below.  To prevent this invalid access, it is necessary to
>>>> check the PMD migration entry and return early.  In this context, there is
>>>> no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the
>>>> equality of the target folio.  Since the PMD migration entry is locked, it
>>>> cannot be served as the target.
>>>>
>>>> Mailing list discussion and explanation from Hugh Dickins: "An anon_vma
>>>> lookup points to a location which may contain the folio of interest, but
>>>> might instead contain another folio: and weeding out those other folios is
>>>> precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of
>>>> replacing the wrong folio" comment a few lines above it) is for."
>>>>
>>>> BUG: unable to handle page fault for address: ffffea60001db008
>>>> CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE
>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>>>> 1.16.3-debian-1.16.3-2 04/01/2014
>>>> RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60
>>>> Call Trace:
>>>> <TASK>
>>>> try_to_migrate_one+0x28c/0x3730
>>>> rmap_walk_anon+0x4f6/0x770
>>>> unmap_folio+0x196/0x1f0
>>>> split_huge_page_to_list_to_order+0x9f6/0x1560
>>>> deferred_split_scan+0xac5/0x12a0
>>>> shrinker_debugfs_scan_write+0x376/0x470
>>>> full_proxy_write+0x15c/0x220
>>>> vfs_write+0x2fc/0xcb0
>>>> ksys_write+0x146/0x250
>>>> do_syscall_64+0x6a/0x120
>>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>
>>>> The bug is found by syzkaller on an internal kernel, then confirmed on
>>>> upstream.
>>>>
>>>> Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com
>>>> Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/
>>>> Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/
>>>> Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
>>>> Signed-off-by: Gavin Guo <gavinguo@igalia.com>
>>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>> Acked-by: Hugh Dickins <hughd@google.com>
>>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>>> Cc: Florent Revest <revest@google.com>
>>>> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>> Cc: Miaohe Lin <linmiaohe@huawei.com>
>>>> Cc: <stable@vger.kernel.org>
>>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>>> [gavin: backport the migration checking logic to __split_huge_pmd]
>>>> Signed-off-by: Gavin Guo <gavinguo@igalia.com>
>>>> ---
>>>>    mm/huge_memory.c | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 9139da4baa39..bcefc17954d6 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2161,7 +2161,7 @@ void __split_huge_pmd(struct vm_area_struct *vma,
>>>> pmd_t *pmd,
>>>>     VM_BUG_ON(freeze && !page);
>>>>     if (page) {
>>>>    		VM_WARN_ON_ONCE(!PageLocked(page));
>>>> -		if (page != pmd_page(*pmd))
>>>> +		if (is_pmd_migration_entry(*pmd) || page != pmd_page(*pmd))
>>>>     		goto out;
>>>>     }
>>>>    @@ -2196,7 +2196,7 @@ void __split_huge_pmd(struct vm_area_struct *vma,
>>>> pmd_t *pmd,
>>>>      }
>>>>      if (PageMlocked(page))
>>>>    			clear_page_mlock(page);
>>>> -	} else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
>>>> +	} else if (!pmd_devmap(*pmd))
>>>>      goto out;
>>>
>>> I'm sorry, Gavin, but this 5.15 and the 5.10 and 5.4 backports look wrong
>>> to me, because here you drop the is_pmd_migration_entry(*pmd) condition,
>>> but if !page then that has not been checked earlier (this check here is
>>> specifically allowing a pmd migration entry to proceed to the split).
>>>
>>> Hugh
>>
>> Hi Hugh,
>>
>> Thank you again for the review.
>>
>> Regarding the 5.4/5.10/5.15. How do you think about the following changes?
> 
> I think you are going way off track with the following changes.
> 
> The first hunk of your backport (the pmd_page line) was fine, it was the
> second hunk (the pmd_devmap line) that I objected to: that second hunk
> should just be deleted, to make no change on the pmd_devmap line.
> 
> Maybe you're misreading that pmd_devmap line, it is easy to get lost
> in its ! and parentheses.

Got it. I'll just go ahead removing the second hunk and submit the patch 
soon.

> 
>>
>> @@ -2327,6 +2327,8 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t
>> *pmd,
>>          mmu_notifier_invalidate_range_start(&range);
>>          ptl = pmd_lock(vma->vm_mm, pmd);
>>
>> +       if (is_pmd_migration_entry(*pmd))
>> +               goto out;
> 
> No.  In general, __split_huge_pmd_locked() works on pmd migration entries;
> the bug you are fixing is not with pmd migration entries as such, but with
> applying pmd_page(*pmd) when *pmd is a migration entry.

Yeah, you are correct. Maybe I was thinking too much. When I modified 
the code, I recalled that the folio lock theory you mentioned in the 
upstream discussion that when migration happens the folio lock is taken 
and in this split path, it's impossible to take the same lock and the 
folio must be the same. If it's not the same, it could be the symptom of 
the reverse mapping behavior and we just skip it.

However, I would stop the discussion here without sprawling the logic 
and focusing on fixing the pmd_page(*pmd) problem.

> 
> I do not recall offhand how important it is that __split_huge_pmd_locked()
> should apply to pmd migration entries (when page here is NULL), and I do
> not wish to spend time researching that: maybe it's just an optimization,
> or maybe it's essential on some path.  What is clear is that this bugfix
> backport should not be making any change to that behaviour.

Agreed.

> 
>>          /*
>>           * If caller asks to setup a migration entries, we need a page to
>> check
>>           * pmd against. Otherwise we can end up replacing wrong page.
>> @@ -2369,7 +2371,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t
>> *pmd,
>>                  }
>>                  if (PageMlocked(page))
>>                          clear_page_mlock(page);
>> -       } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
>> +       } else if (!pmd_devmap(*pmd) )
>>                  goto out;
>>          __split_huge_pmd_locked(vma, pmd, range.start, freeze);
>>   out:
>>
>> There is still an access, page = pmd_page(*pmd), inside the if(!page). I'm not
>> sure if pmd could be a migration entry when the page is NULL. To avoid this as
>> well, maybe just goto out directly in the beginning?
> 
> No.  The other pmd_page(*pmd) is inside a pmd_trans_huge(*pmd) block,
> so it's safe, *pmd cannot be a migration entry there.  (Though admittedly
> I have to check rather carefully, because, at least in the x86 case,
> pmd_trans_huge(*pmd) does not guarantee that the present bit is set.)

Looks good to me. Thank you for reviewing the logic.

> 
> Hugh
> 
>>
>>>
>>>>    	__split_huge_pmd_locked(vma, pmd, range.start, freeze);
>>>>    out:
>>>>
>>>> base-commit: 1c700860e8bc079c5c71d73c55e51865d273943c
>>>> -- 
>>>> 2.43.0


  reply	other threads:[~2025-06-20  3:01 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12  9:20 FAILED: patch "[PATCH] mm/huge_memory: fix dereferencing invalid pmd migration entry" failed to apply to 5.15-stable tree gregkh
2025-06-16  2:42 ` [PATCH 5.15.y] mm/huge_memory: fix dereferencing invalid pmd migration entry Gavin Guo
2025-06-19  3:30   ` Hugh Dickins
2025-06-19  5:25     ` Gavin Guo
2025-06-19 20:17       ` Hugh Dickins
2025-06-20  3:00         ` Gavin Guo [this message]
2025-06-21  5:41 ` Gavin Guo
2025-06-21  7:48   ` Hugh Dickins
2025-06-21 10:15   ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa32c9cd-9dac-4982-8484-ea11c1a003f1@igalia.com \
    --to=gavinguo@igalia.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=hughd@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=revest@google.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox