From: Lorenzo Stoakes <ljs@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, david@kernel.org, riel@surriel.com,
liam@infradead.org, vbabka@kernel.org, harry@kernel.org,
jannh@google.com, balbirs@nvidia.com, ziy@nvidia.com,
sj@kernel.org, linux-mm@kvack.org, stable@vger.kernel.org
Subject: Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd
Date: Fri, 19 Jun 2026 11:44:13 +0100 [thread overview]
Message-ID: <ajUXNjRMraKb6k2n@lucifer> (raw)
In-Reply-To: <20260616063436.20455-1-richard.weiyang@gmail.com>
-cc wrong email
On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote:
> For pmd_trans_huge() and pmd_is_migration_entry(), we does following
> before return the pmd entry:
>
> * re-validate pmd entry after PTL
> * check PVMW_MIGRATION
> * check_pmd()
> * handle on pte level if split under us
>
> But for device-private pmd, we just return after pmd_lock().
>
> This may return improper entry, e.g. if we are looking for a migration
> entry, device-private entry could still be returned, which leads to data
> corruption.
I don't thik this is quite clear?
How about:
If a softleaf entry is present, the existing code simply acquires the
PMD lock and returns success even if PVMW_MIGRATION is set (indicating a
migration entry is sought), meaning that the caller can incorrectly
interpret the entry as something it is not, causing data corruption.
>
> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration
> support device-private entries") by following the same pattern as
> pmd_trans_huge() and pmd_is_migration_entry() for device private entry.
>
> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk().
>
> * Instead of handling trans huge/migration entry/device private entry
> in a mixed manner, we put each case into its own if condition and
> handle with the same pattern.
> * Also we grab PTL and make sure pmd is not changed under us after
> above check instead of do the check with PTL hold.
> * restart the process if pmd is changed under us
You're doing quite a bit for a fix and you're putting it all in one place.
How about do the fix as 1 patch, and then cleanups as other ones? It helps with
review too :)
It's a general rule of thumb that if you do more than one of moving, refactoring
or changing code, to do them as separate patches so a reviewer/somebody
bisecting can clearly separate each.
Also PLEASE do not add new functionality (this lock recheck) in a fixes
patch. We'll end up backporting new logic that way.
Make the fixes bit _minimal_.
I think in general Andrew prefers separate fixes patches so I'd just make the
_minimal_ change that fixes this for the backport, and the cleanup stuff as a
separate series.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
Hmm seems the device private stuff has had a rocky road of late!
I wonder if we need some more test coverage on this?
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Balbir Singh <balbirs@nvidia.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Annoying nag: You sent to my correct email ljs@kernel.org (thanks!) but also
cc'd the incorrect one, please only send to ljs@kernel.org thanks :)
> Cc: <stable@vger.kernel.org>
Be better to just have this with the Fixes tag, Andrew adds the Cc's from the
actual cc- list anyway.
>
> ---
> v2:
> * specify the possible error case of current code and user visible effect
> * besides fix, cleanup the pmd entry handling based on David's suggestion
>
> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
> ---
> mm/page_vma_mapped.c | 63 +++++++++++++++++++++-----------------------
> 1 file changed, 30 insertions(+), 33 deletions(-)
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 2ccbabfb2cc1..21635fab209c 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> */
> pmde = pmdp_get_lockless(pvmw->pmd);
>
> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> - pmde = *pvmw->pmd;
> - if (!pmd_present(pmde)) {
> - softleaf_t entry;
> -
> - if (!thp_migration_supported() ||
> - !(pvmw->flags & PVMW_MIGRATION))
> - return not_found(pvmw);
> - entry = softleaf_from_pmd(pmde);
> -
> - if (!softleaf_is_migration(entry) ||
> - !check_pmd(softleaf_to_pfn(entry), pvmw))
> - return not_found(pvmw);
> - return true;
> - }
> - if (likely(pmd_trans_huge(pmde))) {
> - if (pvmw->flags & PVMW_MIGRATION)
> - return not_found(pvmw);
> - if (!check_pmd(pmd_pfn(pmde), pvmw))
> - return not_found(pvmw);
> - return true;
> - }
> - /* THP pmd was split under us: handle on pte level */
Don't drop critical comments like this, that's very bad.
> - spin_unlock(pvmw->ptl);
> - pvmw->ptl = NULL;
> - } else if (!pmd_present(pmde)) {
> - const softleaf_t entry = softleaf_from_pmd(pmde);
> -
> - if (softleaf_is_device_private(entry)) {
> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> - return true;
> - }
> + if (pmd_present(pmde)) {
You're not checking pmd_trans_huge() at all now? Just assuming pmd_present() ==
pmd_trans_huge()?
> + if (!pmd_leaf(pmde))
> + goto pte_table;
OK now assuming pmd_leaf() == pmd_trans_huge()?
You didn't mention this in the commit msg? Justificaiton please?
> + if (pvmw->flags & PVMW_MIGRATION)
> + return not_found(pvmw);
> + if (!check_pmd(pmd_pfn(pmde), pvmw))
> + return not_found(pvmw);
> + } else if (pmd_is_migration_entry(pmde)) {
> + softleaf_t entry = softleaf_from_pmd(pmde);
> +
Err you dropped the thp_migration_supported() check? Why?
> + if (!(pvmw->flags & PVMW_MIGRATION))
> + return not_found(pvmw);
> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> + return not_found(pvmw);
> + } else if (pmd_is_device_private_entry(pmde)) {
> + softleaf_t entry = softleaf_from_pmd(pmde);
>
> + if (pvmw->flags & PVMW_MIGRATION)
> + return not_found(pvmw);
> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> + return not_found(pvmw);
I mean it's less awful than what was there before, but as refactoring goes,
putting a massive set of branches in the middle of a long function isn't really
the best.
Can we avoid this horrible goto by separating this out into a function?
> + } else {
Else means what exactly? A comment would be good.
I feel like in an effort to keep all this at one level of indentation you've
created a confusing else case.
Sane thing would be to:
a. have this as a separate function
b. to do:
if (pmd_none(pmde)) {
...
return ...;
}
if (pmd_present(pmde)) {
...
return ...;
}
softleaf = softleaf_from_pmd(pmde);
/* softleaf stuff */
return ...;
In this function.
That way it's _very clear_ you're doing softleaf stuff.
> if ((pvmw->flags & PVMW_SYNC) &&
> thp_vma_suitable_order(vma, pvmw->address,
> PMD_ORDER) &&
Umm... previously this was done for all softleaf entries other than
device-private, now not, and you haven't, why?
> @@ -286,6 +274,15 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> step_forward(pvmw, PMD_SIZE);
> continue;
> }
> +
> + /* Double-check under PTL that the PMD didn't change. */
> + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> + if (pmd_same(pmde, pmdp_get(pvmw->pmd)))
> + return true;
> + spin_unlock(pvmw->ptl);
> + pvmw->ptl = NULL;
> + goto restart;
I'm not sure I really get the justification for this? seems you just added this
arbitrarily?
Again, you shouldn't be making changes like this in a fix patch.
> +pte_table:
> if (!map_pte(pvmw, &pmde, &ptl)) {
> if (!pvmw->pte)
> goto restart;
> --
> 2.34.1
>
This is really hard to review like this, as above please split this up properly.
Thanks, Lorenzo
next prev parent reply other threads:[~2026-06-19 10:44 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260616063436.20455-1-richard.weiyang@gmail.com>
2026-06-16 12:30 ` [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Lance Yang
2026-06-16 23:50 ` Wei Yang
2026-06-17 2:32 ` Lance Yang
2026-06-17 8:18 ` Wei Yang
2026-06-19 2:30 ` Wei Yang
2026-06-19 12:19 ` Lance Yang
2026-06-20 2:02 ` Wei Yang
[not found] ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org>
2026-06-17 2:11 ` Balbir Singh
2026-06-17 3:14 ` Lance Yang
2026-06-19 10:44 ` Lorenzo Stoakes [this message]
2026-06-19 10:48 ` David Hildenbrand (Arm)
2026-06-19 11:04 ` Lorenzo Stoakes
2026-06-20 2:13 ` Wei Yang
2026-06-20 2:11 ` Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajUXNjRMraKb6k2n@lucifer \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=balbirs@nvidia.com \
--cc=david@kernel.org \
--cc=harry@kernel.org \
--cc=jannh@google.com \
--cc=liam@infradead.org \
--cc=linux-mm@kvack.org \
--cc=richard.weiyang@gmail.com \
--cc=riel@surriel.com \
--cc=sj@kernel.org \
--cc=stable@vger.kernel.org \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox