From: Lorenzo Stoakes <ljs@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, david@kernel.org, riel@surriel.com,
liam@infradead.org, vbabka@kernel.org, harry@kernel.org,
jannh@google.com, balbirs@nvidia.com, ziy@nvidia.com,
sj@kernel.org, linux-mm@kvack.org, stable@vger.kernel.org
Subject: Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd
Date: Fri, 19 Jun 2026 11:44:13 +0100 [thread overview]
Message-ID: <ajUXNjRMraKb6k2n@lucifer> (raw)
In-Reply-To: <20260616063436.20455-1-richard.weiyang@gmail.com>
-cc wrong email
On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote:
> For pmd_trans_huge() and pmd_is_migration_entry(), we does following
> before return the pmd entry:
>
> * re-validate pmd entry after PTL
> * check PVMW_MIGRATION
> * check_pmd()
> * handle on pte level if split under us
>
> But for device-private pmd, we just return after pmd_lock().
>
> This may return improper entry, e.g. if we are looking for a migration
> entry, device-private entry could still be returned, which leads to data
> corruption.
I don't thik this is quite clear?
How about:
If a softleaf entry is present, the existing code simply acquires the
PMD lock and returns success even if PVMW_MIGRATION is set (indicating a
migration entry is sought), meaning that the caller can incorrectly
interpret the entry as something it is not, causing data corruption.
>
> This patch fixes commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration
> support device-private entries") by following the same pattern as
> pmd_trans_huge() and pmd_is_migration_entry() for device private entry.
>
> While at it, it cleanups the pmd entry handling in page_vma_mapped_walk().
>
> * Instead of handling trans huge/migration entry/device private entry
> in a mixed manner, we put each case into its own if condition and
> handle with the same pattern.
> * Also we grab PTL and make sure pmd is not changed under us after
> above check instead of do the check with PTL hold.
> * restart the process if pmd is changed under us
You're doing quite a bit for a fix and you're putting it all in one place.
How about do the fix as 1 patch, and then cleanups as other ones? It helps with
review too :)
It's a general rule of thumb that if you do more than one of moving, refactoring
or changing code, to do them as separate patches so a reviewer/somebody
bisecting can clearly separate each.
Also PLEASE do not add new functionality (this lock recheck) in a fixes
patch. We'll end up backporting new logic that way.
Make the fixes bit _minimal_.
I think in general Andrew prefers separate fixes patches so I'd just make the
_minimal_ change that fixes this for the backport, and the cleanup stuff as a
separate series.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
Hmm seems the device private stuff has had a rocky road of late!
I wonder if we need some more test coverage on this?
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Balbir Singh <balbirs@nvidia.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Annoying nag: You sent to my correct email ljs@kernel.org (thanks!) but also
cc'd the incorrect one, please only send to ljs@kernel.org thanks :)
> Cc: <stable@vger.kernel.org>
Be better to just have this with the Fixes tag, Andrew adds the Cc's from the
actual cc- list anyway.
>
> ---
> v2:
> * specify the possible error case of current code and user visible effect
> * besides fix, cleanup the pmd entry handling based on David's suggestion
>
> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
> ---
> mm/page_vma_mapped.c | 63 +++++++++++++++++++++-----------------------
> 1 file changed, 30 insertions(+), 33 deletions(-)
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 2ccbabfb2cc1..21635fab209c 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> */
> pmde = pmdp_get_lockless(pvmw->pmd);
>
> - if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> - pmde = *pvmw->pmd;
> - if (!pmd_present(pmde)) {
> - softleaf_t entry;
> -
> - if (!thp_migration_supported() ||
> - !(pvmw->flags & PVMW_MIGRATION))
> - return not_found(pvmw);
> - entry = softleaf_from_pmd(pmde);
> -
> - if (!softleaf_is_migration(entry) ||
> - !check_pmd(softleaf_to_pfn(entry), pvmw))
> - return not_found(pvmw);
> - return true;
> - }
> - if (likely(pmd_trans_huge(pmde))) {
> - if (pvmw->flags & PVMW_MIGRATION)
> - return not_found(pvmw);
> - if (!check_pmd(pmd_pfn(pmde), pvmw))
> - return not_found(pvmw);
> - return true;
> - }
> - /* THP pmd was split under us: handle on pte level */
Don't drop critical comments like this, that's very bad.
> - spin_unlock(pvmw->ptl);
> - pvmw->ptl = NULL;
> - } else if (!pmd_present(pmde)) {
> - const softleaf_t entry = softleaf_from_pmd(pmde);
> -
> - if (softleaf_is_device_private(entry)) {
> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> - return true;
> - }
> + if (pmd_present(pmde)) {
You're not checking pmd_trans_huge() at all now? Just assuming pmd_present() ==
pmd_trans_huge()?
> + if (!pmd_leaf(pmde))
> + goto pte_table;
OK now assuming pmd_leaf() == pmd_trans_huge()?
You didn't mention this in the commit msg? Justificaiton please?
> + if (pvmw->flags & PVMW_MIGRATION)
> + return not_found(pvmw);
> + if (!check_pmd(pmd_pfn(pmde), pvmw))
> + return not_found(pvmw);
> + } else if (pmd_is_migration_entry(pmde)) {
> + softleaf_t entry = softleaf_from_pmd(pmde);
> +
Err you dropped the thp_migration_supported() check? Why?
> + if (!(pvmw->flags & PVMW_MIGRATION))
> + return not_found(pvmw);
> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> + return not_found(pvmw);
> + } else if (pmd_is_device_private_entry(pmde)) {
> + softleaf_t entry = softleaf_from_pmd(pmde);
>
> + if (pvmw->flags & PVMW_MIGRATION)
> + return not_found(pvmw);
> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> + return not_found(pvmw);
I mean it's less awful than what was there before, but as refactoring goes,
putting a massive set of branches in the middle of a long function isn't really
the best.
Can we avoid this horrible goto by separating this out into a function?
> + } else {
Else means what exactly? A comment would be good.
I feel like in an effort to keep all this at one level of indentation you've
created a confusing else case.
Sane thing would be to:
a. have this as a separate function
b. to do:
if (pmd_none(pmde)) {
...
return ...;
}
if (pmd_present(pmde)) {
...
return ...;
}
softleaf = softleaf_from_pmd(pmde);
/* softleaf stuff */
return ...;
In this function.
That way it's _very clear_ you're doing softleaf stuff.
> if ((pvmw->flags & PVMW_SYNC) &&
> thp_vma_suitable_order(vma, pvmw->address,
> PMD_ORDER) &&
Umm... previously this was done for all softleaf entries other than
device-private, now not, and you haven't, why?
> @@ -286,6 +274,15 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> step_forward(pvmw, PMD_SIZE);
> continue;
> }
> +
> + /* Double-check under PTL that the PMD didn't change. */
> + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> + if (pmd_same(pmde, pmdp_get(pvmw->pmd)))
> + return true;
> + spin_unlock(pvmw->ptl);
> + pvmw->ptl = NULL;
> + goto restart;
I'm not sure I really get the justification for this? seems you just added this
arbitrarily?
Again, you shouldn't be making changes like this in a fix patch.
> +pte_table:
> if (!map_pte(pvmw, &pmde, &ptl)) {
> if (!pvmw->pte)
> goto restart;
> --
> 2.34.1
>
This is really hard to review like this, as above please split this up properly.
Thanks, Lorenzo
next prev parent reply other threads:[~2026-06-19 10:44 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260616063436.20455-1-richard.weiyang@gmail.com>
2026-06-16 12:30 ` [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Lance Yang
2026-06-16 23:50 ` Wei Yang
2026-06-17 2:32 ` Lance Yang
2026-06-17 8:18 ` Wei Yang
2026-06-19 2:30 ` Wei Yang
2026-06-19 12:19 ` Lance Yang
2026-06-20 2:02 ` Wei Yang
[not found] ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org>
2026-06-17 2:11 ` Balbir Singh
2026-06-17 3:14 ` Lance Yang
2026-06-19 10:44 ` Lorenzo Stoakes [this message]
2026-06-19 10:48 ` David Hildenbrand (Arm)
2026-06-19 11:04 ` Lorenzo Stoakes
2026-06-20 2:13 ` Wei Yang
2026-06-20 2:11 ` Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajUXNjRMraKb6k2n@lucifer \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=balbirs@nvidia.com \
--cc=david@kernel.org \
--cc=harry@kernel.org \
--cc=jannh@google.com \
--cc=liam@infradead.org \
--cc=linux-mm@kvack.org \
--cc=richard.weiyang@gmail.com \
--cc=riel@surriel.com \
--cc=sj@kernel.org \
--cc=stable@vger.kernel.org \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.