All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, david@kernel.org, riel@surriel.com,
	 liam@infradead.org, vbabka@kernel.org, harry@kernel.org,
	jannh@google.com,  sj@kernel.org, ziy@nvidia.com,
	balbirs@nvidia.com, linux-mm@kvack.org,  stable@vger.kernel.org,
	Lance Yang <lance.yang@linux.dev>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd
Date: Tue, 23 Jun 2026 18:02:56 +0100	[thread overview]
Message-ID: <ajq4_VndR8gwponS@lucifer> (raw)
In-Reply-To: <20260622234518.nnx3r7ckphlxn5vm@master>

On Mon, Jun 22, 2026 at 11:45:18PM +0000, Wei Yang wrote:
> On Mon, Jun 22, 2026 at 05:11:02PM +0100, Lorenzo Stoakes wrote:
> >On Mon, Jun 22, 2026 at 02:21:02PM +0000, Wei Yang wrote:
> >> On Mon, Jun 22, 2026 at 02:46:40PM +0100, Lorenzo Stoakes wrote:
> >> >+cc Lance, linux-kernel
> >> >
> >> >Your subject line is 83 characters long and is way too detailed how about 'fix
> >> >device-private PMD handling'?
> >> >
> >>
> >> Got it.
> >>
> >> >You forgot to include linux-kernel@vger.kernel.org on the mail, lore seems to be
> >> >a bit broken atm but in general it's helpful to include that.
> >>
> >> Got it.
> >>
> >> So usually we send a patch to both linux-mm and linux-kernel? If so, I
> >> remember is later actions.
> >
> >Yeah it's better for dealing with kvack going wrong etc. :)
> >
> >>
> >> >
> >> >Also is useful to make this [PATCH mm-hotfixes] to make it really clear it's
> >> >intended as a hotfix.
> >> >
> >>
> >> Got it.
> >>
> >> >Some commit msg language nits:
> >> >
> >> >On Mon, Jun 22, 2026 at 01:06:51PM +0000, Wei Yang wrote:
> >> >> For pmd_trans_huge() and pmd_is_migration_entry(), we does following
> >> >> before return the pmd entry:
> >> >
> >> >Sounds better as:
> >> >
> >> >	For PMD entries that satisfy pmd_trans_huge() or pmd_is_migration_entry(), we
> >> >	perform the following actions:
> >> >
> >>
> >> Sure.
> >>
> >> >>
> >> >>   * re-validate pmd entry after PTL
> >> >>   * check PVMW_MIGRATION
> >> >>   * check_pmd()
> >> >>   * handle on pte level if split under us
> >> >>
> >> >> But for device-private pmd, we just return after pmd_lock().
> >> >
> >> >->
> >> >
> >> >	However, for device-private PMD entries, we simply acquire the PMD lock
> >> >	and return.
> >> >
> >>
> >> Sure.
> >>
> >> >Also can you please give some justification here as to why all this also applies
> >> >to device-private PMD? Right now it sounds hand wavey.
> >> >
> >>
> >> I thought below paragraph explain it. Not sure what justification is preferred.
> >
> >Something about device private PMDs splitting the same way THP ones do, in the
> >pmd_is_device_private_entry() branch of __split_huge_pmd_locked().
> >
>
> Hi, Lorenzo
>
> Thanks for your detailed suggestions.
>
> I tried to add the justification here, and the following is the commit log
> after consolidate your suggestions.
>
>     For PMD entries that satisfy pmd_trans_huge() or
>     pmd_is_migration_entry(), we perform the following actions:
>
>       * re-validate pmd entry after PTL
>       * check PVMW_MIGRATION
>       * check_pmd()
>       * handle on pte level if split under us
>
>     However, for device-private PMD entries, we simply acquire the PMD lock
>     and return. This is not enough, as __split_huge_pmd_locked() would split
>     a pmd device-private PMD under us just as it does for THP PMD.
>
>     This is particularly problematic when PVMW_MIGRATION is set (meaning a
>     migration entry is sought), as it causes a device-private PMD entry to
>     be returned with a different data layout, causing memory corruption.
>
> Just feel this is not that smooth. Would you mind taking another look to see
> if I get your point correctly?

Honestly I'd just drop the whole pmd_trans_huge()/pmd_is_migration_entry() bit
and say:

	Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
	device-private entries") introduced the concept of device-private
	PMD entries, but did not correctly update the rmap walk code to
	account for them.

	As a result, when page_vma_mapped_walk() encounters device-private
	PMD entries, it takes no action other than to acquire the PMD lock
	and exit.

	However this is highly problematic for two reasons - firstly,
	device private entries possess a PFN so check_pmd() needs to be
	called to ensure an overlapping PFN range.

	Secondly, and more importantly, if PVMW_MIGRATION is set the
	caller assumes the returned entry is a migration entry, resulting
	in memory corruption when the caller tries to interpret the device
	private entry as such.

	In addition, commit 146287290023 ("mm/huge_memory: implement
	device-private THP splitting") allowed device private PMDs to be
	split like THP mappings, but again did not update this code path.

	As a result, we might race a PMD split prior to acquiring the PMD
	lock.

	This patch addresses all of these issues by invoking check_pmd(),
	ensuring PMVW_MIGRATION is not set and checks whether a split raced
	us we do for PMD THP and migration entries.

>
> --
> Wei Yang
> Help you, Help me

Cheers, Lorenzo


  reply	other threads:[~2026-06-23 17:03 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-22 13:06 [PATCH] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Wei Yang
2026-06-22 13:14 ` Wei Yang
2026-06-22 13:46 ` Lorenzo Stoakes
2026-06-22 14:21   ` Wei Yang
2026-06-22 14:59     ` Lance Yang
2026-06-22 16:11     ` Lorenzo Stoakes
2026-06-22 23:45       ` Wei Yang
2026-06-23 17:02         ` Lorenzo Stoakes [this message]
2026-06-22 14:44   ` Lance Yang
2026-06-23 16:18 ` kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2026-05-08  1:37 Wei Yang
2026-05-08 21:51 ` Andrew Morton
2026-05-10  1:22   ` Wei Yang
2026-05-08 22:48 ` Balbir Singh
2026-05-10  1:20   ` Wei Yang
2026-05-12 12:43   ` David Hildenbrand (Arm)
2026-05-12 14:35     ` Wei Yang
2026-05-12 18:55       ` David Hildenbrand (Arm)
2026-05-12 23:03         ` Balbir Singh
2026-05-12 23:14           ` Wei Yang
2026-05-12 23:19             ` Balbir Singh
2026-05-13  1:47             ` Balbir Singh
2026-06-12  2:48         ` Wei Yang
2026-06-15 11:58           ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajq4_VndR8gwponS@lucifer \
    --to=ljs@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=balbirs@nvidia.com \
    --cc=david@kernel.org \
    --cc=harry@kernel.org \
    --cc=jannh@google.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=richard.weiyang@gmail.com \
    --cc=riel@surriel.com \
    --cc=sj@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.