From: "Boone, Max" <mboone@akamai.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Max Boone via B4 Relay <devnull+mboone.akamai.com@kernel.org>,
"David Hildenbrand" <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH v2] mm/pagewalk: fix race between concurrent split and refault
Date: Thu, 26 Mar 2026 09:38:52 +0000 [thread overview]
Message-ID: <55B8AF8A-2A22-4250-A9BF-85434BF28858@akamai.com> (raw)
In-Reply-To: <20260325175006.1c3cae2ee50dd491a153226e@linux-foundation.org>
[-- Attachment #1: Type: text/plain, Size: 2791 bytes --]
Morning,
> On Mar 26, 2026, at 1:50 AM, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> […]
>
> AI review has a couple of questions:
>
> […]
>
> Is there a race condition between the validation of pudval and the
> dereference in pmd_offset()?
I don’t think so - as the comment states: If we spot a PMD table, it will not go away,
so we can continue to walk it.
> Since pmd_offset() dereferences the original pud pointer rather than
> using the validated pudval snapshot, could a concurrent thread collapse
> the PUD into a huge leaf right before pmd_offset() is called?
>
> If that happens, it looks like pmd_offset() might compute a page table
> pointer using the physical address of the huge leaf, which would then be
> dereferenced, leading to the same crash this is trying to prevent.
It shouldn’t change, and the check specifically confirms that the underlying
entry is a table which can not be changed into a huge leaf (given that we have
the mmap read lock).
> Should pmd_offset() be passed the address of the snapshot (&pudval) instead?
No, the snapshot just wraps it in READ_ONCE so that we have guaranteed
coherence when doing the check. Using the reference does not have any benefits
in my perspective - I also don’t see it doing much harm per se, but let’s not increase
the scope of the change unnecessarily.
> Can this loop infinitely on unsplittable PUD leaves?
>
> If walk_pmd_range() encounters a PUD leaf (such as a VFIO or DAX mapping)
> and returns ACTION_AGAIN, this code jumps back to the again label.
>
> During the retry, split_huge_pud() is called, but it only splits Transparent
> Huge Pages. For device memory mapped as large PUD leaves, split_huge_pud()
> does nothing and the entry remains a leaf.
>
> When walk_pmd_range() is called again, it will see the same leaf entry and
> return ACTION_AGAIN, creating a deterministic infinite loop while holding
> the mmap lock.
We shouldn’t hit an infinite loop, theoretically I guess that we can when the two threads
are concurrently splitting and refaulting in perfect lockstep, and it is something discussed
in another patch [1], but adding extra locking is quite expensive for something so
astronomically small.
With regards to “unsplittable” PUDs such as VFIO (i presume this refers to device PFNMAPs)
or DAX (special) mappings. As far as I’m aware, if something’s a vma it should be splittable like
this. Previously split_huge functions had an extra check for pud_devmap(), and the trans_huge
check should return true for VFIO mappings (i.e. on x86, it just checks whether the PSE bit
is set which happens for huge PFNMAPs).
[1] https://lore.kernel.org/all/45e50068-751c-4e8c-a6b0-62cf8d1e58e6@kernel.org/
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3061 bytes --]
prev parent reply other threads:[~2026-03-26 9:39 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-25 9:59 [PATCH v2] mm/pagewalk: fix race between concurrent split and refault Max Boone via B4 Relay
2026-03-25 10:06 ` David Hildenbrand (Arm)
2026-03-25 15:14 ` Lorenzo Stoakes (Oracle)
2026-03-26 0:50 ` Andrew Morton
2026-03-26 8:42 ` David Hildenbrand (Arm)
2026-03-26 9:38 ` Boone, Max [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B8AF8A-2A22-4250-A9BF-85434BF28858@akamai.com \
--to=mboone@akamai.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=devnull+mboone.akamai.com@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox