public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Will Deacon <will@kernel.org>
Cc: Brian Ruley <brian.ruley@gehealthcare.com>,
	Steve Capper <steve.capper@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, catalin.marinas@arm.com
Subject: Re: [PATCH] mm/arm: pgtable: remove young bit check for pte_valid_user
Date: Mon, 13 Apr 2026 15:42:48 +0100	[thread overview]
Message-ID: <ad0A6Oe7XnVUQe-5@shell.armlinux.org.uk> (raw)
In-Reply-To: <adzMOdySgMIePcue@willie-the-truck>

On Mon, Apr 13, 2026 at 11:58:01AM +0100, Will Deacon wrote:
> On Fri, Apr 10, 2026 at 02:01:41PM +0300, Brian Ruley wrote:
> > On Apr 09, Russell King (Oracle) wrote:
> > > 
> > > On Thu, Apr 09, 2026 at 06:17:36PM +0300, Brian Ruley wrote:
> > > > However, in the case I describe, if VA_B is mapped immediately to pfn_q
> > > > after it been has unmapped and freed for VA_A, then it's quite possible
> > > > that the page is still indexed in the cache.
> > > 
> > > True.
> > > 
> > > > The hypothesis is that if
> > > > VA_A and VA_B land in the same I-cache set and VA_A old cache entry
> > > > still exists (tagged with pfn_q), then the CPU can fetch stale
> > > > instructions because the tag will match. That's one reason why we need
> > > > to invalidate the cache, but that will be skipped in the path:
> > > >
> > > >     migrate_pages
> > > >      migrate_pages_batch
> > > >       migrate_folio_move
> > > >        remove_migration_ptes
> > > >         remove_migration_pte
> > > >          set_pte_at
> > > >           set_ptes
> > > >            __sync_icache_dcache  (skipped if !young)
> > > >             set_pte_ext
> > > 
> > > In this case, if the old PTE was marked !young, then the new PTE will
> > > have:
> > >         pte = pte_mkold(pte);
> > > 
> > > on it, which marks it !young. As you say, __sync_icache_dcache() will
> > > be skipped. While a PTE entry will be set for the kernel, the code in
> > > set_pte_ext() will *not* establish a hardware PTE entry. For the
> > > 2-level pte code:
> > > 
> > >         tst     r1, #L_PTE_YOUNG        @ <- results in Z being set
> > >         tstne   r1, #L_PTE_VALID        @ <- not executed
> > >         eorne   r1, r1, #L_PTE_NONE     @ <- not executed
> > >         tstne   r1, #L_PTE_NONE         @ <- not executed
> > >         moveq   r3, #0                  @ <- hardware PTE value
> > >  ARM(   str     r3, [r0, #2048]! )      @ <- writes hardware PTE
> > > 
> > > So, for a !young PTE, the hardware PTE entry is written as zero,
> > > which means accesses should fault, which will then cause the PTE to
> > > be marked young.
> > > 
> > > For the 3-level case, the L_PTE_YOUNG bit corresponds with the AF bit
> > > in the PTE, and there aren't split Linux / hardware PTE entries. AF
> > > being clear should result in a page fault being generated for the
> > > kernel to handle making the PTE young.
> > > 
> > > In both of these cases, set_ptes() will need to be called with the
> > > updated PTE which will now be marked young, and that will result in
> > > the I-cache being flushed.
> > 
> > Hi Russell,
> > 
> > Thank you for the clarification, this is very educational for me.
> > I understand your scepticism, and I can't explain what's going on based
> > on what you replied. However, I do honestly believe there is a problem
> > here. I'll share the exact testing details and the instrumentation
> > we added that convinced us to reach out at the end. One idea we also
> > had was that could cache aliasing be happening here.
> 
> I thought a bit more about this over the weekend and started to wonder
> if there's a potential race where multiple CPUs try to write the same
> PTE and don't synchronise properly on the cache-maintenance.
> 
> In particular, PG_dcache_clean is manipulated with a test_and_set_bit()
> operation _before_ the cache maintenance is performed, so there's a
> small window where the flag is set but the page is _not_ clean. I don't
> think that matters with regards to invalid migration entries, but
> perhaps the migration just means that we end up putting down a bunch of
> 'old' entries and are then more likely to see concurrent faults trying
> to make the thing young again, potentially hitting this race.
> 
> Looking at arm64 this morning, I noticed that we split the flag
> manipulation so that it's set with a set_bit() after the maintenance has
> been performed. Git then points to 588a513d3425 ("arm64: Fix race
> condition on PG_dcache_clean in __sync_icache_dcache()") which seems to
> talk about the same race. In fact, the mailing list posting:
> 
>   https://lore.kernel.org/all/20210514095001.13236-1-catalin.marinas@arm.com/
> 
> points out that arch/arm/ is also affected but we forgot to CC Russell
> because I think this all came out of the MTE-enablement work [1] and it

If this is the problem, then I'll point out that this is the problem
with *not* sharing even a base level of code between arm64 and arm,
resulting in the same bugs being present in both, needing two separate
fixes, but only one fix happens.

Catalin was dead against any kind of code sharing though, so I suspect
we're going to be forever fixing the same bugs in two separate chunks
of code.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

  parent reply	other threads:[~2026-04-13 14:42 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 12:54 [PATCH] mm/arm: pgtable: remove young bit check for pte_valid_user Brian Ruley
2026-04-09 13:56 ` Will Deacon
2026-04-09 14:21   ` Russell King (Oracle)
2026-04-09 14:43   ` Russell King (Oracle)
2026-04-09 15:17   ` Brian Ruley
2026-04-09 16:00     ` Russell King (Oracle)
2026-04-10 11:01       ` Brian Ruley
2026-04-10 11:18         ` Russell King (Oracle)
2026-04-10 11:43           ` [RFC PATCH] test: " Brian Ruley
2026-04-13 10:58         ` [PATCH] " Will Deacon
2026-04-13 11:17           ` Brian Ruley
2026-04-13 14:42           ` Russell King (Oracle) [this message]
2026-04-13 15:24           ` Brian Ruley
2026-04-14  6:28             ` Brian Ruley
2026-04-14  7:44             ` Brian Ruley
2026-04-14 11:08               ` Will Deacon
2026-04-14 11:43                 ` Brian Ruley
2026-04-09 14:15 ` Russell King (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad0A6Oe7XnVUQe-5@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=brian.ruley@gehealthcare.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=steve.capper@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox