public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	catalin.marinas@arm.com, cl@gentwo.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [v5 PATCH] arm64: mm: show direct mapping use in /proc/meminfo
Date: Mon, 26 Jan 2026 18:58:32 +0000	[thread overview]
Message-ID: <aXe5WHBdHLEnh-Bp@willie-the-truck> (raw)
In-Reply-To: <04f816f9-5533-4fe1-99b0-cd405caac485@os.amperecomputing.com>

On Mon, Jan 26, 2026 at 09:55:06AM -0800, Yang Shi wrote:
> 
> 
> On 1/26/26 6:14 AM, Will Deacon wrote:
> > On Thu, Jan 22, 2026 at 01:59:54PM -0800, Yang Shi wrote:
> > > On 1/22/26 6:43 AM, Ryan Roberts wrote:
> > > > On 21/01/2026 22:44, Yang Shi wrote:
> > > > > On 1/21/26 9:23 AM, Ryan Roberts wrote:
> > > > But it looks like all the higher level users will only ever unplug in the same
> > > > granularity that was plugged in (I might be wrong but that's the sense I get).
> > > > 
> > > > arm64 adds the constraint that it won't unplug any memory that was present at
> > > > boot - see prevent_bootmem_remove_notifier().
> > > > 
> > > > So in practice this is probably safe, though perhaps brittle.
> > > > 
> > > > Some options:
> > > > 
> > > >    - leave it as is and worry about it if/when something shifts and hits the
> > > >      problem.
> > > Seems like the most simple way :-)
> > > 
> > > >    - Enhance prevent_bootmem_remove_notifier() to reject unplugging memory blocks
> > > >      whose boundaries are within leaf mappings.
> > > I don't quite get why we should enhance prevent_bootmem_remove_notifier().
> > > If I read the code correctly, it just simply reject offline boot memory.
> > > Offlining a single memory block is fine. If you check the boundaries there,
> > > will it prevent from offlining a single memory block?
> > > 
> > > I think you need enhance try_remove_memory(). But kernel may unmap linear
> > > mapping by memory blocks if altmap is used. So you should need an extra page
> > > table walk with the start and the size of unplugged dimm before removing the
> > > memory to tell whether the boundaries are within leaf mappings or not IIUC.
> > > Can it be done in arch_remove_memory()? It seems not because
> > > arch_remove_memory() may be called on memory block granularity if altmap is
> > > used.
> > > 
> > > >    - For non-bbml2_noabort systems, map hotplug memory with a new flag to ensure
> > > >      that leaf mappings are always <= memory_block_size_bytes(). For
> > > >      bbml2_noabort, split at the block boundaries before doing the unmapping.
> > > The linear mapping will be at most 128M (4K page size), it sounds sub
> > > optimal IMHO.
> > > 
> > > > Given I don't think this can happen in practice, probably the middle option is
> > > > the best? There is no runtime impact and it will give us a warning if it ever
> > > > does happen in future.
> > > > 
> > > > What do you think?
> > > I agree it can't happen in practice, so why not just take option #1 given
> > > the complexity added by option #2?
> > It still looks broken in the case that a region that was mapped with the
> > contiguous bit is then unmapped. The sequence seems to iterate over
> > each contiguous PTE, zapping the entry and doing the TLBI while the
> > other entries in the contiguous range remain intact. I don't think
> > that's sufficient to guarantee that you don't have stale TLB entries
> > once you've finished processing the whole range.
> > 
> > For example, imagine you have an L1 TLB that only supports 4k entries
> > and an L2 TLB that supports 64k entries. Let's say that the contiguous
> > range is mapped by pte0 ... pte15 and we've zapped and invalidated
> > pte0 ... pte14. At that point, I think the hardware is permitted to use
> > the last remaining contiguous pte (pte15) to allocate a 64k entry in the
> > L2 TLB covering the whole range. A (speculative) walk via one of the
> > virtual addresses translated by pte0 ... pte14 could then hit that entry
> > and fill a 4k entry into the L1 TLB. So at the end of the sequence, you
> > could presumably still access the first 60k of the range thanks to stale
> > entries in the L1 TLB?
> 
> It is a little bit hard for me to understand how come a (speculative) walk
> could happen when we reach here.
> 
> Before we reach here, IIUC kernel has:
> 
>  * offlined all the page blocks. It means they are freed and isolated from
> buddy allocator, even pfn walk (for example, compaction) should not reach
> them at all.
>  * vmemmap has been eliminated. So no struct page available.
> 
> From kernel point of view, they are nonreachable now. Did I miss and/or
> misunderstand something?

I'm talking about hardware speculation. It's mapped as normal memory so
the CPU can speculate from it. We can't really reason about the bounds
of that, especially in a world with branch predictors and history-based
prefetchers.

Will

  reply	other threads:[~2026-01-26 18:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-07  0:29 [v5 PATCH] arm64: mm: show direct mapping use in /proc/meminfo Yang Shi
2026-01-07 18:38 ` Christoph Lameter (Ampere)
2026-01-13 14:36 ` Will Deacon
2026-01-14  0:36   ` Yang Shi
2026-01-21  0:17     ` Yang Shi
2026-01-26 14:18     ` Will Deacon
2026-01-26 17:59       ` Yang Shi
2026-01-21 17:23   ` Ryan Roberts
2026-01-21 22:44     ` Yang Shi
2026-01-22 14:43       ` Ryan Roberts
2026-01-22 21:59         ` Yang Shi
2026-01-26 14:14           ` Will Deacon
2026-01-26 17:55             ` Yang Shi
2026-01-26 18:58               ` Will Deacon [this message]
2026-01-26 20:50                 ` Yang Shi
2026-01-27  8:57                   ` Ryan Roberts
2026-01-28  0:50                     ` Yang Shi
2026-01-22  5:09 ` Anshuman Khandual
2026-01-22 14:17   ` Will Deacon
2026-01-23  2:40     ` Anshuman Khandual
2026-01-23 20:08       ` Yang Shi
2026-01-26 12:44         ` Will Deacon
2026-01-22 19:48   ` Yang Shi
2026-01-22 21:41   ` Yang Shi
2026-01-23 18:42     ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXe5WHBdHLEnh-Bp@willie-the-truck \
    --to=will@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox