From: Catalin Marinas <catalin.marinas@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com>,
anshuman.khandual@arm.com, will@kernel.org, ardb@kernel.org,
ryan.roberts@arm.com, mark.rutland@arm.com, joey.gouly@arm.com,
dave.hansen@linux.intel.com, akpm@linux-foundation.org,
chenfeiyang@loongson.cn, chenhuacai@kernel.org,
linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, quic_tingweiz@quicinc.com,
stable@vger.kernel.org
Subject: Re: [PATCH v6] arm64: mm: Populate vmemmap/linear at the page level for hotplugged sections
Date: Thu, 13 Feb 2025 17:56:53 +0000 [thread overview]
Message-ID: <Z64yZRPpyR9A_BiR@arm.com> (raw)
In-Reply-To: <b2964ea1-a22c-4b66-89ef-3082b6d00d21@redhat.com>
On Thu, Feb 13, 2025 at 05:16:37PM +0100, David Hildenbrand wrote:
> On 13.02.25 16:49, Catalin Marinas wrote:
> > On Thu, Feb 13, 2025 at 01:59:25PM +0100, David Hildenbrand wrote:
> > > On 13.02.25 08:57, Zhenhua Huang wrote:
> > > > On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
> > > > to 27, making one section 128M. The related page struct which vmemmap
> > > > points to is 2M then.
> > > > Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
> > > > vmemmap to populate at the PMD section level which was suitable
> > > > initially since hot plug granule is always one section(128M). However,
> > > > commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> > > > introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
> > > > existing arm64 assumptions.
> > > >
> > > > Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when
> > > > pmd_sect() is true, the entire PMD section is cleared, even if there is
> > > > other effective subsection. For example page_struct_map1 and
> > > > page_strcut_map2 are part of a single PMD entry and they are hot-added
> > > > sequentially. Then page_struct_map1 is removed, vmemmap_free() will clear
> > > > the entire PMD entry freeing the struct page map for the whole section,
> > > > even though page_struct_map2 is still active. Similar problem exists
> > > > with linear mapping as well, for 16K base page(PMD size = 32M) or 64K
> > > > base page(PMD = 512M), their block mappings exceed SUBSECTION_SIZE.
> > > > Tearing down the entire PMD mapping too will leave other subsections
> > > > unmapped in the linear mapping.
> > > >
> > > > To address the issue, we need to prevent PMD/PUD/CONT mappings for both
> > > > linear and vmemmap for non-boot sections if corresponding size on the
> > > > given base page exceeds SUBSECTION_SIZE(2MB now).
> > > >
> > > > Cc: <stable@vger.kernel.org> # v5.4+
> > > > Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> > > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
> > >
> > > Just so I understand correctly: for ordinary memory-sections-size hotplug
> > > (NVDIMM, virtio-mem), we still get a large mapping where possible?
> >
> > Up to 2MB blocks only since that's the SUBSECTION_SIZE value. The
> > vmemmap mapping is also limited to PAGE_SIZE mappings (we could use
> > contiguous mappings for vmemmap but it's not wired up; I don't think
> > it's worth the hassle).
>
> But that's messed up, no?
>
> If someone hotplugs a memory section, they have to hotunplug a memory
> section, not parts of it.
>
> That's why x86 does in vmemmap_populate():
>
> if (end - start < PAGES_PER_SECTION * sizeof(struct page))
> err = vmemmap_populate_basepages(start, end, node, NULL);
> else if (boot_cpu_has(X86_FEATURE_PSE))
> err = vmemmap_populate_hugepages(start, end, node, altmap);
> ...
>
> Maybe I'm missing something. Most importantly, why the weird subsection
> stuff is supposed to degrade ordinary hotplug of dimms/virtio-mem etc.
I think that's based on the discussion for a previous version assuming
that the hotplug/unplug sizes are not guaranteed to be symmetric:
https://lore.kernel.org/lkml/a720aaa5-a75e-481e-b396-a5f2b50ed362@quicinc.com/
If that's not the case, we can indeed ignore the SUBSECTION_SIZE
altogether and just rely on the start/end of the hotplugged region.
--
Catalin
next prev parent reply other threads:[~2025-02-13 18:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-13 7:57 [PATCH v6] arm64: mm: Populate vmemmap/linear at the page level for hotplugged sections Zhenhua Huang
2025-02-13 12:59 ` David Hildenbrand
2025-02-13 15:49 ` Catalin Marinas
2025-02-13 16:16 ` David Hildenbrand
2025-02-13 17:56 ` Catalin Marinas [this message]
2025-02-13 18:20 ` David Hildenbrand
2025-02-14 9:46 ` Zhenhua Huang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z64yZRPpyR9A_BiR@arm.com \
--to=catalin.marinas@arm.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=ardb@kernel.org \
--cc=chenfeiyang@loongson.cn \
--cc=chenhuacai@kernel.org \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=joey.gouly@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=quic_tingweiz@quicinc.com \
--cc=quic_zhenhuah@quicinc.com \
--cc=ryan.roberts@arm.com \
--cc=stable@vger.kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.