Re: [PATCH v5 2/3] arm64: mmu: avoid allocating pages while splitting the linear mapping

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

From: Yeoreum Yun <yeoreum.yun@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev,
	catalin.marinas@arm.com, akpm@linux-oundation.org,
	david@kernel.org, kevin.brodsky@arm.com,
	quic_zhenhuah@quicinc.com, dev.jain@arm.com,
	yang@os.amperecomputing.com, chaitanyas.prakash@arm.com,
	bigeasy@linutronix.de, clrkwllms@kernel.org, rostedt@goodmis.org,
	lorenzo.stoakes@oracle.com, ardb@kernel.org, jackmanb@google.com,
	vbabka@suse.cz, mhocko@suse.com
Subject: Re: [PATCH v5 2/3] arm64: mmu: avoid allocating pages while splitting the linear mapping
Date: Tue, 20 Jan 2026 10:54:55 +0000	[thread overview]
Message-ID: <aW9e/6np9uO39yXM@e129823.arm.com> (raw)
In-Reply-To: <baa46ea0-737b-4d9e-b4f4-2eee1476da95@arm.com>

> >>>> On Mon, Jan 05, 2026 at 08:23:27PM +0000, Yeoreum Yun wrote:
> >>>>> +static int __init linear_map_prealloc_split_pgtables(void)
> >>>>> +{
> >>>>> +	int ret, i;
> >>>>> +	unsigned long lstart = _PAGE_OFFSET(vabits_actual);
> >>>>> +	unsigned long lend = PAGE_END;
> >>>>> +	unsigned long kstart = (unsigned long)lm_alias(_stext);
> >>>>> +	unsigned long kend = (unsigned long)lm_alias(__init_begin);
> >>>>> +
> >>>>> +	const struct mm_walk_ops collect_to_split_ops = {
> >>>>> +		.pud_entry	= collect_to_split_pud_entry,
> >>>>> +		.pmd_entry	= collect_to_split_pmd_entry
> >>>>> +	};
> >>>>
> >>>> Why do we need to rewalk the page-table here instead of collating the
> >>>> number of block mappings we put down when creating the linear map in
> >>>> the first place?
> >>
> >> That's a good point; perhaps we can reuse the counters that this series introduces?
> >>
> >> https://lore.kernel.org/all/20260107002944.2940963-1-yang@os.amperecomputing.com/
> >>
> >>>
> >>> First, linear alias of the [_text, __init_begin) is not a target for
> >>> the split and it also seems strange to me to add code inside alloc_init_XXX()
> >>> that both checks an address range and counts to get the number of block mappings.
> >>>
> >>> Second, for a future feature,
> >>> I hope to add some code to split "specfic" area to be spilt e.x)
> >>> to set a specific pkey for specific area.
> >>
> >> Could you give more detail on this? My working assumption is that either the
> >> system supports BBML2 or it doesn't. If it doesn't, we need to split the whole
> >> linear map. If it does, we already have logic to split parts of the linear map
> >> when needed.
> >
> > This is not for a linear mapping case. but for a "kernel text area".
> > As a draft, I want to mark some of kernel code can executable
> > both kernel and eBPF program.
> > (I'm trying to make eBPF program non-executable kernel code directly
> > with POE feature).
> > For this "executable area" both of kernel and eBPF program
> > -- typical example is exception entry, It need to split that specific
> > range and mark them with special POE index.
>
> Ahh yes, I recall you mentioning this a while back (although I confess all the
> deatils have fallen out of my head). You'd need to make sure you're definitely
> not splitting an area of text that the secondary CPUs are executing while they
> are being held in the pen, since at least one of those CPUs doesn't support BBML2.
>

Absoultely. Anyway for that feature, I hope to sustain the current
way -- collect, pre-allocate and use them for specific ranges.

> >
> >>
> >>>
> >>> In this case, it's useful to rewalk the page-table with the specific
> >>> range to get the number of block mapping.
> >>>
> >>>>
> >>>>> +	split_pgtables_idx = 0;
> >>>>> +	split_pgtables_count = 0;
> >>>>> +
> >>>>> +	ret = walk_kernel_page_table_range_lockless(lstart, kstart,
> >>>>> +						    &collect_to_split_ops,
> >>>>> +						    NULL, NULL);
> >>>>> +	if (!ret)
> >>>>> +		ret = walk_kernel_page_table_range_lockless(kend, lend,
> >>>>> +							    &collect_to_split_ops,
> >>>>> +							    NULL, NULL);
> >>>>> +	if (ret || !split_pgtables_count)
> >>>>> +		goto error;
> >>>>> +
> >>>>> +	ret = -ENOMEM;
> >>>>> +
> >>>>> +	split_pgtables = kvmalloc(split_pgtables_count * sizeof(struct ptdesc *),
> >>>>> +				  GFP_KERNEL | __GFP_ZERO);
> >>>>> +	if (!split_pgtables)
> >>>>> +		goto error;
> >>>>> +
> >>>>> +	for (i = 0; i < split_pgtables_count; i++) {
> >>>>> +		/* The page table will be filled during splitting, so zeroing it is unnecessary. */
> >>>>> +		split_pgtables[i] = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_ZERO, 0);
> >>>>> +		if (!split_pgtables[i])
> >>>>> +			goto error;
> >>>>
> >>>> This looks potentially expensive on the boot path and only gets worse as
> >>>> the amount of memory grows. Maybe we should predicate this preallocation
> >>>> on preempt-rt?
> >>>
> >>> Agree. then I'll apply pre-allocation with PREEMPT_RT only.
> >>
> >> I guess I'm missing something obvious but I don't understand the problem here...
> >> We are only deferring the allocation of all these pgtables, so the cost is
> >> neutral surely? Had we correctly guessed that the system doesn't support BBML2
> >> earlier, we would have had to allocate all these pgtables earlier.
> >>
> >> Another way to look at it is that we are still allocating the same number of
> >> pgtables in the existing fallback path, it's just that we are doing it inside
> >> the stop_machine().
> >>
> >> My vote would be _not_ to have a separate path for PREEMPT_RT, which will end up
> >> with significantly less testing...
> >
> > IIUC, Will's mention is additional memory allocation for
> > "split_pgtables" where saved "pre-allocate" page tables.
> > As the memory increase, definitely this size would increase the cost.
>
> Err, so you're referring to the extra kvmalloc()? I don't think that's a big
> deal is it? you get 512 pointers per page. So the amortized cost is 1/512= 0.2%?
>
> I suspect we have both misunderstood Will's point...

Might be.. sorry for my misunderstanding.

--
Sincerely,
Yeoreum Yun

next prev parent reply	other threads:[~2026-01-20 11:01 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-05 20:23 [PATCH v5 0/3] fix wrong usage of memory allocation APIs under PREEMPT_RT in arm64 Yeoreum Yun
2026-01-05 20:23 ` [PATCH v5 1/3] arm64: mmu: introduce pgtable_alloc_t Yeoreum Yun
2026-01-05 20:23 ` [PATCH v5 2/3] arm64: mmu: avoid allocating pages while splitting the linear mapping Yeoreum Yun
2026-01-19 17:28   ` Will Deacon
2026-01-19 21:24     ` Yeoreum Yun
2026-01-20  8:56       ` Ryan Roberts
2026-01-20  9:29         ` Yeoreum Yun
2026-01-20 10:40           ` Ryan Roberts
2026-01-20 10:54             ` Yeoreum Yun [this message]
2026-01-20 15:53             ` Will Deacon
2026-01-20 16:16               ` Yeoreum Yun
2026-01-20 16:22               ` Ryan Roberts
2026-01-20 16:31                 ` Yeoreum Yun
2026-01-20 17:35                   ` Ryan Roberts
2026-01-20 17:49                     ` Yeoreum Yun
2026-01-21  0:12                     ` Yang Shi
2026-01-21  8:32                       ` Yeoreum Yun
2026-01-21 10:20                         ` Ryan Roberts
2026-01-21 11:30                           ` Yeoreum Yun
2026-01-21 22:57                           ` Yang Shi
2026-01-22  7:42                             ` Yeoreum Yun
2026-01-22 13:47                               ` Ryan Roberts
2026-01-20 22:24           ` Yang Shi
2026-01-20 23:01             ` Yeoreum Yun
2026-01-21  0:43               ` Yang Shi
2026-01-21  8:15                 ` Yeoreum Yun
2026-01-05 20:23 ` [PATCH v5 3/3] arm64: mmu: avoid allocating pages while installing ng-mapping for KPTI Yeoreum Yun
2026-01-19 17:31   ` Will Deacon
2026-01-19 21:30     ` Yeoreum Yun
2026-01-20 11:44       ` Will Deacon
2026-01-20 15:30         ` Yeoreum Yun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aW9e/6np9uO39yXM@e129823.arm.com \
    --to=yeoreum.yun@arm.com \
    --cc=akpm@linux-oundation.org \
    --cc=ardb@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=catalin.marinas@arm.com \
    --cc=chaitanyas.prakash@arm.com \
    --cc=clrkwllms@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=jackmanb@google.com \
    --cc=kevin.brodsky@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=quic_zhenhuah@quicinc.com \
    --cc=rostedt@goodmis.org \
    --cc=ryan.roberts@arm.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox