From: Yang Shi <yang@os.amperecomputing.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
will@kernel.org, catalin.marinas@arm.com,
Miko.Lenczewski@arm.com, dev.jain@arm.com,
scott@os.amperecomputing.com, cl@gentwo.org
Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] arm64: mm: support large block mapping when rodata=full
Date: Thu, 26 Jun 2025 15:39:14 -0700 [thread overview]
Message-ID: <25ecbf39-e5dc-496c-be3c-8b25eeae2414@os.amperecomputing.com> (raw)
In-Reply-To: <ed942c01-58e8-4d91-8f86-3b3645af6940@arm.com>
On 6/23/25 6:26 AM, Ryan Roberts wrote:
> [...]
>
>>> +
>>> +int split_leaf_mapping(unsigned long addr)
>> Thanks for coming up with the code. It does help to understand your idea. Now I
>> see why you suggested "split_mapping(start); split_mapping(end);" model. It does
>> make the implementation easier because we don't need a loop anymore. But this
>> may have a couple of problems:
>> 1. We need walk the page table twice instead of once. It sounds expensive.
> Yes we need to walk twice. That may be more expensive or less expensive,
> depending on the size of the range that you are splitting. If the range is large
> then your approach loops through every leaf mapping between the start and end
> which will be more expensive than just doing 2 walks. If the range is small then
> your approach can avoid the second walk, but at the expense of all the extra
> loop overhead.
Thinking about this further. Although there is some extra loop overhead,
but there should be not extra loads. We can check whether the start and
end are properly aligned or not, it they are aligned, we just continue
the loop without loading page table entry.
And we can optimize the loop by advancing multiple PUD/PMD/CONT size at
a time instead of one at a time. The pseudo code (for example, pmd
level) looks like:
do {
next = pmd_addr_end(start, end);
if (next < end)
nr = ((end - next) / PMD_SIZE) + 1;
if (((start | next) & ~PMD_MASK) == 0)
continue;
split_pmd(start, next);
} while (pmdp += nr, start = next * nr, start != end)
For repainting case, we just need do:
do {
nr = 1;
next = pmd_addr_end(start, end);
if (next < end && !repainting)
nr = ((end - next) / PMD_SIZE) + 1;
if (((start | next) & ~PMD_MASK) == 0 && !repainting)
continue;
split_pmd(start, next);
} while (pmdp += nr, start = next * nr, start != end)
This should reduce loop overhead and duplicate code for repainting.
Thanks,
Yang
>
> My suggestion requires 5 loads (assuming the maximum of 5 levels of lookup).
> Personally I think this is probably acceptable? Perhaps we need some other
> voices here.
>
>
>> 2. How should we handle repainting? We need split all the page tables all the
>> way down to PTE for repainting between start and end rather than keeping block
>> mappings. This model doesn't work, right? For example, repaint a 2G block. The
>> first 1G is mapped by a PUD, the second 1G is mapped by 511 PMD and 512 PTEs.
>> split_mapping(start) will split the first 1G, but split_mapping(end) will do
>> nothing, the 511 PMDs are kept intact. In addition, I think we also prefer reuse
>> the split primitive for repainting instead of inventing another one.
> I agree my approach doesn't work for the repainting case. But I think what I'm
> trying to say is that the 2 things are different operations;
> split_leaf_mapping() is just trying to ensure that the start and end of a ragion
> are on leaf boundaries. Repainting is trying to ensure that all leaf mappings
> within a range are PTE-size. I've implemented the former and you've implemented
> that latter. Your implementation looks like meets the former's requirements
> because you are only testing it for the case where the range is 1 page. But
> actually it is splitting everything in the range to PTEs.
>
> Thanks,
> Ryan
>
>> Thanks,
>> Yang
>>
>>> +{
>>> + pgd_t *pgdp, pgd;
>>> + p4d_t *p4dp, p4d;
>>> + pud_t *pudp, pud;
>>> + pmd_t *pmdp, pmd;
>>> + pte_t *ptep, pte;
>>> + int ret = 0;
>>> +
>>> + /*
>>> + * !BBML2_NOABORT systems should not be trying to change permissions on
>>> + * anything that is not pte-mapped in the first place. Just return early
>>> + * and let the permission change code raise a warning if not already
>>> + * pte-mapped.
>>> + */
>>> + if (!system_supports_bbml2_noabort())
>>> + return 0;
>>> +
>>> + /*
>>> + * Ensure addr is at least page-aligned since this is the finest
>>> + * granularity we can split to.
>>> + */
>>> + if (addr != PAGE_ALIGN(addr))
>>> + return -EINVAL;
>>> +
>>> + arch_enter_lazy_mmu_mode();
>>> +
>>> + /*
>>> + * PGD: If addr is PGD aligned then addr already describes a leaf
>>> + * boundary. If not present then there is nothing to split.
>>> + */
>>> + if (ALIGN_DOWN(addr, PGDIR_SIZE) == addr)
>>> + goto out;
>>> + pgdp = pgd_offset_k(addr);
>>> + pgd = pgdp_get(pgdp);
>>> + if (!pgd_present(pgd))
>>> + goto out;
>>> +
>>> + /*
>>> + * P4D: If addr is P4D aligned then addr already describes a leaf
>>> + * boundary. If not present then there is nothing to split.
>>> + */
>>> + if (ALIGN_DOWN(addr, P4D_SIZE) == addr)
>>> + goto out;
>>> + p4dp = p4d_offset(pgdp, addr);
>>> + p4d = p4dp_get(p4dp);
>>> + if (!p4d_present(p4d))
>>> + goto out;
>>> +
>>> + /*
>>> + * PUD: If addr is PUD aligned then addr already describes a leaf
>>> + * boundary. If not present then there is nothing to split. Otherwise,
>>> + * if we have a pud leaf, split to contpmd.
>>> + */
>>> + if (ALIGN_DOWN(addr, PUD_SIZE) == addr)
>>> + goto out;
>>> + pudp = pud_offset(p4dp, addr);
>>> + pud = pudp_get(pudp);
>>> + if (!pud_present(pud))
>>> + goto out;
>>> + if (pud_leaf(pud)) {
>>> + ret = split_pud(pudp, pud);
>>> + if (ret)
>>> + goto out;
>>> + }
>>> +
>>> + /*
>>> + * CONTPMD: If addr is CONTPMD aligned then addr already describes a
>>> + * leaf boundary. If not present then there is nothing to split.
>>> + * Otherwise, if we have a contpmd leaf, split to pmd.
>>> + */
>>> + if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
>>> + goto out;
>>> + pmdp = pmd_offset(pudp, addr);
>>> + pmd = pmdp_get(pmdp);
>>> + if (!pmd_present(pmd))
>>> + goto out;
>>> + if (pmd_leaf(pmd)) {
>>> + if (pmd_cont(pmd))
>>> + split_contpmd(pmdp);
>>> + /*
>>> + * PMD: If addr is PMD aligned then addr already describes a
>>> + * leaf boundary. Otherwise, split to contpte.
>>> + */
>>> + if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
>>> + goto out;
>>> + ret = split_pmd(pmdp, pmd);
>>> + if (ret)
>>> + goto out;
>>> + }
>>> +
>>> + /*
>>> + * CONTPTE: If addr is CONTPTE aligned then addr already describes a
>>> + * leaf boundary. If not present then there is nothing to split.
>>> + * Otherwise, if we have a contpte leaf, split to pte.
>>> + */
>>> + if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
>>> + goto out;
>>> + ptep = pte_offset_kernel(pmdp, addr);
>>> + pte = __ptep_get(ptep);
>>> + if (!pte_present(pte))
>>> + goto out;
>>> + if (pte_cont(pte))
>>> + split_contpte(ptep);
>>> +
>>> +out:
>>> + arch_leave_lazy_mmu_mode();
>>> + return ret;
>>> +}
>>> ---8<---
>>>
>>> Thanks,
>>> Ryan
>>>
next prev parent reply other threads:[~2025-06-26 22:39 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-31 2:41 [v4 PATCH 0/4] arm64: support FEAT_BBM level 2 and large block mapping when rodata=full Yang Shi
2025-05-31 2:41 ` [PATCH 1/4] arm64: cpufeature: add AmpereOne to BBML2 allow list Yang Shi
2025-05-31 2:41 ` [PATCH 2/4] arm64: mm: make __create_pgd_mapping() and helpers non-void Yang Shi
2025-06-16 10:04 ` Ryan Roberts
2025-06-17 21:11 ` Yang Shi
2025-06-23 13:05 ` Ryan Roberts
2025-05-31 2:41 ` [PATCH 3/4] arm64: mm: support large block mapping when rodata=full Yang Shi
2025-06-16 11:58 ` Ryan Roberts
2025-06-16 12:33 ` Ryan Roberts
2025-06-17 21:01 ` Yang Shi
2025-06-16 16:24 ` Ryan Roberts
2025-06-17 21:09 ` Yang Shi
2025-06-23 13:26 ` Ryan Roberts
2025-06-23 19:12 ` Yang Shi
2025-06-26 22:39 ` Yang Shi [this message]
2025-07-23 17:38 ` Dev Jain
2025-07-23 20:51 ` Yang Shi
2025-07-24 11:43 ` Dev Jain
2025-07-24 17:59 ` Yang Shi
2025-05-31 2:41 ` [PATCH 4/4] arm64: mm: split linear mapping if BBML2 is not supported on secondary CPUs Yang Shi
2025-06-23 12:26 ` Ryan Roberts
2025-06-23 20:56 ` Yang Shi
2025-06-13 17:21 ` [v4 PATCH 0/4] arm64: support FEAT_BBM level 2 and large block mapping when rodata=full Yang Shi
2025-06-16 9:09 ` Ryan Roberts
2025-06-17 20:57 ` Yang Shi
-- strict thread matches above, loose matches on Subject: below --
2025-07-24 22:11 [v5 " Yang Shi
2025-07-24 22:11 ` [PATCH 3/4] arm64: mm: support " Yang Shi
2025-07-29 12:34 ` Dev Jain
2025-08-05 21:28 ` Yang Shi
2025-08-06 0:10 ` Yang Shi
2025-08-01 14:35 ` Ryan Roberts
2025-08-04 10:07 ` Ryan Roberts
2025-08-05 18:53 ` Yang Shi
2025-08-06 7:20 ` Ryan Roberts
2025-08-07 0:44 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25ecbf39-e5dc-496c-be3c-8b25eeae2414@os.amperecomputing.com \
--to=yang@os.amperecomputing.com \
--cc=Miko.Lenczewski@arm.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=dev.jain@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ryan.roberts@arm.com \
--cc=scott@os.amperecomputing.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).