From: Yang Shi <yang@os.amperecomputing.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
Mike Rapoport <rppt@kernel.org>, Dev Jain <dev.jain@arm.com>
Cc: akpm@linux-foundation.org, david@redhat.com,
catalin.marinas@arm.com, will@kernel.org,
lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
vbabka@suse.cz, surenb@google.com, mhocko@suse.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
suzuki.poulose@arm.com, steven.price@arm.com, gshan@redhat.com,
linux-arm-kernel@lists.infradead.org, anshuman.khandual@arm.com
Subject: Re: [PATCH v3 1/2] arm64: pageattr: Use pagewalk API to change memory permissions
Date: Thu, 26 Jun 2025 14:08:40 -0700 [thread overview]
Message-ID: <970c5885-8a06-438e-b626-e6640f9322f5@os.amperecomputing.com> (raw)
In-Reply-To: <b0ef3756-2cd2-41d7-b757-0518332e1b54@arm.com>
On 6/26/25 1:47 AM, Ryan Roberts wrote:
> On 25/06/2025 21:40, Yang Shi wrote:
>>
>> On 6/25/25 4:04 AM, Ryan Roberts wrote:
>>> On 15/06/2025 08:32, Mike Rapoport wrote:
>>>> On Fri, Jun 13, 2025 at 07:13:51PM +0530, Dev Jain wrote:
>>>>> -/*
>>>>> - * This function assumes that the range is mapped with PAGE_SIZE pages.
>>>>> - */
>>>>> -static int __change_memory_common(unsigned long start, unsigned long size,
>>>>> +static int ___change_memory_common(unsigned long start, unsigned long size,
>>>>> pgprot_t set_mask, pgprot_t clear_mask)
>>>>> {
>>>>> struct page_change_data data;
>>>>> @@ -61,9 +140,28 @@ static int __change_memory_common(unsigned long start,
>>>>> unsigned long size,
>>>>> data.set_mask = set_mask;
>>>>> data.clear_mask = clear_mask;
>>>>> - ret = apply_to_page_range(&init_mm, start, size, change_page_range,
>>>>> - &data);
>>>>> + arch_enter_lazy_mmu_mode();
>>>>> +
>>>>> + /*
>>>>> + * The caller must ensure that the range we are operating on does not
>>>>> + * partially overlap a block mapping. Any such case should either not
>>>>> + * exist, or must be eliminated by splitting the mapping - which for
>>>>> + * kernel mappings can be done only on BBML2 systems.
>>>>> + *
>>>>> + */
>>>>> + ret = walk_kernel_page_table_range_lockless(start, start + size,
>>>>> + &pageattr_ops, NULL, &data);
>>>> x86 has a cpa_lock for set_memory/set_direct_map to ensure that there's on
>>>> concurrency in kernel page table updates. I think arm64 has to have such
>>>> lock as well.
>>> We don't have a lock today, using apply_to_page_range(); we are expecting that
>>> the caller has exclusive ownership of the portion of virtual memory - i.e. the
>>> vmalloc region or linear map. So I don't think this patch changes that
>>> requirement?
>>>
>>> Where it does get a bit more hairy is when we introduce the support for
>>> splitting. In that case, 2 non-overlapping areas of virtual memory may share a
>>> large leaf mapping that needs to be split. But I've been discussing that with
>>> Yang Shi at [1] and I think we can handle that locklessly too.
>> If the split is serialized by a lock, changing permission can be lockless. But
>> if split is lockless, changing permission may be a little bit tricky,
>> particularly for CONT mappings. The implementation in my split patch assumes the
>> whole range has cont bit cleared if the first PTE in the range has cont bit
>> cleared because the lock guarantees two concurrent splits are serialized.
>>
>> But lockless split may trigger the below race:
>>
>> CPU A is splitting the page table, CPU B is changing the permission for one PTE
>> entry in the same table. Clearing cont bit is RMW, changing permission is RMW
>> too, but neither of them is atomic.
>>
>> CPU A CPU B
>> read the PTE read the PTE
>> clear the cont bit for the PTE
>> change the PTE permission from RW to RO
>> store the new PTE
>>
>> store the new PTE <- it will overwrite the PTE value stored by CPU B and result
>> in misprogrammed cont PTEs
> Ahh yes, good point! I missed that. When I was thinking about this, I had
> assumed that *both* CPUs racing to split would (non-atomically) RMW to remove
> the cont bit on the whole block. That is safe as long as nothing else in the PTE
> changes. But of course you're right that the first one to complete that may then
> go on to modify the permissions in their portion of the now-split VA space. So
> there is definitely a problem.
>
>>
>> We should need do one the of the follows to avoid the race off the top of my head:
>> 1. Serialize the split with a lock
> I guess this is certainly the simplest as per your original proposal.
Yeah
>
>> 2. Make page table RMW atomic in both split and permission change
> I don't think we would need atomic RMW for the permission change - we would only
> need it for removing the cont bit? My reasoning is that by the time a thread is
> doing the permission change it must have already finished splitting the cont
> block. The permission change will only be for PTEs that we know we have
> exclusive access too. The other CPU may still be "splitting" the cont block, but
> since we already won, it will just be reading the PTEs and noticing that cont is
> already clear? I guess split_contpte()/split_contpmd() becomes a loop doing
> READ_ONCE() to test if the bit is set, followed by atomic bit clear if it was
> set (avoid the atomic where we can)?
>
>> 3. Check whether PTE is cont or not for every PTEs in the range instead of the
>> first PTE, before clearing cont bit if they are
> Ahh perhaps this is what I'm actually describing above?
Yes
>
>> 4. Retry if cont bit is not cleared in permission change, but we need
>> distinguish this from changing permission for the whole CONT PTE range because
>> we keep cont bit for this case
> I'd prefer to keep the splitting decoupled from the permission change if we can.
I agree.
>
>
> Personally, I'd prefer to take the lockless approach. I think it has the least
> chance of contention issues. But if you prefer to use a lock, then I'm ok with
> that as a starting point. I'd prefer to use a new separate lock though (like x86
> does) rather than risking extra contention with the init_mm PTL.
A separate lock is fine to me. I think it will make our life easier to
use a lock. We can always optimize it if the lock contention turns out
to be a problem.
Thanks,
Yang
>
> Thanks,
> Ryan
>
>
>> Thanks,
>> Yang
>>
>>> Perhaps I'm misunderstanding something?
>>>
>>> [1] https://lore.kernel.org/all/f036acea-1bd1-48a7-8600-75ddd504b8db@arm.com/
>>>
>>> Thanks,
>>> Ryan
>>>
>>>>> + arch_leave_lazy_mmu_mode();
>>>>> +
>>>>> + return ret;
>>>>> +}
next prev parent reply other threads:[~2025-06-26 21:08 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-13 13:43 [PATCH v3 0/2] Enable permission change on arm64 kernel block mappings Dev Jain
2025-06-13 13:43 ` [PATCH v3 1/2] arm64: pageattr: Use pagewalk API to change memory permissions Dev Jain
2025-06-13 16:27 ` Lorenzo Stoakes
2025-06-14 14:50 ` Karim Manaouil
2025-06-19 4:03 ` Dev Jain
2025-06-25 10:57 ` Ryan Roberts
2025-06-15 7:25 ` Mike Rapoport
2025-06-25 10:57 ` Ryan Roberts
2025-06-15 7:32 ` Mike Rapoport
2025-06-19 4:10 ` Dev Jain
2025-06-25 11:04 ` Ryan Roberts
2025-06-25 20:40 ` Yang Shi
2025-06-26 8:47 ` Ryan Roberts
2025-06-26 21:08 ` Yang Shi [this message]
2025-06-25 11:20 ` Ryan Roberts
2025-06-26 5:47 ` Dev Jain
2025-06-26 8:15 ` Ryan Roberts
2025-06-13 13:43 ` [PATCH v3 2/2] arm64: pageattr: Enable huge-vmalloc permission change Dev Jain
2025-06-25 11:08 ` Ryan Roberts
2025-06-25 11:16 ` Dev Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=970c5885-8a06-438e-b626-e6640f9322f5@os.amperecomputing.com \
--to=yang@os.amperecomputing.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=gshan@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=steven.price@arm.com \
--cc=surenb@google.com \
--cc=suzuki.poulose@arm.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).