linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang@os.amperecomputing.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Mike Rapoport <rppt@kernel.org>, Dev Jain <dev.jain@arm.com>
Cc: akpm@linux-foundation.org, david@redhat.com,
	catalin.marinas@arm.com, will@kernel.org,
	lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	vbabka@suse.cz, surenb@google.com, mhocko@suse.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	suzuki.poulose@arm.com, steven.price@arm.com, gshan@redhat.com,
	linux-arm-kernel@lists.infradead.org, anshuman.khandual@arm.com
Subject: Re: [PATCH v3 1/2] arm64: pageattr: Use pagewalk API to change memory permissions
Date: Thu, 26 Jun 2025 14:08:40 -0700	[thread overview]
Message-ID: <970c5885-8a06-438e-b626-e6640f9322f5@os.amperecomputing.com> (raw)
In-Reply-To: <b0ef3756-2cd2-41d7-b757-0518332e1b54@arm.com>



On 6/26/25 1:47 AM, Ryan Roberts wrote:
> On 25/06/2025 21:40, Yang Shi wrote:
>>
>> On 6/25/25 4:04 AM, Ryan Roberts wrote:
>>> On 15/06/2025 08:32, Mike Rapoport wrote:
>>>> On Fri, Jun 13, 2025 at 07:13:51PM +0530, Dev Jain wrote:
>>>>> -/*
>>>>> - * This function assumes that the range is mapped with PAGE_SIZE pages.
>>>>> - */
>>>>> -static int __change_memory_common(unsigned long start, unsigned long size,
>>>>> +static int ___change_memory_common(unsigned long start, unsigned long size,
>>>>>                    pgprot_t set_mask, pgprot_t clear_mask)
>>>>>    {
>>>>>        struct page_change_data data;
>>>>> @@ -61,9 +140,28 @@ static int __change_memory_common(unsigned long start,
>>>>> unsigned long size,
>>>>>        data.set_mask = set_mask;
>>>>>        data.clear_mask = clear_mask;
>>>>>    -    ret = apply_to_page_range(&init_mm, start, size, change_page_range,
>>>>> -                    &data);
>>>>> +    arch_enter_lazy_mmu_mode();
>>>>> +
>>>>> +    /*
>>>>> +     * The caller must ensure that the range we are operating on does not
>>>>> +     * partially overlap a block mapping. Any such case should either not
>>>>> +     * exist, or must be eliminated by splitting the mapping - which for
>>>>> +     * kernel mappings can be done only on BBML2 systems.
>>>>> +     *
>>>>> +     */
>>>>> +    ret = walk_kernel_page_table_range_lockless(start, start + size,
>>>>> +                            &pageattr_ops, NULL, &data);
>>>> x86 has a cpa_lock for set_memory/set_direct_map to ensure that there's on
>>>> concurrency in kernel page table updates. I think arm64 has to have such
>>>> lock as well.
>>> We don't have a lock today, using apply_to_page_range(); we are expecting that
>>> the caller has exclusive ownership of the portion of virtual memory - i.e. the
>>> vmalloc region or linear map. So I don't think this patch changes that
>>> requirement?
>>>
>>> Where it does get a bit more hairy is when we introduce the support for
>>> splitting. In that case, 2 non-overlapping areas of virtual memory may share a
>>> large leaf mapping that needs to be split. But I've been discussing that with
>>> Yang Shi at [1] and I think we can handle that locklessly too.
>> If the split is serialized by a lock, changing permission can be lockless. But
>> if split is lockless, changing permission may be a little bit tricky,
>> particularly for CONT mappings. The implementation in my split patch assumes the
>> whole range has cont bit cleared if the first PTE in the range has cont bit
>> cleared because the lock guarantees two concurrent splits are serialized.
>>
>> But lockless split may trigger the below race:
>>
>> CPU A is splitting the page table, CPU B is changing the permission for one PTE
>> entry in the same table. Clearing cont bit is RMW, changing permission is RMW
>> too, but neither of them is atomic.
>>
>>                 CPU A                                      CPU B
>> read the PTE read the PTE
>> clear the cont bit for the PTE
>>                                     change the PTE permission from RW to RO
>>                                     store the new PTE
>>
>> store the new PTE <- it will overwrite the PTE value stored by CPU B and result
>> in misprogrammed cont PTEs
> Ahh yes, good point! I missed that. When I was thinking about this, I had
> assumed that *both* CPUs racing to split would (non-atomically) RMW to remove
> the cont bit on the whole block. That is safe as long as nothing else in the PTE
> changes. But of course you're right that the first one to complete that may then
> go on to modify the permissions in their portion of the now-split VA space. So
> there is definitely a problem.
>
>>
>> We should need do one the of the follows to avoid the race off the top of my head:
>> 1. Serialize the split with a lock
> I guess this is certainly the simplest as per your original proposal.

Yeah

>
>> 2. Make page table RMW atomic in both split and permission change
> I don't think we would need atomic RMW for the permission change - we would only
> need it for removing the cont bit? My reasoning is that by the time a thread is
> doing the permission change it must have already finished splitting the cont
> block. The permission change will only be for PTEs that we know we have
> exclusive access too. The other CPU may still be "splitting" the cont block, but
> since we already won, it will just be reading the PTEs and noticing that cont is
> already clear? I guess split_contpte()/split_contpmd() becomes a loop doing
> READ_ONCE() to test if the bit is set, followed by atomic bit clear if it was
> set (avoid the atomic where we can)?
>
>> 3. Check whether PTE is cont or not for every PTEs in the range instead of the
>> first PTE, before clearing cont bit if they are
> Ahh perhaps this is what I'm actually describing above?

Yes

>
>> 4. Retry if cont bit is not cleared in permission change, but we need
>> distinguish this from changing permission for the whole CONT PTE range because
>> we keep cont bit for this case
> I'd prefer to keep the splitting decoupled from the permission change if we can.

I agree.

>
>
> Personally, I'd prefer to take the lockless approach. I think it has the least
> chance of contention issues. But if you prefer to use a lock, then I'm ok with
> that as a starting point. I'd prefer to use a new separate lock though (like x86
> does) rather than risking extra contention with the init_mm PTL.

A separate lock is fine to me. I think it will make our life easier to 
use a lock. We can always optimize it if the lock contention turns out 
to be a problem.

Thanks,
Yang

>
> Thanks,
> Ryan
>
>
>> Thanks,
>> Yang
>>
>>> Perhaps I'm misunderstanding something?
>>>
>>> [1] https://lore.kernel.org/all/f036acea-1bd1-48a7-8600-75ddd504b8db@arm.com/
>>>
>>> Thanks,
>>> Ryan
>>>
>>>>> +    arch_leave_lazy_mmu_mode();
>>>>> +
>>>>> +    return ret;
>>>>> +}


  reply	other threads:[~2025-06-26 21:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-13 13:43 [PATCH v3 0/2] Enable permission change on arm64 kernel block mappings Dev Jain
2025-06-13 13:43 ` [PATCH v3 1/2] arm64: pageattr: Use pagewalk API to change memory permissions Dev Jain
2025-06-13 16:27   ` Lorenzo Stoakes
2025-06-14 14:50     ` Karim Manaouil
2025-06-19  4:03       ` Dev Jain
2025-06-25 10:57         ` Ryan Roberts
2025-06-15  7:25     ` Mike Rapoport
2025-06-25 10:57       ` Ryan Roberts
2025-06-15  7:32   ` Mike Rapoport
2025-06-19  4:10     ` Dev Jain
2025-06-25 11:04     ` Ryan Roberts
2025-06-25 20:40       ` Yang Shi
2025-06-26  8:47         ` Ryan Roberts
2025-06-26 21:08           ` Yang Shi [this message]
2025-06-25 11:20   ` Ryan Roberts
2025-06-26  5:47   ` Dev Jain
2025-06-26  8:15     ` Ryan Roberts
2025-06-13 13:43 ` [PATCH v3 2/2] arm64: pageattr: Enable huge-vmalloc permission change Dev Jain
2025-06-25 11:08   ` Ryan Roberts
2025-06-25 11:16     ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=970c5885-8a06-438e-b626-e6640f9322f5@os.amperecomputing.com \
    --to=yang@os.amperecomputing.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=gshan@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).