linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Dev Jain <dev.jain@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	akpm@linux-foundation.org, david@redhat.com,
	catalin.marinas@arm.com, will@kernel.org
Cc: lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, suzuki.poulose@arm.com,
	steven.price@arm.com, gshan@redhat.com,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 2/3] arm64: pageattr: Use walk_page_range_novma() to change memory permissions
Date: Mon, 2 Jun 2025 10:05:56 +0530	[thread overview]
Message-ID: <4fa4b022-ad2c-46db-9edc-ea4396723964@arm.com> (raw)
In-Reply-To: <d195c7bc-0c04-4514-b536-b503d4827914@arm.com>


On 30/05/25 6:23 pm, Ryan Roberts wrote:
> On 30/05/2025 10:04, Dev Jain wrote:
>> Move away from apply_to_page_range(), which does not honour leaf mappings,
>> to walk_page_range_novma(). The callbacks emit a warning and return EINVAL
>> if a partial range is detected.
> Perhaps:
>
> """
> apply_to_page_range(), which was previously used to change permissions for
> kernel mapped memory, can only operate on page mappings. In future we want to
> support block mappings for more efficient TLB usage. Reimplement pageattr.c to
> instead use walk_page_range_novma() to visit and modify leaf mappings of all sizes.
>
> We only require that the start and end of a given range lie on leaf mapping
> boundaries. If this is not the case, emit a warning and return -EINVAL.
> """

Thanks.

>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>>   arch/arm64/mm/pageattr.c | 69 +++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 64 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>> index 39fd1f7ff02a..a5c829c64969 100644
>> --- a/arch/arm64/mm/pageattr.c
>> +++ b/arch/arm64/mm/pageattr.c
>> @@ -8,6 +8,7 @@
>>   #include <linux/mem_encrypt.h>
>>   #include <linux/sched.h>
>>   #include <linux/vmalloc.h>
>> +#include <linux/pagewalk.h>
>>   
>>   #include <asm/cacheflush.h>
>>   #include <asm/pgtable-prot.h>
>> @@ -20,6 +21,67 @@ struct page_change_data {
>>   	pgprot_t clear_mask;
>>   };
>>   
>> +static pteval_t set_pageattr_masks(unsigned long val, struct mm_walk *walk)
> Please don't use unsigned long for raw ptes; This will break with D128 pgtables.
>
> Anshuman had a patch in flight to introduce ptdesc_t as a generic/any level raw
> value. It would be preferable to incorporate that patch and use it. pteval_t
> isn't really correct because this is for any level and that implies pte level only.

Okay.

>
>> +{
>> +	struct page_change_data *masks = walk->private;
>> +	unsigned long new_val = val;
> why do you need new_val? Why not just update and return val?

Yes, shameless copying from riscv and loongarch : )

>
>> +
>> +	new_val &= ~(pgprot_val(masks->clear_mask));
>> +	new_val |= (pgprot_val(masks->set_mask));
>> +
>> +	return new_val;
>> +}
> One potential pitfall of having a generic function that can change the
> permissions for ptes at all levels is that bit 1 is defined differently for
> level 3 vs the other levels. I don't think there should be any issues here in
> practice having had a quick look at all the masks that users currently pass in
> though.
>
>> +
>> +static int pageattr_pud_entry(pud_t *pud, unsigned long addr,
>> +			      unsigned long next, struct mm_walk *walk)
>> +{
>> +	pud_t val = pudp_get(pud);
>> +
>> +	if (pud_leaf(val)) {
>> +		if (WARN_ON_ONCE((next - addr) != PUD_SIZE))
>> +			return -EINVAL;
>> +		val = __pud(set_pageattr_masks(pud_val(val), walk));
>> +		set_pud(pud, val);
>> +		walk->action = ACTION_CONTINUE;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int pageattr_pmd_entry(pmd_t *pmd, unsigned long addr,
>> +			      unsigned long next, struct mm_walk *walk)
>> +{
>> +	pmd_t val = pmdp_get(pmd);
>> +
>> +	if (pmd_leaf(val)) {
>> +		if (WARN_ON_ONCE((next - addr) != PMD_SIZE))
>> +			return -EINVAL;
>> +		val = __pmd(set_pageattr_masks(pmd_val(val), walk));
>> +		set_pmd(pmd, val);
>> +		walk->action = ACTION_CONTINUE;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int pageattr_pte_entry(pte_t *pte, unsigned long addr,
>> +			      unsigned long next, struct mm_walk *walk)
>> +{
>> +	pte_t val = ptep_get(pte);
> BUG: Use __ptep_get(), which is "below" the contpte management layer. ptep_get()
> will look at the contiguous bit and potentially decide to accumulate all the a/d
> bits in the block which is not relavent for kernel mappings.

Thanks.

>
>> +
>> +	val = __pte(set_pageattr_masks(pte_val(val), walk));
>> +	set_pte(pte, val);
> BUG: Use __set_pte(). Same reasoning as above. But this is more harmful because
> set_pte() will try to detect contpte blocks and may zero/flush the entries.
> Which would be very bad for kernel mappings.

Thanks.

>
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct mm_walk_ops pageattr_ops = {
>> +	.pud_entry	= pageattr_pud_entry,
>> +	.pmd_entry	= pageattr_pmd_entry,
>> +	.pte_entry	= pageattr_pte_entry,
> Is there a reason why you don't have pgd and p4d entries? I think there are
> configs where the pgd may contain leaf mappings. Possibly 64K/42-bit, which will
> have 2 levels and I think they will be pgd and pte. So I think you'd better
> implement all levels to be correct.

Okay.

>
>> +	.walk_lock	= PGWALK_NOLOCK,
>> +};
>> +
>>   bool rodata_full __ro_after_init = IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED);
>>   
>>   bool can_set_direct_map(void)
>> @@ -49,9 +111,6 @@ static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
>>   	return 0;
>>   }
>>   
>> -/*
>> - * This function assumes that the range is mapped with PAGE_SIZE pages.
>> - */
>>   static int __change_memory_common(unsigned long start, unsigned long size,
>>   				pgprot_t set_mask, pgprot_t clear_mask)
>>   {
>> @@ -61,8 +120,8 @@ static int __change_memory_common(unsigned long start, unsigned long size,
>>   	data.set_mask = set_mask;
>>   	data.clear_mask = clear_mask;
>>   
>> -	ret = apply_to_page_range(&init_mm, start, size, change_page_range,
>> -					&data);
>> +	ret = walk_page_range_novma(&init_mm, start, start + size,
>> +				    &pageattr_ops, NULL, &data);
>>   
>>   	/*
>>   	 * If the memory is being made valid without changing any other bits
> I notice that set_direct_map_invalid_noflush() and
> set_direct_map_default_noflush() don't use __change_memory_common() but instead
> call apply_to_page_range() direct. (presumably because they don't want the
> associated tlb flush). Is there a reason not to update these callers too?
>
> Perhaps it would be cleaner to wrap in ___change_memory_common (3 leading
> underscores) which does everything except the flush).

Makes sense, I think Yang's series will need this to handle block mappings
for linear map.

>
> Thanks,
> Ryan
>


  reply	other threads:[~2025-06-02  4:39 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-30  9:04 [PATCH 0/3] Enable huge-vmalloc permission change Dev Jain
2025-05-30  9:04 ` [PATCH 1/3] mm: Allow pagewalk without locks Dev Jain
2025-05-30 10:33   ` Ryan Roberts
2025-05-30 21:33     ` Yang Shi
2025-05-30 10:57   ` Lorenzo Stoakes
2025-06-06  9:21     ` Dev Jain
2025-06-06  9:33       ` Dev Jain
2025-06-06 10:02       ` Lorenzo Stoakes
2025-05-30  9:04 ` [PATCH 2/3] arm64: pageattr: Use walk_page_range_novma() to change memory permissions Dev Jain
2025-05-30 12:53   ` Ryan Roberts
2025-06-02  4:35     ` Dev Jain [this message]
2025-06-06  9:49   ` Lorenzo Stoakes
2025-06-06 10:39     ` Dev Jain
2025-06-06 10:56       ` Lorenzo Stoakes
2025-06-06 11:08         ` Dev Jain
2025-06-09  9:41     ` Dev Jain
2025-06-09 11:00       ` Lorenzo Stoakes
2025-06-09 11:31         ` Dev Jain
2025-05-30  9:04 ` [PATCH 3/3] mm/pagewalk: Add pre/post_pte_table callback for lazy MMU on arm64 Dev Jain
2025-05-30 11:14   ` Lorenzo Stoakes
2025-05-30 12:12     ` Ryan Roberts
2025-05-30 12:18       ` Lorenzo Stoakes
2025-05-30  9:24 ` [PATCH 0/3] Enable huge-vmalloc permission change Dev Jain
2025-05-30 10:03 ` Ryan Roberts
2025-05-30 10:10   ` Dev Jain
2025-05-30 10:37     ` Ryan Roberts
2025-05-30 10:42       ` Dev Jain
2025-05-30 10:51         ` Ryan Roberts
2025-05-30 11:11           ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fa4b022-ad2c-46db-9edc-ea4396723964@arm.com \
    --to=dev.jain@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).