From: Catalin Marinas <catalin.marinas@arm.com>
To: Huang Ying <ying.huang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Gavin Shan <gshan@redhat.com>, Ard Biesheuvel <ardb@kernel.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Yicong Yang <yangyicong@hisilicon.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] arm64, mm: avoid always making PTE dirty in pte_mkwrite()
Date: Fri, 17 Oct 2025 19:06:16 +0100 [thread overview]
Message-ID: <aPKFmHg-FrkGJxWd@arm.com> (raw)
In-Reply-To: <20251015023712.46598-1-ying.huang@linux.alibaba.com>
On Wed, Oct 15, 2025 at 10:37:12AM +0800, Huang Ying wrote:
> Current pte_mkwrite_novma() makes PTE dirty unconditionally. This may
> mark some pages that are never written dirty wrongly. For example,
> do_swap_page() may map the exclusive pages with writable and clean PTEs
> if the VMA is writable and the page fault is for read access.
> However, current pte_mkwrite_novma() implementation always dirties the
> PTE. This may cause unnecessary disk writing if the pages are
> never written before being reclaimed.
>
> So, change pte_mkwrite_novma() to clear the PTE_RDONLY bit only if the
> PTE_DIRTY bit is set to make it possible to make the PTE writable and
> clean.
>
> The current behavior was introduced in commit 73e86cb03cf2 ("arm64:
> Move PTE_RDONLY bit handling out of set_pte_at()"). Before that,
> pte_mkwrite() only sets the PTE_WRITE bit, while set_pte_at() only
> clears the PTE_RDONLY bit if both the PTE_WRITE and the PTE_DIRTY bits
> are set.
>
> To test the performance impact of the patch, on an arm64 server
> machine, run 16 redis-server processes on socket 1 and 16
> memtier_benchmark processes on socket 0 with mostly get
> transactions (that is, redis-server will mostly read memory only).
> The memory footprint of redis-server is larger than the available
> memory, so swap out/in will be triggered. Test results show that the
> patch can avoid most swapping out because the pages are mostly clean.
> And the benchmark throughput improves ~23.9% in the test.
>
> Fixes: 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()")
> Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Gavin Shan <gshan@redhat.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Yicong Yang <yangyicong@hisilicon.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
> arch/arm64/include/asm/pgtable.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index aa89c2e67ebc..0944e296dd4a 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -293,7 +293,8 @@ static inline pmd_t set_pmd_bit(pmd_t pmd, pgprot_t prot)
> static inline pte_t pte_mkwrite_novma(pte_t pte)
> {
> pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
> - pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
> + if (pte_sw_dirty(pte))
> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
> return pte;
> }
This seems to be the right thing. I recall years ago I grep'ed
(obviously not hard enough) and most pte_mkwrite() places had a
pte_mkdirty(). But I missed do_swap_page() and possibly others.
For this patch:
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
I wonder whether we should also add (as a separate patch):
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 830107b6dd08..df1c552ef11c 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -101,6 +101,7 @@ static void __init pte_basic_tests(struct pgtable_debug_args *args, int idx)
WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte))));
WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte, args->vma))));
WARN_ON(pte_dirty(pte_wrprotect(pte_mkclean(pte))));
+ WARN_ON(pte_dirty(pte_mkwrite_novma(pte_mkclean(pte))));
WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte))));
}
For completeness, also (and maybe other combinations):
WARN_ON(!pte_write(pte_mkdirty(pte_mkwrite_novma(pte))));
I cc'ed linux-mm in case we missed anything. If nothing raised, I'll
queue it next week.
Thanks.
--
Catalin
next prev parent reply other threads:[~2025-10-17 18:06 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-15 2:37 [PATCH] arm64, mm: avoid always making PTE dirty in pte_mkwrite() Huang Ying
2025-10-17 18:06 ` Catalin Marinas [this message]
2025-10-20 2:09 ` Anshuman Khandual
2025-10-20 11:04 ` Huang, Ying
2025-10-20 19:17 ` David Hildenbrand
2025-10-20 11:00 ` Huang, Ying
2025-10-21 16:19 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPKFmHg-FrkGJxWd@arm.com \
--to=catalin.marinas@arm.com \
--cc=anshuman.khandual@arm.com \
--cc=ardb@kernel.org \
--cc=gshan@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yangyicong@hisilicon.com \
--cc=ying.huang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.