linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Xavier  <xavier_qy@163.com>
To: "Gavin Shan" <gshan@redhat.com>
Cc: dev.jain@arm.com, akpm@linux-foundation.org, baohua@kernel.org,
	ryan.roberts@arm.com, catalin.marinas@arm.com,
	ioworker0@gmail.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, will@kernel.org
Subject: Re:Re: [PATCH v2 1/1] mm/contpte: Optimize loop to reduce redundant operations
Date: Wed, 9 Apr 2025 23:10:17 +0800 (CST)	[thread overview]
Message-ID: <711cf430.b25d.1961b1a1b0e.Coremail.xavier_qy@163.com> (raw)
In-Reply-To: <34a4bf01-6324-482b-a011-32974d68e02f@redhat.com>



Hi Gavin,

Thank you for carefully reviewing this patch and raising your questions. 
I'll try to explain and answer them below.


At 2025-04-09 12:09:48, "Gavin Shan" <gshan@redhat.com> wrote:
>Hi Xavier,
>
>On 4/8/25 6:58 PM, Xavier wrote:
>> This commit optimizes the contpte_ptep_get function by adding early
>>   termination logic. It checks if the dirty and young bits of orig_pte
>>   are already set and skips redundant bit-setting operations during
>>   the loop. This reduces unnecessary iterations and improves performance.
>> 
>> Signed-off-by: Xavier <xavier_qy@163.com>
>> ---
>>   arch/arm64/mm/contpte.c | 22 ++++++++++++++++++++--
>>   1 file changed, 20 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
>> index bcac4f55f9c1..034d153d7d19 100644
>> --- a/arch/arm64/mm/contpte.c
>> +++ b/arch/arm64/mm/contpte.c
>> @@ -152,6 +152,18 @@ void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr,
>>   }
>>   EXPORT_SYMBOL_GPL(__contpte_try_unfold);
>>   
>
>I'm wandering how it can work. More details are given below.
>
>> +#define CHECK_CONTPTE_FLAG(start, ptep, orig_pte, flag) \
>> +	do { \
>> +		int _start = start; \
>> +		pte_t *_ptep = ptep; \
>> +		while (_start++ < CONT_PTES) { \
>> +			if (pte_##flag(__ptep_get(_ptep++))) { \
>> +				orig_pte = pte_mk##flag(orig_pte); \
>> +				break; \
>> +			} \
>> +		} \
>> +	} while (0)
>> +
>
>CONT_PTES is 16 with the assumption of 4KB base page size. CHECK_CONTPTE_FLAG()
>collects the flag for ptep[start, CONT_PTES].
>
>>   pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte)
>>   {
>>   	/*
>> @@ -169,11 +181,17 @@ pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte)
>>   	for (i = 0; i < CONT_PTES; i++, ptep++) {
>>   		pte = __ptep_get(ptep);
>>   
>> -		if (pte_dirty(pte))
>> +		if (pte_dirty(pte)) {
>>   			orig_pte = pte_mkdirty(orig_pte);
>> +			CHECK_CONTPTE_FLAG(i, ptep, orig_pte, young);
>> +			break;
>> +		}
>>   
>> -		if (pte_young(pte))
>> +		if (pte_young(pte)) {
>>   			orig_pte = pte_mkyoung(orig_pte);
>> +			CHECK_CONTPTE_FLAG(i, ptep, orig_pte, dirty);
>> +			break;
>> +		}
>>   	}
>>   
>>   	return orig_pte;
>
>There are two issues as I can see: (1) The loop stops on any dirty or young flag. Another
>flag can be missed when one flag is seen. For example, when ptep[0] has both dirty/young
>flag, only the dirty flag is collected. (2) CHECK_CONTPTE_FLAG() iterates ptep[i, CONT_PTES],
>conflicting to the outer loop, which iterates ptep[0, CONT_PTES].

No flags will be missed. The outer loop is used to check for the first flag,
which could be either the dirty or young flag.
Once this flag (let's assume it's the dirty flag) is found in the i-th PTE,
the dirty flag of orig_pte will be set, and the code will immediately enter
the inner loop, namely CHECK_CONTPTE_FLAG. This inner loop will continue
to check only for the young flag starting from the i-th position, and we needn't
concern about the dirty flag anymore.
If CHECK_CONTPTE_FLAG finds the young flag in the j-th PTE, the young flag
of orig_pte will be set. At this point, both the young and dirty flags of
orig_pte have been set, and there's no need for further loop judgments, so
the both the inner and outer loops can be exited directly. This approach
reduces unnecessary repeated traversals and judgments.

>
>Besides, I also doubt how much performance can be gained by bailing early on (dirty | young).
>However, it's avoided to cross the L1D cache line boundary if possible. With 4KB base page
>size, 128 bytes are needed for ptep[CONT_PTES], equal to two cache lines. If we can bail
>early with luck, we don't have to step into another cache line. Note that extra checks needs
>more CPU cycles.

Compared to the previous function, this code doesn't add any extra checks.
Even in the worst-case scenario, where neither a dirty nor a young flag is
found among the 16 PTEs, the number of checks is the same as in the original
function. If any flag is found earlier, the optimized patch will reduce the
number of subsequent checks for that flag compared to the original code.

I'm not sure if my explanation is clear. If you have any further questions,
feel free to communicate with me again.

--
Thanks,
Xavier

  reply	other threads:[~2025-04-09 15:22 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07  9:22 [PATCH v1] mm/contpte: Optimize loop to reduce redundant operations Xavier
2025-04-07 11:29 ` Lance Yang
2025-04-07 12:56   ` Xavier
2025-04-07 13:31     ` Lance Yang
2025-04-07 16:19       ` Dev Jain
2025-04-08  8:58         ` [PATCH v2 0/1] " Xavier
2025-04-08  8:58           ` [PATCH v2 1/1] " Xavier
2025-04-09  4:09             ` Gavin Shan
2025-04-09 15:10               ` Xavier [this message]
2025-04-10  0:58                 ` Gavin Shan
2025-04-10  2:53                   ` Xavier
2025-04-10  3:06                     ` Gavin Shan
2025-04-08  9:17         ` [PATCH v1] " Lance Yang
2025-04-09 15:15           ` Xavier
2025-04-10 21:25 ` Barry Song
2025-04-11 12:03   ` David Laight
2025-04-12  7:18     ` Barry Song
2025-04-11 17:30   ` Dev Jain
2025-04-12  5:05     ` Lance Yang
2025-04-12  5:27       ` Xavier
2025-04-14  8:06       ` Ryan Roberts
2025-04-14  8:51         ` Dev Jain
2025-04-14 12:11           ` Ryan Roberts
2025-04-15  8:22             ` [mm/contpte v3 0/1] " Xavier
2025-04-15  8:22               ` [mm/contpte v3 1/1] " Xavier
2025-04-15  9:01                 ` [mm/contpte v3] " Markus Elfring
2025-04-16  8:57                 ` [mm/contpte v3 1/1] " David Laight
2025-04-16 16:15                   ` Xavier
2025-04-16 12:54                 ` Ryan Roberts
2025-04-16 16:09                   ` Xavier
2025-04-22  9:33                   ` Xavier
2025-04-30 23:17                     ` Barry Song
2025-05-01 12:39                       ` Xavier
2025-05-01 21:19                         ` Barry Song
2025-05-01 21:32                           ` Barry Song
2025-05-04  2:39                             ` Xavier
2025-05-08  1:29                               ` Barry Song
2025-05-08  7:03                             ` [PATCH v4] arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get Xavier Xia
2025-05-08  8:30                               ` David Hildenbrand
2025-05-09  9:17                                 ` Xavier
2025-05-09  9:25                                   ` David Hildenbrand
2025-05-09  2:09                               ` Barry Song
2025-05-09  9:20                                 ` Xavier
2025-04-16  2:10               ` [mm/contpte v3 0/1] mm/contpte: Optimize loop to reduce redundant operations Andrew Morton
2025-04-16  3:25                 ` Xavier
2025-04-16 12:47               ` Catalin Marinas
2025-04-16 15:08                 ` Xavier
2025-04-16 12:48               ` Ryan Roberts
2025-04-16 15:22                 ` Xavier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=711cf430.b25d.1961b1a1b0e.Coremail.xavier_qy@163.com \
    --to=xavier_qy@163.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=dev.jain@arm.com \
    --cc=gshan@redhat.com \
    --cc=ioworker0@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).