* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
@ 2023-01-11 8:34 ` Anshuman Khandual
2023-01-11 13:31 ` Gabriel Krisman Bertazi
2023-01-12 3:05 ` Anshuman Khandual
` (4 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Anshuman Khandual @ 2023-01-11 8:34 UTC (permalink / raw)
To: Gabriel Krisman Bertazi, catalin.marinas, will; +Cc: linux-arm-kernel, broonie
On 1/9/23 20:49, Gabriel Krisman Bertazi wrote:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor. In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
I am just curious, is this the only system register access (AA64MMFR1_EL1)
causing such performance problems ?
>
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path. But, even when the feature of setting faultaround
Right, although cpu_has_hw_af() was added earlier via commit 47d7b15b88f9
("arm64: cpufeature: introduce helper cpu_has_hw_af()"), but above commit
did add this on regular page fault path via do_set_pte().
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
Right, accessing a system register on each page fault is not optimal.
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads. For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
>
> This patch avoids the extra cost by using the sanitized cached value.
> It should be safe to do so, since this register mustn't change for a
> given cpu.
>
> [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
>
> ---
> Changes since v1:
> - Rely on read_sanitised_ftr_reg instead of static var (will)
> ---
> arch/arm64/include/asm/cpufeature.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 03d1c9d7af82..3d1a5677321e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
> if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
> return false;
>
> - mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
> + /*
> + * Use cached version to avoid emulated msr operation on KVM
> + * guests.
> + */
> + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> return cpuid_feature_extract_unsigned_field(mmfr1,
> ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> }
LGTM but as mentioned earlier, are there not other similar instances or this
is just more problematic being on direct page fault path ?
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-11 8:34 ` Anshuman Khandual
@ 2023-01-11 13:31 ` Gabriel Krisman Bertazi
2023-01-12 2:57 ` Anshuman Khandual
0 siblings, 1 reply; 12+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-01-11 13:31 UTC (permalink / raw)
To: Anshuman Khandual; +Cc: catalin.marinas, will, linux-arm-kernel, broonie
Anshuman Khandual <anshuman.khandual@arm.com> writes:
> On 1/9/23 20:49, Gabriel Krisman Bertazi wrote:
>> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
>> in the hypervisor. In fact, ARM documentation mentions some feature
>> registers are not supposed to be accessed frequently by the OS, and
>> therefore should be emulated for guests [1].
>
> I am just curious, is this the only system register access (AA64MMFR1_EL1)
> causing such performance problems ?
I haven't audited all the system registers. For AA64MMFR1_EL1 this is
the only instance where the frequency of access affects performance in a
meaningful way for my workload.
I have a real-world bug report about it, and by profiling vm exit
events, I can also argue this is the only instance of any emulated msr
read/write that happens frequently enough to change the order of
magnitude of exit events measured by perf for my workload between, at
least, 5.4 (it was introduced in v5.12, but I have data back to 5.4) and
mainline.
>> Commit 0388f9c74330 ("arm64: mm: Implement
>> arch_wants_old_prefaulted_pte()") introduced a read of this register in
>> the page fault path. But, even when the feature of setting faultaround
>
> Right, although cpu_has_hw_af() was added earlier via commit 47d7b15b88f9
> ("arm64: cpufeature: introduce helper cpu_has_hw_af()"), but above commit
> did add this on regular page fault path via do_set_pte().
Indeed. The only other usage of this function is in wp_page_copy, and,
from what I can tell, it is in an unlikely() branch when COW is being
performed on a page that was recently unmapped. It is not something
frequent enough that I saw in profiling.
>> ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
>> }
>
> LGTM but as mentioned earlier, are there not other similar instances or this
> is just more problematic being on direct page fault path ?
I think a full audit of the emulated system registers in kvm will be
required to definitely answer it. But this instance is, by far, the hottest
case in the codebase.
--
Gabriel Krisman Bertazi
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-11 13:31 ` Gabriel Krisman Bertazi
@ 2023-01-12 2:57 ` Anshuman Khandual
0 siblings, 0 replies; 12+ messages in thread
From: Anshuman Khandual @ 2023-01-12 2:57 UTC (permalink / raw)
To: Gabriel Krisman Bertazi, Anshuman Khandual
Cc: catalin.marinas, will, linux-arm-kernel, broonie
On 1/11/23 19:01, Gabriel Krisman Bertazi wrote:
> Anshuman Khandual <anshuman.khandual@arm.com> writes:
>
>> On 1/9/23 20:49, Gabriel Krisman Bertazi wrote:
>>> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
>>> in the hypervisor. In fact, ARM documentation mentions some feature
>>> registers are not supposed to be accessed frequently by the OS, and
>>> therefore should be emulated for guests [1].
>>
>> I am just curious, is this the only system register access (AA64MMFR1_EL1)
>> causing such performance problems ?
>
> I haven't audited all the system registers. For AA64MMFR1_EL1 this is
> the only instance where the frequency of access affects performance in a
> meaningful way for my workload.
>
> I have a real-world bug report about it, and by profiling vm exit
> events, I can also argue this is the only instance of any emulated msr
> read/write that happens frequently enough to change the order of
> magnitude of exit events measured by perf for my workload between, at
> least, 5.4 (it was introduced in v5.12, but I have data back to 5.4) and
> mainline.
>
>>> Commit 0388f9c74330 ("arm64: mm: Implement
>>> arch_wants_old_prefaulted_pte()") introduced a read of this register in
>>> the page fault path. But, even when the feature of setting faultaround
>>
>> Right, although cpu_has_hw_af() was added earlier via commit 47d7b15b88f9
>> ("arm64: cpufeature: introduce helper cpu_has_hw_af()"), but above commit
>> did add this on regular page fault path via do_set_pte().
>
> Indeed. The only other usage of this function is in wp_page_copy, and,
> from what I can tell, it is in an unlikely() branch when COW is being
> performed on a page that was recently unmapped. It is not something
> frequent enough that I saw in profiling.
>
>>> ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
>>> }
>>
>> LGTM but as mentioned earlier, are there not other similar instances or this
>> is just more problematic being on direct page fault path ?
>
> I think a full audit of the emulated system registers in kvm will be
> required to definitely answer it. But this instance is, by far, the hottest
> case in the codebase.
>
Thanks for the additional details.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
2023-01-11 8:34 ` Anshuman Khandual
@ 2023-01-12 3:05 ` Anshuman Khandual
2023-01-16 20:41 ` Gabriel Krisman Bertazi
` (3 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Anshuman Khandual @ 2023-01-12 3:05 UTC (permalink / raw)
To: Gabriel Krisman Bertazi, catalin.marinas, will; +Cc: linux-arm-kernel, broonie
On 1/9/23 20:49, Gabriel Krisman Bertazi wrote:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor. In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
>
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path. But, even when the feature of setting faultaround
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads. For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
>
> This patch avoids the extra cost by using the sanitized cached value.
> It should be safe to do so, since this register mustn't change for a
> given cpu.
>
> [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
A small nit. This URL length causes a checkpatch.pl warning.
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
Just wondering, if this URL can be replaced with a document identification
number or something similar for a more cleaner commit message, not sure if
such a thing is even feasible.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
>
> ---
> Changes since v1:
> - Rely on read_sanitised_ftr_reg instead of static var (will)
> ---
> arch/arm64/include/asm/cpufeature.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 03d1c9d7af82..3d1a5677321e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
> if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
> return false;
>
> - mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
> + /*
> + * Use cached version to avoid emulated msr operation on KVM
> + * guests.
> + */
> + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> return cpuid_feature_extract_unsigned_field(mmfr1,
> ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> }
FWIW
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
2023-01-11 8:34 ` Anshuman Khandual
2023-01-12 3:05 ` Anshuman Khandual
@ 2023-01-16 20:41 ` Gabriel Krisman Bertazi
2023-01-17 16:14 ` Will Deacon
2023-01-18 15:44 ` Catalin Marinas
` (2 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-01-16 20:41 UTC (permalink / raw)
To: catalin.marinas; +Cc: will, linux-arm-kernel, broonie
Gabriel Krisman Bertazi <krisman@suse.de> writes:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor. In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
>
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path. But, even when the feature of setting faultaround
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads. For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
>
> This patch avoids the extra cost by using the sanitized cached value.
> It should be safe to do so, since this register mustn't change for a
> given cpu.
>
> [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Hi,
Considering the performance impact on kvm guests, unless someone
opposes, can we get this queued already for -rc5?
--
Gabriel Krisman Bertazi
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-16 20:41 ` Gabriel Krisman Bertazi
@ 2023-01-17 16:14 ` Will Deacon
2023-01-18 16:08 ` Gabriel Krisman Bertazi
0 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2023-01-17 16:14 UTC (permalink / raw)
To: Gabriel Krisman Bertazi; +Cc: catalin.marinas, linux-arm-kernel, broonie
On Mon, Jan 16, 2023 at 05:41:35PM -0300, Gabriel Krisman Bertazi wrote:
> Gabriel Krisman Bertazi <krisman@suse.de> writes:
>
> > Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> > in the hypervisor. In fact, ARM documentation mentions some feature
> > registers are not supposed to be accessed frequently by the OS, and
> > therefore should be emulated for guests [1].
> >
> > Commit 0388f9c74330 ("arm64: mm: Implement
> > arch_wants_old_prefaulted_pte()") introduced a read of this register in
> > the page fault path. But, even when the feature of setting faultaround
> > pages with the old flag is disabled for a given cpu, we are still paying
> > the cost of checking the register on every pagefault. This results in an
> > explosion of vmexit events in KVM guests, which directly impacts the
> > performance of virtualized workloads. For instance, running kernbench
> > yields a 15% increase in system time solely due to the increased vmexit
> > cycles.
> >
> > This patch avoids the extra cost by using the sanitized cached value.
> > It should be safe to do so, since this register mustn't change for a
> > given cpu.
> >
> > [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
> >
> > Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
>
> Hi,
>
> Considering the performance impact on kvm guests, unless someone
> opposes, can we get this queued already for -rc5?
Given this has been the case since v5.12 afaict and it's not a correctness
issue, I was thinking we could queue this for 6.3?
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-17 16:14 ` Will Deacon
@ 2023-01-18 16:08 ` Gabriel Krisman Bertazi
0 siblings, 0 replies; 12+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-01-18 16:08 UTC (permalink / raw)
To: Will Deacon; +Cc: catalin.marinas, linux-arm-kernel, broonie
Will Deacon <will@kernel.org> writes:
> On Mon, Jan 16, 2023 at 05:41:35PM -0300, Gabriel Krisman Bertazi wrote:
>>
>> Considering the performance impact on kvm guests, unless someone
>> opposes, can we get this queued already for -rc5?
>
> Given this has been the case since v5.12 afaict and it's not a correctness
> issue, I was thinking we could queue this for 6.3?
hey Will,
On a second thought, I'm fine with having it only in 6.3. :)
--
Gabriel Krisman Bertazi
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
` (2 preceding siblings ...)
2023-01-16 20:41 ` Gabriel Krisman Bertazi
@ 2023-01-18 15:44 ` Catalin Marinas
2023-01-18 16:06 ` Gabriel Krisman Bertazi
2023-01-18 16:38 ` Will Deacon
2023-01-20 16:59 ` Catalin Marinas
5 siblings, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2023-01-18 15:44 UTC (permalink / raw)
To: Gabriel Krisman Bertazi; +Cc: will, linux-arm-kernel, broonie
On Mon, Jan 09, 2023 at 12:19:55PM -0300, Gabriel Krisman Bertazi wrote:
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 03d1c9d7af82..3d1a5677321e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
> if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
> return false;
>
> - mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
> + /*
> + * Use cached version to avoid emulated msr operation on KVM
> + * guests.
> + */
> + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> return cpuid_feature_extract_unsigned_field(mmfr1,
> ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> }
You can probably make it slightly faster if you add a cpucap feature. We
have ARM64_HW_DBM but not ARM64_HW_AF. Not sure you'd notice a
difference in practice though.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-18 15:44 ` Catalin Marinas
@ 2023-01-18 16:06 ` Gabriel Krisman Bertazi
0 siblings, 0 replies; 12+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-01-18 16:06 UTC (permalink / raw)
To: Catalin Marinas; +Cc: will, linux-arm-kernel, broonie
Catalin Marinas <catalin.marinas@arm.com> writes:
> On Mon, Jan 09, 2023 at 12:19:55PM -0300, Gabriel Krisman Bertazi wrote:
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index 03d1c9d7af82..3d1a5677321e 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
>> if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
>> return false;
>>
>> - mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
>> + /*
>> + * Use cached version to avoid emulated msr operation on KVM
>> + * guests.
>> + */
>> + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
>> return cpuid_feature_extract_unsigned_field(mmfr1,
>> ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
>> }
>
> You can probably make it slightly faster if you add a cpucap feature. We
> have ARM64_HW_DBM but not ARM64_HW_AF. Not sure you'd notice a
> difference in practice though.
Hi Catalin,
I had a patch for that, but when I noticed the existence of
ARM64_HW_DBM, I dropped it. it is mentioned quickly in the v1.
From my measurements, when comparing this version to a static variable
caching the value (patch v1), the bsearch performance impact was not
significant. I think it would be fine to keep this version.
but, of course, I can add the cpucap bit if it is preferred. Please,
let me know.
Thanks,
--
Gabriel Krisman Bertazi
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
` (3 preceding siblings ...)
2023-01-18 15:44 ` Catalin Marinas
@ 2023-01-18 16:38 ` Will Deacon
2023-01-20 16:59 ` Catalin Marinas
5 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2023-01-18 16:38 UTC (permalink / raw)
To: Gabriel Krisman Bertazi; +Cc: catalin.marinas, linux-arm-kernel, broonie
On Mon, Jan 09, 2023 at 12:19:55PM -0300, Gabriel Krisman Bertazi wrote:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor. In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
>
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path. But, even when the feature of setting faultaround
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads. For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
>
> This patch avoids the extra cost by using the sanitized cached value.
> It should be safe to do so, since this register mustn't change for a
> given cpu.
>
> [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
>
> ---
> Changes since v1:
> - Rely on read_sanitised_ftr_reg instead of static var (will)
> ---
> arch/arm64/include/asm/cpufeature.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 03d1c9d7af82..3d1a5677321e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
> if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
> return false;
>
> - mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
> + /*
> + * Use cached version to avoid emulated msr operation on KVM
> + * guests.
> + */
> + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> return cpuid_feature_extract_unsigned_field(mmfr1,
> ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> }
Acked-by: Will Deacon <will@kernel.org>
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
` (4 preceding siblings ...)
2023-01-18 16:38 ` Will Deacon
@ 2023-01-20 16:59 ` Catalin Marinas
5 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2023-01-20 16:59 UTC (permalink / raw)
To: will, Gabriel Krisman Bertazi; +Cc: linux-arm-kernel, broonie
On Mon, 09 Jan 2023 12:19:55 -0300, Gabriel Krisman Bertazi wrote:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor. In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
>
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path. But, even when the feature of setting faultaround
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads. For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
>
> [...]
Applied to arm64 (for-next/misc), thanks!
[1/1] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
https://git.kernel.org/arm64/c/a89c6bcdac22
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread