linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT
@ 2024-08-02  9:34 Yicong Yang
  2024-08-02  9:34 ` [PATCH 1/2] arm64: Add support for FEAT_HAFT Yicong Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Yicong Yang @ 2024-08-02  9:34 UTC (permalink / raw)
  To: catalin.marinas, will, maz, mark.rutland, linux-arm-kernel
  Cc: oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang,
	yangyicong

From: Yicong Yang <yangyicong@hisilicon.com>

This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4
and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in
lru-gen aging. Tested with lru-gen in below steps:
1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to
   stop accessing the memory. (AF bit won't be updated)
2. try to age the memory by /sys/kernel/debug/lru_gen

Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively
(switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG
will clear and test the PMD AF bit on page walking for aging,
otherwise will clear and test the PTE AF bit for aging. In this case
LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning
since pages won't be accessed and we don't need to scan each PTE.

For lru-gen aging:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/mm/multigen_lru.rst?h=v6.11-rc1#n94

Yicong Yang (2):
  arm64: Add support for FEAT_HAFT
  arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG

 arch/arm64/Kconfig                     | 21 ++++++++++++++
 arch/arm64/include/asm/pgtable-hwdef.h |  5 ++++
 arch/arm64/include/asm/pgtable.h       | 14 ++++++++--
 arch/arm64/kernel/cpufeature.c         | 38 ++++++++++++++++++++++++++
 arch/arm64/tools/cpucaps               |  1 +
 arch/arm64/tools/sysreg                |  1 +
 6 files changed, 78 insertions(+), 2 deletions(-)

-- 
2.24.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] arm64: Add support for FEAT_HAFT
  2024-08-02  9:34 [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT Yicong Yang
@ 2024-08-02  9:34 ` Yicong Yang
  2024-08-02 10:37   ` Marc Zyngier
  2024-08-02  9:34 ` [PATCH 2/2] arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG Yicong Yang
  2024-08-02 10:40 ` [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT Marc Zyngier
  2 siblings, 1 reply; 11+ messages in thread
From: Yicong Yang @ 2024-08-02  9:34 UTC (permalink / raw)
  To: catalin.marinas, will, maz, mark.rutland, linux-arm-kernel
  Cc: oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang,
	yangyicong

From: Yicong Yang <yangyicong@hisilicon.com>

Armv8.9/v9.4 introduces the feature Hardware managed Access Flag
for Table descriptors (FEAT_HAFT). The feature is indicated by
ID_AA64MMFR1_EL1.HAFDBS == 0b0011 and can be enabled by
TCR2_EL1.HAFT so it has a dependency on FEAT_TCR2.

This patch adds the Kconfig for FEAT_HAFT and support detecting
and enabling the feature.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 arch/arm64/Kconfig                     | 20 ++++++++++++++
 arch/arm64/include/asm/pgtable-hwdef.h |  5 ++++
 arch/arm64/kernel/cpufeature.c         | 38 ++++++++++++++++++++++++++
 arch/arm64/tools/cpucaps               |  1 +
 arch/arm64/tools/sysreg                |  1 +
 5 files changed, 65 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b3fc891f1544..f263ae4139a5 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2127,6 +2127,26 @@ config ARM64_EPAN
 	  if the cpu does not implement the feature.
 endmenu # "ARMv8.7 architectural features"
 
+menu "ARMv8.9 architectural features"
+
+config ARM64_HAFT
+	bool "Support for Hardware managed Access Flag for Table Descriptor"
+	depends on ARM64_HW_AFDBM
+	default y
+	help
+	  The ARMv8.9/ARMv9.5 introduces the feature Hardware managed Access
+	  Flag for Table descriptors. When enabled in TCR_EL1 (HAFT bit) on
+	  capable processors, an architectural executed memory access will
+	  update the Access Flag in each Table descriptor which is accessed
+	  during the translation table walk and for which the Access Flag is
+	  0. The Access Flag of the Table descriptor use the same bit of
+	  PTE_AF.
+
+	  The feature will only be enabled on supported CPUs. If unsure,
+	  say Y.
+
+endmenu # "ARMv8.9 architectural features"
+
 config ARM64_SVE
 	bool "ARM Scalable Vector Extension support"
 	default y
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 1f60aa1bc750..47bd29874e62 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -308,6 +308,11 @@
 #define TCR_TCMA1		(UL(1) << 58)
 #define TCR_DS			(UL(1) << 59)
 
+/*
+ * TCR2 Flags
+ */
+#define TCR2_HAFT		(UL(1) << 11)
+
 /*
  * TTBR.
  */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 646ecd3069fd..99402fd00f16 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2044,6 +2044,29 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
 
 #endif
 
+#if CONFIG_ARM64_HAFT
+
+static void cpu_enable_haft(struct arm64_cpu_capabilities const *cap)
+{
+	u64 reg = read_sysreg_s(SYS_TCR2_EL1);
+
+	reg |= TCR2_HAFT;
+	write_sysreg_s(reg, SYS_TCR2_EL1);
+	isb();
+	local_flush_tlb_all();
+}
+
+static bool has_haft(const struct arm64_cpu_capabilities *cap, int scope)
+{
+	/* FEAT_HAFT relies on FEAT_TCR2 */
+	if (!this_cpu_has_cap(ARM64_HAS_TCR2))
+		return false;
+
+	return has_cpuid_feature(cap, scope);
+}
+
+#endif
+
 #ifdef CONFIG_ARM64_AMU_EXTN
 
 /*
@@ -2580,6 +2603,21 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.cpus = &dbm_cpus,
 		ARM64_CPUID_FIELDS(ID_AA64MMFR1_EL1, HAFDBS, DBM)
 	},
+#endif
+#ifdef CONFIG_ARM64_HAFT
+	{
+		.desc = "Hardware managed Access Flag for Table Descriptor",
+		/*
+		 * Per Spec, software management of Access Flag for Table
+		 * descriptor is not supported, so make this feature system
+		 * wide.
+		 */
+		.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
+		.capability = ARM64_HAFT,
+		.matches = has_haft,
+		.cpu_enable = cpu_enable_haft,
+		ARM64_CPUID_FIELDS(ID_AA64MMFR1_EL1, HAFDBS, HAFT)
+	},
 #endif
 	{
 		.desc = "CRC32 instructions",
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index ac3429d892b9..0b7a3a237e5d 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -55,6 +55,7 @@ HAS_TLB_RANGE
 HAS_VA52
 HAS_VIRT_HOST_EXTN
 HAS_WFXT
+HAFT
 HW_DBM
 KVM_HVHE
 KVM_PROTECTED_MODE
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 7ceaa1e0b4bc..9b3d15ea8a63 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -1688,6 +1688,7 @@ UnsignedEnum	3:0	HAFDBS
 	0b0000	NI
 	0b0001	AF
 	0b0010	DBM
+	0b0011	HAFT
 EndEnum
 EndSysreg
 
-- 
2.24.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG
  2024-08-02  9:34 [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT Yicong Yang
  2024-08-02  9:34 ` [PATCH 1/2] arm64: Add support for FEAT_HAFT Yicong Yang
@ 2024-08-02  9:34 ` Yicong Yang
  2024-08-02 10:40 ` [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT Marc Zyngier
  2 siblings, 0 replies; 11+ messages in thread
From: Yicong Yang @ 2024-08-02  9:34 UTC (permalink / raw)
  To: catalin.marinas, will, maz, mark.rutland, linux-arm-kernel
  Cc: oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang,
	yangyicong

From: Yicong Yang <yangyicong@hisilicon.com>

With the support of FEAT_HAFT, the NONLEAF_PMD_YOUNG can be enabled
on arm64 since the hardware is capable of updating the AF flag for
PMD table descriptor. Since the AF bit of the table descriptor
shares the same bit position in block descriptors, we only need
to implement arch_has_hw_nonleaf_pmd_young() and select related
configs. The related pmd_young test/update operations keeps the
same with and already implemented for transparent page support.

Currently ARCH_HAS_NONLEAF_PMD_YOUNG is used to improve the
efficiency of lru-gen aging.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 arch/arm64/Kconfig               |  1 +
 arch/arm64/include/asm/pgtable.h | 14 ++++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f263ae4139a5..d1103a3f4a8d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -36,6 +36,7 @@ config ARM64
 	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
 	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+	select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT
 	select ARCH_HAS_PTE_DEVMAP
 	select ARCH_HAS_PTE_SPECIAL
 	select ARCH_HAS_HW_PTE_YOUNG
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7a4f5604be3f..077bea37867e 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1205,7 +1205,7 @@ static inline int __ptep_clear_flush_young(struct vm_area_struct *vma,
 	return young;
 }
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG)
 #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG
 static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 					    unsigned long address,
@@ -1213,7 +1213,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 {
 	return __ptep_test_and_clear_young(vma, address, (pte_t *)pmdp);
 }
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG */
 
 static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
 				       unsigned long address, pte_t *ptep)
@@ -1448,6 +1448,16 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf,
  */
 #define arch_has_hw_pte_young		cpu_has_hw_af
 
+#ifdef CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG
+
+#define arch_has_hw_nonleaf_pmd_young arch_has_hw_nonleaf_pmd_young
+static inline bool arch_has_hw_nonleaf_pmd_young(void)
+{
+	return cpus_have_final_cap(ARM64_HAFT);
+}
+
+#endif
+
 /*
  * Experimentally, it's cheap to set the access flag in hardware and we
  * benefit from prefaulting mappings as 'old' to start with.
-- 
2.24.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] arm64: Add support for FEAT_HAFT
  2024-08-02  9:34 ` [PATCH 1/2] arm64: Add support for FEAT_HAFT Yicong Yang
@ 2024-08-02 10:37   ` Marc Zyngier
  2024-08-06  3:09     ` Yicong Yang
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2024-08-02 10:37 UTC (permalink / raw)
  To: Yicong Yang
  Cc: catalin.marinas, will, mark.rutland, linux-arm-kernel,
	oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang,
	yangyicong

On Fri, 02 Aug 2024 10:34:57 +0100,
Yicong Yang <yangyicong@huawei.com> wrote:
> 
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> Armv8.9/v9.4 introduces the feature Hardware managed Access Flag
> for Table descriptors (FEAT_HAFT). The feature is indicated by
> ID_AA64MMFR1_EL1.HAFDBS == 0b0011 and can be enabled by
> TCR2_EL1.HAFT so it has a dependency on FEAT_TCR2.
> 
> This patch adds the Kconfig for FEAT_HAFT and support detecting
> and enabling the feature.
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
>  arch/arm64/Kconfig                     | 20 ++++++++++++++
>  arch/arm64/include/asm/pgtable-hwdef.h |  5 ++++
>  arch/arm64/kernel/cpufeature.c         | 38 ++++++++++++++++++++++++++
>  arch/arm64/tools/cpucaps               |  1 +
>  arch/arm64/tools/sysreg                |  1 +
>  5 files changed, 65 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index b3fc891f1544..f263ae4139a5 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2127,6 +2127,26 @@ config ARM64_EPAN
>  	  if the cpu does not implement the feature.
>  endmenu # "ARMv8.7 architectural features"
>  
> +menu "ARMv8.9 architectural features"
> +
> +config ARM64_HAFT
> +	bool "Support for Hardware managed Access Flag for Table Descriptor"
> +	depends on ARM64_HW_AFDBM
> +	default y
> +	help
> +	  The ARMv8.9/ARMv9.5 introduces the feature Hardware managed Access
> +	  Flag for Table descriptors. When enabled in TCR_EL1 (HAFT bit) on

TCR2_EL{1,2}. But I don't think we need to details registers and bit
layout in the help section.

> +	  capable processors, an architectural executed memory access will
> +	  update the Access Flag in each Table descriptor which is accessed
> +	  during the translation table walk and for which the Access Flag is
> +	  0. The Access Flag of the Table descriptor use the same bit of
> +	  PTE_AF.
> +
> +	  The feature will only be enabled on supported CPUs. If unsure,
> +	  say Y.
> +
> +endmenu # "ARMv8.9 architectural features"
> +
>  config ARM64_SVE
>  	bool "ARM Scalable Vector Extension support"
>  	default y
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index 1f60aa1bc750..47bd29874e62 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -308,6 +308,11 @@
>  #define TCR_TCMA1		(UL(1) << 58)
>  #define TCR_DS			(UL(1) << 59)
>  
> +/*
> + * TCR2 Flags
> + */
> +#define TCR2_HAFT		(UL(1) << 11)
> +

TCR2_ELx is already fully described in arch/arm64/tools/sysreg.

>  /*
>   * TTBR.
>   */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 646ecd3069fd..99402fd00f16 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2044,6 +2044,29 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
>  
>  #endif
>  
> +#if CONFIG_ARM64_HAFT
> +
> +static void cpu_enable_haft(struct arm64_cpu_capabilities const *cap)
> +{
> +	u64 reg = read_sysreg_s(SYS_TCR2_EL1);
> +
> +	reg |= TCR2_HAFT;
> +	write_sysreg_s(reg, SYS_TCR2_EL1);

Probably more elegantly written as

	sysreg_clear_set_s(SYS_TCR2_EL1, 0, TCR2_EL1x_HAFT);

> +	isb();
> +	local_flush_tlb_all();
> +}
> +
> +static bool has_haft(const struct arm64_cpu_capabilities *cap, int scope)
> +{
> +	/* FEAT_HAFT relies on FEAT_TCR2 */
> +	if (!this_cpu_has_cap(ARM64_HAS_TCR2))
> +		return false;

Why do we need this? If FEAT_TCR2 isn't implemented, this is a HW bug.

> +
> +	return has_cpuid_feature(cap, scope);
> +}
> +
> +#endif
> +
>  #ifdef CONFIG_ARM64_AMU_EXTN
>  
>  /*
> @@ -2580,6 +2603,21 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>  		.cpus = &dbm_cpus,
>  		ARM64_CPUID_FIELDS(ID_AA64MMFR1_EL1, HAFDBS, DBM)
>  	},
> +#endif
> +#ifdef CONFIG_ARM64_HAFT
> +	{
> +		.desc = "Hardware managed Access Flag for Table Descriptor",
> +		/*
> +		 * Per Spec, software management of Access Flag for Table
> +		 * descriptor is not supported, so make this feature system
> +		 * wide.
> +		 */

I don't understand what you mean by this. Can you please clarify?

> +		.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
> +		.capability = ARM64_HAFT,
> +		.matches = has_haft,
> +		.cpu_enable = cpu_enable_haft,
> +		ARM64_CPUID_FIELDS(ID_AA64MMFR1_EL1, HAFDBS, HAFT)
> +	},
>  #endif
>  	{
>  		.desc = "CRC32 instructions",
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index ac3429d892b9..0b7a3a237e5d 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -55,6 +55,7 @@ HAS_TLB_RANGE
>  HAS_VA52
>  HAS_VIRT_HOST_EXTN
>  HAS_WFXT
> +HAFT
>  HW_DBM
>  KVM_HVHE
>  KVM_PROTECTED_MODE
> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
> index 7ceaa1e0b4bc..9b3d15ea8a63 100644
> --- a/arch/arm64/tools/sysreg
> +++ b/arch/arm64/tools/sysreg
> @@ -1688,6 +1688,7 @@ UnsignedEnum	3:0	HAFDBS
>  	0b0000	NI
>  	0b0001	AF
>  	0b0010	DBM
> +	0b0011	HAFT
>  EndEnum
>  EndSysreg
>  

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT
  2024-08-02  9:34 [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT Yicong Yang
  2024-08-02  9:34 ` [PATCH 1/2] arm64: Add support for FEAT_HAFT Yicong Yang
  2024-08-02  9:34 ` [PATCH 2/2] arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG Yicong Yang
@ 2024-08-02 10:40 ` Marc Zyngier
  2024-08-06  3:43   ` Yicong Yang
  2 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2024-08-02 10:40 UTC (permalink / raw)
  To: Yicong Yang
  Cc: catalin.marinas, will, mark.rutland, linux-arm-kernel,
	oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang,
	yangyicong

On Fri, 02 Aug 2024 10:34:56 +0100,
Yicong Yang <yangyicong@huawei.com> wrote:
> 
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4
> and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in
> lru-gen aging. Tested with lru-gen in below steps:
> 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to
>    stop accessing the memory. (AF bit won't be updated)
> 2. try to age the memory by /sys/kernel/debug/lru_gen
> 
> Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively
> (switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG
> will clear and test the PMD AF bit on page walking for aging,
> otherwise will clear and test the PTE AF bit for aging. In this case
> LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning
> since pages won't be accessed and we don't need to scan each PTE.

Improve by how much? Can you please publish numbers that demonstrate
the effect of this feature?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] arm64: Add support for FEAT_HAFT
  2024-08-02 10:37   ` Marc Zyngier
@ 2024-08-06  3:09     ` Yicong Yang
  2024-08-06  7:57       ` Marc Zyngier
  0 siblings, 1 reply; 11+ messages in thread
From: Yicong Yang @ 2024-08-06  3:09 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: yangyicong, catalin.marinas, will, mark.rutland, linux-arm-kernel,
	oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang

Hi Marc,

Thanks for the comments.

On 2024/8/2 18:37, Marc Zyngier wrote:
> On Fri, 02 Aug 2024 10:34:57 +0100,
> Yicong Yang <yangyicong@huawei.com> wrote:
>>
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> Armv8.9/v9.4 introduces the feature Hardware managed Access Flag
>> for Table descriptors (FEAT_HAFT). The feature is indicated by
>> ID_AA64MMFR1_EL1.HAFDBS == 0b0011 and can be enabled by
>> TCR2_EL1.HAFT so it has a dependency on FEAT_TCR2.
>>
>> This patch adds the Kconfig for FEAT_HAFT and support detecting
>> and enabling the feature.
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>>  arch/arm64/Kconfig                     | 20 ++++++++++++++
>>  arch/arm64/include/asm/pgtable-hwdef.h |  5 ++++
>>  arch/arm64/kernel/cpufeature.c         | 38 ++++++++++++++++++++++++++
>>  arch/arm64/tools/cpucaps               |  1 +
>>  arch/arm64/tools/sysreg                |  1 +
>>  5 files changed, 65 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index b3fc891f1544..f263ae4139a5 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -2127,6 +2127,26 @@ config ARM64_EPAN
>>  	  if the cpu does not implement the feature.
>>  endmenu # "ARMv8.7 architectural features"
>>  
>> +menu "ARMv8.9 architectural features"
>> +
>> +config ARM64_HAFT
>> +	bool "Support for Hardware managed Access Flag for Table Descriptor"
>> +	depends on ARM64_HW_AFDBM
>> +	default y
>> +	help
>> +	  The ARMv8.9/ARMv9.5 introduces the feature Hardware managed Access
>> +	  Flag for Table descriptors. When enabled in TCR_EL1 (HAFT bit) on
> 
> TCR2_EL{1,2}. But I don't think we need to details registers and bit
> layout in the help section.
> 

ok. will drop this information.

>> +	  capable processors, an architectural executed memory access will
>> +	  update the Access Flag in each Table descriptor which is accessed
>> +	  during the translation table walk and for which the Access Flag is
>> +	  0. The Access Flag of the Table descriptor use the same bit of
>> +	  PTE_AF.
>> +
>> +	  The feature will only be enabled on supported CPUs. If unsure,
>> +	  say Y.
>> +
>> +endmenu # "ARMv8.9 architectural features"
>> +
>>  config ARM64_SVE
>>  	bool "ARM Scalable Vector Extension support"
>>  	default y
>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>> index 1f60aa1bc750..47bd29874e62 100644
>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> @@ -308,6 +308,11 @@
>>  #define TCR_TCMA1		(UL(1) << 58)
>>  #define TCR_DS			(UL(1) << 59)
>>  
>> +/*
>> + * TCR2 Flags
>> + */
>> +#define TCR2_HAFT		(UL(1) << 11)
>> +
> 
> TCR2_ELx is already fully described in arch/arm64/tools/sysreg.
> 

ok. will use the definition generated by sysreg.

>>  /*
>>   * TTBR.
>>   */
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 646ecd3069fd..99402fd00f16 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -2044,6 +2044,29 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
>>  
>>  #endif
>>  
>> +#if CONFIG_ARM64_HAFT
>> +
>> +static void cpu_enable_haft(struct arm64_cpu_capabilities const *cap)
>> +{
>> +	u64 reg = read_sysreg_s(SYS_TCR2_EL1);
>> +
>> +	reg |= TCR2_HAFT;
>> +	write_sysreg_s(reg, SYS_TCR2_EL1);
> 
> Probably more elegantly written as
> 
> 	sysreg_clear_set_s(SYS_TCR2_EL1, 0, TCR2_EL1x_HAFT);
> 

this is simpler. will use sysreg_clear_set_s().

>> +	isb();
>> +	local_flush_tlb_all();
>> +}
>> +
>> +static bool has_haft(const struct arm64_cpu_capabilities *cap, int scope)
>> +{
>> +	/* FEAT_HAFT relies on FEAT_TCR2 */
>> +	if (!this_cpu_has_cap(ARM64_HAS_TCR2))
>> +		return false;
> 
> Why do we need this? If FEAT_TCR2 isn't implemented, this is a HW bug.
> 

yes you're right. as spec mentioned:
If FEAT_HAFT is implemented, then FEAT_TCR2 is implemented.

So this check is redundant. We can simply use has_cpuid_feature() instead
without checking FEAT_TCR2 here.

>> +
>> +	return has_cpuid_feature(cap, scope);
>> +}
>> +
>> +#endif
>> +
>>  #ifdef CONFIG_ARM64_AMU_EXTN
>>  
>>  /*
>> @@ -2580,6 +2603,21 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>>  		.cpus = &dbm_cpus,
>>  		ARM64_CPUID_FIELDS(ID_AA64MMFR1_EL1, HAFDBS, DBM)
>>  	},
>> +#endif
>> +#ifdef CONFIG_ARM64_HAFT
>> +	{
>> +		.desc = "Hardware managed Access Flag for Table Descriptor",
>> +		/*
>> +		 * Per Spec, software management of Access Flag for Table
>> +		 * descriptor is not supported, so make this feature system
>> +		 * wide.
>> +		 */
> 
> I don't understand what you mean by this. Can you please clarify?
> 

Since this cannot be managed by the software, we should restrict all the CPUs
in the system to have and enable this feature which is indicated by
ARM64_CPUCAP_BOOT_CPU_FEATURE. It's not possible for part of the CPUs don't have
this feature and managed manually.

I make this comment here since it's handled different from what ARM64_HW_DBM does (which
is ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE). Maybe it's redundant and can be dropped.

Thanks.

>> +		.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
>> +		.capability = ARM64_HAFT,
>> +		.matches = has_haft,
>> +		.cpu_enable = cpu_enable_haft,
>> +		ARM64_CPUID_FIELDS(ID_AA64MMFR1_EL1, HAFDBS, HAFT)
>> +	},
>>  #endif
>>  	{
>>  		.desc = "CRC32 instructions",
>> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
>> index ac3429d892b9..0b7a3a237e5d 100644
>> --- a/arch/arm64/tools/cpucaps
>> +++ b/arch/arm64/tools/cpucaps
>> @@ -55,6 +55,7 @@ HAS_TLB_RANGE
>>  HAS_VA52
>>  HAS_VIRT_HOST_EXTN
>>  HAS_WFXT
>> +HAFT
>>  HW_DBM
>>  KVM_HVHE
>>  KVM_PROTECTED_MODE
>> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
>> index 7ceaa1e0b4bc..9b3d15ea8a63 100644
>> --- a/arch/arm64/tools/sysreg
>> +++ b/arch/arm64/tools/sysreg
>> @@ -1688,6 +1688,7 @@ UnsignedEnum	3:0	HAFDBS
>>  	0b0000	NI
>>  	0b0001	AF
>>  	0b0010	DBM
>> +	0b0011	HAFT
>>  EndEnum
>>  EndSysreg
>>  
> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT
  2024-08-02 10:40 ` [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT Marc Zyngier
@ 2024-08-06  3:43   ` Yicong Yang
  2024-08-06  8:06     ` Marc Zyngier
  0 siblings, 1 reply; 11+ messages in thread
From: Yicong Yang @ 2024-08-06  3:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: yangyicong, catalin.marinas, will, mark.rutland, linux-arm-kernel,
	oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang

On 2024/8/2 18:40, Marc Zyngier wrote:
> On Fri, 02 Aug 2024 10:34:56 +0100,
> Yicong Yang <yangyicong@huawei.com> wrote:
>>
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4
>> and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in
>> lru-gen aging. Tested with lru-gen in below steps:
>> 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to
>>    stop accessing the memory. (AF bit won't be updated)
>> 2. try to age the memory by /sys/kernel/debug/lru_gen
>>
>> Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively
>> (switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG
>> will clear and test the PMD AF bit on page walking for aging,
>> otherwise will clear and test the PTE AF bit for aging. In this case
>> LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning
>> since pages won't be accessed and we don't need to scan each PTE.
> 
> Improve by how much? Can you please publish numbers that demonstrate
> the effect of this feature?
> 

With LRU_GEN_NONLEAF_YOUNG ~40% time saved for 1GiB memory observed on our
emulated platform.

Thanks.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] arm64: Add support for FEAT_HAFT
  2024-08-06  3:09     ` Yicong Yang
@ 2024-08-06  7:57       ` Marc Zyngier
  2024-08-06 13:11         ` Yicong Yang
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2024-08-06  7:57 UTC (permalink / raw)
  To: Yicong Yang
  Cc: yangyicong, catalin.marinas, will, mark.rutland, linux-arm-kernel,
	oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang

On Tue, 06 Aug 2024 04:09:09 +0100,
Yicong Yang <yangyicong@huawei.com> wrote:
> 
> >> +#ifdef CONFIG_ARM64_HAFT
> >> +	{
> >> +		.desc = "Hardware managed Access Flag for Table Descriptor",
> >> +		/*
> >> +		 * Per Spec, software management of Access Flag for Table
> >> +		 * descriptor is not supported, so make this feature system
> >> +		 * wide.
> >> +		 */
> > 
> > I don't understand what you mean by this. Can you please clarify?
> > 
> 
> Since this cannot be managed by the software, we should restrict all the CPUs
> in the system to have and enable this feature which is indicated by
> ARM64_CPUCAP_BOOT_CPU_FEATURE. It's not possible for part of the CPUs don't have
> this feature and managed manually.
> 
> I make this comment here since it's handled different from what ARM64_HW_DBM does (which
> is ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE). Maybe it's redundant and can be dropped.

Ah, I see what you mean. I think this is still important to capture,
but maybe in a clearer manner. Something like:

	Contrary to the page/block access flag, the table access flag
	cannot be emulated in software (no access fault will occur).
	Therefore mandate that all CPUs have FEAT_HAFT.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT
  2024-08-06  3:43   ` Yicong Yang
@ 2024-08-06  8:06     ` Marc Zyngier
  2024-08-06 13:35       ` Yicong Yang
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2024-08-06  8:06 UTC (permalink / raw)
  To: Yicong Yang
  Cc: yangyicong, catalin.marinas, will, mark.rutland, linux-arm-kernel,
	oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang

On Tue, 06 Aug 2024 04:43:52 +0100,
Yicong Yang <yangyicong@huawei.com> wrote:
> 
> On 2024/8/2 18:40, Marc Zyngier wrote:
> > On Fri, 02 Aug 2024 10:34:56 +0100,
> > Yicong Yang <yangyicong@huawei.com> wrote:
> >>
> >> From: Yicong Yang <yangyicong@hisilicon.com>
> >>
> >> This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4
> >> and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in
> >> lru-gen aging. Tested with lru-gen in below steps:
> >> 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to
> >>    stop accessing the memory. (AF bit won't be updated)
> >> 2. try to age the memory by /sys/kernel/debug/lru_gen
> >>
> >> Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively
> >> (switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG
> >> will clear and test the PMD AF bit on page walking for aging,
> >> otherwise will clear and test the PTE AF bit for aging. In this case
> >> LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning
> >> since pages won't be accessed and we don't need to scan each PTE.
> > 
> > Improve by how much? Can you please publish numbers that demonstrate
> > the effect of this feature?
> > 
> 
> With LRU_GEN_NONLEAF_YOUNG ~40% time saved for 1GiB memory observed on our
> emulated platform.

This certainly looks impressive, but it is a very ad-hoc benchmark,
and emulation numbers don't necessarily result in similar improvement
on actual HW.

How does this translate for a more realistic/useful workload? Even
numbers obtained on another architecture would be useful.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] arm64: Add support for FEAT_HAFT
  2024-08-06  7:57       ` Marc Zyngier
@ 2024-08-06 13:11         ` Yicong Yang
  0 siblings, 0 replies; 11+ messages in thread
From: Yicong Yang @ 2024-08-06 13:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: yangyicong, catalin.marinas, will, mark.rutland, linux-arm-kernel,
	oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang

On 2024/8/6 15:57, Marc Zyngier wrote:
> On Tue, 06 Aug 2024 04:09:09 +0100,
> Yicong Yang <yangyicong@huawei.com> wrote:
>>
>>>> +#ifdef CONFIG_ARM64_HAFT
>>>> +	{
>>>> +		.desc = "Hardware managed Access Flag for Table Descriptor",
>>>> +		/*
>>>> +		 * Per Spec, software management of Access Flag for Table
>>>> +		 * descriptor is not supported, so make this feature system
>>>> +		 * wide.
>>>> +		 */
>>>
>>> I don't understand what you mean by this. Can you please clarify?
>>>
>>
>> Since this cannot be managed by the software, we should restrict all the CPUs
>> in the system to have and enable this feature which is indicated by
>> ARM64_CPUCAP_BOOT_CPU_FEATURE. It's not possible for part of the CPUs don't have
>> this feature and managed manually.
>>
>> I make this comment here since it's handled different from what ARM64_HW_DBM does (which
>> is ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE). Maybe it's redundant and can be dropped.
> 
> Ah, I see what you mean. I think this is still important to capture,
> but maybe in a clearer manner. Something like:
> 
> 	Contrary to the page/block access flag, the table access flag
> 	cannot be emulated in software (no access fault will occur).
> 	Therefore mandate that all CPUs have FEAT_HAFT.
> 

Will refine the comment here in v2. Thanks for the suggestion.

Thanks.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT
  2024-08-06  8:06     ` Marc Zyngier
@ 2024-08-06 13:35       ` Yicong Yang
  0 siblings, 0 replies; 11+ messages in thread
From: Yicong Yang @ 2024-08-06 13:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: yangyicong, catalin.marinas, will, mark.rutland, linux-arm-kernel,
	oliver.upton, broonie, ryan.roberts, linuxarm, jonathan.cameron,
	shameerali.kolothum.thodi, prime.zeng, xuwei5, wangkefeng.wang

On 2024/8/6 16:06, Marc Zyngier wrote:
> On Tue, 06 Aug 2024 04:43:52 +0100,
> Yicong Yang <yangyicong@huawei.com> wrote:
>>
>> On 2024/8/2 18:40, Marc Zyngier wrote:
>>> On Fri, 02 Aug 2024 10:34:56 +0100,
>>> Yicong Yang <yangyicong@huawei.com> wrote:
>>>>
>>>> From: Yicong Yang <yangyicong@hisilicon.com>
>>>>
>>>> This series adds basic support for FEAT_HAFT introduced in Armv8.9/v9.4
>>>> and enable ARCH_HAS_NONLEAF_PMD_YOUNG. The latter will be used in
>>>> lru-gen aging. Tested with lru-gen in below steps:
>>>> 1. Generate a 1GiB workingset by `stress-ng --vm 1`. Then hang the task to
>>>>    stop accessing the memory. (AF bit won't be updated)
>>>> 2. try to age the memory by /sys/kernel/debug/lru_gen
>>>>
>>>> Run above steps with LRU_GEN_NONLEAF_YOUNG(0x4) and not respectively
>>>> (switching by /sys/kernel/mm/lru_gen/enabled). LRU_GEN_NONLEAF_YOUNG
>>>> will clear and test the PMD AF bit on page walking for aging,
>>>> otherwise will clear and test the PTE AF bit for aging. In this case
>>>> LRU_GEN_NONLEAF_YOUNG will improve the efficiency of page scanning
>>>> since pages won't be accessed and we don't need to scan each PTE.
>>>
>>> Improve by how much? Can you please publish numbers that demonstrate
>>> the effect of this feature?
>>>
>>
>> With LRU_GEN_NONLEAF_YOUNG ~40% time saved for 1GiB memory observed on our
>> emulated platform.
> 
> This certainly looks impressive, but it is a very ad-hoc benchmark,
> and emulation numbers don't necessarily result in similar improvement
> on actual HW.
> 

Yes indeed. I just design this case for testing it works. The real case maybe
more complex and not that ideal and may also involves other things like THP
(for THP we may already use the PMD block mapping so the advantage of HAFT
may not take effects).

> How does this translate for a more realistic/useful workload? Even
> numbers obtained on another architecture would be useful.
> 

Currently I have no numbers for the real workload yet. Maybe for the next step
once the platform's available (for a x86 or arm64 one which can run real
workloads).

Thanks.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-08-06 13:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02  9:34 [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT Yicong Yang
2024-08-02  9:34 ` [PATCH 1/2] arm64: Add support for FEAT_HAFT Yicong Yang
2024-08-02 10:37   ` Marc Zyngier
2024-08-06  3:09     ` Yicong Yang
2024-08-06  7:57       ` Marc Zyngier
2024-08-06 13:11         ` Yicong Yang
2024-08-02  9:34 ` [PATCH 2/2] arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG Yicong Yang
2024-08-02 10:40 ` [PATCH 0/2] Support Armv8.9/v9.4 FEAT_HAFT Marc Zyngier
2024-08-06  3:43   ` Yicong Yang
2024-08-06  8:06     ` Marc Zyngier
2024-08-06 13:35       ` Yicong Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).