linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
@ 2023-09-12  3:52 guojinhui.liam
  2023-09-12  8:31 ` Robin Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: guojinhui.liam @ 2023-09-12  3:52 UTC (permalink / raw)
  To: catalin.marinas, will
  Cc: lizefan.x, linux-arm-kernel, linux-kernel, guojinhui.liam

In EL0, it can get the register midr's value to distinguish vendor.
But it won't return real value of the register mpidr by using mrs
in EL0. The register mpidr's value is useful to obtain the cpu
topology information.

In some scenarios, the task scheduling in userspace can be
optimized with CPU Die information.

Signed-off-by: guojinhui.liam <guojinhui.liam@bytedance.com>
---
 arch/arm64/include/asm/sysreg.h | 3 ---
 arch/arm64/kernel/cpufeature.c  | 2 +-
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 38296579a4fd..1885857c8a22 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -901,9 +901,6 @@
 #define SYS_TFSR_EL1_TF0	(UL(1) << SYS_TFSR_EL1_TF0_SHIFT)
 #define SYS_TFSR_EL1_TF1	(UL(1) << SYS_TFSR_EL1_TF1_SHIFT)
 
-/* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
-#define SYS_MPIDR_SAFE_VAL	(BIT(31))
-
 #define TRFCR_ELx_TS_SHIFT		5
 #define TRFCR_ELx_TS_MASK		((0x3UL) << TRFCR_ELx_TS_SHIFT)
 #define TRFCR_ELx_TS_VIRTUAL		((0x1UL) << TRFCR_ELx_TS_SHIFT)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b018ae12ff5f..6e18597fdcc3 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -3450,7 +3450,7 @@ static inline int emulate_id_reg(u32 id, u64 *valp)
 		*valp = read_cpuid_id();
 		break;
 	case SYS_MPIDR_EL1:
-		*valp = SYS_MPIDR_SAFE_VAL;
+		*valp = read_cpuid_mpidr();
 		break;
 	case SYS_REVIDR_EL1:
 		/* IMPLEMENTATION DEFINED values are emulated with 0 */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
  2023-09-12  3:52 [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0 guojinhui.liam
@ 2023-09-12  8:31 ` Robin Murphy
  2023-09-12 10:51   ` Jonathan Cameron
  2023-09-13  9:44   ` guojinhui
  0 siblings, 2 replies; 9+ messages in thread
From: Robin Murphy @ 2023-09-12  8:31 UTC (permalink / raw)
  To: guojinhui.liam, catalin.marinas, will
  Cc: lizefan.x, linux-arm-kernel, linux-kernel

On 2023-09-12 04:52, guojinhui.liam wrote:
> In EL0, it can get the register midr's value to distinguish vendor.
> But it won't return real value of the register mpidr by using mrs
> in EL0. The register mpidr's value is useful to obtain the cpu
> topology information.

...except there's no guarantee that the MPIDR value is anything other 
than a unique identifier. Proper topology information is already exposed 
to userspace[1], as described by ACPI PPTT or Devicetree[2]. Userspace 
should be using that.

Not to mention that userspace fundamentally can't guarantee it won't be 
migrated at just the wrong point and read the MPIDR of a different CPU 
anyway. (This is why the MIDRs and REVIDRs are also reported via sysfs, 
such that userspace has a stable and reliable source of information in 
case it needs to consider potential errata.)

Thanks,
Robin.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt

> In some scenarios, the task scheduling in userspace can be
> optimized with CPU Die information.
> 
> Signed-off-by: guojinhui.liam <guojinhui.liam@bytedance.com>
> ---
>   arch/arm64/include/asm/sysreg.h | 3 ---
>   arch/arm64/kernel/cpufeature.c  | 2 +-
>   2 files changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 38296579a4fd..1885857c8a22 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -901,9 +901,6 @@
>   #define SYS_TFSR_EL1_TF0	(UL(1) << SYS_TFSR_EL1_TF0_SHIFT)
>   #define SYS_TFSR_EL1_TF1	(UL(1) << SYS_TFSR_EL1_TF1_SHIFT)
>   
> -/* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
> -#define SYS_MPIDR_SAFE_VAL	(BIT(31))
> -
>   #define TRFCR_ELx_TS_SHIFT		5
>   #define TRFCR_ELx_TS_MASK		((0x3UL) << TRFCR_ELx_TS_SHIFT)
>   #define TRFCR_ELx_TS_VIRTUAL		((0x1UL) << TRFCR_ELx_TS_SHIFT)
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index b018ae12ff5f..6e18597fdcc3 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -3450,7 +3450,7 @@ static inline int emulate_id_reg(u32 id, u64 *valp)
>   		*valp = read_cpuid_id();
>   		break;
>   	case SYS_MPIDR_EL1:
> -		*valp = SYS_MPIDR_SAFE_VAL;
> +		*valp = read_cpuid_mpidr();
>   		break;
>   	case SYS_REVIDR_EL1:
>   		/* IMPLEMENTATION DEFINED values are emulated with 0 */

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
  2023-09-12  8:31 ` Robin Murphy
@ 2023-09-12 10:51   ` Jonathan Cameron
  2023-09-13 10:51     ` Jinhui Guo
  2023-09-13  9:44   ` guojinhui
  1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Cameron @ 2023-09-12 10:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: guojinhui.liam, catalin.marinas, will, lizefan.x,
	linux-arm-kernel, linux-kernel

On Tue, 12 Sep 2023 09:31:43 +0100
Robin Murphy <robin.murphy@arm.com> wrote:

> On 2023-09-12 04:52, guojinhui.liam wrote:
> > In EL0, it can get the register midr's value to distinguish vendor.
> > But it won't return real value of the register mpidr by using mrs
> > in EL0. The register mpidr's value is useful to obtain the cpu
> > topology information.  
> 
> ...except there's no guarantee that the MPIDR value is anything other 
> than a unique identifier. Proper topology information is already exposed 
> to userspace[1], as described by ACPI PPTT or Devicetree[2]. Userspace 
> should be using that.
> 
> Not to mention that userspace fundamentally can't guarantee it won't be 
> migrated at just the wrong point and read the MPIDR of a different CPU 
> anyway. (This is why the MIDRs and REVIDRs are also reported via sysfs, 
> such that userspace has a stable and reliable source of information in 
> case it needs to consider potential errata.)
> 
> Thanks,
> Robin.
> 
> [1] https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt

As a follow up question, is there some information that is missing from
current topology description?  (there is lots missing but I'm curious
as to what might matter for your use case!)

Jonathan

> 
> > In some scenarios, the task scheduling in userspace can be
> > optimized with CPU Die information.
> > 
> > Signed-off-by: guojinhui.liam <guojinhui.liam@bytedance.com>
> > ---
> >   arch/arm64/include/asm/sysreg.h | 3 ---
> >   arch/arm64/kernel/cpufeature.c  | 2 +-
> >   2 files changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > index 38296579a4fd..1885857c8a22 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -901,9 +901,6 @@
> >   #define SYS_TFSR_EL1_TF0	(UL(1) << SYS_TFSR_EL1_TF0_SHIFT)
> >   #define SYS_TFSR_EL1_TF1	(UL(1) << SYS_TFSR_EL1_TF1_SHIFT)
> >   
> > -/* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
> > -#define SYS_MPIDR_SAFE_VAL	(BIT(31))
> > -
> >   #define TRFCR_ELx_TS_SHIFT		5
> >   #define TRFCR_ELx_TS_MASK		((0x3UL) << TRFCR_ELx_TS_SHIFT)
> >   #define TRFCR_ELx_TS_VIRTUAL		((0x1UL) << TRFCR_ELx_TS_SHIFT)
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index b018ae12ff5f..6e18597fdcc3 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -3450,7 +3450,7 @@ static inline int emulate_id_reg(u32 id, u64 *valp)
> >   		*valp = read_cpuid_id();
> >   		break;
> >   	case SYS_MPIDR_EL1:
> > -		*valp = SYS_MPIDR_SAFE_VAL;
> > +		*valp = read_cpuid_mpidr();
> >   		break;
> >   	case SYS_REVIDR_EL1:
> >   		/* IMPLEMENTATION DEFINED values are emulated with 0 */  
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
  2023-09-12  8:31 ` Robin Murphy
  2023-09-12 10:51   ` Jonathan Cameron
@ 2023-09-13  9:44   ` guojinhui
  2023-09-13 11:23     ` Robin Murphy
  1 sibling, 1 reply; 9+ messages in thread
From: guojinhui @ 2023-09-13  9:44 UTC (permalink / raw)
  To: robin.murphy
  Cc: catalin.marinas, guojinhui.liam, linux-arm-kernel, linux-kernel,
	lizefan.x, will

> > In EL0, it can get the register midr's value to distinguish vendor.
> > But it won't return real value of the register mpidr by using mrs
> > in EL0. The register mpidr's value is useful to obtain the cpu
> > topology information.
> 
> ...except there's no guarantee that the MPIDR value is anything other 
> than a unique identifier. Proper topology information is already exposed 
> to userspace[1], as described by ACPI PPTT or Devicetree[2]. Userspace 
> should be using that.
> 
> Not to mention that userspace fundamentally can't guarantee it won't be 
> migrated at just the wrong point and read the MPIDR of a different CPU 
> anyway. (This is why the MIDRs and REVIDRs are also reported via sysfs, 
> such that userspace has a stable and reliable source of information in 
> case it needs to consider potential errata.)
> 
> Thanks,
> Robin.
> 
> [1] https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt

1. If we can get the infomation of the vendor (by MIDR), i think it possible to obtain
the die infomation from the MPIDR value. Such as the kunpeng-920, 
4 cores per cluster, 8 clusters per die, whose MPIDR value is as follow:

```
<DIE>.<CLUSTER>.<CORE>.<HT>

cpu = 0, 81080000
cpu = 1, 81080100
...
cpu = 3, 81080300
cpu = 4, 81090000
...
cpu = 7, 81090300
cpu = 8, 810a0000
...
cpu = 11, 810a0300
cpu = 12, 810b0000
...
cpu = 15, 810b0300
cpu = 16, 810c0000
...
cpu = 19, 810c0300
cpu = 20, 810d0000
...
cpu = 31, 810f0300
cpu = 32, 81180000
...
cpu = 63, 811f0300
```

we can get the die infomation by 0x810, 0x811.

2. we can bind the task to the specific cpu to obtain the MPIDR value.

3. I have checked the sysfs interface `/sys/devices/system/cpu/cpuN/topology/*`
in Ampere and kunpeng-920 with the latest linux kernel before i submit the patch,
but it doesn't provide the information of die.

```
# ls /sys/devices/system/cpu/cpu0/topology/
cluster_cpus  cluster_cpus_list  cluster_id  core_cpus  core_cpus_list  core_id  core_siblings  core_siblings_list  package_cpus  package_cpus_list  physical_package_id  thread_siblings  thread_siblings_list
# cat /sys/devices/system/cpu/cpu0/topology/*
00000000,00000000,00000000,00000003
0-1
616
00000000,00000000,00000000,00000001
0
6656
00000000,00000000,ffffffff,ffffffff
0-63
00000000,00000000,ffffffff,ffffffff
0-63
0
00000000,00000000,00000000,00000001
0

# uname -r
6.6.0-rc1
```

Then I check the code which parses the cpu topology infomation from PPTT:

```
int __init parse_acpi_topology(void)
{
        int cpu, topology_id;

        if (acpi_disabled)
                return 0;

        for_each_possible_cpu(cpu) {
                topology_id = find_acpi_cpu_topology(cpu, 0);
                if (topology_id < 0)
                        return topology_id;

                if (acpi_cpu_is_threaded(cpu)) {
                        cpu_topology[cpu].thread_id = topology_id;
                        topology_id = find_acpi_cpu_topology(cpu, 1);
                        cpu_topology[cpu].core_id   = topology_id;
                } else {
                        cpu_topology[cpu].thread_id  = -1;
                        cpu_topology[cpu].core_id    = topology_id;
                }
                topology_id = find_acpi_cpu_topology_cluster(cpu);
                cpu_topology[cpu].cluster_id = topology_id;
                topology_id = find_acpi_cpu_topology_package(cpu);
                cpu_topology[cpu].package_id = topology_id;
        }

        return 0;
}
```

Actually, it just gives the infomation of thread, cluster and package
though the PPTT provides the dies infomation.

May be we can implement some code to obtain die information from PPTT?

thanks,

guojinhui

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
  2023-09-12 10:51   ` Jonathan Cameron
@ 2023-09-13 10:51     ` Jinhui Guo
  2023-09-13 13:11       ` Jonathan Cameron
  0 siblings, 1 reply; 9+ messages in thread
From: Jinhui Guo @ 2023-09-13 10:51 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: catalin.marinas, guojinhui.liam, linux-arm-kernel, linux-kernel,
	lizefan.x, robin.murphy, will

> As a follow up question, is there some information that is missing from
> current topology description?  (there is lots missing but I'm curious
> as to what might matter for your use case!)

We want to know the infomation about dies to advoid memroy accessing
across dies (some settings like 2 numa per die).

thanks,

Jinhui Guo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
  2023-09-13  9:44   ` guojinhui
@ 2023-09-13 11:23     ` Robin Murphy
  2023-09-13 13:06       ` Jonathan Cameron
  0 siblings, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2023-09-13 11:23 UTC (permalink / raw)
  To: guojinhui
  Cc: catalin.marinas, linux-arm-kernel, linux-kernel, lizefan.x, will,
	Jonathan Cameron

On 2023-09-13 10:44, guojinhui wrote:
>>> In EL0, it can get the register midr's value to distinguish vendor.
>>> But it won't return real value of the register mpidr by using mrs
>>> in EL0. The register mpidr's value is useful to obtain the cpu
>>> topology information.
>>
>> ...except there's no guarantee that the MPIDR value is anything other
>> than a unique identifier. Proper topology information is already exposed
>> to userspace[1], as described by ACPI PPTT or Devicetree[2]. Userspace
>> should be using that.
>>
>> Not to mention that userspace fundamentally can't guarantee it won't be
>> migrated at just the wrong point and read the MPIDR of a different CPU
>> anyway. (This is why the MIDRs and REVIDRs are also reported via sysfs,
>> such that userspace has a stable and reliable source of information in
>> case it needs to consider potential errata.)
>>
>> Thanks,
>> Robin.
>>
>> [1] https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html
>> [2]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt
> 
> 1. If we can get the infomation of the vendor (by MIDR), i think it possible to obtain
> the die infomation from the MPIDR value. Such as the kunpeng-920,
> 4 cores per cluster, 8 clusters per die, whose MPIDR value is as follow:
> 
> ```
> <DIE>.<CLUSTER>.<CORE>.<HT>
> 
> cpu = 0, 81080000
> cpu = 1, 81080100
> ...
> cpu = 3, 81080300
> cpu = 4, 81090000
> ...
> cpu = 7, 81090300
> cpu = 8, 810a0000
> ...
> cpu = 11, 810a0300
> cpu = 12, 810b0000
> ...
> cpu = 15, 810b0300
> cpu = 16, 810c0000
> ...
> cpu = 19, 810c0300
> cpu = 20, 810d0000
> ...
> cpu = 31, 810f0300
> cpu = 32, 81180000
> ...
> cpu = 63, 811f0300
> ```
> 
> we can get the die infomation by 0x810, 0x811.

This is very much a platform-specific assumption, though, and once 
you're assuming enough to be able to derive anything meaningful from a 
raw MPIDR, you could equally derive the same thing from existing sources 
like NUMA topology (if you know the SoC, then for sure you can know how 
nodes relate to dies).

> 2. we can bind the task to the specific cpu to obtain the MPIDR value.

...unless that CPU then gets offlined, the task is forcibly migrated 
elsewhere, and ends up obtaining the *wrong* MPIDR value :(

> 3. I have checked the sysfs interface `/sys/devices/system/cpu/cpuN/topology/*`
> in Ampere and kunpeng-920 with the latest linux kernel before i submit the patch,
> but it doesn't provide the information of die.
> 
> ```
> # ls /sys/devices/system/cpu/cpu0/topology/
> cluster_cpus  cluster_cpus_list  cluster_id  core_cpus  core_cpus_list  core_id  core_siblings  core_siblings_list  package_cpus  package_cpus_list  physical_package_id  thread_siblings  thread_siblings_list
> # cat /sys/devices/system/cpu/cpu0/topology/*
> 00000000,00000000,00000000,00000003
> 0-1
> 616
> 00000000,00000000,00000000,00000001
> 0
> 6656
> 00000000,00000000,ffffffff,ffffffff
> 0-63
> 00000000,00000000,ffffffff,ffffffff
> 0-63
> 0
> 00000000,00000000,00000000,00000001
> 0
> 
> # uname -r
> 6.6.0-rc1
> ```
> 
> Then I check the code which parses the cpu topology infomation from PPTT:
> 
> ```
> int __init parse_acpi_topology(void)
> {
>          int cpu, topology_id;
> 
>          if (acpi_disabled)
>                  return 0;
> 
>          for_each_possible_cpu(cpu) {
>                  topology_id = find_acpi_cpu_topology(cpu, 0);
>                  if (topology_id < 0)
>                          return topology_id;
> 
>                  if (acpi_cpu_is_threaded(cpu)) {
>                          cpu_topology[cpu].thread_id = topology_id;
>                          topology_id = find_acpi_cpu_topology(cpu, 1);
>                          cpu_topology[cpu].core_id   = topology_id;
>                  } else {
>                          cpu_topology[cpu].thread_id  = -1;
>                          cpu_topology[cpu].core_id    = topology_id;
>                  }
>                  topology_id = find_acpi_cpu_topology_cluster(cpu);
>                  cpu_topology[cpu].cluster_id = topology_id;
>                  topology_id = find_acpi_cpu_topology_package(cpu);
>                  cpu_topology[cpu].package_id = topology_id;
>          }
> 
>          return 0;
> }
> ```
> 
> Actually, it just gives the infomation of thread, cluster and package
> though the PPTT provides the dies infomation.
> 
> May be we can implement some code to obtain die information from PPTT?

I guess if any additional levels of hierarchy exist between the root 
"package" level and what we infer to be the "cluster" level, then it 
seems reasonable to me to infer the next level above "package" to be 
"die". Then it looks like pretty much just a case of wiring up 
topology_die_id() through the generic topology code.

Thanks,
Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
  2023-09-13 11:23     ` Robin Murphy
@ 2023-09-13 13:06       ` Jonathan Cameron
  2023-09-13 18:57         ` Robin Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Cameron @ 2023-09-13 13:06 UTC (permalink / raw)
  To: Robin Murphy
  Cc: guojinhui, catalin.marinas, linux-arm-kernel, linux-kernel,
	lizefan.x, will

On Wed, 13 Sep 2023 12:23:37 +0100
Robin Murphy <robin.murphy@arm.com> wrote:

> On 2023-09-13 10:44, guojinhui wrote:
> >>> In EL0, it can get the register midr's value to distinguish vendor.
> >>> But it won't return real value of the register mpidr by using mrs
> >>> in EL0. The register mpidr's value is useful to obtain the cpu
> >>> topology information.  
> >>
> >> ...except there's no guarantee that the MPIDR value is anything other
> >> than a unique identifier. Proper topology information is already exposed
> >> to userspace[1], as described by ACPI PPTT or Devicetree[2]. Userspace
> >> should be using that.
> >>
> >> Not to mention that userspace fundamentally can't guarantee it won't be
> >> migrated at just the wrong point and read the MPIDR of a different CPU
> >> anyway. (This is why the MIDRs and REVIDRs are also reported via sysfs,
> >> such that userspace has a stable and reliable source of information in
> >> case it needs to consider potential errata.)
> >>
> >> Thanks,
> >> Robin.
> >>
> >> [1] https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html
> >> [2]
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt  
> > 
> > 1. If we can get the infomation of the vendor (by MIDR), i think it possible to obtain
> > the die infomation from the MPIDR value. Such as the kunpeng-920,
> > 4 cores per cluster, 8 clusters per die, whose MPIDR value is as follow:
> > 
> > ```
> > <DIE>.<CLUSTER>.<CORE>.<HT>
> > 
> > cpu = 0, 81080000
> > cpu = 1, 81080100
> > ...
> > cpu = 3, 81080300
> > cpu = 4, 81090000
> > ...
> > cpu = 7, 81090300
> > cpu = 8, 810a0000
> > ...
> > cpu = 11, 810a0300
> > cpu = 12, 810b0000
> > ...
> > cpu = 15, 810b0300
> > cpu = 16, 810c0000
> > ...
> > cpu = 19, 810c0300
> > cpu = 20, 810d0000
> > ...
> > cpu = 31, 810f0300
> > cpu = 32, 81180000
> > ...
> > cpu = 63, 811f0300
> > ```
> > 
> > we can get the die infomation by 0x810, 0x811.  
> 
> This is very much a platform-specific assumption, though, and once 
> you're assuming enough to be able to derive anything meaningful from a 
> raw MPIDR, you could equally derive the same thing from existing sources 
> like NUMA topology (if you know the SoC, then for sure you can know how 
> nodes relate to dies).
> 
> > 2. we can bind the task to the specific cpu to obtain the MPIDR value.  
> 
> ...unless that CPU then gets offlined, the task is forcibly migrated 
> elsewhere, and ends up obtaining the *wrong* MPIDR value :(
> 
> > 3. I have checked the sysfs interface `/sys/devices/system/cpu/cpuN/topology/*`
> > in Ampere and kunpeng-920 with the latest linux kernel before i submit the patch,
> > but it doesn't provide the information of die.
> > 
> > ```
> > # ls /sys/devices/system/cpu/cpu0/topology/
> > cluster_cpus  cluster_cpus_list  cluster_id  core_cpus  core_cpus_list  core_id  core_siblings  core_siblings_list  package_cpus  package_cpus_list  physical_package_id  thread_siblings  thread_siblings_list
> > # cat /sys/devices/system/cpu/cpu0/topology/*
> > 00000000,00000000,00000000,00000003
> > 0-1
> > 616
> > 00000000,00000000,00000000,00000001
> > 0
> > 6656
> > 00000000,00000000,ffffffff,ffffffff
> > 0-63
> > 00000000,00000000,ffffffff,ffffffff
> > 0-63
> > 0
> > 00000000,00000000,00000000,00000001
> > 0
> > 
> > # uname -r
> > 6.6.0-rc1
> > ```
> > 
> > Then I check the code which parses the cpu topology infomation from PPTT:
> > 
> > ```
> > int __init parse_acpi_topology(void)
> > {
> >          int cpu, topology_id;
> > 
> >          if (acpi_disabled)
> >                  return 0;
> > 
> >          for_each_possible_cpu(cpu) {
> >                  topology_id = find_acpi_cpu_topology(cpu, 0);
> >                  if (topology_id < 0)
> >                          return topology_id;
> > 
> >                  if (acpi_cpu_is_threaded(cpu)) {
> >                          cpu_topology[cpu].thread_id = topology_id;
> >                          topology_id = find_acpi_cpu_topology(cpu, 1);
> >                          cpu_topology[cpu].core_id   = topology_id;
> >                  } else {
> >                          cpu_topology[cpu].thread_id  = -1;
> >                          cpu_topology[cpu].core_id    = topology_id;
> >                  }
> >                  topology_id = find_acpi_cpu_topology_cluster(cpu);
> >                  cpu_topology[cpu].cluster_id = topology_id;
> >                  topology_id = find_acpi_cpu_topology_package(cpu);
> >                  cpu_topology[cpu].package_id = topology_id;
> >          }
> > 
> >          return 0;
> > }
> > ```
> > 
> > Actually, it just gives the infomation of thread, cluster and package
> > though the PPTT provides the dies infomation.
> > 
> > May be we can implement some code to obtain die information from PPTT?  
> 
> I guess if any additional levels of hierarchy exist between the root 
> "package" level and what we infer to be the "cluster" level, then it 
> seems reasonable to me to infer the next level above "package" to be 
> "die". Then it looks like pretty much just a case of wiring up 
> topology_die_id() through the generic topology code.

Cluster was a vague enough concept that it was safe to define
it as the layer above a core.  Die is a lot less obvious because it
has more direct physical meaning.  There are interconnect and cache
topologies where something is shared above the cluster where it
doesn't correspond to a die (in the sense of a chiplet / one
piece of silicon).  For those you'd have another level in PPTT before
you reach the one that can be thought of as a die.

What about the die is of relevance for a userspace scheduler?
Sharing of L3, NUMA proximity domain / DDR controller locations?

All that is visible either via the cache topology or the NUMA node
topology.

If there is a strong reason (beyond 'x86 has it in a system register')
for having die explicitly provided in PPTT, then file a code first ACPI
proposal to add a flags, similar to the existing one that indicates
Physical Package.  Note that a strong justification would be needed.

Jonathan



> 
> Thanks,
> Robin.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
  2023-09-13 10:51     ` Jinhui Guo
@ 2023-09-13 13:11       ` Jonathan Cameron
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Cameron @ 2023-09-13 13:11 UTC (permalink / raw)
  To: Jinhui Guo, linux-kernel
  Cc: catalin.marinas, linux-arm-kernel, lizefan.x, robin.murphy, will

On Wed, 13 Sep 2023 18:51:33 +0800
Jinhui Guo <guojinhui.liam@bytedance.com> wrote:

> > As a follow up question, is there some information that is missing from
> > current topology description?  (there is lots missing but I'm curious
> > as to what might matter for your use case!)  
> 
> We want to know the infomation about dies to advoid memroy accessing
> across dies (some settings like 2 numa per die).

The NUMA access characteristics should give you the info you want - it's
the variation in latency and bandwidth between dies that matters, not that
they are dies.  If you got really bad access characteristics across a die
that info would be equally useful.

I think it is not that this is a die that matters, but rather that there
are groups of Numa nodes with relatively small differences in access
characteristics, then others with much larger variation (and I assume
a layer above that which is inter socket which is even worse).

The info is in HMAT, but the kernel presentation of HMAT is rather limited
currently - so you may want to look at extending what is visible in sysfs
from that table.

Thanks,

Jonathan

> 
> thanks,
> 
> Jinhui Guo
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
  2023-09-13 13:06       ` Jonathan Cameron
@ 2023-09-13 18:57         ` Robin Murphy
  0 siblings, 0 replies; 9+ messages in thread
From: Robin Murphy @ 2023-09-13 18:57 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: guojinhui, catalin.marinas, linux-arm-kernel, linux-kernel,
	lizefan.x, will

On 2023-09-13 14:06, Jonathan Cameron wrote:
> On Wed, 13 Sep 2023 12:23:37 +0100
> Robin Murphy <robin.murphy@arm.com> wrote:
> 
>> On 2023-09-13 10:44, guojinhui wrote:
>>>>> In EL0, it can get the register midr's value to distinguish vendor.
>>>>> But it won't return real value of the register mpidr by using mrs
>>>>> in EL0. The register mpidr's value is useful to obtain the cpu
>>>>> topology information.
>>>>
>>>> ...except there's no guarantee that the MPIDR value is anything other
>>>> than a unique identifier. Proper topology information is already exposed
>>>> to userspace[1], as described by ACPI PPTT or Devicetree[2]. Userspace
>>>> should be using that.
>>>>
>>>> Not to mention that userspace fundamentally can't guarantee it won't be
>>>> migrated at just the wrong point and read the MPIDR of a different CPU
>>>> anyway. (This is why the MIDRs and REVIDRs are also reported via sysfs,
>>>> such that userspace has a stable and reliable source of information in
>>>> case it needs to consider potential errata.)
>>>>
>>>> Thanks,
>>>> Robin.
>>>>
>>>> [1] https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html
>>>> [2]
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt
>>>
>>> 1. If we can get the infomation of the vendor (by MIDR), i think it possible to obtain
>>> the die infomation from the MPIDR value. Such as the kunpeng-920,
>>> 4 cores per cluster, 8 clusters per die, whose MPIDR value is as follow:
>>>
>>> ```
>>> <DIE>.<CLUSTER>.<CORE>.<HT>
>>>
>>> cpu = 0, 81080000
>>> cpu = 1, 81080100
>>> ...
>>> cpu = 3, 81080300
>>> cpu = 4, 81090000
>>> ...
>>> cpu = 7, 81090300
>>> cpu = 8, 810a0000
>>> ...
>>> cpu = 11, 810a0300
>>> cpu = 12, 810b0000
>>> ...
>>> cpu = 15, 810b0300
>>> cpu = 16, 810c0000
>>> ...
>>> cpu = 19, 810c0300
>>> cpu = 20, 810d0000
>>> ...
>>> cpu = 31, 810f0300
>>> cpu = 32, 81180000
>>> ...
>>> cpu = 63, 811f0300
>>> ```
>>>
>>> we can get the die infomation by 0x810, 0x811.
>>
>> This is very much a platform-specific assumption, though, and once
>> you're assuming enough to be able to derive anything meaningful from a
>> raw MPIDR, you could equally derive the same thing from existing sources
>> like NUMA topology (if you know the SoC, then for sure you can know how
>> nodes relate to dies).
>>
>>> 2. we can bind the task to the specific cpu to obtain the MPIDR value.
>>
>> ...unless that CPU then gets offlined, the task is forcibly migrated
>> elsewhere, and ends up obtaining the *wrong* MPIDR value :(
>>
>>> 3. I have checked the sysfs interface `/sys/devices/system/cpu/cpuN/topology/*`
>>> in Ampere and kunpeng-920 with the latest linux kernel before i submit the patch,
>>> but it doesn't provide the information of die.
>>>
>>> ```
>>> # ls /sys/devices/system/cpu/cpu0/topology/
>>> cluster_cpus  cluster_cpus_list  cluster_id  core_cpus  core_cpus_list  core_id  core_siblings  core_siblings_list  package_cpus  package_cpus_list  physical_package_id  thread_siblings  thread_siblings_list
>>> # cat /sys/devices/system/cpu/cpu0/topology/*
>>> 00000000,00000000,00000000,00000003
>>> 0-1
>>> 616
>>> 00000000,00000000,00000000,00000001
>>> 0
>>> 6656
>>> 00000000,00000000,ffffffff,ffffffff
>>> 0-63
>>> 00000000,00000000,ffffffff,ffffffff
>>> 0-63
>>> 0
>>> 00000000,00000000,00000000,00000001
>>> 0
>>>
>>> # uname -r
>>> 6.6.0-rc1
>>> ```
>>>
>>> Then I check the code which parses the cpu topology infomation from PPTT:
>>>
>>> ```
>>> int __init parse_acpi_topology(void)
>>> {
>>>           int cpu, topology_id;
>>>
>>>           if (acpi_disabled)
>>>                   return 0;
>>>
>>>           for_each_possible_cpu(cpu) {
>>>                   topology_id = find_acpi_cpu_topology(cpu, 0);
>>>                   if (topology_id < 0)
>>>                           return topology_id;
>>>
>>>                   if (acpi_cpu_is_threaded(cpu)) {
>>>                           cpu_topology[cpu].thread_id = topology_id;
>>>                           topology_id = find_acpi_cpu_topology(cpu, 1);
>>>                           cpu_topology[cpu].core_id   = topology_id;
>>>                   } else {
>>>                           cpu_topology[cpu].thread_id  = -1;
>>>                           cpu_topology[cpu].core_id    = topology_id;
>>>                   }
>>>                   topology_id = find_acpi_cpu_topology_cluster(cpu);
>>>                   cpu_topology[cpu].cluster_id = topology_id;
>>>                   topology_id = find_acpi_cpu_topology_package(cpu);
>>>                   cpu_topology[cpu].package_id = topology_id;
>>>           }
>>>
>>>           return 0;
>>> }
>>> ```
>>>
>>> Actually, it just gives the infomation of thread, cluster and package
>>> though the PPTT provides the dies infomation.
>>>
>>> May be we can implement some code to obtain die information from PPTT?
>>
>> I guess if any additional levels of hierarchy exist between the root
>> "package" level and what we infer to be the "cluster" level, then it
>> seems reasonable to me to infer the next level above "package" to be
>> "die". Then it looks like pretty much just a case of wiring up
>> topology_die_id() through the generic topology code.
> 
> Cluster was a vague enough concept that it was safe to define
> it as the layer above a core.  Die is a lot less obvious because it
> has more direct physical meaning.  There are interconnect and cache
> topologies where something is shared above the cluster where it
> doesn't correspond to a die (in the sense of a chiplet / one
> piece of silicon).  For those you'd have another level in PPTT before
> you reach the one that can be thought of as a die.

Oh, silly me; I was already assuming that there may be intermediate 
levels between cluster and die, but of course you could have that while 
still *not* having multiple dies, and thus no distinct "die" level in 
PPTT at all when that's already sufficiently implied by "package", so 
indeed it couldn't work in general.

> What about the die is of relevance for a userspace scheduler?
> Sharing of L3, NUMA proximity domain / DDR controller locations?
> 
> All that is visible either via the cache topology or the NUMA node
> topology.
> 
> If there is a strong reason (beyond 'x86 has it in a system register')
> for having die explicitly provided in PPTT, then file a code first ACPI
> proposal to add a flags, similar to the existing one that indicates
> Physical Package.  Note that a strong justification would be needed.

I'd agree with that. Thanks for the clarifications!

Cheers,
Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-09-13 18:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-12  3:52 [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0 guojinhui.liam
2023-09-12  8:31 ` Robin Murphy
2023-09-12 10:51   ` Jonathan Cameron
2023-09-13 10:51     ` Jinhui Guo
2023-09-13 13:11       ` Jonathan Cameron
2023-09-13  9:44   ` guojinhui
2023-09-13 11:23     ` Robin Murphy
2023-09-13 13:06       ` Jonathan Cameron
2023-09-13 18:57         ` Robin Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).