From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: guojinhui <guojinhui.liam@bytedance.com>,
<catalin.marinas@arm.com>, <linux-arm-kernel@lists.infradead.org>,
<linux-kernel@vger.kernel.org>, <lizefan.x@bytedance.com>,
<will@kernel.org>
Subject: Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
Date: Wed, 13 Sep 2023 14:06:29 +0100 [thread overview]
Message-ID: <20230913140629.000052ad@Huawei.com> (raw)
In-Reply-To: <381a2abc-1597-c179-99f2-477d7f41b91b@arm.com>
On Wed, 13 Sep 2023 12:23:37 +0100
Robin Murphy <robin.murphy@arm.com> wrote:
> On 2023-09-13 10:44, guojinhui wrote:
> >>> In EL0, it can get the register midr's value to distinguish vendor.
> >>> But it won't return real value of the register mpidr by using mrs
> >>> in EL0. The register mpidr's value is useful to obtain the cpu
> >>> topology information.
> >>
> >> ...except there's no guarantee that the MPIDR value is anything other
> >> than a unique identifier. Proper topology information is already exposed
> >> to userspace[1], as described by ACPI PPTT or Devicetree[2]. Userspace
> >> should be using that.
> >>
> >> Not to mention that userspace fundamentally can't guarantee it won't be
> >> migrated at just the wrong point and read the MPIDR of a different CPU
> >> anyway. (This is why the MIDRs and REVIDRs are also reported via sysfs,
> >> such that userspace has a stable and reliable source of information in
> >> case it needs to consider potential errata.)
> >>
> >> Thanks,
> >> Robin.
> >>
> >> [1] https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html
> >> [2]
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt
> >
> > 1. If we can get the infomation of the vendor (by MIDR), i think it possible to obtain
> > the die infomation from the MPIDR value. Such as the kunpeng-920,
> > 4 cores per cluster, 8 clusters per die, whose MPIDR value is as follow:
> >
> > ```
> > <DIE>.<CLUSTER>.<CORE>.<HT>
> >
> > cpu = 0, 81080000
> > cpu = 1, 81080100
> > ...
> > cpu = 3, 81080300
> > cpu = 4, 81090000
> > ...
> > cpu = 7, 81090300
> > cpu = 8, 810a0000
> > ...
> > cpu = 11, 810a0300
> > cpu = 12, 810b0000
> > ...
> > cpu = 15, 810b0300
> > cpu = 16, 810c0000
> > ...
> > cpu = 19, 810c0300
> > cpu = 20, 810d0000
> > ...
> > cpu = 31, 810f0300
> > cpu = 32, 81180000
> > ...
> > cpu = 63, 811f0300
> > ```
> >
> > we can get the die infomation by 0x810, 0x811.
>
> This is very much a platform-specific assumption, though, and once
> you're assuming enough to be able to derive anything meaningful from a
> raw MPIDR, you could equally derive the same thing from existing sources
> like NUMA topology (if you know the SoC, then for sure you can know how
> nodes relate to dies).
>
> > 2. we can bind the task to the specific cpu to obtain the MPIDR value.
>
> ...unless that CPU then gets offlined, the task is forcibly migrated
> elsewhere, and ends up obtaining the *wrong* MPIDR value :(
>
> > 3. I have checked the sysfs interface `/sys/devices/system/cpu/cpuN/topology/*`
> > in Ampere and kunpeng-920 with the latest linux kernel before i submit the patch,
> > but it doesn't provide the information of die.
> >
> > ```
> > # ls /sys/devices/system/cpu/cpu0/topology/
> > cluster_cpus cluster_cpus_list cluster_id core_cpus core_cpus_list core_id core_siblings core_siblings_list package_cpus package_cpus_list physical_package_id thread_siblings thread_siblings_list
> > # cat /sys/devices/system/cpu/cpu0/topology/*
> > 00000000,00000000,00000000,00000003
> > 0-1
> > 616
> > 00000000,00000000,00000000,00000001
> > 0
> > 6656
> > 00000000,00000000,ffffffff,ffffffff
> > 0-63
> > 00000000,00000000,ffffffff,ffffffff
> > 0-63
> > 0
> > 00000000,00000000,00000000,00000001
> > 0
> >
> > # uname -r
> > 6.6.0-rc1
> > ```
> >
> > Then I check the code which parses the cpu topology infomation from PPTT:
> >
> > ```
> > int __init parse_acpi_topology(void)
> > {
> > int cpu, topology_id;
> >
> > if (acpi_disabled)
> > return 0;
> >
> > for_each_possible_cpu(cpu) {
> > topology_id = find_acpi_cpu_topology(cpu, 0);
> > if (topology_id < 0)
> > return topology_id;
> >
> > if (acpi_cpu_is_threaded(cpu)) {
> > cpu_topology[cpu].thread_id = topology_id;
> > topology_id = find_acpi_cpu_topology(cpu, 1);
> > cpu_topology[cpu].core_id = topology_id;
> > } else {
> > cpu_topology[cpu].thread_id = -1;
> > cpu_topology[cpu].core_id = topology_id;
> > }
> > topology_id = find_acpi_cpu_topology_cluster(cpu);
> > cpu_topology[cpu].cluster_id = topology_id;
> > topology_id = find_acpi_cpu_topology_package(cpu);
> > cpu_topology[cpu].package_id = topology_id;
> > }
> >
> > return 0;
> > }
> > ```
> >
> > Actually, it just gives the infomation of thread, cluster and package
> > though the PPTT provides the dies infomation.
> >
> > May be we can implement some code to obtain die information from PPTT?
>
> I guess if any additional levels of hierarchy exist between the root
> "package" level and what we infer to be the "cluster" level, then it
> seems reasonable to me to infer the next level above "package" to be
> "die". Then it looks like pretty much just a case of wiring up
> topology_die_id() through the generic topology code.
Cluster was a vague enough concept that it was safe to define
it as the layer above a core. Die is a lot less obvious because it
has more direct physical meaning. There are interconnect and cache
topologies where something is shared above the cluster where it
doesn't correspond to a die (in the sense of a chiplet / one
piece of silicon). For those you'd have another level in PPTT before
you reach the one that can be thought of as a die.
What about the die is of relevance for a userspace scheduler?
Sharing of L3, NUMA proximity domain / DDR controller locations?
All that is visible either via the cache topology or the NUMA node
topology.
If there is a strong reason (beyond 'x86 has it in a system register')
for having die explicitly provided in PPTT, then file a code first ACPI
proposal to add a flags, similar to the existing one that indicates
Physical Package. Note that a strong justification would be needed.
Jonathan
>
> Thanks,
> Robin.
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: guojinhui <guojinhui.liam@bytedance.com>,
<catalin.marinas@arm.com>, <linux-arm-kernel@lists.infradead.org>,
<linux-kernel@vger.kernel.org>, <lizefan.x@bytedance.com>,
<will@kernel.org>
Subject: Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0
Date: Wed, 13 Sep 2023 14:06:29 +0100 [thread overview]
Message-ID: <20230913140629.000052ad@Huawei.com> (raw)
In-Reply-To: <381a2abc-1597-c179-99f2-477d7f41b91b@arm.com>
On Wed, 13 Sep 2023 12:23:37 +0100
Robin Murphy <robin.murphy@arm.com> wrote:
> On 2023-09-13 10:44, guojinhui wrote:
> >>> In EL0, it can get the register midr's value to distinguish vendor.
> >>> But it won't return real value of the register mpidr by using mrs
> >>> in EL0. The register mpidr's value is useful to obtain the cpu
> >>> topology information.
> >>
> >> ...except there's no guarantee that the MPIDR value is anything other
> >> than a unique identifier. Proper topology information is already exposed
> >> to userspace[1], as described by ACPI PPTT or Devicetree[2]. Userspace
> >> should be using that.
> >>
> >> Not to mention that userspace fundamentally can't guarantee it won't be
> >> migrated at just the wrong point and read the MPIDR of a different CPU
> >> anyway. (This is why the MIDRs and REVIDRs are also reported via sysfs,
> >> such that userspace has a stable and reliable source of information in
> >> case it needs to consider potential errata.)
> >>
> >> Thanks,
> >> Robin.
> >>
> >> [1] https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html
> >> [2]
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt
> >
> > 1. If we can get the infomation of the vendor (by MIDR), i think it possible to obtain
> > the die infomation from the MPIDR value. Such as the kunpeng-920,
> > 4 cores per cluster, 8 clusters per die, whose MPIDR value is as follow:
> >
> > ```
> > <DIE>.<CLUSTER>.<CORE>.<HT>
> >
> > cpu = 0, 81080000
> > cpu = 1, 81080100
> > ...
> > cpu = 3, 81080300
> > cpu = 4, 81090000
> > ...
> > cpu = 7, 81090300
> > cpu = 8, 810a0000
> > ...
> > cpu = 11, 810a0300
> > cpu = 12, 810b0000
> > ...
> > cpu = 15, 810b0300
> > cpu = 16, 810c0000
> > ...
> > cpu = 19, 810c0300
> > cpu = 20, 810d0000
> > ...
> > cpu = 31, 810f0300
> > cpu = 32, 81180000
> > ...
> > cpu = 63, 811f0300
> > ```
> >
> > we can get the die infomation by 0x810, 0x811.
>
> This is very much a platform-specific assumption, though, and once
> you're assuming enough to be able to derive anything meaningful from a
> raw MPIDR, you could equally derive the same thing from existing sources
> like NUMA topology (if you know the SoC, then for sure you can know how
> nodes relate to dies).
>
> > 2. we can bind the task to the specific cpu to obtain the MPIDR value.
>
> ...unless that CPU then gets offlined, the task is forcibly migrated
> elsewhere, and ends up obtaining the *wrong* MPIDR value :(
>
> > 3. I have checked the sysfs interface `/sys/devices/system/cpu/cpuN/topology/*`
> > in Ampere and kunpeng-920 with the latest linux kernel before i submit the patch,
> > but it doesn't provide the information of die.
> >
> > ```
> > # ls /sys/devices/system/cpu/cpu0/topology/
> > cluster_cpus cluster_cpus_list cluster_id core_cpus core_cpus_list core_id core_siblings core_siblings_list package_cpus package_cpus_list physical_package_id thread_siblings thread_siblings_list
> > # cat /sys/devices/system/cpu/cpu0/topology/*
> > 00000000,00000000,00000000,00000003
> > 0-1
> > 616
> > 00000000,00000000,00000000,00000001
> > 0
> > 6656
> > 00000000,00000000,ffffffff,ffffffff
> > 0-63
> > 00000000,00000000,ffffffff,ffffffff
> > 0-63
> > 0
> > 00000000,00000000,00000000,00000001
> > 0
> >
> > # uname -r
> > 6.6.0-rc1
> > ```
> >
> > Then I check the code which parses the cpu topology infomation from PPTT:
> >
> > ```
> > int __init parse_acpi_topology(void)
> > {
> > int cpu, topology_id;
> >
> > if (acpi_disabled)
> > return 0;
> >
> > for_each_possible_cpu(cpu) {
> > topology_id = find_acpi_cpu_topology(cpu, 0);
> > if (topology_id < 0)
> > return topology_id;
> >
> > if (acpi_cpu_is_threaded(cpu)) {
> > cpu_topology[cpu].thread_id = topology_id;
> > topology_id = find_acpi_cpu_topology(cpu, 1);
> > cpu_topology[cpu].core_id = topology_id;
> > } else {
> > cpu_topology[cpu].thread_id = -1;
> > cpu_topology[cpu].core_id = topology_id;
> > }
> > topology_id = find_acpi_cpu_topology_cluster(cpu);
> > cpu_topology[cpu].cluster_id = topology_id;
> > topology_id = find_acpi_cpu_topology_package(cpu);
> > cpu_topology[cpu].package_id = topology_id;
> > }
> >
> > return 0;
> > }
> > ```
> >
> > Actually, it just gives the infomation of thread, cluster and package
> > though the PPTT provides the dies infomation.
> >
> > May be we can implement some code to obtain die information from PPTT?
>
> I guess if any additional levels of hierarchy exist between the root
> "package" level and what we infer to be the "cluster" level, then it
> seems reasonable to me to infer the next level above "package" to be
> "die". Then it looks like pretty much just a case of wiring up
> topology_die_id() through the generic topology code.
Cluster was a vague enough concept that it was safe to define
it as the layer above a core. Die is a lot less obvious because it
has more direct physical meaning. There are interconnect and cache
topologies where something is shared above the cluster where it
doesn't correspond to a die (in the sense of a chiplet / one
piece of silicon). For those you'd have another level in PPTT before
you reach the one that can be thought of as a die.
What about the die is of relevance for a userspace scheduler?
Sharing of L3, NUMA proximity domain / DDR controller locations?
All that is visible either via the cache topology or the NUMA node
topology.
If there is a strong reason (beyond 'x86 has it in a system register')
for having die explicitly provided in PPTT, then file a code first ACPI
proposal to add a flags, similar to the existing one that indicates
Physical Package. Note that a strong justification would be needed.
Jonathan
>
> Thanks,
> Robin.
>
next prev parent reply other threads:[~2023-09-13 13:07 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-12 3:52 [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0 guojinhui.liam
2023-09-12 3:52 ` guojinhui.liam
2023-09-12 8:31 ` Robin Murphy
2023-09-12 8:31 ` Robin Murphy
2023-09-12 10:51 ` Jonathan Cameron
2023-09-13 10:51 ` Jinhui Guo
2023-09-13 10:51 ` Jinhui Guo
2023-09-13 13:11 ` Jonathan Cameron
2023-09-13 13:11 ` Jonathan Cameron
2023-09-13 9:44 ` guojinhui
2023-09-13 9:44 ` guojinhui
2023-09-13 11:23 ` Robin Murphy
2023-09-13 11:23 ` Robin Murphy
2023-09-13 13:06 ` Jonathan Cameron [this message]
2023-09-13 13:06 ` Jonathan Cameron
2023-09-13 18:57 ` Robin Murphy
2023-09-13 18:57 ` Robin Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230913140629.000052ad@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=catalin.marinas@arm.com \
--cc=guojinhui.liam@bytedance.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan.x@bytedance.com \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.