* [PATCH] lscpu: fix incorrect number of sockets during hotplug
@ 2024-10-18 10:43 Anjali K
2024-10-20 10:30 ` Anushree Mathur
2024-10-30 10:55 ` Karel Zak
0 siblings, 2 replies; 4+ messages in thread
From: Anjali K @ 2024-10-18 10:43 UTC (permalink / raw)
To: util-linux; +Cc: anushree.mathur, anjalik
lscpu sometimes shows incorrect 'Socket(s)' value if a hotplug operation
is running.
On a 32 CPU 2-socket system, the expected output is as shown below:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Model name: POWER10 (architected), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 2
Socket(s): 2
On the same system, if hotplug is running along with lscpu, it shows
"Socket(s):" as 3 and 4 incorrectly sometimes.
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-11,16-31
Off-line CPU(s) list: 12-15
Model name: POWER10 (architected), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 3
The number of sockets is considered as the number of unique core_siblings
CPU groups. The issues causing the number of sockets to sometimes be
higher during hotplug is:
1. The core_siblings of CPUs on the same socket are different because a CPU
on the socket has been onlined/offlined in between. In the below example,
nr sockets was wrongly incremented for CPU 5 though CPU 4 and 5 are on the
same socket because their core_siblings was different as CPU 12 was onlined
in between.
CPU: 4
core_siblings: ff f0 0 0 0 0 0 0
CPU: 5
core_siblings: ff f8 0 0 0 0 0 0
2. The core_siblings file of a CPU is created when a CPU is onlined. It may
have an invalid value for some time until the online operation is fully
complete. In the below example, nr sockets is wrongly incremented because
the core_siblings of CPU 14 was 0 as it had just been onlined.
CPU: 14
core_siblings: 0 0 0 0 0 0 0 0
To fix this, make the below changes:
1. Instead of considering CPUs to be on different sockets if their
core_siblings masks are unequal, consider them to be on different sockets
only if their core_siblings masks don't have even one common CPU. Then CPUs
on the same socket will be correctly identified even if offline/online
operations happen while they are read if at least one CPU in the socket is
online during both reads.
2. Check if a CPU's hotplug operation has been completed before using its
core_siblings file
Reported-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
---
sys-utils/lscpu-topology.c | 70 ++++++++++++++++++++++++++++++++++----
1 file changed, 63 insertions(+), 7 deletions(-)
diff --git a/sys-utils/lscpu-topology.c b/sys-utils/lscpu-topology.c
index e3742e319..a24d93c03 100644
--- a/sys-utils/lscpu-topology.c
+++ b/sys-utils/lscpu-topology.c
@@ -17,21 +17,33 @@
#include <unistd.h>
#include <string.h>
#include <stdio.h>
+#include <ctype.h>
#include "lscpu.h"
/* add @set to the @ary, unnecessary set is deallocated. */
-static int add_cpuset_to_array(cpu_set_t **ary, size_t *items, cpu_set_t *set, size_t setsize)
+static int add_cpuset_to_array(cpu_set_t **ary, size_t *items, cpu_set_t *set, size_t setsize, int maxcpus)
{
+ cpu_set_t *common_cpus_set;
size_t i;
if (!ary)
return -EINVAL;
+ common_cpus_set = CPU_ALLOC(maxcpus);
+ if (!common_cpus_set)
+ return -EINVAL;
+
+ /*
+ * Check if @set has no cpu in common with the cpusets
+ * saved in @ary and if so append @set to @ary.
+ */
for (i = 0; i < *items; i++) {
- if (CPU_EQUAL_S(setsize, set, ary[i]))
+ CPU_AND_S(setsize, common_cpus_set, set, ary[i]);
+ if (CPU_COUNT_S(setsize, common_cpus_set))
break;
}
+ CPU_FREE(common_cpus_set);
if (i == *items) {
ary[*items] = set;
++*items;
@@ -98,13 +110,47 @@ void lscpu_sort_caches(struct lscpu_cache *caches, size_t n)
qsort(caches, n, sizeof(struct lscpu_cache), cmp_cache);
}
+/*
+ * Get the hotplug state number representing a completely online
+ * cpu from /sys/devices/system/cpu/hotplug/state
+ */
+static int get_online_state(struct path_cxt *sys)
+{
+ int hp_online_state_val, page_size, rc;
+ char *buf, *strp;
+
+ hp_online_state_val = -1;
+
+ /* sysfs text files have size = page size */
+ page_size = getpagesize();
+
+ buf = (char *)malloc(page_size);
+ if (!buf)
+ goto done;
+ rc = ul_path_readf_buffer(sys, buf, page_size, "hotplug/states");
+ if (rc <= 0)
+ goto done;
+
+ strp = strstr(buf, ": online");
+ if (!strp)
+ goto done;
+
+ strp--; /* get digits before ': online' */
+ while (strp >= buf && isdigit(*strp))
+ strp--;
+ hp_online_state_val = atoi(strp + 1);
+
+done:
+ free(buf);
+ return hp_online_state_val;
+}
/* Read topology for specified type */
static int cputype_read_topology(struct lscpu_cxt *cxt, struct lscpu_cputype *ct)
{
size_t i, npos;
struct path_cxt *sys;
- int nthreads = 0, sw_topo = 0;
+ int nthreads = 0, sw_topo = 0, rc, hp_state, hp_online_state;
FILE *fd;
sys = cxt->syscpu; /* /sys/devices/system/cpu/ */
@@ -112,6 +158,7 @@ static int cputype_read_topology(struct lscpu_cxt *cxt, struct lscpu_cputype *ct
DBG(TYPE, ul_debugobj(ct, "reading %s/%s/%s topology",
ct->vendor ?: "", ct->model ?: "", ct->modelname ?:""));
+ hp_online_state = get_online_state(sys);
for (i = 0; i < cxt->npossibles; i++) {
struct lscpu_cpu *cpu = cxt->cpus[i];
@@ -127,6 +174,15 @@ static int cputype_read_topology(struct lscpu_cxt *cxt, struct lscpu_cputype *ct
"cpu%d/topology/thread_siblings", num) != 0)
continue;
+ /*
+ * Ignore cpus which are not fully online.
+ * If hp_online_state is negative/zero or rc is negative,
+ * online state could not be read correctly, skip this check.
+ */
+ rc = ul_path_readf_s32(sys, &hp_state, "cpu%d/hotplug/state", num);
+ if (hp_online_state > 0 && rc >= 0 && hp_state != hp_online_state)
+ continue;
+
/* read topology maps */
ul_path_readf_cpuset(sys, &thread_siblings, cxt->maxcpus,
"cpu%d/topology/thread_siblings", num);
@@ -163,13 +219,13 @@ static int cputype_read_topology(struct lscpu_cxt *cxt, struct lscpu_cputype *ct
/* add to topology maps */
if (thread_siblings)
- add_cpuset_to_array(ct->coremaps, &ct->ncores, thread_siblings, cxt->setsize);
+ add_cpuset_to_array(ct->coremaps, &ct->ncores, thread_siblings, cxt->setsize, cxt->maxcpus);
if (core_siblings)
- add_cpuset_to_array(ct->socketmaps, &ct->nsockets, core_siblings, cxt->setsize);
+ add_cpuset_to_array(ct->socketmaps, &ct->nsockets, core_siblings, cxt->setsize, cxt->maxcpus);
if (book_siblings)
- add_cpuset_to_array(ct->bookmaps, &ct->nbooks, book_siblings, cxt->setsize);
+ add_cpuset_to_array(ct->bookmaps, &ct->nbooks, book_siblings, cxt->setsize, cxt->maxcpus);
if (drawer_siblings)
- add_cpuset_to_array(ct->drawermaps, &ct->ndrawers, drawer_siblings, cxt->setsize);
+ add_cpuset_to_array(ct->drawermaps, &ct->ndrawers, drawer_siblings, cxt->setsize, cxt->maxcpus);
}
base-commit: fda5dc760a0501554f079558ebd95a4f91fba7cd
--
2.46.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] lscpu: fix incorrect number of sockets during hotplug
2024-10-18 10:43 [PATCH] lscpu: fix incorrect number of sockets during hotplug Anjali K
@ 2024-10-20 10:30 ` Anushree Mathur
2024-10-30 10:55 ` Karel Zak
1 sibling, 0 replies; 4+ messages in thread
From: Anushree Mathur @ 2024-10-20 10:30 UTC (permalink / raw)
To: util-linux; +Cc: Anjali K, Anushree Mathur
Hi,
I have verified the patch and it works fine!Here is my analysis:
LSCPU O/P on my system without enabling and disabling cpus
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 384
On-line CPU(s) list: 0-383
Model name: POWER10 (raw), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 12
Socket(s): 4
Physical sockets: 2
Physical chips: 2
Physical cores/chip: 12
Before applying the patch
While repeatedly onlining/offlining cpus on my system , saw that socket
number is higher. It doesn't happen if i do online/offline of cpu single
time.
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 384
On-line CPU(s) list: 0-3,5-7,9,11-15,17-383
Off-line CPU(s) list: 4,8,10,16
Model name: POWER10 (raw), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 9
Socket(s): 5
Physical sockets: 2
Physical chips: 2
Physical cores/chip: 12
After applying the patch :
Tried the same scenario again to online and offline the cpus in a loop
continuously and it didn't show any wrong topology:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 384
On-line CPU(s) list: 0-3,5-7,9,11-15,17-383
Off-line CPU(s) list: 4,8,10,16
Model name: POWER10 (raw), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 12
Socket(s): 4
Physical sockets: 2
Physical chips: 2
Physical cores/chip: 12
Tested-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Thanks,
Anushree Mathur
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] lscpu: fix incorrect number of sockets during hotplug
2024-10-18 10:43 [PATCH] lscpu: fix incorrect number of sockets during hotplug Anjali K
2024-10-20 10:30 ` Anushree Mathur
@ 2024-10-30 10:55 ` Karel Zak
2024-11-04 6:37 ` Anjali K
1 sibling, 1 reply; 4+ messages in thread
From: Karel Zak @ 2024-10-30 10:55 UTC (permalink / raw)
To: Anjali K; +Cc: util-linux, anushree.mathur
HI,
sorry for delay with review.
On Fri, Oct 18, 2024 at 04:13:35PM GMT, Anjali K wrote:
> /* add @set to the @ary, unnecessary set is deallocated. */
> -static int add_cpuset_to_array(cpu_set_t **ary, size_t *items, cpu_set_t *set, size_t setsize)
> +static int add_cpuset_to_array(cpu_set_t **ary, size_t *items, cpu_set_t *set, size_t setsize, int maxcpus)
> {
> + cpu_set_t *common_cpus_set;
> size_t i;
>
> if (!ary)
> return -EINVAL;
>
> + common_cpus_set = CPU_ALLOC(maxcpus);
> + if (!common_cpus_set)
> + return -EINVAL;
Would be better to allocate this only once in cputype_read_topology()
and reuse it for all the arrays and CPUs?
The rest looks good. Thanks!
Karel
--
Karel Zak <kzak@redhat.com>
http://karelzak.blogspot.com
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-11-04 6:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-18 10:43 [PATCH] lscpu: fix incorrect number of sockets during hotplug Anjali K
2024-10-20 10:30 ` Anushree Mathur
2024-10-30 10:55 ` Karel Zak
2024-11-04 6:37 ` Anjali K
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).