* [PATCH] lscpu: Fix issue found on CPU hot-remove @ 2012-10-23 19:49 Toshi Kani 2012-11-07 11:47 ` Karel Zak 0 siblings, 1 reply; 6+ messages in thread From: Toshi Kani @ 2012-10-23 19:49 UTC (permalink / raw) To: util-linux, kzak; +Cc: heiko.carstens, kerolasa, Toshi Kani read_basicinfo() relies on sysfs cpu directories "/sys/devices/system/cpu/cpu%d" with assumption that cpu logical number %d is always sequentially assigned for all CPUs. However, this assumption is not correct with CPU hot-remove operation since it removes a target sysfs cpu directory after it is ejected. As a result, lscpu may not recognize all CPUs. The issue can be easily reproduced on KVM or VirtualBox, which supports CPU eject operation, as follows. 1) The system has 4 CPUs $ lscpu -a -e CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE 0 0 0 0 0:0:0 yes 1 0 1 1 1:1:1 yes 2 0 2 2 2:2:2 yes 3 0 3 3 3:3:3 yes 2) Eject cpu2 # echo 1 > /sys/bus/acpi/devices/LNXCPU:02/eject 3) lscpu no longer recognizes cpu3 after cpu2 is ejected $ lscpu -a -e CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE 0 0 0 0 0:0:0 yes 1 0 1 1 1:1:1 yes The following changes are made to address this issue. - Use maxcpus to allocate and parse bitmaps. - Set desc->ncpu from cpu/present, which includes both on-line and off-line CPUs. - Add is_cpu_present() to check if a CPU is present. Ejected CPUs are not present. Signed-off-by: Toshi Kani <toshi.kani@hp.com> --- sys-utils/lscpu.c | 53 ++++++++++++++++++++++++++++++++--------------------- 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/sys-utils/lscpu.c b/sys-utils/lscpu.c index 25a0273..946eb41 100644 --- a/sys-utils/lscpu.c +++ b/sys-utils/lscpu.c @@ -150,7 +150,8 @@ struct lscpu_desc { int dispatching; /* none, horizontal or vertical */ int mode; /* rm, lm or/and tm */ - int ncpus; /* number of CPUs */ + int ncpus; /* number of present CPUs */ + cpu_set_t *present; /* mask with present CPUs */ cpu_set_t *online; /* mask with online CPUs */ int nnodes; /* number of NUMA modes */ @@ -204,8 +205,11 @@ struct lscpu_modifier { static int maxcpus; /* size in bits of kernel cpu mask */ #define is_cpu_online(_d, _cpu) \ - ((_d) && (_d)->online ? \ - CPU_ISSET_S((_cpu), CPU_ALLOC_SIZE(maxcpus), (_d)->online) : 0) + ((_d) && (_d)->online ? \ + CPU_ISSET_S((_cpu), CPU_ALLOC_SIZE(maxcpus), (_d)->online) : 0) +#define is_cpu_present(_d, _cpu) \ + ((_d) && (_d)->present ? \ + CPU_ISSET_S((_cpu), CPU_ALLOC_SIZE(maxcpus), (_d)->present) : 0) /* * IDs @@ -340,10 +344,6 @@ read_basicinfo(struct lscpu_desc *desc, struct lscpu_modifier *mod) err(EXIT_FAILURE, _("error: uname failed")); desc->arch = xstrdup(utsbuf.machine); - /* count CPU(s) */ - while(path_exist(_PATH_SYS_SYSTEM "/cpu/cpu%d", desc->ncpus)) - desc->ncpus++; - /* details */ while (fgets(buf, sizeof(buf), fp) != NULL) { if (lookup(buf, "vendor", &desc->vendor)) ; @@ -391,7 +391,14 @@ read_basicinfo(struct lscpu_desc *desc, struct lscpu_modifier *mod) if (maxcpus <= 0) /* error or we are reading some /sys snapshot instead of the * real /sys, let's use any crazy number... */ - maxcpus = desc->ncpus > 2048 ? desc->ncpus : 2048; + maxcpus = 2048; + + /* get mask for present CPUs */ + if (path_exist(_PATH_SYS_SYSTEM "/cpu/present")) { + size_t setsize = CPU_ALLOC_SIZE(maxcpus); + desc->present = path_cpulist(maxcpus, _PATH_SYS_SYSTEM "/cpu/present"); + desc->ncpus = CPU_COUNT_S(setsize, desc->present); + } /* get mask for online CPUs */ if (path_exist(_PATH_SYS_SYSTEM "/cpu/online")) { @@ -626,16 +633,16 @@ read_topology(struct lscpu_desc *desc, int num) */ if (!desc->nthreads) desc->nthreads = nbooks * nsockets * ncores * nthreads; - /* For each map we make sure that it can have up to ncpus + /* For each map we make sure that it can have up to maxcpus * entries. This is because we cannot reliably calculate the * number of cores, sockets and books on all architectures. * E.g. completely virtualized architectures like s390 may * have multiple sockets of different sizes. */ - desc->coremaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); - desc->socketmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); + desc->coremaps = xcalloc(maxcpus, sizeof(cpu_set_t *)); + desc->socketmaps = xcalloc(maxcpus, sizeof(cpu_set_t *)); if (book_siblings) - desc->bookmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); + desc->bookmaps = xcalloc(maxcpus, sizeof(cpu_set_t *)); } add_cpuset_to_array(desc->socketmaps, &desc->nsockets, core_siblings); @@ -653,7 +660,7 @@ read_polarization(struct lscpu_desc *desc, int num) if (!path_exist(_PATH_SYS_CPU "/cpu%d/polarization", num)) return; if (!desc->polarization) - desc->polarization = xcalloc(desc->ncpus, sizeof(int)); + desc->polarization = xcalloc(maxcpus, sizeof(int)); path_getstr(mode, sizeof(mode), _PATH_SYS_CPU "/cpu%d/polarization", num); if (strncmp(mode, "vertical:low", sizeof(mode)) == 0) desc->polarization[num] = POLAR_VLOW; @@ -673,7 +680,7 @@ read_address(struct lscpu_desc *desc, int num) if (!path_exist(_PATH_SYS_CPU "/cpu%d/address", num)) return; if (!desc->addresses) - desc->addresses = xcalloc(desc->ncpus, sizeof(int)); + desc->addresses = xcalloc(maxcpus, sizeof(int)); desc->addresses[num] = path_getnum(_PATH_SYS_CPU "/cpu%d/address", num); } @@ -683,7 +690,7 @@ read_configured(struct lscpu_desc *desc, int num) if (!path_exist(_PATH_SYS_CPU "/cpu%d/configure", num)) return; if (!desc->configured) - desc->configured = xcalloc(desc->ncpus, sizeof(int)); + desc->configured = xcalloc(maxcpus, sizeof(int)); desc->configured[num] = path_getnum(_PATH_SYS_CPU "/cpu%d/configure", num); } @@ -756,7 +763,7 @@ read_cache(struct lscpu_desc *desc, int num) num, i); if (!ca->sharedmaps) - ca->sharedmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); + ca->sharedmaps = xcalloc(maxcpus, sizeof(cpu_set_t *)); add_cpuset_to_array(ca->sharedmaps, &ca->nsharedmaps, map); } } @@ -982,13 +989,15 @@ print_parsable(struct lscpu_desc *desc, int cols[], int ncols, /* * Data */ - for (i = 0; i < desc->ncpus; i++) { + for (i = 0; i < maxcpus; i++) { int c; if (!mod->offline && desc->online && !is_cpu_online(desc, i)) continue; if (!mod->online && desc->online && is_cpu_online(desc, i)) continue; + if (desc->present && !is_cpu_present(desc, i)) + continue; for (c = 0; c < ncols; c++) { if (mod->compat && cols[c] == COL_CACHE) { if (!desc->ncaches) @@ -1026,7 +1035,7 @@ print_readable(struct lscpu_desc *desc, int cols[], int ncols, tt_define_column(tt, xstrdup(data), 0, 0); } - for (i = 0; i < desc->ncpus; i++) { + for (i = 0; i < maxcpus; i++) { int c; struct tt_line *line; @@ -1034,6 +1043,8 @@ print_readable(struct lscpu_desc *desc, int cols[], int ncols, continue; if (!mod->online && desc->online && is_cpu_online(desc, i)) continue; + if (desc->present && !is_cpu_present(desc, i)) + continue; line = tt_add_line(tt, NULL); @@ -1117,8 +1128,8 @@ print_summary(struct lscpu_desc *desc, struct lscpu_modifier *mod) if (!set) err(EXIT_FAILURE, _("failed to callocate cpu set")); CPU_ZERO_S(setsize, set); - for (i = 0; i < desc->ncpus; i++) { - if (!is_cpu_online(desc, i)) + for (i = 0; i < maxcpus; i++) { + if (!is_cpu_online(desc, i) && is_cpu_present(desc, i)) CPU_SET_S(i, setsize, set); } print_cpuset(mod->hex ? _("Off-line CPU(s) mask:") : @@ -1339,7 +1350,7 @@ int main(int argc, char *argv[]) read_basicinfo(desc, mod); - for (i = 0; i < desc->ncpus; i++) { + for (i = 0; i < maxcpus; i++) { read_topology(desc, i); read_cache(desc, i); read_polarization(desc, i); -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] lscpu: Fix issue found on CPU hot-remove 2012-10-23 19:49 [PATCH] lscpu: Fix issue found on CPU hot-remove Toshi Kani @ 2012-11-07 11:47 ` Karel Zak 2012-11-07 23:29 ` Toshi Kani 0 siblings, 1 reply; 6+ messages in thread From: Karel Zak @ 2012-11-07 11:47 UTC (permalink / raw) To: Toshi Kani; +Cc: util-linux, heiko.carstens, kerolasa On Tue, Oct 23, 2012 at 01:49:46PM -0600, Toshi Kani wrote: > 1) The system has 4 CPUs > $ lscpu -a -e > CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE > 0 0 0 0 0:0:0 yes > 1 0 1 1 1:1:1 yes > 2 0 2 2 2:2:2 yes > 3 0 3 3 3:3:3 yes > > 2) Eject cpu2 > # echo 1 > /sys/bus/acpi/devices/LNXCPU:02/eject > > 3) lscpu no longer recognizes cpu3 after cpu2 is ejected > $ lscpu -a -e > CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE > 0 0 0 0 0:0:0 yes > 1 0 1 1 1:1:1 yes > > The following changes are made to address this issue. > - Use maxcpus to allocate and parse bitmaps. > - Set desc->ncpu from cpu/present, which includes both on-line > and off-line CPUs. In the directory tests we have a dump from some strange machine tests/ts/lscpu/dumps/sparc64-UltraSparc-T1.tar.gz where is 32 /sys/devices/system/cpu/cpu[0-9]* subdirectories, but the present mask is 0-23, all the masks from the dump: present: 0-23 possible: 0-31 online: 0-23 offline: 24-31 count: 32 (means ls -d sys/devices/system/cpu/cpu[0-9]* | wc -l) strange, right? But I guess we can ignore this dump, according to kernel Documentation/cpu-hotplug.txt the cpu[0-9]* subdirectories have to follow the present mask. Maybe the dump is broken or so.. Note that many of our old dumps in the tests/ts/lscpu/dumps don't contain /sys/devices/system/cpu/{present,online,possible} masks, but I will drop these incomplete dumps. > - desc->coremaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); > - desc->socketmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); > + desc->coremaps = xcalloc(maxcpus, sizeof(cpu_set_t *)); > + desc->socketmaps = xcalloc(maxcpus, sizeof(cpu_set_t *)); [...] > - for (i = 0; i < desc->ncpus; i++) { > + for (i = 0; i < maxcpus; i++) { > read_topology(desc, i); > read_cache(desc, i); > read_polarization(desc, i); What about to read /sys/devices/system/cpu/possible to determine the maximal possible number of CPUs on the machine. It seems like overkill to follow the maximal size of the cpu_set_t mask (=maxcpus). The "possible" mask is static and no affected by hot-plug. See the patch below. Karel diff --git a/sys-utils/lscpu.c b/sys-utils/lscpu.c index 25a0273..5b1a0af 100644 --- a/sys-utils/lscpu.c +++ b/sys-utils/lscpu.c @@ -150,7 +150,9 @@ struct lscpu_desc { int dispatching; /* none, horizontal or vertical */ int mode; /* rm, lm or/and tm */ - int ncpus; /* number of CPUs */ + int ncpuspos; /* maximal possible CPUs */ + int ncpus; /* number of present CPUs */ + cpu_set_t *present; /* mask with present CPUs */ cpu_set_t *online; /* mask with online CPUs */ int nnodes; /* number of NUMA modes */ @@ -204,8 +206,11 @@ struct lscpu_modifier { static int maxcpus; /* size in bits of kernel cpu mask */ #define is_cpu_online(_d, _cpu) \ - ((_d) && (_d)->online ? \ - CPU_ISSET_S((_cpu), CPU_ALLOC_SIZE(maxcpus), (_d)->online) : 0) + ((_d) && (_d)->online ? \ + CPU_ISSET_S((_cpu), CPU_ALLOC_SIZE(maxcpus), (_d)->online) : 0) +#define is_cpu_present(_d, _cpu) \ + ((_d) && (_d)->present ? \ + CPU_ISSET_S((_cpu), CPU_ALLOC_SIZE(maxcpus), (_d)->present) : 0) /* * IDs @@ -334,16 +339,13 @@ read_basicinfo(struct lscpu_desc *desc, struct lscpu_modifier *mod) FILE *fp = path_fopen("r", 1, _PATH_PROC_CPUINFO); char buf[BUFSIZ]; struct utsname utsbuf; + size_t setsize; /* architecture */ if (uname(&utsbuf) == -1) err(EXIT_FAILURE, _("error: uname failed")); desc->arch = xstrdup(utsbuf.machine); - /* count CPU(s) */ - while(path_exist(_PATH_SYS_SYSTEM "/cpu/cpu%d", desc->ncpus)) - desc->ncpus++; - /* details */ while (fgets(buf, sizeof(buf), fp) != NULL) { if (lookup(buf, "vendor", &desc->vendor)) ; @@ -391,11 +393,27 @@ read_basicinfo(struct lscpu_desc *desc, struct lscpu_modifier *mod) if (maxcpus <= 0) /* error or we are reading some /sys snapshot instead of the * real /sys, let's use any crazy number... */ - maxcpus = desc->ncpus > 2048 ? desc->ncpus : 2048; + maxcpus = 2048; + + setsize = CPU_ALLOC_SIZE(maxcpus); + + if (path_exist(_PATH_SYS_SYSTEM "/cpu/possible")) { + cpu_set_t *tmp = path_cpulist(maxcpus, _PATH_SYS_SYSTEM "/cpu/possible"); + desc->ncpuspos = CPU_COUNT_S(setsize, tmp); + cpuset_free(tmp); + } else + err(EXIT_FAILURE, _("failed to determine number of CPUs: %s"), + _PATH_SYS_SYSTEM "/cpu/possible"); + + + /* get mask for present CPUs */ + if (path_exist(_PATH_SYS_SYSTEM "/cpu/present")) { + desc->present = path_cpulist(maxcpus, _PATH_SYS_SYSTEM "/cpu/present"); + desc->ncpus = CPU_COUNT_S(setsize, desc->present); + } /* get mask for online CPUs */ if (path_exist(_PATH_SYS_SYSTEM "/cpu/online")) { - size_t setsize = CPU_ALLOC_SIZE(maxcpus); desc->online = path_cpulist(maxcpus, _PATH_SYS_SYSTEM "/cpu/online"); desc->nthreads = CPU_COUNT_S(setsize, desc->online); } @@ -626,16 +644,16 @@ read_topology(struct lscpu_desc *desc, int num) */ if (!desc->nthreads) desc->nthreads = nbooks * nsockets * ncores * nthreads; - /* For each map we make sure that it can have up to ncpus + /* For each map we make sure that it can have up to ncpuspos * entries. This is because we cannot reliably calculate the * number of cores, sockets and books on all architectures. * E.g. completely virtualized architectures like s390 may * have multiple sockets of different sizes. */ - desc->coremaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); - desc->socketmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); + desc->coremaps = xcalloc(desc->ncpuspos, sizeof(cpu_set_t *)); + desc->socketmaps = xcalloc(desc->ncpuspos, sizeof(cpu_set_t *)); if (book_siblings) - desc->bookmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); + desc->bookmaps = xcalloc(desc->ncpuspos, sizeof(cpu_set_t *)); } add_cpuset_to_array(desc->socketmaps, &desc->nsockets, core_siblings); @@ -653,7 +671,7 @@ read_polarization(struct lscpu_desc *desc, int num) if (!path_exist(_PATH_SYS_CPU "/cpu%d/polarization", num)) return; if (!desc->polarization) - desc->polarization = xcalloc(desc->ncpus, sizeof(int)); + desc->polarization = xcalloc(desc->ncpuspos, sizeof(int)); path_getstr(mode, sizeof(mode), _PATH_SYS_CPU "/cpu%d/polarization", num); if (strncmp(mode, "vertical:low", sizeof(mode)) == 0) desc->polarization[num] = POLAR_VLOW; @@ -673,7 +691,7 @@ read_address(struct lscpu_desc *desc, int num) if (!path_exist(_PATH_SYS_CPU "/cpu%d/address", num)) return; if (!desc->addresses) - desc->addresses = xcalloc(desc->ncpus, sizeof(int)); + desc->addresses = xcalloc(desc->ncpuspos, sizeof(int)); desc->addresses[num] = path_getnum(_PATH_SYS_CPU "/cpu%d/address", num); } @@ -683,7 +701,7 @@ read_configured(struct lscpu_desc *desc, int num) if (!path_exist(_PATH_SYS_CPU "/cpu%d/configure", num)) return; if (!desc->configured) - desc->configured = xcalloc(desc->ncpus, sizeof(int)); + desc->configured = xcalloc(desc->ncpuspos, sizeof(int)); desc->configured[num] = path_getnum(_PATH_SYS_CPU "/cpu%d/configure", num); } @@ -756,7 +774,7 @@ read_cache(struct lscpu_desc *desc, int num) num, i); if (!ca->sharedmaps) - ca->sharedmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); + ca->sharedmaps = xcalloc(desc->ncpuspos, sizeof(cpu_set_t *)); add_cpuset_to_array(ca->sharedmaps, &ca->nsharedmaps, map); } } @@ -982,13 +1000,15 @@ print_parsable(struct lscpu_desc *desc, int cols[], int ncols, /* * Data */ - for (i = 0; i < desc->ncpus; i++) { + for (i = 0; i < desc->ncpuspos; i++) { int c; if (!mod->offline && desc->online && !is_cpu_online(desc, i)) continue; if (!mod->online && desc->online && is_cpu_online(desc, i)) continue; + if (desc->present && !is_cpu_present(desc, i)) + continue; for (c = 0; c < ncols; c++) { if (mod->compat && cols[c] == COL_CACHE) { if (!desc->ncaches) @@ -1026,7 +1046,7 @@ print_readable(struct lscpu_desc *desc, int cols[], int ncols, tt_define_column(tt, xstrdup(data), 0, 0); } - for (i = 0; i < desc->ncpus; i++) { + for (i = 0; i < desc->ncpuspos; i++) { int c; struct tt_line *line; @@ -1034,6 +1054,8 @@ print_readable(struct lscpu_desc *desc, int cols[], int ncols, continue; if (!mod->online && desc->online && is_cpu_online(desc, i)) continue; + if (desc->present && !is_cpu_present(desc, i)) + continue; line = tt_add_line(tt, NULL); @@ -1117,8 +1139,8 @@ print_summary(struct lscpu_desc *desc, struct lscpu_modifier *mod) if (!set) err(EXIT_FAILURE, _("failed to callocate cpu set")); CPU_ZERO_S(setsize, set); - for (i = 0; i < desc->ncpus; i++) { - if (!is_cpu_online(desc, i)) + for (i = 0; i < desc->ncpuspos; i++) { + if (!is_cpu_online(desc, i) && is_cpu_present(desc, i)) CPU_SET_S(i, setsize, set); } print_cpuset(mod->hex ? _("Off-line CPU(s) mask:") : @@ -1339,7 +1361,7 @@ int main(int argc, char *argv[]) read_basicinfo(desc, mod); - for (i = 0; i < desc->ncpus; i++) { + for (i = 0; i < desc->ncpuspos; i++) { read_topology(desc, i); read_cache(desc, i); read_polarization(desc, i); -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] lscpu: Fix issue found on CPU hot-remove 2012-11-07 11:47 ` Karel Zak @ 2012-11-07 23:29 ` Toshi Kani 2012-11-08 4:32 ` Ben Hutchings 2012-11-12 15:42 ` Karel Zak 0 siblings, 2 replies; 6+ messages in thread From: Toshi Kani @ 2012-11-07 23:29 UTC (permalink / raw) To: Karel Zak, ben; +Cc: util-linux, heiko.carstens, kerolasa On Wed, 2012-11-07 at 12:47 +0100, Karel Zak wrote: > On Tue, Oct 23, 2012 at 01:49:46PM -0600, Toshi Kani wrote: > > 1) The system has 4 CPUs > > $ lscpu -a -e > > CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE > > 0 0 0 0 0:0:0 yes > > 1 0 1 1 1:1:1 yes > > 2 0 2 2 2:2:2 yes > > 3 0 3 3 3:3:3 yes > > > > 2) Eject cpu2 > > # echo 1 > /sys/bus/acpi/devices/LNXCPU:02/eject > > > > 3) lscpu no longer recognizes cpu3 after cpu2 is ejected > > $ lscpu -a -e > > CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE > > 0 0 0 0 0:0:0 yes > > 1 0 1 1 1:1:1 yes > > > > The following changes are made to address this issue. > > - Use maxcpus to allocate and parse bitmaps. > > - Set desc->ncpu from cpu/present, which includes both on-line > > and off-line CPUs. > > In the directory tests we have a dump from some strange machine > > tests/ts/lscpu/dumps/sparc64-UltraSparc-T1.tar.gz > > where is 32 /sys/devices/system/cpu/cpu[0-9]* subdirectories, but the > present mask is 0-23, all the masks from the dump: > > present: 0-23 > possible: 0-31 > online: 0-23 > offline: 24-31 > count: 32 (means ls -d sys/devices/system/cpu/cpu[0-9]* | wc -l) > > strange, right? But I guess we can ignore this dump, according to > kernel Documentation/cpu-hotplug.txt the cpu[0-9]* subdirectories > have to follow the present mask. Maybe the dump is broken or so.. Hi Karel, Thanks for the updated patch. It looks good. I have tested it on my setup and it worked fine as well. I agree that the present mask and cpu directories need to be consistent. I looked at the kernel code and found the following change, which might be the cause of this issue. ==== commit 9f13a1fd452f11c18004ba2422a6384b424ec8a9 Author: Ben Hutchings <ben@decadent.org.uk> Date: Tue Jan 10 03:04:32 2012 +0000 cpu: Register a generic CPU device on architectures that currently do not frv, h8300, m68k, microblaze, openrisc, score, um and xtensa currently do not register a CPU device. Add the config option GENERIC_CPU_DEVICES which causes a generic CPU device to be registered for each present CPU, and make all these architectures select it. Richard Weinberger <richard@nod.at> covered UML and suggested using per_cpu. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> ==== Hi Ben, In the change log above, you described that a generic CPU device to be registered for each "present" CPU. However, in the code below, it used for_each_possible_cpu(), instead of for_each_present_cpu(). Is this what you intended? +static void __init cpu_dev_register_generic(void) +{ +#ifdef CONFIG_GENERIC_CPU_DEVICES + int i; + + for_each_possible_cpu(i) { + if (register_cpu(&per_cpu(cpu_devices, i), i)) + panic("Failed to register CPU device"); + } +#endif +} + Thanks, -Toshi > Note that many of our old dumps in the tests/ts/lscpu/dumps don't > contain /sys/devices/system/cpu/{present,online,possible} masks, but > I will drop these incomplete dumps. > > > - desc->coremaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); > > - desc->socketmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); > > + desc->coremaps = xcalloc(maxcpus, sizeof(cpu_set_t *)); > > + desc->socketmaps = xcalloc(maxcpus, sizeof(cpu_set_t *)); > > [...] > > > - for (i = 0; i < desc->ncpus; i++) { > > + for (i = 0; i < maxcpus; i++) { > > read_topology(desc, i); > > read_cache(desc, i); > > read_polarization(desc, i); > > What about to read /sys/devices/system/cpu/possible to determine the > maximal possible number of CPUs on the machine. It seems like > overkill to follow the maximal size of the cpu_set_t mask (=maxcpus). > > The "possible" mask is static and no affected by hot-plug. > > See the patch below. > > Karel > > > diff --git a/sys-utils/lscpu.c b/sys-utils/lscpu.c > index 25a0273..5b1a0af 100644 > --- a/sys-utils/lscpu.c > +++ b/sys-utils/lscpu.c > @@ -150,7 +150,9 @@ struct lscpu_desc { > int dispatching; /* none, horizontal or vertical */ > int mode; /* rm, lm or/and tm */ > > - int ncpus; /* number of CPUs */ > + int ncpuspos; /* maximal possible CPUs */ > + int ncpus; /* number of present CPUs */ > + cpu_set_t *present; /* mask with present CPUs */ > cpu_set_t *online; /* mask with online CPUs */ > > int nnodes; /* number of NUMA modes */ > @@ -204,8 +206,11 @@ struct lscpu_modifier { > static int maxcpus; /* size in bits of kernel cpu mask */ > > #define is_cpu_online(_d, _cpu) \ > - ((_d) && (_d)->online ? \ > - CPU_ISSET_S((_cpu), CPU_ALLOC_SIZE(maxcpus), (_d)->online) : 0) > + ((_d) && (_d)->online ? \ > + CPU_ISSET_S((_cpu), CPU_ALLOC_SIZE(maxcpus), (_d)->online) : 0) > +#define is_cpu_present(_d, _cpu) \ > + ((_d) && (_d)->present ? \ > + CPU_ISSET_S((_cpu), CPU_ALLOC_SIZE(maxcpus), (_d)->present) : 0) > > /* > * IDs > @@ -334,16 +339,13 @@ read_basicinfo(struct lscpu_desc *desc, struct lscpu_modifier *mod) > FILE *fp = path_fopen("r", 1, _PATH_PROC_CPUINFO); > char buf[BUFSIZ]; > struct utsname utsbuf; > + size_t setsize; > > /* architecture */ > if (uname(&utsbuf) == -1) > err(EXIT_FAILURE, _("error: uname failed")); > desc->arch = xstrdup(utsbuf.machine); > > - /* count CPU(s) */ > - while(path_exist(_PATH_SYS_SYSTEM "/cpu/cpu%d", desc->ncpus)) > - desc->ncpus++; > - > /* details */ > while (fgets(buf, sizeof(buf), fp) != NULL) { > if (lookup(buf, "vendor", &desc->vendor)) ; > @@ -391,11 +393,27 @@ read_basicinfo(struct lscpu_desc *desc, struct lscpu_modifier *mod) > if (maxcpus <= 0) > /* error or we are reading some /sys snapshot instead of the > * real /sys, let's use any crazy number... */ > - maxcpus = desc->ncpus > 2048 ? desc->ncpus : 2048; > + maxcpus = 2048; > + > + setsize = CPU_ALLOC_SIZE(maxcpus); > + > + if (path_exist(_PATH_SYS_SYSTEM "/cpu/possible")) { > + cpu_set_t *tmp = path_cpulist(maxcpus, _PATH_SYS_SYSTEM "/cpu/possible"); > + desc->ncpuspos = CPU_COUNT_S(setsize, tmp); > + cpuset_free(tmp); > + } else > + err(EXIT_FAILURE, _("failed to determine number of CPUs: %s"), > + _PATH_SYS_SYSTEM "/cpu/possible"); > + > + > + /* get mask for present CPUs */ > + if (path_exist(_PATH_SYS_SYSTEM "/cpu/present")) { > + desc->present = path_cpulist(maxcpus, _PATH_SYS_SYSTEM "/cpu/present"); > + desc->ncpus = CPU_COUNT_S(setsize, desc->present); > + } > > /* get mask for online CPUs */ > if (path_exist(_PATH_SYS_SYSTEM "/cpu/online")) { > - size_t setsize = CPU_ALLOC_SIZE(maxcpus); > desc->online = path_cpulist(maxcpus, _PATH_SYS_SYSTEM "/cpu/online"); > desc->nthreads = CPU_COUNT_S(setsize, desc->online); > } > @@ -626,16 +644,16 @@ read_topology(struct lscpu_desc *desc, int num) > */ > if (!desc->nthreads) > desc->nthreads = nbooks * nsockets * ncores * nthreads; > - /* For each map we make sure that it can have up to ncpus > + /* For each map we make sure that it can have up to ncpuspos > * entries. This is because we cannot reliably calculate the > * number of cores, sockets and books on all architectures. > * E.g. completely virtualized architectures like s390 may > * have multiple sockets of different sizes. > */ > - desc->coremaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); > - desc->socketmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); > + desc->coremaps = xcalloc(desc->ncpuspos, sizeof(cpu_set_t *)); > + desc->socketmaps = xcalloc(desc->ncpuspos, sizeof(cpu_set_t *)); > if (book_siblings) > - desc->bookmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); > + desc->bookmaps = xcalloc(desc->ncpuspos, sizeof(cpu_set_t *)); > } > > add_cpuset_to_array(desc->socketmaps, &desc->nsockets, core_siblings); > @@ -653,7 +671,7 @@ read_polarization(struct lscpu_desc *desc, int num) > if (!path_exist(_PATH_SYS_CPU "/cpu%d/polarization", num)) > return; > if (!desc->polarization) > - desc->polarization = xcalloc(desc->ncpus, sizeof(int)); > + desc->polarization = xcalloc(desc->ncpuspos, sizeof(int)); > path_getstr(mode, sizeof(mode), _PATH_SYS_CPU "/cpu%d/polarization", num); > if (strncmp(mode, "vertical:low", sizeof(mode)) == 0) > desc->polarization[num] = POLAR_VLOW; > @@ -673,7 +691,7 @@ read_address(struct lscpu_desc *desc, int num) > if (!path_exist(_PATH_SYS_CPU "/cpu%d/address", num)) > return; > if (!desc->addresses) > - desc->addresses = xcalloc(desc->ncpus, sizeof(int)); > + desc->addresses = xcalloc(desc->ncpuspos, sizeof(int)); > desc->addresses[num] = path_getnum(_PATH_SYS_CPU "/cpu%d/address", num); > } > > @@ -683,7 +701,7 @@ read_configured(struct lscpu_desc *desc, int num) > if (!path_exist(_PATH_SYS_CPU "/cpu%d/configure", num)) > return; > if (!desc->configured) > - desc->configured = xcalloc(desc->ncpus, sizeof(int)); > + desc->configured = xcalloc(desc->ncpuspos, sizeof(int)); > desc->configured[num] = path_getnum(_PATH_SYS_CPU "/cpu%d/configure", num); > } > > @@ -756,7 +774,7 @@ read_cache(struct lscpu_desc *desc, int num) > num, i); > > if (!ca->sharedmaps) > - ca->sharedmaps = xcalloc(desc->ncpus, sizeof(cpu_set_t *)); > + ca->sharedmaps = xcalloc(desc->ncpuspos, sizeof(cpu_set_t *)); > add_cpuset_to_array(ca->sharedmaps, &ca->nsharedmaps, map); > } > } > @@ -982,13 +1000,15 @@ print_parsable(struct lscpu_desc *desc, int cols[], int ncols, > /* > * Data > */ > - for (i = 0; i < desc->ncpus; i++) { > + for (i = 0; i < desc->ncpuspos; i++) { > int c; > > if (!mod->offline && desc->online && !is_cpu_online(desc, i)) > continue; > if (!mod->online && desc->online && is_cpu_online(desc, i)) > continue; > + if (desc->present && !is_cpu_present(desc, i)) > + continue; > for (c = 0; c < ncols; c++) { > if (mod->compat && cols[c] == COL_CACHE) { > if (!desc->ncaches) > @@ -1026,7 +1046,7 @@ print_readable(struct lscpu_desc *desc, int cols[], int ncols, > tt_define_column(tt, xstrdup(data), 0, 0); > } > > - for (i = 0; i < desc->ncpus; i++) { > + for (i = 0; i < desc->ncpuspos; i++) { > int c; > struct tt_line *line; > > @@ -1034,6 +1054,8 @@ print_readable(struct lscpu_desc *desc, int cols[], int ncols, > continue; > if (!mod->online && desc->online && is_cpu_online(desc, i)) > continue; > + if (desc->present && !is_cpu_present(desc, i)) > + continue; > > line = tt_add_line(tt, NULL); > > @@ -1117,8 +1139,8 @@ print_summary(struct lscpu_desc *desc, struct lscpu_modifier *mod) > if (!set) > err(EXIT_FAILURE, _("failed to callocate cpu set")); > CPU_ZERO_S(setsize, set); > - for (i = 0; i < desc->ncpus; i++) { > - if (!is_cpu_online(desc, i)) > + for (i = 0; i < desc->ncpuspos; i++) { > + if (!is_cpu_online(desc, i) && is_cpu_present(desc, i)) > CPU_SET_S(i, setsize, set); > } > print_cpuset(mod->hex ? _("Off-line CPU(s) mask:") : > @@ -1339,7 +1361,7 @@ int main(int argc, char *argv[]) > > read_basicinfo(desc, mod); > > - for (i = 0; i < desc->ncpus; i++) { > + for (i = 0; i < desc->ncpuspos; i++) { > read_topology(desc, i); > read_cache(desc, i); > read_polarization(desc, i); > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] lscpu: Fix issue found on CPU hot-remove 2012-11-07 23:29 ` Toshi Kani @ 2012-11-08 4:32 ` Ben Hutchings 2012-11-08 15:25 ` Toshi Kani 2012-11-12 15:42 ` Karel Zak 1 sibling, 1 reply; 6+ messages in thread From: Ben Hutchings @ 2012-11-08 4:32 UTC (permalink / raw) To: Toshi Kani; +Cc: Karel Zak, util-linux, heiko.carstens, kerolasa [-- Attachment #1: Type: text/plain, Size: 3300 bytes --] On Wed, 2012-11-07 at 16:29 -0700, Toshi Kani wrote: > On Wed, 2012-11-07 at 12:47 +0100, Karel Zak wrote: > > On Tue, Oct 23, 2012 at 01:49:46PM -0600, Toshi Kani wrote: > > > 1) The system has 4 CPUs > > > $ lscpu -a -e > > > CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE > > > 0 0 0 0 0:0:0 yes > > > 1 0 1 1 1:1:1 yes > > > 2 0 2 2 2:2:2 yes > > > 3 0 3 3 3:3:3 yes > > > > > > 2) Eject cpu2 > > > # echo 1 > /sys/bus/acpi/devices/LNXCPU:02/eject > > > > > > 3) lscpu no longer recognizes cpu3 after cpu2 is ejected > > > $ lscpu -a -e > > > CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE > > > 0 0 0 0 0:0:0 yes > > > 1 0 1 1 1:1:1 yes > > > > > > The following changes are made to address this issue. > > > - Use maxcpus to allocate and parse bitmaps. > > > - Set desc->ncpu from cpu/present, which includes both on-line > > > and off-line CPUs. > > > > In the directory tests we have a dump from some strange machine > > > > tests/ts/lscpu/dumps/sparc64-UltraSparc-T1.tar.gz > > > > where is 32 /sys/devices/system/cpu/cpu[0-9]* subdirectories, but the > > present mask is 0-23, all the masks from the dump: > > > > present: 0-23 > > possible: 0-31 > > online: 0-23 > > offline: 24-31 > > count: 32 (means ls -d sys/devices/system/cpu/cpu[0-9]* | wc -l) > > > > strange, right? But I guess we can ignore this dump, according to > > kernel Documentation/cpu-hotplug.txt the cpu[0-9]* subdirectories > > have to follow the present mask. Maybe the dump is broken or so.. > > Hi Karel, > > Thanks for the updated patch. It looks good. I have tested it on my > setup and it worked fine as well. > > I agree that the present mask and cpu directories need to be consistent. > I looked at the kernel code and found the following change, which might > be the cause of this issue. > > ==== > commit 9f13a1fd452f11c18004ba2422a6384b424ec8a9 > Author: Ben Hutchings <ben@decadent.org.uk> > Date: Tue Jan 10 03:04:32 2012 +0000 > > cpu: Register a generic CPU device on architectures that currently > do not > > frv, h8300, m68k, microblaze, openrisc, score, um and xtensa > currently > do not register a CPU device. Add the config option > GENERIC_CPU_DEVICES > which causes a generic CPU device to be registered for each present > CPU, > and make all these architectures select it. > > Richard Weinberger <richard@nod.at> covered UML and suggested using > per_cpu. > > Signed-off-by: Ben Hutchings <ben@decadent.org.uk> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > ==== > > > Hi Ben, > > In the change log above, you described that a generic CPU device to be > registered for each "present" CPU. However, in the code below, it used > for_each_possible_cpu(), instead of for_each_present_cpu(). Is this > what you intended? [...] I think the distinction doesn't matter - the GENERIC_CPU_DEVICES code does not support hotplug and therefore can only be used if present == possible. Ben. -- Ben Hutchings The program is absolutely right; therefore, the computer must be wrong. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] lscpu: Fix issue found on CPU hot-remove 2012-11-08 4:32 ` Ben Hutchings @ 2012-11-08 15:25 ` Toshi Kani 0 siblings, 0 replies; 6+ messages in thread From: Toshi Kani @ 2012-11-08 15:25 UTC (permalink / raw) To: Ben Hutchings; +Cc: Karel Zak, util-linux, heiko.carstens, kerolasa On Thu, 2012-11-08 at 04:32 +0000, Ben Hutchings wrote: > On Wed, 2012-11-07 at 16:29 -0700, Toshi Kani wrote: > > On Wed, 2012-11-07 at 12:47 +0100, Karel Zak wrote: > > > On Tue, Oct 23, 2012 at 01:49:46PM -0600, Toshi Kani wrote: > > > > 1) The system has 4 CPUs > > > > $ lscpu -a -e > > > > CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE > > > > 0 0 0 0 0:0:0 yes > > > > 1 0 1 1 1:1:1 yes > > > > 2 0 2 2 2:2:2 yes > > > > 3 0 3 3 3:3:3 yes > > > > > > > > 2) Eject cpu2 > > > > # echo 1 > /sys/bus/acpi/devices/LNXCPU:02/eject > > > > > > > > 3) lscpu no longer recognizes cpu3 after cpu2 is ejected > > > > $ lscpu -a -e > > > > CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE > > > > 0 0 0 0 0:0:0 yes > > > > 1 0 1 1 1:1:1 yes > > > > > > > > The following changes are made to address this issue. > > > > - Use maxcpus to allocate and parse bitmaps. > > > > - Set desc->ncpu from cpu/present, which includes both on-line > > > > and off-line CPUs. > > > > > > In the directory tests we have a dump from some strange machine > > > > > > tests/ts/lscpu/dumps/sparc64-UltraSparc-T1.tar.gz > > > > > > where is 32 /sys/devices/system/cpu/cpu[0-9]* subdirectories, but the > > > present mask is 0-23, all the masks from the dump: > > > > > > present: 0-23 > > > possible: 0-31 > > > online: 0-23 > > > offline: 24-31 > > > count: 32 (means ls -d sys/devices/system/cpu/cpu[0-9]* | wc -l) > > > > > > strange, right? But I guess we can ignore this dump, according to > > > kernel Documentation/cpu-hotplug.txt the cpu[0-9]* subdirectories > > > have to follow the present mask. Maybe the dump is broken or so.. > > > > Hi Karel, > > > > Thanks for the updated patch. It looks good. I have tested it on my > > setup and it worked fine as well. > > > > I agree that the present mask and cpu directories need to be consistent. > > I looked at the kernel code and found the following change, which might > > be the cause of this issue. > > > > ==== > > commit 9f13a1fd452f11c18004ba2422a6384b424ec8a9 > > Author: Ben Hutchings <ben@decadent.org.uk> > > Date: Tue Jan 10 03:04:32 2012 +0000 > > > > cpu: Register a generic CPU device on architectures that currently > > do not > > > > frv, h8300, m68k, microblaze, openrisc, score, um and xtensa > > currently > > do not register a CPU device. Add the config option > > GENERIC_CPU_DEVICES > > which causes a generic CPU device to be registered for each present > > CPU, > > and make all these architectures select it. > > > > Richard Weinberger <richard@nod.at> covered UML and suggested using > > per_cpu. > > > > Signed-off-by: Ben Hutchings <ben@decadent.org.uk> > > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > > ==== > > > > > > Hi Ben, > > > > In the change log above, you described that a generic CPU device to be > > registered for each "present" CPU. However, in the code below, it used > > for_each_possible_cpu(), instead of for_each_present_cpu(). Is this > > what you intended? > [...] > > I think the distinction doesn't matter - the GENERIC_CPU_DEVICES code > does not support hotplug and therefore can only be used if present == > possible. Hi Ben, You may be right about that on those architectures where you added GENERIC_CPU_DEVICES. We have seen strange cpu directories on Sparc64, but it does not seem to use the GENERIC_CPU_DEVICES code. So, this issue may not be related with your change. Sorry about that. Thanks, -Toshi > Ben. > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] lscpu: Fix issue found on CPU hot-remove 2012-11-07 23:29 ` Toshi Kani 2012-11-08 4:32 ` Ben Hutchings @ 2012-11-12 15:42 ` Karel Zak 1 sibling, 0 replies; 6+ messages in thread From: Karel Zak @ 2012-11-12 15:42 UTC (permalink / raw) To: Toshi Kani; +Cc: ben, util-linux, heiko.carstens, kerolasa On Wed, Nov 07, 2012 at 04:29:35PM -0700, Toshi Kani wrote: > Thanks for the updated patch. It looks good. I have tested it on my > setup and it worked fine as well. Applied, thanks. -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-11-12 15:42 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-23 19:49 [PATCH] lscpu: Fix issue found on CPU hot-remove Toshi Kani 2012-11-07 11:47 ` Karel Zak 2012-11-07 23:29 ` Toshi Kani 2012-11-08 4:32 ` Ben Hutchings 2012-11-08 15:25 ` Toshi Kani 2012-11-12 15:42 ` Karel Zak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).