* [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS
@ 2024-12-06 4:40 Ian Rogers
2024-12-06 4:40 ` [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096 Ian Rogers
` (8 more replies)
0 siblings, 9 replies; 17+ messages in thread
From: Ian Rogers @ 2024-12-06 4:40 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Kyle Meyer,
Ben Gainey, linux-perf-users, linux-kernel
Prompted by Kyle Meyer's <kyle.meyer@hpe.com> report of the
MAX_NR_CPUS value being too small, initiate some clean up of its
use. Kyle's patch is at the head of the series. The additional patches
hide MAX_NR_CPUS as exposed from cpumap.h, reduce its use by removing
perf_cpu_map__read, and try to better size the temporary CPU array in
perf_cpu_map__new.
Ian Rogers (7):
perf cpumap: Reduce transitive dependencies on libperf MAX_NR_CPUS
libperf cpumap: Hide/reduce scope of MAX_NR_CPUS
libperf cpumap: Be tolerant of newline at the end of a cpumask
perf pmu: Remove use of perf_cpu_map__read
libperf cpumap: Remove use of perf_cpu_map__read
libperf cpumap: Remove perf_cpu_map__read
libperf cpumap: Grow array of read CPUs in smaller increments
Kyle Meyer (1):
perf: Increase MAX_NR_CPUS to 4096
tools/lib/perf/Documentation/libperf.txt | 1 -
tools/lib/perf/cpumap.c | 82 +++++-------------------
tools/lib/perf/include/internal/cpumap.h | 4 --
tools/lib/perf/include/perf/cpumap.h | 2 -
tools/lib/perf/libperf.map | 1 -
tools/perf/builtin-annotate.c | 1 +
tools/perf/builtin-diff.c | 1 +
tools/perf/builtin-kwork.c | 1 +
tools/perf/builtin-mem.c | 1 +
tools/perf/builtin-sched.c | 1 +
tools/perf/perf.h | 2 +-
tools/perf/util/bpf_skel/kwork_top.bpf.c | 4 +-
tools/perf/util/kwork.h | 1 +
tools/perf/util/pmu.c | 30 ++++++---
tools/perf/util/session.c | 1 +
tools/perf/util/svghelper.c | 1 +
16 files changed, 47 insertions(+), 87 deletions(-)
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096
2024-12-06 4:40 [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Ian Rogers
@ 2024-12-06 4:40 ` Ian Rogers
2024-12-06 10:24 ` Leo Yan
2024-12-06 4:40 ` [PATCH v1 2/8] perf cpumap: Reduce transitive dependencies on libperf MAX_NR_CPUS Ian Rogers
` (7 subsequent siblings)
8 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2024-12-06 4:40 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Kyle Meyer,
Ben Gainey, linux-perf-users, linux-kernel
From: Kyle Meyer <kyle.meyer@hpe.com>
Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096.
Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B,
cpus_runtime will increase from 81960B to 163880B, and max_entries will
increase from 8192B to 16384B.
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Reviewed-by: Ian Rogers <irogers@google.com>
---
tools/lib/perf/include/internal/cpumap.h | 2 +-
tools/perf/perf.h | 2 +-
tools/perf/util/bpf_skel/kwork_top.bpf.c | 4 +++-
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/tools/lib/perf/include/internal/cpumap.h b/tools/lib/perf/include/internal/cpumap.h
index 49649eb51ce4..3cf28522004e 100644
--- a/tools/lib/perf/include/internal/cpumap.h
+++ b/tools/lib/perf/include/internal/cpumap.h
@@ -22,7 +22,7 @@ DECLARE_RC_STRUCT(perf_cpu_map) {
};
#ifndef MAX_NR_CPUS
-#define MAX_NR_CPUS 2048
+#define MAX_NR_CPUS 4096
#endif
struct perf_cpu_map *perf_cpu_map__alloc(int nr_cpus);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index c004dd4e65a3..3cb40965549f 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -3,7 +3,7 @@
#define _PERF_PERF_H
#ifndef MAX_NR_CPUS
-#define MAX_NR_CPUS 2048
+#define MAX_NR_CPUS 4096
#endif
enum perf_affinity {
diff --git a/tools/perf/util/bpf_skel/kwork_top.bpf.c b/tools/perf/util/bpf_skel/kwork_top.bpf.c
index 594da91965a2..73e32e063030 100644
--- a/tools/perf/util/bpf_skel/kwork_top.bpf.c
+++ b/tools/perf/util/bpf_skel/kwork_top.bpf.c
@@ -18,7 +18,9 @@ enum kwork_class_type {
};
#define MAX_ENTRIES 102400
-#define MAX_NR_CPUS 2048
+#ifndef MAX_NR_CPUS
+#define MAX_NR_CPUS 4096
+#endif
#define PF_KTHREAD 0x00200000
#define MAX_COMMAND_LEN 16
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 2/8] perf cpumap: Reduce transitive dependencies on libperf MAX_NR_CPUS
2024-12-06 4:40 [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Ian Rogers
2024-12-06 4:40 ` [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096 Ian Rogers
@ 2024-12-06 4:40 ` Ian Rogers
2024-12-06 4:40 ` [PATCH v1 3/8] libperf cpumap: Hide/reduce scope of MAX_NR_CPUS Ian Rogers
` (6 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2024-12-06 4:40 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Kyle Meyer,
Ben Gainey, linux-perf-users, linux-kernel
libperf exposes MAX_NR_CPUS via
tools/lib/perf/include/internal/cpumap.h which is internal. The
preferred dependency should be the definition in
tools/perf/perf.h. Add the includes of perf.h so that MAX_NR_CPUS can
be hidden in libperf.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-annotate.c | 1 +
tools/perf/builtin-diff.c | 1 +
tools/perf/builtin-kwork.c | 1 +
tools/perf/builtin-mem.c | 1 +
tools/perf/builtin-sched.c | 1 +
tools/perf/util/kwork.h | 1 +
tools/perf/util/session.c | 1 +
tools/perf/util/svghelper.c | 1 +
8 files changed, 8 insertions(+)
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index bb87e6e7687d..836ae0122dab 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -7,6 +7,7 @@
* a histogram of results, along various sorting keys.
*/
#include "builtin.h"
+#include "perf.h"
#include "util/color.h"
#include <linux/list.h>
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 82fb7773e03e..196969538e58 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -6,6 +6,7 @@
* DSOs and symbol information, sort them and produce a diff.
*/
#include "builtin.h"
+#include "perf.h"
#include "util/debug.h"
#include "util/event.h"
diff --git a/tools/perf/builtin-kwork.c b/tools/perf/builtin-kwork.c
index 8234410cba4c..233ca3c3895c 100644
--- a/tools/perf/builtin-kwork.c
+++ b/tools/perf/builtin-kwork.c
@@ -6,6 +6,7 @@
*/
#include "builtin.h"
+#include "perf.h"
#include "util/data.h"
#include "util/evlist.h"
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 651188c1d825..99d5e1491a28 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -4,6 +4,7 @@
#include <sys/stat.h>
#include <unistd.h>
#include "builtin.h"
+#include "perf.h"
#include <subcmd/parse-options.h>
#include "util/auxtrace.h"
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7049c60ebf77..26ece6e9bfd1 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include "builtin.h"
+#include "perf.h"
#include "perf-sys.h"
#include "util/cpumap.h"
diff --git a/tools/perf/util/kwork.h b/tools/perf/util/kwork.h
index 76fe2a821bcf..596595946a06 100644
--- a/tools/perf/util/kwork.h
+++ b/tools/perf/util/kwork.h
@@ -1,6 +1,7 @@
#ifndef PERF_UTIL_KWORK_H
#define PERF_UTIL_KWORK_H
+#include "perf.h"
#include "util/tool.h"
#include "util/time-utils.h"
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 507e6cba9545..c06e3020a976 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -37,6 +37,7 @@
#include "arch/common.h"
#include "units.h"
#include "annotate.h"
+#include "perf.h"
#include <internal/lib.h>
static int perf_session__deliver_event(struct perf_session *session,
diff --git a/tools/perf/util/svghelper.c b/tools/perf/util/svghelper.c
index 2b04f47f4db0..b1d259f590e9 100644
--- a/tools/perf/util/svghelper.c
+++ b/tools/perf/util/svghelper.c
@@ -21,6 +21,7 @@
#include <perf/cpumap.h>
#include "env.h"
+#include "perf.h"
#include "svghelper.h"
static u64 first_time, last_time;
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 3/8] libperf cpumap: Hide/reduce scope of MAX_NR_CPUS
2024-12-06 4:40 [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Ian Rogers
2024-12-06 4:40 ` [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096 Ian Rogers
2024-12-06 4:40 ` [PATCH v1 2/8] perf cpumap: Reduce transitive dependencies on libperf MAX_NR_CPUS Ian Rogers
@ 2024-12-06 4:40 ` Ian Rogers
2024-12-06 4:40 ` [PATCH v1 4/8] libperf cpumap: Be tolerant of newline at the end of a cpumask Ian Rogers
` (5 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2024-12-06 4:40 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Kyle Meyer,
Ben Gainey, linux-perf-users, linux-kernel
Avoid redefinition of MAX_NR_CPUS as a global constant, the original
definition is tools/perf/perf.h.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/lib/perf/cpumap.c | 2 ++
tools/lib/perf/include/internal/cpumap.h | 4 ----
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index cae799ad44e1..3ea06865d4b0 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -11,6 +11,8 @@
#include <limits.h>
#include "internal.h"
+#define MAX_NR_CPUS 4096
+
void perf_cpu_map__set_nr(struct perf_cpu_map *map, int nr_cpus)
{
RC_CHK_ACCESS(map)->nr = nr_cpus;
diff --git a/tools/lib/perf/include/internal/cpumap.h b/tools/lib/perf/include/internal/cpumap.h
index 3cf28522004e..e2be2d17c32b 100644
--- a/tools/lib/perf/include/internal/cpumap.h
+++ b/tools/lib/perf/include/internal/cpumap.h
@@ -21,10 +21,6 @@ DECLARE_RC_STRUCT(perf_cpu_map) {
struct perf_cpu map[];
};
-#ifndef MAX_NR_CPUS
-#define MAX_NR_CPUS 4096
-#endif
-
struct perf_cpu_map *perf_cpu_map__alloc(int nr_cpus);
int perf_cpu_map__idx(const struct perf_cpu_map *cpus, struct perf_cpu cpu);
bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b);
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 4/8] libperf cpumap: Be tolerant of newline at the end of a cpumask
2024-12-06 4:40 [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Ian Rogers
` (2 preceding siblings ...)
2024-12-06 4:40 ` [PATCH v1 3/8] libperf cpumap: Hide/reduce scope of MAX_NR_CPUS Ian Rogers
@ 2024-12-06 4:40 ` Ian Rogers
2024-12-06 4:40 ` [PATCH v1 5/8] perf pmu: Remove use of perf_cpu_map__read Ian Rogers
` (4 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2024-12-06 4:40 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Kyle Meyer,
Ben Gainey, linux-perf-users, linux-kernel
File cpumasks often have a newline that shouldn't trigger the invalid
parsing case in perf_cpu_map__new.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/lib/perf/cpumap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index 3ea06865d4b0..20d9ee9308c6 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -240,7 +240,7 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)
p = NULL;
start_cpu = strtoul(cpu_list, &p, 0);
if (start_cpu >= INT_MAX
- || (*p != '\0' && *p != ',' && *p != '-'))
+ || (*p != '\0' && *p != ',' && *p != '-' && *p != '\n'))
goto invalid;
if (*p == '-') {
@@ -248,7 +248,7 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)
p = NULL;
end_cpu = strtoul(cpu_list, &p, 0);
- if (end_cpu >= INT_MAX || (*p != '\0' && *p != ','))
+ if (end_cpu >= INT_MAX || (*p != '\0' && *p != ',' && *p != '\n'))
goto invalid;
if (end_cpu < start_cpu)
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 5/8] perf pmu: Remove use of perf_cpu_map__read
2024-12-06 4:40 [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Ian Rogers
` (3 preceding siblings ...)
2024-12-06 4:40 ` [PATCH v1 4/8] libperf cpumap: Be tolerant of newline at the end of a cpumask Ian Rogers
@ 2024-12-06 4:40 ` Ian Rogers
2024-12-06 4:40 ` [PATCH v1 6/8] libperf cpumap: " Ian Rogers
` (3 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2024-12-06 4:40 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Kyle Meyer,
Ben Gainey, linux-perf-users, linux-kernel
Remove use of a FILE and switch to reading a string that is then
passed to perf_cpu_map__new. Being able to remove perf_cpu_map__read
avoids duplicated parsing logic.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/pmu.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 08a9d0bd9301..891c905d08a1 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -12,6 +12,7 @@
#include <stdbool.h>
#include <dirent.h>
#include <api/fs/fs.h>
+#include <api/io.h>
#include <locale.h>
#include <fnmatch.h>
#include <math.h>
@@ -748,26 +749,35 @@ static int pmu_alias_terms(struct perf_pmu_alias *alias, int err_loc, struct lis
* Uncore PMUs have a "cpumask" file under sysfs. CPU PMUs (e.g. on arm/arm64)
* may have a "cpus" file.
*/
-static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_core)
+static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *pmu_name, bool is_core)
{
- struct perf_cpu_map *cpus;
const char *templates[] = {
"cpumask",
"cpus",
NULL
};
const char **template;
- char pmu_name[PATH_MAX];
- struct perf_pmu pmu = {.name = pmu_name};
- FILE *file;
- strlcpy(pmu_name, name, sizeof(pmu_name));
for (template = templates; *template; template++) {
- file = perf_pmu__open_file_at(&pmu, dirfd, *template);
- if (!file)
+ struct io io;
+ char buf[128];
+ char *cpumask = NULL;
+ size_t cpumask_len;
+ ssize_t ret;
+ struct perf_cpu_map *cpus;
+
+ io.fd = perf_pmu__pathname_fd(dirfd, pmu_name, *template, O_RDONLY);
+ if (io.fd < 0)
continue;
- cpus = perf_cpu_map__read(file);
- fclose(file);
+
+ io__init(&io, io.fd, buf, sizeof(buf));
+ ret = io__getline(&io, &cpumask, &cpumask_len);
+ close(io.fd);
+ if (ret < 0)
+ continue;
+
+ cpus = perf_cpu_map__new(cpumask);
+ free(cpumask);
if (cpus)
return cpus;
}
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 6/8] libperf cpumap: Remove use of perf_cpu_map__read
2024-12-06 4:40 [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Ian Rogers
` (4 preceding siblings ...)
2024-12-06 4:40 ` [PATCH v1 5/8] perf pmu: Remove use of perf_cpu_map__read Ian Rogers
@ 2024-12-06 4:40 ` Ian Rogers
2024-12-06 4:40 ` [PATCH v1 7/8] libperf cpumap: Remove perf_cpu_map__read Ian Rogers
` (2 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2024-12-06 4:40 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Kyle Meyer,
Ben Gainey, linux-perf-users, linux-kernel
Remove use of a FILE and switch to reading a string that is then
passed to perf_cpu_map__new. Being able to remove perf_cpu_map__read
avoids duplicated parsing logic.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/lib/perf/cpumap.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index 20d9ee9308c6..60ef8eea42ee 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -10,6 +10,7 @@
#include <ctype.h>
#include <limits.h>
#include "internal.h"
+#include <api/fs/fs.h>
#define MAX_NR_CPUS 4096
@@ -102,12 +103,12 @@ static struct perf_cpu_map *cpu_map__new_sysconf(void)
static struct perf_cpu_map *cpu_map__new_sysfs_online(void)
{
struct perf_cpu_map *cpus = NULL;
- FILE *onlnf;
+ char *buf = NULL;
+ size_t buf_len;
- onlnf = fopen("/sys/devices/system/cpu/online", "r");
- if (onlnf) {
- cpus = perf_cpu_map__read(onlnf);
- fclose(onlnf);
+ if (sysfs__read_str("devices/system/cpu/online", &buf, &buf_len) >= 0) {
+ cpus = perf_cpu_map__new(buf);
+ free(buf);
}
return cpus;
}
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 7/8] libperf cpumap: Remove perf_cpu_map__read
2024-12-06 4:40 [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Ian Rogers
` (5 preceding siblings ...)
2024-12-06 4:40 ` [PATCH v1 6/8] libperf cpumap: " Ian Rogers
@ 2024-12-06 4:40 ` Ian Rogers
2024-12-06 4:40 ` [PATCH v1 8/8] libperf cpumap: Grow array of read CPUs in smaller increments Ian Rogers
2024-12-06 23:03 ` [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Leo Yan
8 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2024-12-06 4:40 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Kyle Meyer,
Ben Gainey, linux-perf-users, linux-kernel
Function is no longer used and duplicates the parsing logic from
perf_cpu_map__new. Remove to allow simplification.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/lib/perf/Documentation/libperf.txt | 1 -
tools/lib/perf/cpumap.c | 56 ------------------------
tools/lib/perf/include/perf/cpumap.h | 2 -
tools/lib/perf/libperf.map | 1 -
4 files changed, 60 deletions(-)
diff --git a/tools/lib/perf/Documentation/libperf.txt b/tools/lib/perf/Documentation/libperf.txt
index fcfb9499ef9c..59aabdd3cabf 100644
--- a/tools/lib/perf/Documentation/libperf.txt
+++ b/tools/lib/perf/Documentation/libperf.txt
@@ -39,7 +39,6 @@ SYNOPSIS
struct perf_cpu_map *perf_cpu_map__new_any_cpu(void);
struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list);
- struct perf_cpu_map *perf_cpu_map__read(FILE *file);
struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
struct perf_cpu_map *other);
diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index 60ef8eea42ee..17413d3a2221 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -161,62 +161,6 @@ static struct perf_cpu_map *cpu_map__trim_new(int nr_cpus, const struct perf_cpu
return cpus;
}
-struct perf_cpu_map *perf_cpu_map__read(FILE *file)
-{
- struct perf_cpu_map *cpus = NULL;
- int nr_cpus = 0;
- struct perf_cpu *tmp_cpus = NULL, *tmp;
- int max_entries = 0;
- int n, cpu, prev;
- char sep;
-
- sep = 0;
- prev = -1;
- for (;;) {
- n = fscanf(file, "%u%c", &cpu, &sep);
- if (n <= 0)
- break;
- if (prev >= 0) {
- int new_max = nr_cpus + cpu - prev - 1;
-
- WARN_ONCE(new_max >= MAX_NR_CPUS, "Perf can support %d CPUs. "
- "Consider raising MAX_NR_CPUS\n", MAX_NR_CPUS);
-
- if (new_max >= max_entries) {
- max_entries = new_max + MAX_NR_CPUS / 2;
- tmp = realloc(tmp_cpus, max_entries * sizeof(struct perf_cpu));
- if (tmp == NULL)
- goto out_free_tmp;
- tmp_cpus = tmp;
- }
-
- while (++prev < cpu)
- tmp_cpus[nr_cpus++].cpu = prev;
- }
- if (nr_cpus == max_entries) {
- max_entries += MAX_NR_CPUS;
- tmp = realloc(tmp_cpus, max_entries * sizeof(struct perf_cpu));
- if (tmp == NULL)
- goto out_free_tmp;
- tmp_cpus = tmp;
- }
-
- tmp_cpus[nr_cpus++].cpu = cpu;
- if (n == 2 && sep == '-')
- prev = cpu;
- else
- prev = -1;
- if (n == 1 || sep == '\n')
- break;
- }
-
- if (nr_cpus > 0)
- cpus = cpu_map__trim_new(nr_cpus, tmp_cpus);
-out_free_tmp:
- free(tmp_cpus);
- return cpus;
-}
-
struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)
{
struct perf_cpu_map *cpus = NULL;
diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
index 90457d17fb2f..cbb65e55fc67 100644
--- a/tools/lib/perf/include/perf/cpumap.h
+++ b/tools/lib/perf/include/perf/cpumap.h
@@ -3,7 +3,6 @@
#define __LIBPERF_CPUMAP_H
#include <perf/core.h>
-#include <stdio.h>
#include <stdbool.h>
/** A wrapper around a CPU to avoid confusion with the perf_cpu_map's map's indices. */
@@ -37,7 +36,6 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__new_online_cpus(void);
* perf_cpu_map__new_online_cpus is returned.
*/
LIBPERF_API struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list);
-LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
struct perf_cpu_map *other);
diff --git a/tools/lib/perf/libperf.map b/tools/lib/perf/libperf.map
index 2aa79b696032..fdd8304fe9d0 100644
--- a/tools/lib/perf/libperf.map
+++ b/tools/lib/perf/libperf.map
@@ -6,7 +6,6 @@ LIBPERF_0.0.1 {
perf_cpu_map__get;
perf_cpu_map__put;
perf_cpu_map__new;
- perf_cpu_map__read;
perf_cpu_map__nr;
perf_cpu_map__cpu;
perf_cpu_map__has_any_cpu_or_is_empty;
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v1 8/8] libperf cpumap: Grow array of read CPUs in smaller increments
2024-12-06 4:40 [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Ian Rogers
` (6 preceding siblings ...)
2024-12-06 4:40 ` [PATCH v1 7/8] libperf cpumap: Remove perf_cpu_map__read Ian Rogers
@ 2024-12-06 4:40 ` Ian Rogers
2024-12-06 23:03 ` [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Leo Yan
8 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2024-12-06 4:40 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Kyle Meyer,
Ben Gainey, linux-perf-users, linux-kernel
Instead of growing the array by 2048, grow by the larger of the
current range or 16. As ranges are typical for things like the online
CPUs this will mean a single allocation happens. While uncore CPU maps
will grow 16 at a time which is a value that is generous except say on
large servers.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/lib/perf/cpumap.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index 17413d3a2221..2237505f8f5f 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -212,7 +212,7 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)
goto invalid;
if (nr_cpus == max_entries) {
- max_entries += MAX_NR_CPUS;
+ max_entries += max(end_cpu - start_cpu + 1, 16UL);
tmp = realloc(tmp_cpus, max_entries * sizeof(struct perf_cpu));
if (tmp == NULL)
goto invalid;
@@ -226,14 +226,15 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)
cpu_list = p;
}
- if (nr_cpus > 0)
+ if (nr_cpus > 0) {
cpus = cpu_map__trim_new(nr_cpus, tmp_cpus);
- else if (*cpu_list != '\0') {
+ } else if (*cpu_list != '\0') {
pr_warning("Unexpected characters at end of cpu list ('%s'), using online CPUs.",
cpu_list);
cpus = perf_cpu_map__new_online_cpus();
- } else
+ } else {
cpus = perf_cpu_map__new_any_cpu();
+ }
invalid:
free(tmp_cpus);
out:
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096
2024-12-06 4:40 ` [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096 Ian Rogers
@ 2024-12-06 10:24 ` Leo Yan
2024-12-06 16:25 ` Ian Rogers
0 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2024-12-06 10:24 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Kan Liang, James Clark, Kyle Meyer, Ben Gainey,
linux-perf-users, linux-kernel
Hi Ian,
On Thu, Dec 05, 2024 at 08:40:28PM -0800, Ian Rogers wrote:
>
> From: Kyle Meyer <kyle.meyer@hpe.com>
>
> Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096.
>
> Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B,
> cpus_runtime will increase from 81960B to 163880B, and max_entries will
> increase from 8192B to 16384B.
>
> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> Reviewed-by: Ian Rogers <irogers@google.com>
> ---
> tools/lib/perf/include/internal/cpumap.h | 2 +-
> tools/perf/perf.h | 2 +-
> tools/perf/util/bpf_skel/kwork_top.bpf.c | 4 +++-
> 3 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/tools/lib/perf/include/internal/cpumap.h b/tools/lib/perf/include/internal/cpumap.h
> index 49649eb51ce4..3cf28522004e 100644
> --- a/tools/lib/perf/include/internal/cpumap.h
> +++ b/tools/lib/perf/include/internal/cpumap.h
> @@ -22,7 +22,7 @@ DECLARE_RC_STRUCT(perf_cpu_map) {
> };
>
> #ifndef MAX_NR_CPUS
> -#define MAX_NR_CPUS 2048
> +#define MAX_NR_CPUS 4096
> #endif
This series is fine for me. Just wandering if we can use a central
place to maintain the macro, e.g. lib/perf/include/perf/cpumap.h. It
is pointless to define exactly same macros in different headers. As
least, I think we can unify this except the kwork bpf program?
P.s. for dynamically allocating per CPU maps in eBPF program, we can
refer to the code samples/bpf/xdp_sample_user.c, but this is another
topic.
Thanks,
Leo
> struct perf_cpu_map *perf_cpu_map__alloc(int nr_cpus);
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index c004dd4e65a3..3cb40965549f 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -3,7 +3,7 @@
> #define _PERF_PERF_H
>
> #ifndef MAX_NR_CPUS
> -#define MAX_NR_CPUS 2048
> +#define MAX_NR_CPUS 4096
> #endif
>
> enum perf_affinity {
> diff --git a/tools/perf/util/bpf_skel/kwork_top.bpf.c b/tools/perf/util/bpf_skel/kwork_top.bpf.c
> index 594da91965a2..73e32e063030 100644
> --- a/tools/perf/util/bpf_skel/kwork_top.bpf.c
> +++ b/tools/perf/util/bpf_skel/kwork_top.bpf.c
> @@ -18,7 +18,9 @@ enum kwork_class_type {
> };
>
> #define MAX_ENTRIES 102400
> -#define MAX_NR_CPUS 2048
> +#ifndef MAX_NR_CPUS
> +#define MAX_NR_CPUS 4096
> +#endif
> #define PF_KTHREAD 0x00200000
> #define MAX_COMMAND_LEN 16
>
> --
> 2.47.0.338.g60cca15819-goog
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096
2024-12-06 10:24 ` Leo Yan
@ 2024-12-06 16:25 ` Ian Rogers
2024-12-06 23:03 ` Leo Yan
0 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2024-12-06 16:25 UTC (permalink / raw)
To: Leo Yan
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Kan Liang, James Clark, Kyle Meyer, Ben Gainey,
linux-perf-users, linux-kernel
On Fri, Dec 6, 2024 at 2:24 AM Leo Yan <leo.yan@arm.com> wrote:
>
> Hi Ian,
>
> On Thu, Dec 05, 2024 at 08:40:28PM -0800, Ian Rogers wrote:
> >
> > From: Kyle Meyer <kyle.meyer@hpe.com>
> >
> > Systems have surpassed 2048 CPUs. Increase MAX_NR_CPUS to 4096.
> >
> > Bitmaps declared with MAX_NR_CPUS bits will increase from 256B to 512B,
> > cpus_runtime will increase from 81960B to 163880B, and max_entries will
> > increase from 8192B to 16384B.
> >
> > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> > Reviewed-by: Ian Rogers <irogers@google.com>
> > ---
> > tools/lib/perf/include/internal/cpumap.h | 2 +-
> > tools/perf/perf.h | 2 +-
> > tools/perf/util/bpf_skel/kwork_top.bpf.c | 4 +++-
> > 3 files changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/lib/perf/include/internal/cpumap.h b/tools/lib/perf/include/internal/cpumap.h
> > index 49649eb51ce4..3cf28522004e 100644
> > --- a/tools/lib/perf/include/internal/cpumap.h
> > +++ b/tools/lib/perf/include/internal/cpumap.h
> > @@ -22,7 +22,7 @@ DECLARE_RC_STRUCT(perf_cpu_map) {
> > };
> >
> > #ifndef MAX_NR_CPUS
> > -#define MAX_NR_CPUS 2048
> > +#define MAX_NR_CPUS 4096
> > #endif
>
> This series is fine for me. Just wandering if we can use a central
> place to maintain the macro, e.g. lib/perf/include/perf/cpumap.h. It
> is pointless to define exactly same macros in different headers. As
> least, I think we can unify this except the kwork bpf program?
>
> P.s. for dynamically allocating per CPU maps in eBPF program, we can
> refer to the code samples/bpf/xdp_sample_user.c, but this is another
> topic.
Thanks Leo,
can I take this as an acked-by? Wrt a single constant I agree,
following these changes MAX_NR_CPUS is just used for a warning in
libperf's cpumap.c. I think we're agreed that getting rid of the
constant would be best. I also think the cpumap logic is duplicating
something that libc is providing in cpu_set:
https://man7.org/linux/man-pages/man3/CPU_SET.3.html
And we have more than one representation in perf for the sake of the
disk representation:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/event.h?h=perf-tools-next#n227
Just changing the int to be a s16 would lower the memory overhead,
which is why I'd kind of like the abstraction to be minimal.
Thanks,
Ian
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096
2024-12-06 16:25 ` Ian Rogers
@ 2024-12-06 23:03 ` Leo Yan
2024-12-07 5:24 ` Ian Rogers
0 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2024-12-06 23:03 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Kan Liang, James Clark, Kyle Meyer, Ben Gainey,
linux-perf-users, linux-kernel
Hi Ian,
On Fri, Dec 06, 2024 at 08:25:06AM -0800, Ian Rogers wrote:
[...]
> > This series is fine for me. Just wandering if we can use a central
> > place to maintain the macro, e.g. lib/perf/include/perf/cpumap.h. It
> > is pointless to define exactly same macros in different headers. As
> > least, I think we can unify this except the kwork bpf program?
> >
> > P.s. for dynamically allocating per CPU maps in eBPF program, we can
> > refer to the code samples/bpf/xdp_sample_user.c, but this is another
> > topic.
>
> Thanks Leo,
>
> can I take this as an acked-by?
Yeah. I will give my review tags in the cover letter.
> Wrt a single constant I agree,
> following these changes MAX_NR_CPUS is just used for a warning in
> libperf's cpumap.c. I think we're agreed that getting rid of the
> constant would be best. I also think the cpumap logic is duplicating
> something that libc is providing in cpu_set.
>
> And we have more than one representation in perf for the sake of the
> disk representation:
Thanks for sharing the info.
> Just changing the int to be a s16 would lower the memory overhead,
> which is why I'd kind of like the abstraction to be minimal.
Here I am not clear what for "changing the int to be a s16". Could you
elaberate a bit for this?
Lastly, I also found multiple files use "MAX_CPUS" rather than
"MAX_NR_CPUS". Polish them in a new series?
Thanks,
Leo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS
2024-12-06 4:40 [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Ian Rogers
` (7 preceding siblings ...)
2024-12-06 4:40 ` [PATCH v1 8/8] libperf cpumap: Grow array of read CPUs in smaller increments Ian Rogers
@ 2024-12-06 23:03 ` Leo Yan
2024-12-09 16:04 ` Arnaldo Carvalho de Melo
8 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2024-12-06 23:03 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Kan Liang, James Clark, Kyle Meyer, Ben Gainey,
linux-perf-users, linux-kernel
On Thu, Dec 05, 2024 at 08:40:27PM -0800, Ian Rogers wrote:
>
> Prompted by Kyle Meyer's <kyle.meyer@hpe.com> report of the
> MAX_NR_CPUS value being too small, initiate some clean up of its
> use. Kyle's patch is at the head of the series. The additional patches
> hide MAX_NR_CPUS as exposed from cpumap.h, reduce its use by removing
> perf_cpu_map__read, and try to better size the temporary CPU array in
> perf_cpu_map__new.
Reviewed-by: Leo Yan <leo.yan@arm.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096
2024-12-06 23:03 ` Leo Yan
@ 2024-12-07 5:24 ` Ian Rogers
2024-12-09 21:35 ` David Laight
0 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2024-12-07 5:24 UTC (permalink / raw)
To: Leo Yan
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Kan Liang, James Clark, Kyle Meyer, Ben Gainey,
linux-perf-users, linux-kernel
On Fri, Dec 6, 2024 at 3:03 PM Leo Yan <leo.yan@arm.com> wrote:
>
> Hi Ian,
>
> On Fri, Dec 06, 2024 at 08:25:06AM -0800, Ian Rogers wrote:
>
> [...]
>
> > > This series is fine for me. Just wandering if we can use a central
> > > place to maintain the macro, e.g. lib/perf/include/perf/cpumap.h. It
> > > is pointless to define exactly same macros in different headers. As
> > > least, I think we can unify this except the kwork bpf program?
> > >
> > > P.s. for dynamically allocating per CPU maps in eBPF program, we can
> > > refer to the code samples/bpf/xdp_sample_user.c, but this is another
> > > topic.
> >
> > Thanks Leo,
> >
> > can I take this as an acked-by?
>
> Yeah. I will give my review tags in the cover letter.
>
> > Wrt a single constant I agree,
> > following these changes MAX_NR_CPUS is just used for a warning in
> > libperf's cpumap.c. I think we're agreed that getting rid of the
> > constant would be best. I also think the cpumap logic is duplicating
> > something that libc is providing in cpu_set.
> >
> > And we have more than one representation in perf for the sake of the
> > disk representation:
>
> Thanks for sharing the info.
>
> > Just changing the int to be a s16 would lower the memory overhead,
> > which is why I'd kind of like the abstraction to be minimal.
>
> Here I am not clear what for "changing the int to be a s16". Could you
> elaberate a bit for this?
I meant this :-)
https://lore.kernel.org/lkml/20241207052133.102829-1-irogers@google.com/
> Lastly, I also found multiple files use "MAX_CPUS" rather than
> "MAX_NR_CPUS". Polish them in a new series?
Makes sense.
Thanks,
Ian
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS
2024-12-06 23:03 ` [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Leo Yan
@ 2024-12-09 16:04 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-12-09 16:04 UTC (permalink / raw)
To: Leo Yan
Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Namhyung Kim,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, James Clark, Kyle Meyer, Ben Gainey, linux-perf-users,
linux-kernel
On Fri, Dec 06, 2024 at 11:03:58PM +0000, Leo Yan wrote:
> On Thu, Dec 05, 2024 at 08:40:27PM -0800, Ian Rogers wrote:
> >
> > Prompted by Kyle Meyer's <kyle.meyer@hpe.com> report of the
> > MAX_NR_CPUS value being too small, initiate some clean up of its
> > use. Kyle's patch is at the head of the series. The additional patches
> > hide MAX_NR_CPUS as exposed from cpumap.h, reduce its use by removing
> > perf_cpu_map__read, and try to better size the temporary CPU array in
> > perf_cpu_map__new.
>
> Reviewed-by: Leo Yan <leo.yan@arm.com>
Thanks, applied to perf-tools-next,
- Arnaldo
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096
2024-12-07 5:24 ` Ian Rogers
@ 2024-12-09 21:35 ` David Laight
2024-12-09 21:47 ` Ian Rogers
0 siblings, 1 reply; 17+ messages in thread
From: David Laight @ 2024-12-09 21:35 UTC (permalink / raw)
To: 'Ian Rogers', Leo Yan
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Kan Liang, James Clark, Kyle Meyer, Ben Gainey,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
..
> > > Just changing the int to be a s16 would lower the memory overhead,
> > > which is why I'd kind of like the abstraction to be minimal.
> >
> > Here I am not clear what for "changing the int to be a s16". Could you
> > elaberate a bit for this?
>
> I meant this :-)
> https://lore.kernel.org/lkml/20241207052133.102829-1-irogers@google.com/
How many time is this allocated?
If it is 2 bytes in a larger structure it is likely to be noise.
For a local the code is likely to be worse.
Any maths and you start forcing the compiler to mask the value
(on pretty much anything except x86).
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096
2024-12-09 21:35 ` David Laight
@ 2024-12-09 21:47 ` Ian Rogers
0 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2024-12-09 21:47 UTC (permalink / raw)
To: David Laight
Cc: Leo Yan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Kan Liang, James Clark, Kyle Meyer, Ben Gainey,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
On Mon, Dec 9, 2024 at 1:36 PM David Laight <David.Laight@aculab.com> wrote:
>
> ..
> > > > Just changing the int to be a s16 would lower the memory overhead,
> > > > which is why I'd kind of like the abstraction to be minimal.
> > >
> > > Here I am not clear what for "changing the int to be a s16". Could you
> > > elaberate a bit for this?
> >
> > I meant this :-)
> > https://lore.kernel.org/lkml/20241207052133.102829-1-irogers@google.com/
>
> How many time is this allocated?
> If it is 2 bytes in a larger structure it is likely to be noise.
> For a local the code is likely to be worse.
> Any maths and you start forcing the compiler to mask the value
> (on pretty much anything except x86).
So the data structure is a sorted array of ints, this changes it to
int16s. On the 32 socket GNR with > 2048 logical CPUs, the array would
be over 8kb before and 4kb after for all online CPUs. On my more
modest desktop with 72 logical cores the size goes from 288 bytes down
to 144, a reduction of 2 cache lines. I'm not super excited about the
memory savings, but the patch is only 8 lines in difference.
Thanks,
Ian
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-12-09 21:48 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-06 4:40 [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Ian Rogers
2024-12-06 4:40 ` [PATCH v1 1/8] perf: Increase MAX_NR_CPUS to 4096 Ian Rogers
2024-12-06 10:24 ` Leo Yan
2024-12-06 16:25 ` Ian Rogers
2024-12-06 23:03 ` Leo Yan
2024-12-07 5:24 ` Ian Rogers
2024-12-09 21:35 ` David Laight
2024-12-09 21:47 ` Ian Rogers
2024-12-06 4:40 ` [PATCH v1 2/8] perf cpumap: Reduce transitive dependencies on libperf MAX_NR_CPUS Ian Rogers
2024-12-06 4:40 ` [PATCH v1 3/8] libperf cpumap: Hide/reduce scope of MAX_NR_CPUS Ian Rogers
2024-12-06 4:40 ` [PATCH v1 4/8] libperf cpumap: Be tolerant of newline at the end of a cpumask Ian Rogers
2024-12-06 4:40 ` [PATCH v1 5/8] perf pmu: Remove use of perf_cpu_map__read Ian Rogers
2024-12-06 4:40 ` [PATCH v1 6/8] libperf cpumap: " Ian Rogers
2024-12-06 4:40 ` [PATCH v1 7/8] libperf cpumap: Remove perf_cpu_map__read Ian Rogers
2024-12-06 4:40 ` [PATCH v1 8/8] libperf cpumap: Grow array of read CPUs in smaller increments Ian Rogers
2024-12-06 23:03 ` [PATCH v1 0/8] Cpumap improvements for large MAX_NR_CPUS Leo Yan
2024-12-09 16:04 ` Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox