* [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir
@ 2025-02-07 23:24 Ian Rogers
2025-02-07 23:24 ` [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative Ian Rogers
` (8 more replies)
0 siblings, 9 replies; 15+ messages in thread
From: Ian Rogers @ 2025-02-07 23:24 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Sam James, Jesper Juhl,
James Clark, Zhongqiu Han, Yicong Yang, Thomas Richter,
Michael Petlan, Veronika Molnarova, Anne Macedo,
Dominique Martinet, Jean-Philippe Romain, Junhao He, linux-kernel,
linux-perf-users, Krzysztof Łopatowski
glibc's opendir allocates a minimum of 32kb, when called recursively
for a directory tree the memory consumption can add up - nearly 300kb
during perf start-up when processing modules. Add a stack allocated
variant of readdir sized a little more than 1kb
v2: Remove the feature test and always use a perf supplied getdents64
to workaround an Alpine Linux issue in v1:
https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
As suggested by Krzysztof Łopatowski
<krzysztof.m.lopatowski@gmail.com> who also pointed to the perf
trace performance improvements in start-up time eliminating stat
calls can achieve:
https://lore.kernel.org/lkml/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com/
Convert parse-events and hwmon_pmu to use io_dir.
v1: This was previously part of the memory saving change set:
https://lore.kernel.org/lkml/20231127220902.1315692-1-irogers@google.com/
It is separated here and a feature check and syscall workaround
for missing getdents64 added.
Ian Rogers (7):
tools lib api: Add io_dir an allocation free readdir alternative
perf maps: Switch modules tree walk to io_dir__readdir
perf pmu: Switch to io_dir__readdir
perf header: Switch mem topology to io_dir__readdir
perf events: Remove scandir in thread synthesis
perf parse-events: Switch tracepoints to io_dir__readdir
perf hwmon_pmu: Switch event discovery to io_dir__readdir
tools/lib/api/Makefile | 2 +-
tools/lib/api/io_dir.h | 91 ++++++++++++++++++++++++++++++
tools/perf/util/header.c | 31 +++++-----
tools/perf/util/hwmon_pmu.c | 42 ++++++--------
tools/perf/util/machine.c | 19 +++----
tools/perf/util/parse-events.c | 32 ++++++-----
tools/perf/util/pmu.c | 46 +++++++--------
tools/perf/util/pmus.c | 30 ++++------
tools/perf/util/synthetic-events.c | 22 ++++----
9 files changed, 194 insertions(+), 121 deletions(-)
create mode 100644 tools/lib/api/io_dir.h
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative
2025-02-07 23:24 [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir Ian Rogers
@ 2025-02-07 23:24 ` Ian Rogers
2025-02-08 17:02 ` [PATCH v2 1/7] tools lib api: Add io_dir() as an allocation free readdir() alternative Markus Elfring
2025-02-19 21:51 ` [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative Namhyung Kim
2025-02-07 23:24 ` [PATCH v2 2/7] perf maps: Switch modules tree walk to io_dir__readdir Ian Rogers
` (7 subsequent siblings)
8 siblings, 2 replies; 15+ messages in thread
From: Ian Rogers @ 2025-02-07 23:24 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Sam James, Jesper Juhl,
James Clark, Zhongqiu Han, Yicong Yang, Thomas Richter,
Michael Petlan, Veronika Molnarova, Anne Macedo,
Dominique Martinet, Jean-Philippe Romain, Junhao He, linux-kernel,
linux-perf-users, Krzysztof Łopatowski
glibc's opendir allocates a minimum of 32kb, when called recursively
for a directory tree the memory consumption can add up - nearly 300kb
during perf start-up when processing modules. Add a stack allocated
variant of readdir sized a little more than 1kb.
As getdents64 may be missing from libc, add support using syscall.
Note, an earlier version of this patch had a feature test for
getdents64 but there were problems on certains distros where
getdents64 would be #define renamed to getdents breaking the code. The
syscall use was made uncondtional to work around this. There is
context in:
https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/lib/api/Makefile | 2 +-
tools/lib/api/io_dir.h | 93 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 94 insertions(+), 1 deletion(-)
create mode 100644 tools/lib/api/io_dir.h
diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile
index 7f6396087b46..8665c799e0fa 100644
--- a/tools/lib/api/Makefile
+++ b/tools/lib/api/Makefile
@@ -95,7 +95,7 @@ install_lib: $(LIBFILE)
$(call do_install_mkdir,$(libdir_SQ)); \
cp -fpR $(LIBFILE) $(DESTDIR)$(libdir_SQ)
-HDRS := cpu.h debug.h io.h
+HDRS := cpu.h debug.h io.h io_dir.h
FD_HDRS := fd/array.h
FS_HDRS := fs/fs.h fs/tracing_path.h
INSTALL_HDRS_PFX := $(DESTDIR)$(prefix)/include/api
diff --git a/tools/lib/api/io_dir.h b/tools/lib/api/io_dir.h
new file mode 100644
index 000000000000..c84738923c96
--- /dev/null
+++ b/tools/lib/api/io_dir.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+/*
+ * Lightweight directory reading library.
+ */
+#ifndef __API_IO_DIR__
+#define __API_IO_DIR__
+
+#include <dirent.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+
+#if !defined(SYS_getdents64)
+#if defined(__x86_64__)
+#define SYS_getdents64 217
+#elif defined(__aarch64__)
+#define SYS_getdents64 61
+#endif
+#endif
+
+static inline ssize_t perf_getdents64(int fd, void *dirp, size_t count)
+{
+#ifdef MEMORY_SANITIZER
+ memset(dirp, 0, count);
+#endif
+ return syscall(SYS_getdents64, fd, dirp, count);
+}
+#endif
+
+struct io_dirent64 {
+ ino64_t d_ino; /* 64-bit inode number */
+ off64_t d_off; /* 64-bit offset to next structure */
+ unsigned short d_reclen; /* Size of this dirent */
+ unsigned char d_type; /* File type */
+ char d_name[NAME_MAX + 1]; /* Filename (null-terminated) */
+};
+
+struct io_dir {
+ int dirfd;
+ ssize_t available_bytes;
+ struct io_dirent64 *next;
+ struct io_dirent64 buff[4];
+};
+
+static inline void io_dir__init(struct io_dir *iod, int dirfd)
+{
+ iod->dirfd = dirfd;
+ iod->available_bytes = 0;
+}
+
+static inline void io_dir__rewinddir(struct io_dir *iod)
+{
+ lseek(iod->dirfd, 0, SEEK_SET);
+ iod->available_bytes = 0;
+}
+
+static inline struct io_dirent64 *io_dir__readdir(struct io_dir *iod)
+{
+ struct io_dirent64 *entry;
+
+ if (iod->available_bytes <= 0) {
+ ssize_t rc = perf_getdents64(iod->dirfd, iod->buff, sizeof(iod->buff));
+
+ if (rc <= 0)
+ return NULL;
+ iod->available_bytes = rc;
+ iod->next = iod->buff;
+ }
+ entry = iod->next;
+ iod->next = (struct io_dirent64 *)((char *)entry + entry->d_reclen);
+ iod->available_bytes -= entry->d_reclen;
+ return entry;
+}
+
+static inline bool io_dir__is_dir(const struct io_dir *iod, struct io_dirent64 *dent)
+{
+ if (dent->d_type == DT_UNKNOWN) {
+ struct stat st;
+
+ if (fstatat(iod->dirfd, dent->d_name, &st, /*flags=*/0))
+ return false;
+
+ if (S_ISDIR(st.st_mode)) {
+ dent->d_type = DT_DIR;
+ return true;
+ }
+ }
+ return dent->d_type == DT_DIR;
+}
+
+#endif
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/7] perf maps: Switch modules tree walk to io_dir__readdir
2025-02-07 23:24 [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir Ian Rogers
2025-02-07 23:24 ` [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative Ian Rogers
@ 2025-02-07 23:24 ` Ian Rogers
2025-02-07 23:24 ` [PATCH v2 3/7] perf pmu: Switch " Ian Rogers
` (6 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Ian Rogers @ 2025-02-07 23:24 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Sam James, Jesper Juhl,
James Clark, Zhongqiu Han, Yicong Yang, Thomas Richter,
Michael Petlan, Veronika Molnarova, Anne Macedo,
Dominique Martinet, Jean-Philippe Romain, Junhao He, linux-kernel,
linux-perf-users, Krzysztof Łopatowski
Compared to glibc's opendir/readdir this lowers the max RSS of perf
record by 1.8MB on a Debian machine.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/machine.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2d51badfbf2e..bdd7969533c6 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -37,6 +37,7 @@
#include <internal/lib.h> // page_size
#include "cgroup.h"
#include "arm64-frame-pointer-unwind-support.h"
+#include <api/io_dir.h>
#include <linux/ctype.h>
#include <symbol/kallsyms.h>
@@ -1354,25 +1355,21 @@ static int maps__set_module_path(struct maps *maps, const char *path, struct kmo
static int maps__set_modules_path_dir(struct maps *maps, const char *dir_name, int depth)
{
- struct dirent *dent;
- DIR *dir = opendir(dir_name);
+ struct io_dirent64 *dent;
+ struct io_dir iod;
int ret = 0;
- if (!dir) {
+ io_dir__init(&iod, open(dir_name, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+ if (iod.dirfd < 0) {
pr_debug("%s: cannot open %s dir\n", __func__, dir_name);
return -1;
}
- while ((dent = readdir(dir)) != NULL) {
+ while ((dent = io_dir__readdir(&iod)) != NULL) {
char path[PATH_MAX];
- struct stat st;
- /*sshfs might return bad dent->d_type, so we have to stat*/
path__join(path, sizeof(path), dir_name, dent->d_name);
- if (stat(path, &st))
- continue;
-
- if (S_ISDIR(st.st_mode)) {
+ if (io_dir__is_dir(&iod, dent)) {
if (!strcmp(dent->d_name, ".") ||
!strcmp(dent->d_name, ".."))
continue;
@@ -1405,7 +1402,7 @@ static int maps__set_modules_path_dir(struct maps *maps, const char *dir_name, i
}
out:
- closedir(dir);
+ close(iod.dirfd);
return ret;
}
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 3/7] perf pmu: Switch to io_dir__readdir
2025-02-07 23:24 [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir Ian Rogers
2025-02-07 23:24 ` [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative Ian Rogers
2025-02-07 23:24 ` [PATCH v2 2/7] perf maps: Switch modules tree walk to io_dir__readdir Ian Rogers
@ 2025-02-07 23:24 ` Ian Rogers
2025-02-07 23:24 ` [PATCH v2 4/7] perf header: Switch mem topology " Ian Rogers
` (5 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Ian Rogers @ 2025-02-07 23:24 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Sam James, Jesper Juhl,
James Clark, Zhongqiu Han, Yicong Yang, Thomas Richter,
Michael Petlan, Veronika Molnarova, Anne Macedo,
Dominique Martinet, Jean-Philippe Romain, Junhao He, linux-kernel,
linux-perf-users, Krzysztof Łopatowski
Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/pmu.c | 46 +++++++++++++++++-------------------------
tools/perf/util/pmus.c | 30 ++++++++++-----------------
2 files changed, 30 insertions(+), 46 deletions(-)
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index c2a15b0259cf..91ed909433cc 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -13,6 +13,7 @@
#include <dirent.h>
#include <api/fs/fs.h>
#include <api/io.h>
+#include <api/io_dir.h>
#include <locale.h>
#include <fnmatch.h>
#include <math.h>
@@ -195,19 +196,17 @@ static void perf_pmu_format__load(const struct perf_pmu *pmu, struct perf_pmu_fo
*/
static int perf_pmu__format_parse(struct perf_pmu *pmu, int dirfd, bool eager_load)
{
- struct dirent *evt_ent;
- DIR *format_dir;
+ struct io_dirent64 *evt_ent;
+ struct io_dir format_dir;
int ret = 0;
- format_dir = fdopendir(dirfd);
- if (!format_dir)
- return -EINVAL;
+ io_dir__init(&format_dir, dirfd);
- while ((evt_ent = readdir(format_dir)) != NULL) {
+ while ((evt_ent = io_dir__readdir(&format_dir)) != NULL) {
struct perf_pmu_format *format;
char *name = evt_ent->d_name;
- if (!strcmp(name, ".") || !strcmp(name, ".."))
+ if (io_dir__is_dir(&format_dir, evt_ent))
continue;
format = perf_pmu__new_format(&pmu->format, name);
@@ -234,7 +233,7 @@ static int perf_pmu__format_parse(struct perf_pmu *pmu, int dirfd, bool eager_lo
}
}
- closedir(format_dir);
+ close(format_dir.dirfd);
return ret;
}
@@ -635,14 +634,12 @@ static inline bool pmu_alias_info_file(const char *name)
*/
static int __pmu_aliases_parse(struct perf_pmu *pmu, int events_dir_fd)
{
- struct dirent *evt_ent;
- DIR *event_dir;
+ struct io_dirent64 *evt_ent;
+ struct io_dir event_dir;
- event_dir = fdopendir(events_dir_fd);
- if (!event_dir)
- return -EINVAL;
+ io_dir__init(&event_dir, events_dir_fd);
- while ((evt_ent = readdir(event_dir))) {
+ while ((evt_ent = io_dir__readdir(&event_dir))) {
char *name = evt_ent->d_name;
int fd;
FILE *file;
@@ -674,7 +671,6 @@ static int __pmu_aliases_parse(struct perf_pmu *pmu, int events_dir_fd)
fclose(file);
}
- closedir(event_dir);
pmu->sysfs_aliases_loaded = true;
return 0;
}
@@ -2221,10 +2217,9 @@ static void perf_pmu__del_caps(struct perf_pmu *pmu)
*/
int perf_pmu__caps_parse(struct perf_pmu *pmu)
{
- struct stat st;
char caps_path[PATH_MAX];
- DIR *caps_dir;
- struct dirent *evt_ent;
+ struct io_dir caps_dir;
+ struct io_dirent64 *evt_ent;
int caps_fd;
if (pmu->caps_initialized)
@@ -2235,24 +2230,21 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
if (!perf_pmu__pathname_scnprintf(caps_path, sizeof(caps_path), pmu->name, "caps"))
return -1;
- if (stat(caps_path, &st) < 0) {
+ caps_fd = open(caps_path, O_CLOEXEC | O_DIRECTORY | O_RDONLY);
+ if (caps_fd == -1) {
pmu->caps_initialized = true;
return 0; /* no error if caps does not exist */
}
- caps_dir = opendir(caps_path);
- if (!caps_dir)
- return -EINVAL;
-
- caps_fd = dirfd(caps_dir);
+ io_dir__init(&caps_dir, caps_fd);
- while ((evt_ent = readdir(caps_dir)) != NULL) {
+ while ((evt_ent = io_dir__readdir(&caps_dir)) != NULL) {
char *name = evt_ent->d_name;
char value[128];
FILE *file;
int fd;
- if (!strcmp(name, ".") || !strcmp(name, ".."))
+ if (io_dir__is_dir(&caps_dir, evt_ent))
continue;
fd = openat(caps_fd, name, O_RDONLY);
@@ -2274,7 +2266,7 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
fclose(file);
}
- closedir(caps_dir);
+ close(caps_fd);
pmu->caps_initialized = true;
return pmu->nr_caps;
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 8a0a919415d4..afd59d678fd0 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -3,10 +3,10 @@
#include <linux/list_sort.h>
#include <linux/string.h>
#include <linux/zalloc.h>
+#include <api/io_dir.h>
#include <subcmd/pager.h>
#include <sys/types.h>
#include <ctype.h>
-#include <dirent.h>
#include <pthread.h>
#include <string.h>
#include <unistd.h>
@@ -235,20 +235,16 @@ static void pmu_read_sysfs(unsigned int to_read_types)
if (to_read_types & (PERF_TOOL_PMU_TYPE_PE_CORE_MASK | PERF_TOOL_PMU_TYPE_PE_OTHER_MASK)) {
int fd = perf_pmu__event_source_devices_fd();
- DIR *dir;
- struct dirent *dent;
+ struct io_dir dir;
+ struct io_dirent64 *dent;
bool core_only = (to_read_types & PERF_TOOL_PMU_TYPE_PE_OTHER_MASK) == 0;
if (fd < 0)
goto skip_pe_pmus;
- dir = fdopendir(fd);
- if (!dir) {
- close(fd);
- goto skip_pe_pmus;
- }
+ io_dir__init(&dir, fd);
- while ((dent = readdir(dir))) {
+ while ((dent = io_dir__readdir(&dir)) != NULL) {
if (!strcmp(dent->d_name, ".") || !strcmp(dent->d_name, ".."))
continue;
if (core_only && !is_pmu_core(dent->d_name))
@@ -257,7 +253,7 @@ static void pmu_read_sysfs(unsigned int to_read_types)
perf_pmu__find2(fd, dent->d_name);
}
- closedir(dir);
+ close(fd);
}
skip_pe_pmus:
if ((to_read_types & PERF_TOOL_PMU_TYPE_PE_CORE_MASK) && list_empty(&core_pmus)) {
@@ -721,8 +717,8 @@ bool perf_pmus__supports_extended_type(void)
char *perf_pmus__default_pmu_name(void)
{
int fd;
- DIR *dir;
- struct dirent *dent;
+ struct io_dir dir;
+ struct io_dirent64 *dent;
char *result = NULL;
if (!list_empty(&core_pmus))
@@ -732,13 +728,9 @@ char *perf_pmus__default_pmu_name(void)
if (fd < 0)
return strdup("cpu");
- dir = fdopendir(fd);
- if (!dir) {
- close(fd);
- return strdup("cpu");
- }
+ io_dir__init(&dir, fd);
- while ((dent = readdir(dir))) {
+ while ((dent = io_dir__readdir(&dir)) != NULL) {
if (!strcmp(dent->d_name, ".") || !strcmp(dent->d_name, ".."))
continue;
if (is_pmu_core(dent->d_name)) {
@@ -747,7 +739,7 @@ char *perf_pmus__default_pmu_name(void)
}
}
- closedir(dir);
+ close(fd);
return result ?: strdup("cpu");
}
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 4/7] perf header: Switch mem topology to io_dir__readdir
2025-02-07 23:24 [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir Ian Rogers
` (2 preceding siblings ...)
2025-02-07 23:24 ` [PATCH v2 3/7] perf pmu: Switch " Ian Rogers
@ 2025-02-07 23:24 ` Ian Rogers
2025-02-07 23:24 ` [PATCH v2 5/7] perf events: Remove scandir in thread synthesis Ian Rogers
` (4 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Ian Rogers @ 2025-02-07 23:24 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Sam James, Jesper Juhl,
James Clark, Zhongqiu Han, Yicong Yang, Thomas Richter,
Michael Petlan, Veronika Molnarova, Anne Macedo,
Dominique Martinet, Jean-Philippe Romain, Junhao He, linux-kernel,
linux-perf-users, Krzysztof Łopatowski
Switch memory_node__read and build_mem_topology from opendir/readdir
to io_dir__readdir, with smaller stack allocations. Reduces peak
memory consumption of perf record by 10kb.
Signed-off-by: Ian Rogers <irogers@google.com>
---
| 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
--git a/tools/perf/util/header.c b/tools/perf/util/header.c
index d06aa86352d3..1900965f8752 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -44,6 +44,7 @@
#include "build-id.h"
#include "data.h"
#include <api/fs/fs.h>
+#include <api/io_dir.h>
#include "asm/bug.h"
#include "tool.h"
#include "time-utils.h"
@@ -1311,11 +1312,11 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)
{
unsigned int phys, size = 0;
char path[PATH_MAX];
- struct dirent *ent;
- DIR *dir;
+ struct io_dirent64 *ent;
+ struct io_dir dir;
#define for_each_memory(mem, dir) \
- while ((ent = readdir(dir))) \
+ while ((ent = io_dir__readdir(&dir)) != NULL) \
if (strcmp(ent->d_name, ".") && \
strcmp(ent->d_name, "..") && \
sscanf(ent->d_name, "memory%u", &mem) == 1)
@@ -1324,9 +1325,9 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)
"%s/devices/system/node/node%lu",
sysfs__mountpoint(), idx);
- dir = opendir(path);
- if (!dir) {
- pr_warning("failed: can't open memory sysfs data\n");
+ io_dir__init(&dir, open(path, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+ if (dir.dirfd < 0) {
+ pr_warning("failed: can't open memory sysfs data '%s'\n", path);
return -1;
}
@@ -1338,20 +1339,20 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)
n->set = bitmap_zalloc(size);
if (!n->set) {
- closedir(dir);
+ close(dir.dirfd);
return -ENOMEM;
}
n->node = idx;
n->size = size;
- rewinddir(dir);
+ io_dir__rewinddir(&dir);
for_each_memory(phys, dir) {
__set_bit(phys, n->set);
}
- closedir(dir);
+ close(dir.dirfd);
return 0;
}
@@ -1374,8 +1375,8 @@ static int memory_node__sort(const void *a, const void *b)
static int build_mem_topology(struct memory_node **nodesp, u64 *cntp)
{
char path[PATH_MAX];
- struct dirent *ent;
- DIR *dir;
+ struct io_dirent64 *ent;
+ struct io_dir dir;
int ret = 0;
size_t cnt = 0, size = 0;
struct memory_node *nodes = NULL;
@@ -1383,14 +1384,14 @@ static int build_mem_topology(struct memory_node **nodesp, u64 *cntp)
scnprintf(path, PATH_MAX, "%s/devices/system/node/",
sysfs__mountpoint());
- dir = opendir(path);
- if (!dir) {
+ io_dir__init(&dir, open(path, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+ if (dir.dirfd < 0) {
pr_debug2("%s: couldn't read %s, does this arch have topology information?\n",
__func__, path);
return -1;
}
- while (!ret && (ent = readdir(dir))) {
+ while (!ret && (ent = io_dir__readdir(&dir))) {
unsigned int idx;
int r;
@@ -1419,7 +1420,7 @@ static int build_mem_topology(struct memory_node **nodesp, u64 *cntp)
cnt += 1;
}
out:
- closedir(dir);
+ close(dir.dirfd);
if (!ret) {
*cntp = cnt;
*nodesp = nodes;
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 5/7] perf events: Remove scandir in thread synthesis
2025-02-07 23:24 [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir Ian Rogers
` (3 preceding siblings ...)
2025-02-07 23:24 ` [PATCH v2 4/7] perf header: Switch mem topology " Ian Rogers
@ 2025-02-07 23:24 ` Ian Rogers
2025-02-07 23:24 ` [PATCH v2 6/7] perf parse-events: Switch tracepoints to io_dir__readdir Ian Rogers
` (3 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Ian Rogers @ 2025-02-07 23:24 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Sam James, Jesper Juhl,
James Clark, Zhongqiu Han, Yicong Yang, Thomas Richter,
Michael Petlan, Veronika Molnarova, Anne Macedo,
Dominique Martinet, Jean-Philippe Romain, Junhao He, linux-kernel,
linux-perf-users, Krzysztof Łopatowski
This avoids scanddir reading the directory into memory that's
allocated and instead allocates on the stack.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/synthetic-events.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 6923b0d5efed..2c1ba39dfdde 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -38,6 +38,7 @@
#include <uapi/linux/mman.h> /* To get things like MAP_HUGETLB even on older libc headers */
#include <api/fs/fs.h>
#include <api/io.h>
+#include <api/io_dir.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
@@ -767,10 +768,10 @@ static int __event__synthesize_thread(union perf_event *comm_event,
bool needs_mmap, bool mmap_data)
{
char filename[PATH_MAX];
- struct dirent **dirent;
+ struct io_dir iod;
+ struct io_dirent64 *dent;
pid_t tgid, ppid;
int rc = 0;
- int i, n;
/* special case: only send one comm event using passed in pid */
if (!full) {
@@ -802,16 +803,19 @@ static int __event__synthesize_thread(union perf_event *comm_event,
snprintf(filename, sizeof(filename), "%s/proc/%d/task",
machine->root_dir, pid);
- n = scandir(filename, &dirent, filter_task, NULL);
- if (n < 0)
- return n;
+ io_dir__init(&iod, open(filename, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+ if (iod.dirfd < 0)
+ return -1;
- for (i = 0; i < n; i++) {
+ while ((dent = io_dir__readdir(&iod)) != NULL) {
char *end;
pid_t _pid;
bool kernel_thread = false;
- _pid = strtol(dirent[i]->d_name, &end, 10);
+ if (!isdigit(dent->d_name[0]))
+ continue;
+
+ _pid = strtol(dent->d_name, &end, 10);
if (*end)
continue;
@@ -845,9 +849,7 @@ static int __event__synthesize_thread(union perf_event *comm_event,
}
}
- for (i = 0; i < n; i++)
- zfree(&dirent[i]);
- free(dirent);
+ close(iod.dirfd);
return rc;
}
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 6/7] perf parse-events: Switch tracepoints to io_dir__readdir
2025-02-07 23:24 [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir Ian Rogers
` (4 preceding siblings ...)
2025-02-07 23:24 ` [PATCH v2 5/7] perf events: Remove scandir in thread synthesis Ian Rogers
@ 2025-02-07 23:24 ` Ian Rogers
2025-02-07 23:24 ` [PATCH v2 7/7] perf hwmon_pmu: Switch event discovery " Ian Rogers
` (2 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Ian Rogers @ 2025-02-07 23:24 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Sam James, Jesper Juhl,
James Clark, Zhongqiu Han, Yicong Yang, Thomas Richter,
Michael Petlan, Veronika Molnarova, Anne Macedo,
Dominique Martinet, Jean-Philippe Romain, Junhao He, linux-kernel,
linux-perf-users, Krzysztof Łopatowski
Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/lib/api/io_dir.h | 2 --
tools/perf/util/parse-events.c | 32 +++++++++++++++++++-------------
2 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/tools/lib/api/io_dir.h b/tools/lib/api/io_dir.h
index c84738923c96..9f07460afacc 100644
--- a/tools/lib/api/io_dir.h
+++ b/tools/lib/api/io_dir.h
@@ -89,5 +89,3 @@ static inline bool io_dir__is_dir(const struct io_dir *iod, struct io_dirent64 *
}
return dent->d_type == DT_DIR;
}
-
-#endif
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 6c36b98875bc..35e48fe56dfa 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -17,6 +17,7 @@
#include "strbuf.h"
#include "debug.h"
#include <api/fs/tracing_path.h>
+#include <api/io_dir.h>
#include <perf/cpumap.h>
#include <util/parse-events-bison.h>
#include <util/parse-events-flex.h>
@@ -554,8 +555,8 @@ static int add_tracepoint_multi_event(struct parse_events_state *parse_state,
struct parse_events_terms *head_config, YYLTYPE *loc)
{
char *evt_path;
- struct dirent *evt_ent;
- DIR *evt_dir;
+ struct io_dirent64 *evt_ent;
+ struct io_dir evt_dir;
int ret = 0, found = 0;
evt_path = get_events_file(sys_name);
@@ -563,14 +564,14 @@ static int add_tracepoint_multi_event(struct parse_events_state *parse_state,
tracepoint_error(err, errno, sys_name, evt_name, loc->first_column);
return -1;
}
- evt_dir = opendir(evt_path);
- if (!evt_dir) {
+ io_dir__init(&evt_dir, open(evt_path, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+ if (evt_dir.dirfd < 0) {
put_events_file(evt_path);
tracepoint_error(err, errno, sys_name, evt_name, loc->first_column);
return -1;
}
- while (!ret && (evt_ent = readdir(evt_dir))) {
+ while (!ret && (evt_ent = io_dir__readdir(&evt_dir))) {
if (!strcmp(evt_ent->d_name, ".")
|| !strcmp(evt_ent->d_name, "..")
|| !strcmp(evt_ent->d_name, "enable")
@@ -592,7 +593,7 @@ static int add_tracepoint_multi_event(struct parse_events_state *parse_state,
}
put_events_file(evt_path);
- closedir(evt_dir);
+ close(evt_dir.dirfd);
return ret;
}
@@ -615,17 +616,23 @@ static int add_tracepoint_multi_sys(struct parse_events_state *parse_state,
struct parse_events_error *err,
struct parse_events_terms *head_config, YYLTYPE *loc)
{
- struct dirent *events_ent;
- DIR *events_dir;
+ struct io_dirent64 *events_ent;
+ struct io_dir events_dir;
int ret = 0;
+ char *events_dir_path = get_tracing_file("events");
- events_dir = tracing_events__opendir();
- if (!events_dir) {
+ if (!events_dir_path) {
+ tracepoint_error(err, errno, sys_name, evt_name, loc->first_column);
+ return -1;
+ }
+ io_dir__init(&events_dir, open(events_dir_path, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+ put_events_file(events_dir_path);
+ if (events_dir.dirfd < 0) {
tracepoint_error(err, errno, sys_name, evt_name, loc->first_column);
return -1;
}
- while (!ret && (events_ent = readdir(events_dir))) {
+ while (!ret && (events_ent = io_dir__readdir(&events_dir))) {
if (!strcmp(events_ent->d_name, ".")
|| !strcmp(events_ent->d_name, "..")
|| !strcmp(events_ent->d_name, "enable")
@@ -639,8 +646,7 @@ static int add_tracepoint_multi_sys(struct parse_events_state *parse_state,
ret = add_tracepoint_event(parse_state, list, events_ent->d_name,
evt_name, err, head_config, loc);
}
-
- closedir(events_dir);
+ close(events_dir.dirfd);
return ret;
}
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 7/7] perf hwmon_pmu: Switch event discovery to io_dir__readdir
2025-02-07 23:24 [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir Ian Rogers
` (5 preceding siblings ...)
2025-02-07 23:24 ` [PATCH v2 6/7] perf parse-events: Switch tracepoints to io_dir__readdir Ian Rogers
@ 2025-02-07 23:24 ` Ian Rogers
2025-02-08 10:58 ` [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir David Laight
2025-02-19 21:54 ` Namhyung Kim
8 siblings, 0 replies; 15+ messages in thread
From: Ian Rogers @ 2025-02-07 23:24 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Sam James, Jesper Juhl,
James Clark, Zhongqiu Han, Yicong Yang, Thomas Richter,
Michael Petlan, Veronika Molnarova, Anne Macedo,
Dominique Martinet, Jean-Philippe Romain, Junhao He, linux-kernel,
linux-perf-users, Krzysztof Łopatowski
Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/hwmon_pmu.c | 42 +++++++++++++++----------------------
1 file changed, 17 insertions(+), 25 deletions(-)
diff --git a/tools/perf/util/hwmon_pmu.c b/tools/perf/util/hwmon_pmu.c
index acd889b2462f..3cce77fc8004 100644
--- a/tools/perf/util/hwmon_pmu.c
+++ b/tools/perf/util/hwmon_pmu.c
@@ -11,13 +11,13 @@
#include <sys/types.h>
#include <assert.h>
#include <ctype.h>
-#include <dirent.h>
#include <fcntl.h>
#include <stddef.h>
#include <stdlib.h>
#include <string.h>
#include <api/fs/fs.h>
#include <api/io.h>
+#include <api/io_dir.h>
#include <linux/kernel.h>
#include <linux/string.h>
#include <linux/zalloc.h>
@@ -235,31 +235,22 @@ static void fix_name(char *p)
static int hwmon_pmu__read_events(struct hwmon_pmu *pmu)
{
- DIR *dir;
- struct dirent *ent;
- int dup_fd, err = 0;
+ int err = 0;
struct hashmap_entry *cur, *tmp;
size_t bkt;
+ struct io_dirent64 *ent;
+ struct io_dir dir;
if (pmu->pmu.sysfs_aliases_loaded)
return 0;
- /*
- * Use a dup-ed fd as closedir will close it. Use openat so that the
- * directory contents are refreshed.
- */
- dup_fd = openat(pmu->hwmon_dir_fd, ".", O_DIRECTORY);
+ /* Use openat so that the directory contents are refreshed. */
+ io_dir__init(&dir, openat(pmu->hwmon_dir_fd, ".", O_CLOEXEC | O_DIRECTORY | O_RDONLY));
- if (dup_fd == -1)
- return -ENOMEM;
+ if (dir.dirfd < 0)
+ return -ENOENT;
- dir = fdopendir(dup_fd);
- if (!dir) {
- close(dup_fd);
- return -ENOMEM;
- }
-
- while ((ent = readdir(dir)) != NULL) {
+ while ((ent = io_dir__readdir(&dir)) != NULL) {
enum hwmon_type type;
int number;
enum hwmon_item item;
@@ -347,7 +338,7 @@ static int hwmon_pmu__read_events(struct hwmon_pmu *pmu)
pmu->pmu.sysfs_aliases_loaded = true;
err_out:
- closedir(dir);
+ close(dir.dirfd);
return err;
}
@@ -702,8 +693,8 @@ int hwmon_pmu__check_alias(struct parse_events_terms *terms, struct perf_pmu_inf
int perf_pmus__read_hwmon_pmus(struct list_head *pmus)
{
char *line = NULL;
- DIR *class_hwmon_dir;
- struct dirent *class_hwmon_ent;
+ struct io_dirent64 *class_hwmon_ent;
+ struct io_dir class_hwmon_dir;
char buf[PATH_MAX];
const char *sysfs = sysfs__mountpoint();
@@ -711,11 +702,12 @@ int perf_pmus__read_hwmon_pmus(struct list_head *pmus)
return 0;
scnprintf(buf, sizeof(buf), "%s/class/hwmon/", sysfs);
- class_hwmon_dir = opendir(buf);
- if (!class_hwmon_dir)
+ io_dir__init(&class_hwmon_dir, open(buf, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+
+ if (class_hwmon_dir.dirfd < 0)
return 0;
- while ((class_hwmon_ent = readdir(class_hwmon_dir)) != NULL) {
+ while ((class_hwmon_ent = io_dir__readdir(&class_hwmon_dir)) != NULL) {
size_t line_len;
int hwmon_dir, name_fd;
struct io io;
@@ -745,7 +737,7 @@ int perf_pmus__read_hwmon_pmus(struct list_head *pmus)
close(name_fd);
}
free(line);
- closedir(class_hwmon_dir);
+ close(class_hwmon_dir.dirfd);
return 0;
}
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir
2025-02-07 23:24 [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir Ian Rogers
` (6 preceding siblings ...)
2025-02-07 23:24 ` [PATCH v2 7/7] perf hwmon_pmu: Switch event discovery " Ian Rogers
@ 2025-02-08 10:58 ` David Laight
2025-02-08 12:15 ` Ian Rogers
2025-02-19 21:54 ` Namhyung Kim
8 siblings, 1 reply; 15+ messages in thread
From: David Laight @ 2025-02-08 10:58 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Kan Liang, Sam James, Jesper Juhl, James Clark,
Zhongqiu Han, Yicong Yang, Thomas Richter, Michael Petlan,
Veronika Molnarova, Anne Macedo, Dominique Martinet,
Jean-Philippe Romain, Junhao He, linux-kernel, linux-perf-users,
Krzysztof Łopatowski
On Fri, 7 Feb 2025 15:24:41 -0800
Ian Rogers <irogers@google.com> wrote:
> glibc's opendir allocates a minimum of 32kb, when called recursively
> for a directory tree the memory consumption can add up - nearly 300kb
> during perf start-up when processing modules. Add a stack allocated
> variant of readdir sized a little more than 1kb
Does 300kB really matter?
You seem to be trading memory allocation against system calls.
My 'gut feel' is that the system call cost will dominate.
(Unless all the directories are small.)
There might, of course, be other code in glibc that is designed
to make it all slower than necessary.
David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir
2025-02-08 10:58 ` [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir David Laight
@ 2025-02-08 12:15 ` Ian Rogers
0 siblings, 0 replies; 15+ messages in thread
From: Ian Rogers @ 2025-02-08 12:15 UTC (permalink / raw)
To: David Laight
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Adrian Hunter, Kan Liang, Sam James, Jesper Juhl, James Clark,
Zhongqiu Han, Yicong Yang, Thomas Richter, Michael Petlan,
Veronika Molnarova, Anne Macedo, Dominique Martinet,
Jean-Philippe Romain, Junhao He, linux-kernel, linux-perf-users,
Krzysztof Łopatowski
On Sat, Feb 8, 2025 at 2:58 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Fri, 7 Feb 2025 15:24:41 -0800
> Ian Rogers <irogers@google.com> wrote:
>
> > glibc's opendir allocates a minimum of 32kb, when called recursively
> > for a directory tree the memory consumption can add up - nearly 300kb
> > during perf start-up when processing modules. Add a stack allocated
> > variant of readdir sized a little more than 1kb
>
> Does 300kB really matter?
For someone on a laptop, probably no. For a company running the
command a large number of times in consolidated and heavily loaded
data centers, where binaries run in memory constrained containers,
memory usage contributes to latency of the 99th percentile slow
executions, which because of the number of machines is something seen
all the time.
That said, do you need to optimize usages where you recurse over the
kernel modules? That's something you wouldn't try to do in a data
centre, so there isn't so much need.
> You seem to be trading memory allocation against system calls.
> My 'gut feel' is that the system call cost will dominate.
> (Unless all the directories are small.)
It isn't so much directories being small as the filenames within them.
getdents64 will vary the size of each entry based on the filename size
which can be up to 256 bytes. As an example, for trace points:
```
$ sudo ls -al /sys/kernel/debug/tracing/events/raw_syscalls/
total 0
drwxr-xr-x 1 root root 0 Feb 8 03:23 .
drwxr-xr-x 1 root root 0 Feb 8 03:20 ..
-rw-r----- 1 root root 0 Feb 8 03:23 enable
-rw-r----- 1 root root 0 Feb 8 03:23 filter
drwxr-xr-x 1 root root 0 Feb 8 03:23 sys_enter
drwxr-xr-x 1 root root 0 Feb 8 03:23 sys_exit
$ sudo ls -al /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter
total 0
drwxr-xr-x 1 root root 0 Feb 8 03:23 .
drwxr-xr-x 1 root root 0 Feb 8 03:23 ..
-rw-r----- 1 root root 0 Feb 8 03:23 enable
-rw-r----- 1 root root 0 Feb 8 03:23 filter
-r--r----- 1 root root 0 Feb 8 03:23 format
-r--r----- 1 root root 0 Feb 8 03:23 hist
-r--r----- 1 root root 0 Feb 8 03:23 id
-rw-r----- 1 root root 0 Feb 8 03:23 trigger
```
So each d_entry is going to cost you like 20 bytes plus the length of
the filename. So 6 entries with filenames varying up to 9 characters,
you need 24 or 32 bytes per entry after padding. If each entry were 32
bytes then you'd need a buffer to hold all the data of size 192 bytes,
the code here is using 1kb. The value used by glibc is between 32KB
and 1MB, which is pretty generous when all you need is 192 bytes.
An issue in perf is that we may recursively descend directories and
use opendir on each level, so the buffer size gets multiplied by the
depth of the tree. So at 100s of KB these memory allocations are some
of the largest seen in profiles of perf - BPF has some large
allocations too.
So why not fix opendir? I've found trying to get things landed in
glibc is slow to glacial [1], and someone once thought those buffer
sizes were good so who am I to argue? The LLVM libc and similar are
making similar smaller than glibc buffer size choices. The addition of
io_dir means we can tune for the use cases that matter to perf, such
as kernel modules, procfs, sysfs, .. in general we don't see
directories with lots of entries and we're a long way from the 256
byte file name max length. While the code could potentially increase
the number of system calls, in practice it doesn't.
Thanks,
Ian
[1] I sent patches bringing back the "stack:tid" convention in
/prod/pid/maps by naming stacks on allocation (an extra syscall). This
can be used, for example, to identify stack vs heap memory accesses.
Linux used to support this but Linus himself removed the support - the
code to do it had quadratic execution time due to searching for each
stack allocation the corresponding thread stack pointer. Fixing
opendir memory size and stack naming in glibc would be good things
imo. Were libc better we may not need this series (we still know our
use cases better than a generic implementation).
> There might, of course, be other code in glibc that is designed
> to make it all slower than necessary.
>
> David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/7] tools lib api: Add io_dir() as an allocation free readdir() alternative
2025-02-07 23:24 ` [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative Ian Rogers
@ 2025-02-08 17:02 ` Markus Elfring
2025-02-19 21:51 ` [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative Namhyung Kim
1 sibling, 0 replies; 15+ messages in thread
From: Markus Elfring @ 2025-02-08 17:02 UTC (permalink / raw)
To: Ian Rogers, linux-perf-users, Adrian Hunter, Alexander Shishkin,
Anne Macedo, Arnaldo Carvalho de Melo, Dominique Martinet,
Ingo Molnar, James Clark, Jean-Philippe Romain, Jesper Juhl,
Jiri Olsa, Junhao He, Kan Liang, Krzysztof Łopatowski,
Mark Rutland, Michael Petlan, Namhyung Kim, Peter Zijlstra,
Sam James, Thomas Richter, Veronika Molnarova, Yicong Yang,
Zhongqiu Han
Cc: LKML
…
> syscall use was made uncondtional to work around this. …
unconditional?
Regards,
Markus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative
2025-02-07 23:24 ` [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative Ian Rogers
2025-02-08 17:02 ` [PATCH v2 1/7] tools lib api: Add io_dir() as an allocation free readdir() alternative Markus Elfring
@ 2025-02-19 21:51 ` Namhyung Kim
2025-02-19 22:21 ` Ian Rogers
1 sibling, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2025-02-19 21:51 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Sam James, Jesper Juhl, James Clark, Zhongqiu Han,
Yicong Yang, Thomas Richter, Michael Petlan, Veronika Molnarova,
Anne Macedo, Dominique Martinet, Jean-Philippe Romain, Junhao He,
linux-kernel, linux-perf-users, Krzysztof Łopatowski
On Fri, Feb 07, 2025 at 03:24:42PM -0800, Ian Rogers wrote:
> glibc's opendir allocates a minimum of 32kb, when called recursively
> for a directory tree the memory consumption can add up - nearly 300kb
> during perf start-up when processing modules. Add a stack allocated
> variant of readdir sized a little more than 1kb.
>
> As getdents64 may be missing from libc, add support using syscall.
> Note, an earlier version of this patch had a feature test for
> getdents64 but there were problems on certains distros where
> getdents64 would be #define renamed to getdents breaking the code. The
> syscall use was made uncondtional to work around this. There is
> context in:
> https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> tools/lib/api/Makefile | 2 +-
> tools/lib/api/io_dir.h | 93 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 94 insertions(+), 1 deletion(-)
> create mode 100644 tools/lib/api/io_dir.h
>
> diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile
> index 7f6396087b46..8665c799e0fa 100644
> --- a/tools/lib/api/Makefile
> +++ b/tools/lib/api/Makefile
> @@ -95,7 +95,7 @@ install_lib: $(LIBFILE)
> $(call do_install_mkdir,$(libdir_SQ)); \
> cp -fpR $(LIBFILE) $(DESTDIR)$(libdir_SQ)
>
> -HDRS := cpu.h debug.h io.h
> +HDRS := cpu.h debug.h io.h io_dir.h
> FD_HDRS := fd/array.h
> FS_HDRS := fs/fs.h fs/tracing_path.h
> INSTALL_HDRS_PFX := $(DESTDIR)$(prefix)/include/api
> diff --git a/tools/lib/api/io_dir.h b/tools/lib/api/io_dir.h
> new file mode 100644
> index 000000000000..c84738923c96
> --- /dev/null
> +++ b/tools/lib/api/io_dir.h
> @@ -0,0 +1,93 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +/*
> + * Lightweight directory reading library.
> + */
> +#ifndef __API_IO_DIR__
> +#define __API_IO_DIR__
> +
> +#include <dirent.h>
> +#include <fcntl.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <sys/stat.h>
> +#include <sys/syscall.h>
> +
> +#if !defined(SYS_getdents64)
> +#if defined(__x86_64__)
> +#define SYS_getdents64 217
> +#elif defined(__aarch64__)
> +#define SYS_getdents64 61
> +#endif
> +#endif
> +
> +static inline ssize_t perf_getdents64(int fd, void *dirp, size_t count)
> +{
> +#ifdef MEMORY_SANITIZER
> + memset(dirp, 0, count);
> +#endif
> + return syscall(SYS_getdents64, fd, dirp, count);
Unfortunately this fails to build on my i386 vm (and probably other old
archs don't have SYS_getdents64 yet).
In file included from util/pmus.c:6:
/build/libapi/include/api/io_dir.h: In function 'perf_getdents64':
/build/libapi/include/api/io_dir.h:28:24: error: 'SYS_getdents64' undeclared (first use in this function); did you mean 'perf_getdents64'?
28 | return syscall(SYS_getdents64, fd, dirp, count);
| ^~~~~~~~~~~~~~
| perf_getdents64
> +}
> +#endif
Maybe mismatched.
Thanks,
Namhyung
> +
> +struct io_dirent64 {
> + ino64_t d_ino; /* 64-bit inode number */
> + off64_t d_off; /* 64-bit offset to next structure */
> + unsigned short d_reclen; /* Size of this dirent */
> + unsigned char d_type; /* File type */
> + char d_name[NAME_MAX + 1]; /* Filename (null-terminated) */
> +};
> +
> +struct io_dir {
> + int dirfd;
> + ssize_t available_bytes;
> + struct io_dirent64 *next;
> + struct io_dirent64 buff[4];
> +};
> +
> +static inline void io_dir__init(struct io_dir *iod, int dirfd)
> +{
> + iod->dirfd = dirfd;
> + iod->available_bytes = 0;
> +}
> +
> +static inline void io_dir__rewinddir(struct io_dir *iod)
> +{
> + lseek(iod->dirfd, 0, SEEK_SET);
> + iod->available_bytes = 0;
> +}
> +
> +static inline struct io_dirent64 *io_dir__readdir(struct io_dir *iod)
> +{
> + struct io_dirent64 *entry;
> +
> + if (iod->available_bytes <= 0) {
> + ssize_t rc = perf_getdents64(iod->dirfd, iod->buff, sizeof(iod->buff));
> +
> + if (rc <= 0)
> + return NULL;
> + iod->available_bytes = rc;
> + iod->next = iod->buff;
> + }
> + entry = iod->next;
> + iod->next = (struct io_dirent64 *)((char *)entry + entry->d_reclen);
> + iod->available_bytes -= entry->d_reclen;
> + return entry;
> +}
> +
> +static inline bool io_dir__is_dir(const struct io_dir *iod, struct io_dirent64 *dent)
> +{
> + if (dent->d_type == DT_UNKNOWN) {
> + struct stat st;
> +
> + if (fstatat(iod->dirfd, dent->d_name, &st, /*flags=*/0))
> + return false;
> +
> + if (S_ISDIR(st.st_mode)) {
> + dent->d_type = DT_DIR;
> + return true;
> + }
> + }
> + return dent->d_type == DT_DIR;
> +}
> +
> +#endif
> --
> 2.48.1.502.g6dc24dfdaf-goog
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir
2025-02-07 23:24 [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir Ian Rogers
` (7 preceding siblings ...)
2025-02-08 10:58 ` [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir David Laight
@ 2025-02-19 21:54 ` Namhyung Kim
8 siblings, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2025-02-19 21:54 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Sam James, Jesper Juhl, James Clark, Zhongqiu Han,
Yicong Yang, Thomas Richter, Michael Petlan, Veronika Molnarova,
Anne Macedo, Dominique Martinet, Jean-Philippe Romain, Junhao He,
linux-kernel, linux-perf-users, Krzysztof Łopatowski
On Fri, Feb 07, 2025 at 03:24:41PM -0800, Ian Rogers wrote:
> glibc's opendir allocates a minimum of 32kb, when called recursively
> for a directory tree the memory consumption can add up - nearly 300kb
> during perf start-up when processing modules. Add a stack allocated
> variant of readdir sized a little more than 1kb
>
> v2: Remove the feature test and always use a perf supplied getdents64
> to workaround an Alpine Linux issue in v1:
> https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
> As suggested by Krzysztof Łopatowski
> <krzysztof.m.lopatowski@gmail.com> who also pointed to the perf
> trace performance improvements in start-up time eliminating stat
> calls can achieve:
> https://lore.kernel.org/lkml/20250206113314.335376-2-krzysztof.m.lopatowski@gmail.com/
Let me pick up Krzysztof's patch first.
Thanks,
Namhyung
> Convert parse-events and hwmon_pmu to use io_dir.
> v1: This was previously part of the memory saving change set:
> https://lore.kernel.org/lkml/20231127220902.1315692-1-irogers@google.com/
> It is separated here and a feature check and syscall workaround
> for missing getdents64 added.
>
> Ian Rogers (7):
> tools lib api: Add io_dir an allocation free readdir alternative
> perf maps: Switch modules tree walk to io_dir__readdir
> perf pmu: Switch to io_dir__readdir
> perf header: Switch mem topology to io_dir__readdir
> perf events: Remove scandir in thread synthesis
> perf parse-events: Switch tracepoints to io_dir__readdir
> perf hwmon_pmu: Switch event discovery to io_dir__readdir
>
> tools/lib/api/Makefile | 2 +-
> tools/lib/api/io_dir.h | 91 ++++++++++++++++++++++++++++++
> tools/perf/util/header.c | 31 +++++-----
> tools/perf/util/hwmon_pmu.c | 42 ++++++--------
> tools/perf/util/machine.c | 19 +++----
> tools/perf/util/parse-events.c | 32 ++++++-----
> tools/perf/util/pmu.c | 46 +++++++--------
> tools/perf/util/pmus.c | 30 ++++------
> tools/perf/util/synthetic-events.c | 22 ++++----
> 9 files changed, 194 insertions(+), 121 deletions(-)
> create mode 100644 tools/lib/api/io_dir.h
>
> --
> 2.48.1.502.g6dc24dfdaf-goog
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative
2025-02-19 21:51 ` [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative Namhyung Kim
@ 2025-02-19 22:21 ` Ian Rogers
2025-02-21 6:31 ` Namhyung Kim
0 siblings, 1 reply; 15+ messages in thread
From: Ian Rogers @ 2025-02-19 22:21 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Sam James, Jesper Juhl, James Clark, Zhongqiu Han,
Yicong Yang, Thomas Richter, Michael Petlan, Veronika Molnarova,
Anne Macedo, Dominique Martinet, Jean-Philippe Romain, Junhao He,
linux-kernel, linux-perf-users, Krzysztof Łopatowski
On Wed, Feb 19, 2025 at 1:51 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Feb 07, 2025 at 03:24:42PM -0800, Ian Rogers wrote:
> > glibc's opendir allocates a minimum of 32kb, when called recursively
> > for a directory tree the memory consumption can add up - nearly 300kb
> > during perf start-up when processing modules. Add a stack allocated
> > variant of readdir sized a little more than 1kb.
> >
> > As getdents64 may be missing from libc, add support using syscall.
> > Note, an earlier version of this patch had a feature test for
> > getdents64 but there were problems on certains distros where
> > getdents64 would be #define renamed to getdents breaking the code. The
> > syscall use was made uncondtional to work around this. There is
> > context in:
> > https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > tools/lib/api/Makefile | 2 +-
> > tools/lib/api/io_dir.h | 93 ++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 94 insertions(+), 1 deletion(-)
> > create mode 100644 tools/lib/api/io_dir.h
> >
> > diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile
> > index 7f6396087b46..8665c799e0fa 100644
> > --- a/tools/lib/api/Makefile
> > +++ b/tools/lib/api/Makefile
> > @@ -95,7 +95,7 @@ install_lib: $(LIBFILE)
> > $(call do_install_mkdir,$(libdir_SQ)); \
> > cp -fpR $(LIBFILE) $(DESTDIR)$(libdir_SQ)
> >
> > -HDRS := cpu.h debug.h io.h
> > +HDRS := cpu.h debug.h io.h io_dir.h
> > FD_HDRS := fd/array.h
> > FS_HDRS := fs/fs.h fs/tracing_path.h
> > INSTALL_HDRS_PFX := $(DESTDIR)$(prefix)/include/api
> > diff --git a/tools/lib/api/io_dir.h b/tools/lib/api/io_dir.h
> > new file mode 100644
> > index 000000000000..c84738923c96
> > --- /dev/null
> > +++ b/tools/lib/api/io_dir.h
> > @@ -0,0 +1,93 @@
> > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > +/*
> > + * Lightweight directory reading library.
> > + */
> > +#ifndef __API_IO_DIR__
> > +#define __API_IO_DIR__
> > +
> > +#include <dirent.h>
> > +#include <fcntl.h>
> > +#include <stdlib.h>
> > +#include <unistd.h>
> > +#include <sys/stat.h>
> > +#include <sys/syscall.h>
> > +
> > +#if !defined(SYS_getdents64)
> > +#if defined(__x86_64__)
> > +#define SYS_getdents64 217
> > +#elif defined(__aarch64__)
> > +#define SYS_getdents64 61
> > +#endif
> > +#endif
> > +
> > +static inline ssize_t perf_getdents64(int fd, void *dirp, size_t count)
> > +{
> > +#ifdef MEMORY_SANITIZER
> > + memset(dirp, 0, count);
> > +#endif
> > + return syscall(SYS_getdents64, fd, dirp, count);
>
> Unfortunately this fails to build on my i386 vm (and probably other old
> archs don't have SYS_getdents64 yet).
>
> In file included from util/pmus.c:6:
> /build/libapi/include/api/io_dir.h: In function 'perf_getdents64':
> /build/libapi/include/api/io_dir.h:28:24: error: 'SYS_getdents64' undeclared (first use in this function); did you mean 'perf_getdents64'?
> 28 | return syscall(SYS_getdents64, fd, dirp, count);
> | ^~~~~~~~~~~~~~
> | perf_getdents64
>
> > +}
> > +#endif
>
> Maybe mismatched.
So even on 32-bit systems we want getdents64 as getdents encodes the
d_type at the end of dirent making it hard to index. On i386 we know
the number of the syscall for perf trace:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/entry/syscalls/syscall_32.tbl?h=perf-tools-next#n235
So we can presumably change:
```
#if !defined(SYS_getdents64)
#if defined(__x86_64__)
#define SYS_getdents64 217
#elif defined(__aarch64__)
#define SYS_getdents64 61
#endif
#endif
```
to also have:
```
#elif defined(__i386__)
#define SYS_getdents64 220
```
Could you test this so that I don't need to resend 7 patches for each
architecture you test upon? The man page says <sys/syscall.h> and
<unistd.h> should be sufficient for the code to work, so I think
addressing this is adding workarounds for distros that aren't
conformant - ie its the distro's fault the code fails to compile and
not the tool's.
Thanks,
Ian
> Thanks,
> Namhyung
>
> > +
> > +struct io_dirent64 {
> > + ino64_t d_ino; /* 64-bit inode number */
> > + off64_t d_off; /* 64-bit offset to next structure */
> > + unsigned short d_reclen; /* Size of this dirent */
> > + unsigned char d_type; /* File type */
> > + char d_name[NAME_MAX + 1]; /* Filename (null-terminated) */
> > +};
> > +
> > +struct io_dir {
> > + int dirfd;
> > + ssize_t available_bytes;
> > + struct io_dirent64 *next;
> > + struct io_dirent64 buff[4];
> > +};
> > +
> > +static inline void io_dir__init(struct io_dir *iod, int dirfd)
> > +{
> > + iod->dirfd = dirfd;
> > + iod->available_bytes = 0;
> > +}
> > +
> > +static inline void io_dir__rewinddir(struct io_dir *iod)
> > +{
> > + lseek(iod->dirfd, 0, SEEK_SET);
> > + iod->available_bytes = 0;
> > +}
> > +
> > +static inline struct io_dirent64 *io_dir__readdir(struct io_dir *iod)
> > +{
> > + struct io_dirent64 *entry;
> > +
> > + if (iod->available_bytes <= 0) {
> > + ssize_t rc = perf_getdents64(iod->dirfd, iod->buff, sizeof(iod->buff));
> > +
> > + if (rc <= 0)
> > + return NULL;
> > + iod->available_bytes = rc;
> > + iod->next = iod->buff;
> > + }
> > + entry = iod->next;
> > + iod->next = (struct io_dirent64 *)((char *)entry + entry->d_reclen);
> > + iod->available_bytes -= entry->d_reclen;
> > + return entry;
> > +}
> > +
> > +static inline bool io_dir__is_dir(const struct io_dir *iod, struct io_dirent64 *dent)
> > +{
> > + if (dent->d_type == DT_UNKNOWN) {
> > + struct stat st;
> > +
> > + if (fstatat(iod->dirfd, dent->d_name, &st, /*flags=*/0))
> > + return false;
> > +
> > + if (S_ISDIR(st.st_mode)) {
> > + dent->d_type = DT_DIR;
> > + return true;
> > + }
> > + }
> > + return dent->d_type == DT_DIR;
> > +}
> > +
> > +#endif
> > --
> > 2.48.1.502.g6dc24dfdaf-goog
> >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative
2025-02-19 22:21 ` Ian Rogers
@ 2025-02-21 6:31 ` Namhyung Kim
0 siblings, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2025-02-21 6:31 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Sam James, Jesper Juhl, James Clark, Zhongqiu Han,
Yicong Yang, Thomas Richter, Michael Petlan, Veronika Molnarova,
Anne Macedo, Dominique Martinet, Jean-Philippe Romain, Junhao He,
linux-kernel, linux-perf-users, Krzysztof Łopatowski
On Wed, Feb 19, 2025 at 02:21:45PM -0800, Ian Rogers wrote:
> On Wed, Feb 19, 2025 at 1:51 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Feb 07, 2025 at 03:24:42PM -0800, Ian Rogers wrote:
> > > glibc's opendir allocates a minimum of 32kb, when called recursively
> > > for a directory tree the memory consumption can add up - nearly 300kb
> > > during perf start-up when processing modules. Add a stack allocated
> > > variant of readdir sized a little more than 1kb.
> > >
> > > As getdents64 may be missing from libc, add support using syscall.
> > > Note, an earlier version of this patch had a feature test for
> > > getdents64 but there were problems on certains distros where
> > > getdents64 would be #define renamed to getdents breaking the code. The
> > > syscall use was made uncondtional to work around this. There is
> > > context in:
> > > https://lore.kernel.org/lkml/20231207050433.1426834-1-irogers@google.com/
> > >
> > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > ---
> > > tools/lib/api/Makefile | 2 +-
> > > tools/lib/api/io_dir.h | 93 ++++++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 94 insertions(+), 1 deletion(-)
> > > create mode 100644 tools/lib/api/io_dir.h
> > >
> > > diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile
> > > index 7f6396087b46..8665c799e0fa 100644
> > > --- a/tools/lib/api/Makefile
> > > +++ b/tools/lib/api/Makefile
> > > @@ -95,7 +95,7 @@ install_lib: $(LIBFILE)
> > > $(call do_install_mkdir,$(libdir_SQ)); \
> > > cp -fpR $(LIBFILE) $(DESTDIR)$(libdir_SQ)
> > >
> > > -HDRS := cpu.h debug.h io.h
> > > +HDRS := cpu.h debug.h io.h io_dir.h
> > > FD_HDRS := fd/array.h
> > > FS_HDRS := fs/fs.h fs/tracing_path.h
> > > INSTALL_HDRS_PFX := $(DESTDIR)$(prefix)/include/api
> > > diff --git a/tools/lib/api/io_dir.h b/tools/lib/api/io_dir.h
> > > new file mode 100644
> > > index 000000000000..c84738923c96
> > > --- /dev/null
> > > +++ b/tools/lib/api/io_dir.h
> > > @@ -0,0 +1,93 @@
> > > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > > +/*
> > > + * Lightweight directory reading library.
> > > + */
> > > +#ifndef __API_IO_DIR__
> > > +#define __API_IO_DIR__
> > > +
> > > +#include <dirent.h>
> > > +#include <fcntl.h>
> > > +#include <stdlib.h>
> > > +#include <unistd.h>
> > > +#include <sys/stat.h>
> > > +#include <sys/syscall.h>
> > > +
> > > +#if !defined(SYS_getdents64)
> > > +#if defined(__x86_64__)
> > > +#define SYS_getdents64 217
> > > +#elif defined(__aarch64__)
> > > +#define SYS_getdents64 61
> > > +#endif
> > > +#endif
> > > +
> > > +static inline ssize_t perf_getdents64(int fd, void *dirp, size_t count)
> > > +{
> > > +#ifdef MEMORY_SANITIZER
> > > + memset(dirp, 0, count);
> > > +#endif
> > > + return syscall(SYS_getdents64, fd, dirp, count);
> >
> > Unfortunately this fails to build on my i386 vm (and probably other old
> > archs don't have SYS_getdents64 yet).
> >
> > In file included from util/pmus.c:6:
> > /build/libapi/include/api/io_dir.h: In function 'perf_getdents64':
> > /build/libapi/include/api/io_dir.h:28:24: error: 'SYS_getdents64' undeclared (first use in this function); did you mean 'perf_getdents64'?
> > 28 | return syscall(SYS_getdents64, fd, dirp, count);
> > | ^~~~~~~~~~~~~~
> > | perf_getdents64
> >
> > > +}
> > > +#endif
> >
> > Maybe mismatched.
>
> So even on 32-bit systems we want getdents64 as getdents encodes the
> d_type at the end of dirent making it hard to index. On i386 we know
> the number of the syscall for perf trace:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/entry/syscalls/syscall_32.tbl?h=perf-tools-next#n235
> So we can presumably change:
> ```
> #if !defined(SYS_getdents64)
> #if defined(__x86_64__)
> #define SYS_getdents64 217
> #elif defined(__aarch64__)
> #define SYS_getdents64 61
> #endif
> #endif
> ```
> to also have:
> ```
> #elif defined(__i386__)
> #define SYS_getdents64 220
> ```
> Could you test this so that I don't need to resend 7 patches for each
> architecture you test upon? The man page says <sys/syscall.h> and
> <unistd.h> should be sufficient for the code to work, so I think
> addressing this is adding workarounds for distros that aren't
> conformant - ie its the distro's fault the code fails to compile and
> not the tool's.
It fixes the issue on my machine but I'm afraid others will see the same
issue on other archs. I think <sys/syscall.h> should provide the number
for the syscall but the problem is old distros which didn't ship recent
headers. So it's a matter of how long the tool needs to support such an
old one. :(
Thanks,
Namhyung
> >
> > > +
> > > +struct io_dirent64 {
> > > + ino64_t d_ino; /* 64-bit inode number */
> > > + off64_t d_off; /* 64-bit offset to next structure */
> > > + unsigned short d_reclen; /* Size of this dirent */
> > > + unsigned char d_type; /* File type */
> > > + char d_name[NAME_MAX + 1]; /* Filename (null-terminated) */
> > > +};
> > > +
> > > +struct io_dir {
> > > + int dirfd;
> > > + ssize_t available_bytes;
> > > + struct io_dirent64 *next;
> > > + struct io_dirent64 buff[4];
> > > +};
> > > +
> > > +static inline void io_dir__init(struct io_dir *iod, int dirfd)
> > > +{
> > > + iod->dirfd = dirfd;
> > > + iod->available_bytes = 0;
> > > +}
> > > +
> > > +static inline void io_dir__rewinddir(struct io_dir *iod)
> > > +{
> > > + lseek(iod->dirfd, 0, SEEK_SET);
> > > + iod->available_bytes = 0;
> > > +}
> > > +
> > > +static inline struct io_dirent64 *io_dir__readdir(struct io_dir *iod)
> > > +{
> > > + struct io_dirent64 *entry;
> > > +
> > > + if (iod->available_bytes <= 0) {
> > > + ssize_t rc = perf_getdents64(iod->dirfd, iod->buff, sizeof(iod->buff));
> > > +
> > > + if (rc <= 0)
> > > + return NULL;
> > > + iod->available_bytes = rc;
> > > + iod->next = iod->buff;
> > > + }
> > > + entry = iod->next;
> > > + iod->next = (struct io_dirent64 *)((char *)entry + entry->d_reclen);
> > > + iod->available_bytes -= entry->d_reclen;
> > > + return entry;
> > > +}
> > > +
> > > +static inline bool io_dir__is_dir(const struct io_dir *iod, struct io_dirent64 *dent)
> > > +{
> > > + if (dent->d_type == DT_UNKNOWN) {
> > > + struct stat st;
> > > +
> > > + if (fstatat(iod->dirfd, dent->d_name, &st, /*flags=*/0))
> > > + return false;
> > > +
> > > + if (S_ISDIR(st.st_mode)) {
> > > + dent->d_type = DT_DIR;
> > > + return true;
> > > + }
> > > + }
> > > + return dent->d_type == DT_DIR;
> > > +}
> > > +
> > > +#endif
> > > --
> > > 2.48.1.502.g6dc24dfdaf-goog
> > >
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-02-21 6:31 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 23:24 [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir Ian Rogers
2025-02-07 23:24 ` [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative Ian Rogers
2025-02-08 17:02 ` [PATCH v2 1/7] tools lib api: Add io_dir() as an allocation free readdir() alternative Markus Elfring
2025-02-19 21:51 ` [PATCH v2 1/7] tools lib api: Add io_dir an allocation free readdir alternative Namhyung Kim
2025-02-19 22:21 ` Ian Rogers
2025-02-21 6:31 ` Namhyung Kim
2025-02-07 23:24 ` [PATCH v2 2/7] perf maps: Switch modules tree walk to io_dir__readdir Ian Rogers
2025-02-07 23:24 ` [PATCH v2 3/7] perf pmu: Switch " Ian Rogers
2025-02-07 23:24 ` [PATCH v2 4/7] perf header: Switch mem topology " Ian Rogers
2025-02-07 23:24 ` [PATCH v2 5/7] perf events: Remove scandir in thread synthesis Ian Rogers
2025-02-07 23:24 ` [PATCH v2 6/7] perf parse-events: Switch tracepoints to io_dir__readdir Ian Rogers
2025-02-07 23:24 ` [PATCH v2 7/7] perf hwmon_pmu: Switch event discovery " Ian Rogers
2025-02-08 10:58 ` [PATCH v2 0/7] Add io_dir to avoid memory overhead from opendir David Laight
2025-02-08 12:15 ` Ian Rogers
2025-02-19 21:54 ` Namhyung Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).