* [PATCH] perf bench: add --write-size option to sched pipe
@ 2026-05-15 11:45 Breno Leitao
2026-05-15 12:05 ` sashiko-bot
0 siblings, 1 reply; 2+ messages in thread
From: Breno Leitao @ 2026-05-15 11:45 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, James Clark
Cc: linux-perf-users, linux-kernel, kernel-team, Breno Leitao
The default ping-pong uses sizeof(int) (4 bytes) per iteration, which
exercises only the pipe-buffer merge path and keeps allocation entirely
out of the picture. That makes the bench a useful scheduler / context-
switch latency probe but unable to surface anything from the pipe
page-allocation hot path.
Add a -s/--write-size option that sets the bytes written and read per
ping-pong iteration. The buffer is allocated for each side via
struct thread_data and replaces the on-stack int previously used. The
default remains sizeof(int) so existing invocations are unchanged.
With --write-size set above PAGE_SIZE the bench drives anon_pipe_write()
through alloc_page() (or the bulk pre-alloc, if the relevant patch is
applied), which is what we want when measuring pipe locking and page
allocation work.
The bench is a ping-pong: both sides call write() before read(), so a
single write_size payload must fit entirely in the pipe buffer or both
sides deadlock waiting for the other to drain. Resize the pipe via
F_SETPIPE_SZ to match write_size (skipped at the sizeof(int) default),
and error out cleanly when the request exceeds
/proc/sys/fs/pipe-max-size.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
This patch has been valuable for testing and verifying the pipe
enhancements currently under discussion at
https://lore.kernel.org/all/20260515-fix_pipe-v1-0-b14c840c7555@debian.org/
---
tools/perf/bench/sched-pipe.c | 36 ++++++++++++++++++++++++++++++------
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/tools/perf/bench/sched-pipe.c b/tools/perf/bench/sched-pipe.c
index 70139036d68f0..77a7e35d7d809 100644
--- a/tools/perf/bench/sched-pipe.c
+++ b/tools/perf/bench/sched-pipe.c
@@ -39,6 +39,7 @@ struct thread_data {
int epoll_fd;
bool cgroup_failed;
pthread_t pthread;
+ char *buf;
};
#define LOOPS_DEFAULT 1000000
@@ -48,6 +49,7 @@ static int loops = LOOPS_DEFAULT;
static bool threaded;
static bool nonblocking;
+static unsigned int write_size = sizeof(int);
static char *cgrp_names[2];
static struct cgroup *cgrps[2];
@@ -88,6 +90,9 @@ static const struct option options[] = {
OPT_BOOLEAN('n', "nonblocking", &nonblocking, "Use non-blocking operations"),
OPT_INTEGER('l', "loop", &loops, "Specify number of loops"),
OPT_BOOLEAN('T', "threaded", &threaded, "Specify threads/process based task setup"),
+ OPT_UINTEGER('s', "write-size", &write_size,
+ "Bytes per ping-pong write (default 4-bytes). Use larger "
+ "values to exercise the pipe page-allocation path."),
OPT_CALLBACK('G', "cgroups", NULL, "SEND,RECV",
"Put sender and receivers in given cgroups",
parse_two_cgroups),
@@ -172,14 +177,14 @@ static void exit_cgroup(int nr)
static inline int read_pipe(struct thread_data *td)
{
- int ret, m;
+ int ret;
retry:
if (nonblocking) {
ret = epoll_wait(td->epoll_fd, &td->epoll_ev, 1, -1);
if (ret < 0)
return ret;
}
- ret = read(td->pipe_read, &m, sizeof(int));
+ ret = read(td->pipe_read, td->buf, write_size);
if (nonblocking && ret < 0 && errno == EWOULDBLOCK)
goto retry;
return ret;
@@ -188,7 +193,7 @@ static inline int read_pipe(struct thread_data *td)
static void *worker_thread(void *__tdata)
{
struct thread_data *td = __tdata;
- int i, ret, m = 0;
+ int i, ret;
ret = enter_cgroup(td->nr);
if (ret < 0) {
@@ -204,10 +209,10 @@ static void *worker_thread(void *__tdata)
}
for (i = 0; i < loops; i++) {
- ret = write(td->pipe_write, &m, sizeof(int));
- BUG_ON(ret != sizeof(int));
+ ret = write(td->pipe_write, td->buf, write_size);
+ BUG_ON(ret != (int)write_size);
ret = read_pipe(td);
- BUG_ON(ret != sizeof(int));
+ BUG_ON(ret != (int)write_size);
}
return NULL;
@@ -239,6 +244,22 @@ int bench_sched_pipe(int argc, const char **argv)
BUG_ON(pipe2(pipe_1, flags));
BUG_ON(pipe2(pipe_2, flags));
+ /*
+ * On custom write_size, set the pipe size to acommodate write_size
+ */
+ if (write_size > sizeof(int) &&
+ (fcntl(pipe_1[1], F_SETPIPE_SZ, write_size) < (int)write_size ||
+ fcntl(pipe_2[1], F_SETPIPE_SZ, write_size) < (int)write_size)) {
+ fprintf(stderr, "--write-size %u exceeds /proc/sys/fs/pipe-max-size\n",
+ write_size);
+ return -1;
+ }
+
+ for (t = 0; t < nr_threads; t++) {
+ threads[t].buf = calloc(1, write_size);
+ BUG_ON(!threads[t].buf);
+ }
+
gettimeofday(&start, NULL);
for (t = 0; t < nr_threads; t++) {
@@ -287,6 +308,9 @@ int bench_sched_pipe(int argc, const char **argv)
gettimeofday(&stop, NULL);
timersub(&stop, &start, &diff);
+ for (t = 0; t < nr_threads; t++)
+ free(threads[t].buf);
+
exit_cgroup(0);
exit_cgroup(1);
---
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
change-id: 20260515-perf_bench_pipe-bae2ec777c4b
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] perf bench: add --write-size option to sched pipe
2026-05-15 11:45 [PATCH] perf bench: add --write-size option to sched pipe Breno Leitao
@ 2026-05-15 12:05 ` sashiko-bot
0 siblings, 0 replies; 2+ messages in thread
From: sashiko-bot @ 2026-05-15 12:05 UTC (permalink / raw)
To: Breno Leitao; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Integer overflow in write_size cast bypasses pipe capacity validation and causes a deadlock.
- [High] Missing validation for write_size == 0 causes benchmark hangs or bogus results.
- [Medium] File descriptor leak on the fcntl error path.
--
commit 2c945dc323a6b416d95bca3f92808b828bcc9c74
Author: Breno Leitao <leitao@debian.org>
perf bench: add --write-size option to sched pipe
This commit adds a -s/--write-size option to the sched pipe benchmark to set
the bytes written and read per ping-pong iteration. It also resizes the pipe
using F_SETPIPE_SZ to match the requested size.
> diff --git a/tools/perf/bench/sched-pipe.c b/tools/perf/bench/sched-pipe.c
> --- a/tools/perf/bench/sched-pipe.c
> +++ b/tools/perf/bench/sched-pipe.c
[ ... ]
> @@ -48,6 +49,7 @@ static int loops = LOOPS_DEFAULT;
> static bool threaded;
>
> static bool nonblocking;
> +static unsigned int write_size = sizeof(int);
Is there a missing lower-bound check for write_size? If a user passes 0,
the vfs_write system call will return 0 immediately without invoking
pipe_write(), meaning no data is transferred and no readers are woken up.
In blocking mode, both threads could instantly spin through their loops
transferring 0 bytes, generating a falsely fast benchmark result without
measuring context switch latency. In non-blocking mode, the threads rely
on epoll_wait(), which would hang forever since the 0-byte write wouldn't
trigger EPOLLIN.
[ ... ]
> @@ -204,10 +209,10 @@ static void *worker_thread(void *__tdata)
> }
>
> for (i = 0; i < loops; i++) {
> - ret = write(td->pipe_write, &m, sizeof(int));
> - BUG_ON(ret != sizeof(int));
> + ret = write(td->pipe_write, td->buf, write_size);
> + BUG_ON(ret != (int)write_size);
Could this cast to (int) cause an unexpected crash? If write_size is larger
than INT_MAX, the cast results in a negative number, and positive return
values from write() would be compared against this negative number,
triggering the BUG_ON.
> ret = read_pipe(td);
> - BUG_ON(ret != sizeof(int));
> + BUG_ON(ret != (int)write_size);
> }
>
> return NULL;
> @@ -239,6 +244,22 @@ int bench_sched_pipe(int argc, const char **argv)
> BUG_ON(pipe2(pipe_1, flags));
> BUG_ON(pipe2(pipe_2, flags));
>
> + /*
> + * On custom write_size, set the pipe size to acommodate write_size
> + */
> + if (write_size > sizeof(int) &&
> + (fcntl(pipe_1[1], F_SETPIPE_SZ, write_size) < (int)write_size ||
> + fcntl(pipe_2[1], F_SETPIPE_SZ, write_size) < (int)write_size)) {
Does this signed cast bypass the pipe capacity validation when write_size
exceeds INT_MAX?
If fcntl() fails because the size exceeds system limits and returns -1, the
check evaluates as -1 < (negative number). Since -1 is not less than any
negative number, this evaluates to false, silently bypassing the error
handler.
The benchmark would then proceed with default 64KB pipes but attempt to
write >2GB payloads into them. Because the ping-pong design requires both
threads to fully execute write() before read(), both threads would fill
their respective pipes and block indefinitely, resulting in a deadlock.
> + fprintf(stderr, "--write-size %u exceeds /proc/sys/fs/pipe-max-size\n",
> + write_size);
> + return -1;
Does this return path leak the pipe file descriptors? The benchmark allocates
four file descriptors via pipe2() just above this block. If the early return
is taken, it appears to exit without passing through a cleanup path to close
the acquired file descriptors.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260515-perf_bench_pipe-v1-1-3c5b805ba178@debian.org?part=1
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-15 12:05 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 11:45 [PATCH] perf bench: add --write-size option to sched pipe Breno Leitao
2026-05-15 12:05 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox