From: Namhyung Kim <namhyung@kernel.org>
To: Breno Leitao <leitao@debian.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
James Clark <james.clark@linaro.org>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH v3] perf bench: add --write-size option to sched pipe
Date: Wed, 27 May 2026 10:31:10 -0700 [thread overview]
Message-ID: <ahcqXqbjNlU-viPj@google.com> (raw)
In-Reply-To: <20260527-perf_bench_pipe-v3-1-9eee9465d673@debian.org>
On Wed, May 27, 2026 at 06:42:59AM -0700, Breno Leitao wrote:
> The default ping-pong uses sizeof(int) (4 bytes) per iteration, which
> exercises only the pipe-buffer merge path and keeps allocation entirely
> out of the picture. That makes the bench a useful scheduler / context-
> switch latency probe but unable to surface anything from the pipe
> page-allocation hot path.
>
> Add a -s/--write-size option that sets the bytes written and read per
> ping-pong iteration. The buffer is allocated for each side via
> struct thread_data and replaces the on-stack int previously used. The
> default remains sizeof(int) so existing invocations are unchanged.
>
> With --write-size set above PAGE_SIZE the bench drives anon_pipe_write()
> through alloc_page() (or the bulk pre-alloc, if the relevant patch is
> applied), which is what we want when measuring pipe locking and page
> allocation work.
>
> The bench is a ping-pong: both sides call write() before read(), so a
> single write_size payload must fit entirely in the pipe buffer or both
> sides deadlock waiting for the other to drain. Resize the pipe via
> F_SETPIPE_SZ to match write_size (skipped at the sizeof(int) default),
> and error out cleanly when the request exceeds
> /proc/sys/fs/pipe-max-size.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> This patch has been valuable for testing and verifying the pipe
> enhancements currently under discussion at
> https://lore.kernel.org/all/20260515-fix_pipe-v1-0-b14c840c7555@debian.org/
> ---
> Changes in v3:
> - Loop on short read()/write() in the worker via new
> pipe_xread()/pipe_xwrite() helpers, instead of asserting that the
> full payload is always transferred in one call (suggested by
> Namhyung).
> - Link to v2: https://patch.msgid.link/20260521-perf_bench_pipe-v2-1-720b6ff7f0fa@debian.org
>
> Changes in v2:
> - Reject --write-size == 0 to avoid a zero-byte ping-pong that spins
> (blocking mode) or hangs on epoll_wait (non-blocking mode).
> - Validate --write-size <= INT_MAX and drop the (int) casts in the
> read/write BUG_ON and fcntl(F_SETPIPE_SZ) checks, so the comparisons
> are unambiguous regardless of the requested size.
> - Fix "acommodate" typo in the pipe-resize comment.
> - Link to v1: https://patch.msgid.link/20260515-perf_bench_pipe-v1-1-3c5b805ba178@debian.org
>
> To: Peter Zijlstra <peterz@infradead.org>
> To: Ingo Molnar <mingo@redhat.com>
> To: Arnaldo Carvalho de Melo <acme@kernel.org>
> To: Namhyung Kim <namhyung@kernel.org>
> To: Mark Rutland <mark.rutland@arm.com>
> To: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> To: Jiri Olsa <jolsa@kernel.org>
> To: Ian Rogers <irogers@google.com>
> To: Adrian Hunter <adrian.hunter@intel.com>
> To: James Clark <james.clark@linaro.org>
> Cc: linux-perf-users@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> tools/perf/bench/sched-pipe.c | 102 ++++++++++++++++++++++++++++++++++++------
> 1 file changed, 89 insertions(+), 13 deletions(-)
>
> diff --git a/tools/perf/bench/sched-pipe.c b/tools/perf/bench/sched-pipe.c
> index 70139036d68f..7a14abc36047 100644
> --- a/tools/perf/bench/sched-pipe.c
> +++ b/tools/perf/bench/sched-pipe.c
> @@ -22,6 +22,7 @@
> #include <string.h>
> #include <errno.h>
> #include <fcntl.h>
> +#include <limits.h>
> #include <assert.h>
> #include <sys/epoll.h>
> #include <sys/time.h>
> @@ -39,6 +40,7 @@ struct thread_data {
> int epoll_fd;
> bool cgroup_failed;
> pthread_t pthread;
> + char *buf;
> };
>
> #define LOOPS_DEFAULT 1000000
> @@ -48,6 +50,7 @@ static int loops = LOOPS_DEFAULT;
> static bool threaded;
>
> static bool nonblocking;
> +static unsigned int write_size = sizeof(int);
> static char *cgrp_names[2];
> static struct cgroup *cgrps[2];
>
> @@ -88,6 +91,8 @@ static const struct option options[] = {
> OPT_BOOLEAN('n', "nonblocking", &nonblocking, "Use non-blocking operations"),
> OPT_INTEGER('l', "loop", &loops, "Specify number of loops"),
> OPT_BOOLEAN('T', "threaded", &threaded, "Specify threads/process based task setup"),
> + OPT_UINTEGER('s', "write-size", &write_size,
> + "Bytes per ping-pong write (default 4-bytes). Use larger values to exercise the pipe page-allocation path."),
> OPT_CALLBACK('G', "cgroups", NULL, "SEND,RECV",
> "Put sender and receivers in given cgroups",
> parse_two_cgroups),
> @@ -170,25 +175,57 @@ static void exit_cgroup(int nr)
> free(cgrp_names[nr]);
> }
>
> +/*
> + * Loop on short read()/write(): the kernel may return fewer bytes than
> + * requested, and in non-blocking mode the writer can transiently hit
> + * EWOULDBLOCK while the peer is still draining a full pipe (capacity is
> + * sized to write_size).
> + */
> +static inline int write_pipe(struct thread_data *td)
> +{
> + unsigned int done = 0;
> + int ret;
> +
> + while (done < write_size) {
> + ret = write(td->pipe_write, td->buf + done, write_size - done);
> + if (ret < 0) {
> + if (nonblocking && errno == EWOULDBLOCK)
> + continue;
Don't we also need the blocking part?
> + return ret;
> + }
> + done += ret;
> + }
> + return done;
> +}
> +
> static inline int read_pipe(struct thread_data *td)
> {
> - int ret, m;
> -retry:
> - if (nonblocking) {
> - ret = epoll_wait(td->epoll_fd, &td->epoll_ev, 1, -1);
> - if (ret < 0)
> + unsigned int done = 0;
> + int ret;
> +
> + while (done < write_size) {
> + if (nonblocking) {
> + ret = epoll_wait(td->epoll_fd, &td->epoll_ev, 1, -1);
> + if (ret < 0)
> + return ret;
> + }
> + ret = read(td->pipe_read, td->buf + done, write_size - done);
> + if (ret < 0) {
> + if (nonblocking && errno == EWOULDBLOCK)
> + continue;
> + return ret;
> + }
> + if (ret == 0)
> return ret;
Maybe it doesn't matter.. but shouldn't it return 'done' instead?
> + done += ret;
> }
> - ret = read(td->pipe_read, &m, sizeof(int));
> - if (nonblocking && ret < 0 && errno == EWOULDBLOCK)
> - goto retry;
> - return ret;
> + return done;
> }
>
> static void *worker_thread(void *__tdata)
> {
> struct thread_data *td = __tdata;
> - int i, ret, m = 0;
> + int i, ret;
>
> ret = enter_cgroup(td->nr);
> if (ret < 0) {
> @@ -204,15 +241,38 @@ static void *worker_thread(void *__tdata)
> }
>
> for (i = 0; i < loops; i++) {
> - ret = write(td->pipe_write, &m, sizeof(int));
> - BUG_ON(ret != sizeof(int));
> + ret = write_pipe(td);
> + BUG_ON(ret < 0 || (unsigned int)ret != write_size);
> ret = read_pipe(td);
> - BUG_ON(ret != sizeof(int));
> + BUG_ON(ret < 0 || (unsigned int)ret != write_size);
Nit: maybe comparing to the write_size is enough as it cannot be
negative or bigger than INTMAX.
Thanks,
Namhyung
> }
>
> return NULL;
> }
>
> +/*
> + * On a custom write_size, resize the pipes so a single payload fits.
> + */
> +static int resize_pipes(int wfd1, int wfd2)
> +{
> + int r1, r2;
> +
> + if (write_size <= sizeof(int))
> + return 0;
> +
> + r1 = fcntl(wfd1, F_SETPIPE_SZ, write_size);
> + r2 = fcntl(wfd2, F_SETPIPE_SZ, write_size);
> + if (r1 < 0 || r2 < 0 ||
> + (unsigned int)r1 < write_size ||
> + (unsigned int)r2 < write_size) {
> + fprintf(stderr,
> + "--write-size %u exceeds /proc/sys/fs/pipe-max-size\n",
> + write_size);
> + return -1;
> + }
> + return 0;
> +}
> +
> int bench_sched_pipe(int argc, const char **argv)
> {
> struct thread_data threads[2] = {};
> @@ -233,12 +293,25 @@ int bench_sched_pipe(int argc, const char **argv)
>
> argc = parse_options(argc, argv, options, bench_sched_pipe_usage, 0);
>
> + if (write_size == 0 || write_size > INT_MAX) {
> + fprintf(stderr, "--write-size must be in 1..%d\n", INT_MAX);
> + return -1;
> + }
> +
> if (nonblocking)
> flags |= O_NONBLOCK;
>
> BUG_ON(pipe2(pipe_1, flags));
> BUG_ON(pipe2(pipe_2, flags));
>
> + if (resize_pipes(pipe_1[1], pipe_2[1]) < 0)
> + return -1;
> +
> + for (t = 0; t < nr_threads; t++) {
> + threads[t].buf = calloc(1, write_size);
> + BUG_ON(!threads[t].buf);
> + }
> +
> gettimeofday(&start, NULL);
>
> for (t = 0; t < nr_threads; t++) {
> @@ -287,6 +360,9 @@ int bench_sched_pipe(int argc, const char **argv)
> gettimeofday(&stop, NULL);
> timersub(&stop, &start, &diff);
>
> + for (t = 0; t < nr_threads; t++)
> + free(threads[t].buf);
> +
> exit_cgroup(0);
> exit_cgroup(1);
>
>
> ---
> base-commit: e7e28506af98ce4e1059e5ec59334b335c00a246
> change-id: 20260515-perf_bench_pipe-bae2ec777c4b
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
next prev parent reply other threads:[~2026-05-27 17:31 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-27 13:42 [PATCH v3] perf bench: add --write-size option to sched pipe Breno Leitao
2026-05-27 14:07 ` sashiko-bot
2026-05-27 17:31 ` Namhyung Kim [this message]
2026-06-01 11:10 ` Breno Leitao
2026-06-03 5:22 ` Namhyung Kim
2026-06-03 10:25 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahcqXqbjNlU-viPj@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=jolsa@kernel.org \
--cc=kernel-team@meta.com \
--cc=leitao@debian.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.