From: Oleg Nesterov <oleg@redhat.com>
To: "Sapkal, Swapnil" <swapnil.sapkal@amd.com>
Cc: Manfred Spraul <manfred@colorfullife.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Christian Brauner <brauner@kernel.org>,
David Howells <dhowells@redhat.com>,
WangYuli <wangyuli@uniontech.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
K Prateek Nayak <kprateek.nayak@amd.com>,
"Shenoy, Gautham Ranjal" <gautham.shenoy@amd.com>,
Neeraj.Upadhyay@amd.com
Subject: Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still full
Date: Mon, 24 Feb 2025 15:24:32 +0100 [thread overview]
Message-ID: <20250224142329.GA19016@redhat.com> (raw)
In-Reply-To: <e813814e-7094-4673-bc69-731af065a0eb@amd.com>
Hi Sapkal,
On 02/24, Sapkal, Swapnil wrote:
>
> We saw hang in hackbench in our weekly regression testing on mainline
> kernel. The bisect pointed to this commit.
OMG. This patch caused a lot of "hackbench performance degradation" reports,
but hang??
Just in case, did you use
https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git/tree/src/hackbench/hackbench.c
?
OK, I gave up ;) I'll send the revert patch tomorrow (can't do this today)
even if I still don't see how this patch can be wrong.
> Whenever I compare the case where was_full would have been set but
> wake_writer was not set, I see the following pattern:
>
> ret = 100 (Read was successful)
> pipe_full() = 1
> total_len = 0
> buf->len != 0
>
> total_len is computed using iov_iter_count() while the buf->len is the
> length of the buffer corresponding to tail(pipe->bufs[tail & mask].len).
> Looking at pipe_write(), there seems to be a case where the writer can make
> progress when (chars && !was_empty) which only looks at iov_iter_count().
> Could it be the case that there is still room in the buffer but we are not
> waking up the writer?
I don't think so, but perhaps I am totally confused.
If the writer sleeps on pipe->wr_wait, it has already tried to write into
the pipe->bufs[head - 1] buffer before the sleep.
Yes, the reader can read from that buffer, but this won't make it more "writable"
for this particular writer, "PAGE_SIZE - buf->offset + buf->len" won't be changed.
I even wrote the test-case, let me quote my old email below.
Thanks,
Oleg.
--------------------------------------------------------------------------------
Meanwhile I wrote a stupid test-case below.
Without the patch
State: S (sleeping)
voluntary_ctxt_switches: 74
nonvoluntary_ctxt_switches: 5
State: S (sleeping)
voluntary_ctxt_switches: 4169
nonvoluntary_ctxt_switches: 5
finally release the buffer
wrote next char!
With the patch
State: S (sleeping)
voluntary_ctxt_switches: 74
nonvoluntary_ctxt_switches: 3
State: S (sleeping)
voluntary_ctxt_switches: 74
nonvoluntary_ctxt_switches: 3
finally release the buffer
wrote next char!
As you can see, without this patch pipe_read() wakes the writer up
4095 times for no reason, the writer burns a bit of CPU and blocks
again after wakeup until the last read(fd[0], &c, 1).
Oleg.
-------------------------------------------------------------------------------
#include <stdlib.h>
#include <unistd.h>
#include <assert.h>
#include <sys/ioctl.h>
#include <stdio.h>
#include <errno.h>
int main(void)
{
int fd[2], nb, cnt;
char cmd[1024], c;
assert(pipe(fd) == 0);
nb = 1; assert(ioctl(fd[1], FIONBIO, &nb) == 0);
while (write(fd[1], &c, 1) == 1);
assert(errno = -EAGAIN);
nb = 0; assert(ioctl(fd[1], FIONBIO, &nb) == 0);
// The pipe is full, the next write() will block.
sprintf(cmd, "grep -e State -e ctxt_switches /proc/%d/status", getpid());
if (!fork()) {
// wait until the parent sleeps in pipe_write()
usleep(10000);
system(cmd);
// trigger 4095 unnecessary wakeups
for (cnt = 0; cnt < 4095; ++cnt) {
assert(read(fd[0], &c, 1) == 1);
usleep(1000);
}
system(cmd);
// this should actually wake the writer
printf("finally release the buffer\n");
assert(read(fd[0], &c, 1) == 1);
return 0;
}
assert(write(fd[1], &c, 1) == 1);
printf("wrote next char!\n");
return 0;
}
next prev parent reply other threads:[~2025-02-24 14:25 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-02 14:07 [PATCH] pipe_read: don't wake up the writer if the pipe is still full Oleg Nesterov
2025-01-02 16:20 ` WangYuli
2025-01-02 16:46 ` Oleg Nesterov
2025-01-04 8:42 ` Christian Brauner
2025-01-31 9:49 ` K Prateek Nayak
2025-01-31 13:23 ` Oleg Nesterov
2025-01-31 20:06 ` Linus Torvalds
2025-02-02 17:01 ` Oleg Nesterov
2025-02-02 18:39 ` Linus Torvalds
2025-02-02 19:32 ` Oleg Nesterov
2025-02-04 11:17 ` Christian Brauner
2025-02-03 9:05 ` K Prateek Nayak
2025-02-04 13:49 ` Oleg Nesterov
2025-02-24 9:26 ` Sapkal, Swapnil
2025-02-24 14:24 ` Oleg Nesterov [this message]
2025-02-24 18:36 ` Linus Torvalds
2025-02-25 14:26 ` Oleg Nesterov
2025-02-25 11:57 ` Oleg Nesterov
2025-02-26 5:55 ` Sapkal, Swapnil
2025-02-26 11:38 ` Oleg Nesterov
2025-02-26 17:56 ` Sapkal, Swapnil
2025-02-26 18:12 ` Oleg Nesterov
2025-03-03 13:00 ` Alexey Gladkov
2025-03-03 15:46 ` K Prateek Nayak
2025-03-03 17:18 ` Alexey Gladkov
2025-02-26 13:18 ` Mateusz Guzik
2025-02-26 13:21 ` Mateusz Guzik
2025-02-26 17:16 ` Oleg Nesterov
2025-02-27 16:18 ` Sapkal, Swapnil
2025-02-27 16:34 ` Mateusz Guzik
2025-02-27 21:12 ` Oleg Nesterov
2025-02-28 5:58 ` Sapkal, Swapnil
2025-02-28 14:30 ` Oleg Nesterov
2025-02-28 16:33 ` Oleg Nesterov
2025-03-03 9:46 ` Sapkal, Swapnil
2025-03-03 14:37 ` Mateusz Guzik
2025-03-03 14:51 ` Mateusz Guzik
2025-03-03 15:31 ` K Prateek Nayak
2025-03-03 17:54 ` Mateusz Guzik
2025-03-03 18:11 ` Linus Torvalds
2025-03-03 18:33 ` Mateusz Guzik
2025-03-03 18:55 ` Linus Torvalds
2025-03-03 19:06 ` Mateusz Guzik
2025-03-03 20:27 ` Oleg Nesterov
2025-03-03 20:46 ` Linus Torvalds
2025-03-04 5:31 ` K Prateek Nayak
2025-03-04 6:32 ` Linus Torvalds
2025-03-04 12:54 ` Oleg Nesterov
2025-03-04 13:25 ` Oleg Nesterov
2025-03-04 18:28 ` Linus Torvalds
2025-03-04 22:11 ` Oleg Nesterov
2025-03-05 4:40 ` K Prateek Nayak
2025-03-05 4:52 ` Linus Torvalds
2025-03-04 13:51 ` [PATCH] fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex K Prateek Nayak
2025-03-04 18:36 ` Alexey Gladkov
2025-03-04 19:03 ` Linus Torvalds
2025-03-05 15:31 ` [PATCH] pipe_read: don't wake up the writer if the pipe is still full Rasmus Villemoes
2025-03-05 16:50 ` Linus Torvalds
2025-03-06 9:48 ` Rasmus Villemoes
2025-03-06 14:42 ` Rasmus Villemoes
2025-03-05 16:40 ` Linus Torvalds
2025-03-06 8:35 ` Rasmus Villemoes
2025-03-06 17:59 ` Linus Torvalds
2025-03-06 9:28 ` Rasmus Villemoes
2025-03-06 11:39 ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak
2025-03-06 11:39 ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak
2025-03-06 12:28 ` Oleg Nesterov
2025-03-06 15:26 ` K Prateek Nayak
2025-03-06 11:39 ` [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep() K Prateek Nayak
2025-03-06 11:39 ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak
2025-03-06 12:32 ` Oleg Nesterov
2025-03-06 12:41 ` Oleg Nesterov
2025-03-06 15:33 ` K Prateek Nayak
2025-03-06 18:04 ` Linus Torvalds
2025-03-06 14:27 ` Rasmus Villemoes
2025-03-03 18:32 ` [PATCH] pipe_read: don't wake up the writer if the pipe is still full K Prateek Nayak
2025-03-04 5:22 ` K Prateek Nayak
2025-03-03 16:49 ` Oleg Nesterov
2025-03-04 5:06 ` Hillf Danton
2025-03-04 5:35 ` K Prateek Nayak
2025-03-04 10:29 ` Hillf Danton
2025-03-04 12:34 ` Oleg Nesterov
2025-03-04 23:35 ` Hillf Danton
2025-03-04 23:49 ` Oleg Nesterov
2025-03-05 4:56 ` Hillf Danton
2025-03-05 11:44 ` Oleg Nesterov
2025-03-05 22:46 ` Hillf Danton
2025-03-06 9:30 ` Oleg Nesterov
2025-03-07 6:08 ` Hillf Danton
2025-03-07 6:24 ` K Prateek Nayak
2025-03-07 10:46 ` Hillf Danton
2025-03-07 11:29 ` Oleg Nesterov
2025-03-07 12:34 ` Oleg Nesterov
2025-03-07 23:56 ` Hillf Danton
2025-03-09 14:01 ` K Prateek Nayak
2025-03-09 17:02 ` Oleg Nesterov
2025-03-10 10:49 ` Hillf Danton
2025-03-10 11:09 ` Oleg Nesterov
2025-03-10 11:37 ` Hillf Danton
2025-03-10 12:43 ` Oleg Nesterov
2025-03-10 23:33 ` Hillf Danton
2025-03-11 0:26 ` Linus Torvalds
2025-03-11 6:54 ` Oleg Nesterov
[not found] ` <20250311112922.3342-1-hdanton@sina.com>
2025-03-11 11:53 ` Oleg Nesterov
2025-03-07 11:26 ` Oleg Nesterov
2025-02-27 12:50 ` Oleg Nesterov
2025-02-27 13:52 ` Oleg Nesterov
2025-02-27 15:59 ` Mateusz Guzik
2025-02-27 16:28 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250224142329.GA19016@redhat.com \
--to=oleg@redhat.com \
--cc=Neeraj.Upadhyay@amd.com \
--cc=brauner@kernel.org \
--cc=dhowells@redhat.com \
--cc=gautham.shenoy@amd.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=swapnil.sapkal@amd.com \
--cc=torvalds@linux-foundation.org \
--cc=wangyuli@uniontech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).