From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>,
Manfred Spraul <manfred@colorfullife.com>,
Christian Brauner <brauner@kernel.org>,
David Howells <dhowells@redhat.com>,
WangYuli <wangyuli@uniontech.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
"Gautham R. Shenoy" <gautham.shenoy@amd.com>,
Swapnil Sapkal <swapnil.sapkal@amd.com>,
Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Subject: Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still full
Date: Sun, 2 Feb 2025 18:01:32 +0100 [thread overview]
Message-ID: <20250202170131.GA6507@redhat.com> (raw)
In-Reply-To: <CAHk-=wioaHG2P0KH=1zP0Zy=CcQb_JxZrksSS2+-FwcptHtntw@mail.gmail.com>
On 01/31, Linus Torvalds wrote:
>
> On Fri, 31 Jan 2025 at 01:50, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> >
> > On my 3rd Generation EPYC system (2 x 64C/128T), I see that on reverting
> > the changes on the above mentioned commit, sched-messaging sees a
> > regression up until the 8 group case which contains 320 tasks, however
> > with 16 groups (640 tasks), the revert helps with performance.
>
> I suspect that the extra wakeups just end up perturbing timing, and
> then you just randomly get better performance on that particular
> test-case and machine.
>
> I'm not sure this is worth worrying about, unless there's a real load
> somewhere that shows this regression.
Well yes, but the problem is that people seem to believe that hackbench
is the "real" workload, even in the "overloaded" case...
And if we do care about performance... Could you look at the trivial patch
at the end? I don't think {a,c,m}time make any sense in the !fifo case, but
as you explained before they are visible to fstat() so we probably shouldn't
remove file_accessed/file_update_time unconditionally.
This patch does help if I change hackbench to uses pipe2(O_NOATIME) instead
of pipe(). And in fact it helps even in the simplest case:
static char buf[17 * 4096];
static struct timeval TW, TR;
int wr(int fd, int size)
{
int c, r;
struct timeval t0, t1;
gettimeofday(&t0, NULL);
for (c = 0; (r = write(fd, buf, size)) > 0; c += r);
gettimeofday(&t1, NULL);
timeradd(&TW, &t1, &TW);
timersub(&TW, &t0, &TW);
return c;
}
int rd(int fd, int size)
{
int c, r;
struct timeval t0, t1;
gettimeofday(&t0, NULL);
for (c = 0; (r = read(fd, buf, size)) > 0; c += r);
gettimeofday(&t1, NULL);
timeradd(&TR, &t1, &TR);
timersub(&TR, &t0, &TR);
return c;
}
int main(int argc, const char *argv[])
{
int fd[2], nb = 1, noat, loop, size;
assert(argc == 4);
noat = atoi(argv[1]) ? O_NOATIME : 0;
loop = atoi(argv[2]);
size = atoi(argv[3]);
assert(pipe2(fd, noat) == 0);
assert(ioctl(fd[0], FIONBIO, &nb) == 0);
assert(ioctl(fd[1], FIONBIO, &nb) == 0);
assert(size <= sizeof(buf));
while (loop--)
assert(wr(fd[1], size) == rd(fd[0], size));
printf("TW = %lu.%03lu\n", TW.tv_sec, TW.tv_usec/1000);
printf("TR = %lu.%03lu\n", TR.tv_sec, TR.tv_usec/1000);
return 0;
}
Now,
/# for i in 1 2 3; do /host/tmp/test 0 10000 100; done
TW = 7.692
TR = 5.704
TW = 7.930
TR = 5.858
TW = 7.685
TR = 5.697
/#
/# for i in 1 2 3; do /host/tmp/test 1 10000 100; done
TW = 6.432
TR = 4.533
TW = 6.612
TR = 4.638
TW = 6.409
TR = 4.523
Oleg.
---
diff --git a/fs/pipe.c b/fs/pipe.c
index a3f5fd7256e9..14b2c0f8b616 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1122,6 +1122,9 @@ int create_pipe_files(struct file **res, int flags)
}
}
+ if (flags & O_NOATIME)
+ inode->i_flags |= S_NOCMTIME;
+
f = alloc_file_pseudo(inode, pipe_mnt, "",
O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT)),
&pipefifo_fops);
@@ -1134,7 +1137,7 @@ int create_pipe_files(struct file **res, int flags)
f->private_data = inode->i_pipe;
f->f_pipe = 0;
- res[0] = alloc_file_clone(f, O_RDONLY | (flags & O_NONBLOCK),
+ res[0] = alloc_file_clone(f, O_RDONLY | (flags & (O_NONBLOCK | O_NOATIME)),
&pipefifo_fops);
if (IS_ERR(res[0])) {
put_pipe_info(inode, inode->i_pipe);
@@ -1154,7 +1157,7 @@ static int __do_pipe_flags(int *fd, struct file **files, int flags)
int error;
int fdw, fdr;
- if (flags & ~(O_CLOEXEC | O_NONBLOCK | O_DIRECT | O_NOTIFICATION_PIPE))
+ if (flags & ~(O_CLOEXEC | O_NONBLOCK | O_DIRECT | O_NOATIME | O_NOTIFICATION_PIPE))
return -EINVAL;
error = create_pipe_files(files, flags);
next prev parent reply other threads:[~2025-02-02 17:02 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-02 14:07 [PATCH] pipe_read: don't wake up the writer if the pipe is still full Oleg Nesterov
2025-01-02 16:20 ` WangYuli
2025-01-02 16:46 ` Oleg Nesterov
2025-01-04 8:42 ` Christian Brauner
2025-01-31 9:49 ` K Prateek Nayak
2025-01-31 13:23 ` Oleg Nesterov
2025-01-31 20:06 ` Linus Torvalds
2025-02-02 17:01 ` Oleg Nesterov [this message]
2025-02-02 18:39 ` Linus Torvalds
2025-02-02 19:32 ` Oleg Nesterov
2025-02-04 11:17 ` Christian Brauner
2025-02-03 9:05 ` K Prateek Nayak
2025-02-04 13:49 ` Oleg Nesterov
2025-02-24 9:26 ` Sapkal, Swapnil
2025-02-24 14:24 ` Oleg Nesterov
2025-02-24 18:36 ` Linus Torvalds
2025-02-25 14:26 ` Oleg Nesterov
2025-02-25 11:57 ` Oleg Nesterov
2025-02-26 5:55 ` Sapkal, Swapnil
2025-02-26 11:38 ` Oleg Nesterov
2025-02-26 17:56 ` Sapkal, Swapnil
2025-02-26 18:12 ` Oleg Nesterov
2025-03-03 13:00 ` Alexey Gladkov
2025-03-03 15:46 ` K Prateek Nayak
2025-03-03 17:18 ` Alexey Gladkov
2025-02-26 13:18 ` Mateusz Guzik
2025-02-26 13:21 ` Mateusz Guzik
2025-02-26 17:16 ` Oleg Nesterov
2025-02-27 16:18 ` Sapkal, Swapnil
2025-02-27 16:34 ` Mateusz Guzik
2025-02-27 21:12 ` Oleg Nesterov
2025-02-28 5:58 ` Sapkal, Swapnil
2025-02-28 14:30 ` Oleg Nesterov
2025-02-28 16:33 ` Oleg Nesterov
2025-03-03 9:46 ` Sapkal, Swapnil
2025-03-03 14:37 ` Mateusz Guzik
2025-03-03 14:51 ` Mateusz Guzik
2025-03-03 15:31 ` K Prateek Nayak
2025-03-03 17:54 ` Mateusz Guzik
2025-03-03 18:11 ` Linus Torvalds
2025-03-03 18:33 ` Mateusz Guzik
2025-03-03 18:55 ` Linus Torvalds
2025-03-03 19:06 ` Mateusz Guzik
2025-03-03 20:27 ` Oleg Nesterov
2025-03-03 20:46 ` Linus Torvalds
2025-03-04 5:31 ` K Prateek Nayak
2025-03-04 6:32 ` Linus Torvalds
2025-03-04 12:54 ` Oleg Nesterov
2025-03-04 13:25 ` Oleg Nesterov
2025-03-04 18:28 ` Linus Torvalds
2025-03-04 22:11 ` Oleg Nesterov
2025-03-05 4:40 ` K Prateek Nayak
2025-03-05 4:52 ` Linus Torvalds
2025-03-04 13:51 ` [PATCH] fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex K Prateek Nayak
2025-03-04 18:36 ` Alexey Gladkov
2025-03-04 19:03 ` Linus Torvalds
2025-03-05 15:31 ` [PATCH] pipe_read: don't wake up the writer if the pipe is still full Rasmus Villemoes
2025-03-05 16:50 ` Linus Torvalds
2025-03-06 9:48 ` Rasmus Villemoes
2025-03-06 14:42 ` Rasmus Villemoes
2025-03-05 16:40 ` Linus Torvalds
2025-03-06 8:35 ` Rasmus Villemoes
2025-03-06 17:59 ` Linus Torvalds
2025-03-06 9:28 ` Rasmus Villemoes
2025-03-06 11:39 ` [RFC PATCH 0/3] pipe: Convert pipe->{head,tail} to unsigned short K Prateek Nayak
2025-03-06 11:39 ` [RFC PATCH 1/3] fs/pipe: Limit the slots in pipe_resize_ring() K Prateek Nayak
2025-03-06 12:28 ` Oleg Nesterov
2025-03-06 15:26 ` K Prateek Nayak
2025-03-06 11:39 ` [RFC PATCH 2/3] fs/splice: Atomically read pipe->{head,tail} in opipe_prep() K Prateek Nayak
2025-03-06 11:39 ` [RFC PATCH 3/3] treewide: pipe: Convert all references to pipe->{head,tail,max_usage,ring_size} to unsigned short K Prateek Nayak
2025-03-06 12:32 ` Oleg Nesterov
2025-03-06 12:41 ` Oleg Nesterov
2025-03-06 15:33 ` K Prateek Nayak
2025-03-06 18:04 ` Linus Torvalds
2025-03-06 14:27 ` Rasmus Villemoes
2025-03-03 18:32 ` [PATCH] pipe_read: don't wake up the writer if the pipe is still full K Prateek Nayak
2025-03-04 5:22 ` K Prateek Nayak
2025-03-03 16:49 ` Oleg Nesterov
2025-03-04 5:06 ` Hillf Danton
2025-03-04 5:35 ` K Prateek Nayak
2025-03-04 10:29 ` Hillf Danton
2025-03-04 12:34 ` Oleg Nesterov
2025-03-04 23:35 ` Hillf Danton
2025-03-04 23:49 ` Oleg Nesterov
2025-03-05 4:56 ` Hillf Danton
2025-03-05 11:44 ` Oleg Nesterov
2025-03-05 22:46 ` Hillf Danton
2025-03-06 9:30 ` Oleg Nesterov
2025-03-07 6:08 ` Hillf Danton
2025-03-07 6:24 ` K Prateek Nayak
2025-03-07 10:46 ` Hillf Danton
2025-03-07 11:29 ` Oleg Nesterov
2025-03-07 12:34 ` Oleg Nesterov
2025-03-07 23:56 ` Hillf Danton
2025-03-09 14:01 ` K Prateek Nayak
2025-03-09 17:02 ` Oleg Nesterov
2025-03-10 10:49 ` Hillf Danton
2025-03-10 11:09 ` Oleg Nesterov
2025-03-10 11:37 ` Hillf Danton
2025-03-10 12:43 ` Oleg Nesterov
2025-03-10 23:33 ` Hillf Danton
2025-03-11 0:26 ` Linus Torvalds
2025-03-11 6:54 ` Oleg Nesterov
[not found] ` <20250311112922.3342-1-hdanton@sina.com>
2025-03-11 11:53 ` Oleg Nesterov
2025-03-07 11:26 ` Oleg Nesterov
2025-02-27 12:50 ` Oleg Nesterov
2025-02-27 13:52 ` Oleg Nesterov
2025-02-27 15:59 ` Mateusz Guzik
2025-02-27 16:28 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250202170131.GA6507@redhat.com \
--to=oleg@redhat.com \
--cc=Neeraj.Upadhyay@amd.com \
--cc=brauner@kernel.org \
--cc=dhowells@redhat.com \
--cc=gautham.shenoy@amd.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=swapnil.sapkal@amd.com \
--cc=torvalds@linux-foundation.org \
--cc=wangyuli@uniontech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).