* Bug with splice to a pipe preventing a process exit
@ 2025-01-22 2:08 Kir Kolyshkin
2025-01-22 13:31 ` Christian Brauner
0 siblings, 1 reply; 4+ messages in thread
From: Kir Kolyshkin @ 2025-01-22 2:08 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara
Cc: Kir Kolyshkin, linux-fsdevel, Aleksa Sarai, Andrei Vagin
While checking if the tool I'm co-maintaining [1] works OK when compiled
with the future release of golang (1.24, tested with go1.24rc2), I found
out it's not [2], and the issue is caused by Go using sendfile more [3].
I came up with the following simple reproducer:
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/sendfile.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <sys/un.h>
int main() {
int sks[2];
int pipefd[2];
if (pipe(pipefd) == -1) {
perror("pipe");
exit(1);
}
pid_t pid = fork();
if (pid == -1) {
perror("fork");
exit(1);
}
if (pid == 0) {
// Child process.
close(pipefd[1]); // Close write end.
// Minimal process that just exits after some time.
sleep(1);
_exit(0); // <-- The child hangs here.
}
// Parent process.
close(pipefd[0]); // Close read end.
printf("PID1=%d\n", getpid());
printf("PID2=%d\n", pid);
printf("ps -f -p $PID1,$PID2\n");
printf("sudo tail /proc/{$PID1,$PID2}/{stack,syscall}\n");
#ifdef TEST_USE_STDIN
int in_fd = STDIN_FILENO;
#else
socketpair(AF_UNIX, SOCK_STREAM, 0, sks);
int in_fd = sks[0];
#endif
// Copy from in_fd to pipe.
ssize_t ret = sendfile(pipefd[1], in_fd, 0, 1 << 22);
if (ret == -1) {
perror("sendfile");
}
// Wait for child
int status;
waitpid(pid, &status, 0);
close(pipefd[1]); // Close write end.
return 0;
}
To reproduce, compile and run the above code, and when it hangs (instead
of exiting), copy its output to a shell in another terminal. Here's what
I saw:
[kir@kir-tp1 linux]$ PID1=2174401
PID2=2174402
ps -f -p $PID1,$PID2
sudo tail /proc/{$PID1,$PID2}/{stack,syscall}
UID PID PPID C STIME TTY TIME CMD
kir 2174401 63304 0 17:34 pts/1 00:00:00 ./repro
kir 2174402 2174401 0 17:34 pts/1 00:00:00 [repro]
==> /proc/2174401/stack <==
[<0>] unix_stream_read_generic+0x792/0xc90
[<0>] unix_stream_splice_read+0x6f/0xb0
[<0>] splice_file_to_pipe+0x65/0xd0
[<0>] do_sendfile+0x176/0x440
[<0>] __x64_sys_sendfile64+0xb3/0xd0
[<0>] do_syscall_64+0x82/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
==> /proc/2174401/syscall <==
40 0x4 0x3 0x0 0x400000 0x64 0xfffffff9 0x7fff2ab3fc58 0x7f265ed6ca3e
==> /proc/2174402/stack <==
[<0>] pipe_release+0x1f/0x100
[<0>] __fput+0xde/0x2a0
[<0>] task_work_run+0x59/0x90
[<0>] do_exit+0x309/0xab0
[<0>] do_group_exit+0x30/0x80
[<0>] __x64_sys_exit_group+0x18/0x20
[<0>] x64_sys_call+0x14b4/0x14c0
[<0>] do_syscall_64+0x82/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
==> /proc/2174402/syscall <==
231 0x0 0xffffffffffffff88 0xe7 0x0 0x0 0x7f265eea01a0 0x7fff2ab3fc58 0x7f265ed43acd
Presumably, what happens here is the child process is stuck in the
exit_group syscall, being blocked by parent's splice which holds the
lock to the pipe (in splice_file_to_pipe).
To me, the code above looks valid, and the kernel behavior seems to
be a bug. In particular, if the process is exiting, the pipe it was
using is now being closed, and splice (or sendfile) should return.
If this is not a kernel bug, then the code above is not correct; in
this case, please suggest how to fix it.
Regards,
Kir.
----
[1]: https://github.com/opencontainers/runc
[2]: https://github.com/opencontainers/runc/pull/4598
[3]: https://go-review.googlesource.com/c/go/+/603295
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug with splice to a pipe preventing a process exit
2025-01-22 2:08 Bug with splice to a pipe preventing a process exit Kir Kolyshkin
@ 2025-01-22 13:31 ` Christian Brauner
2025-01-23 2:30 ` Kirill Kolyshkin
2025-01-28 16:36 ` Andrei Vagin
0 siblings, 2 replies; 4+ messages in thread
From: Christian Brauner @ 2025-01-22 13:31 UTC (permalink / raw)
To: Kir Kolyshkin, Jens Axboe
Cc: Alexander Viro, Jan Kara, linux-fsdevel, Aleksa Sarai,
Andrei Vagin
On Tue, Jan 21, 2025 at 06:08:41PM -0800, Kir Kolyshkin wrote:
> While checking if the tool I'm co-maintaining [1] works OK when compiled
> with the future release of golang (1.24, tested with go1.24rc2), I found
> out it's not [2], and the issue is caused by Go using sendfile more [3].
>
> I came up with the following simple reproducer:
>
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/sendfile.h>
> #include <sys/wait.h>
> #include <sys/socket.h>
> #include <sys/un.h>
>
> int main() {
> int sks[2];
> int pipefd[2];
> if (pipe(pipefd) == -1) {
> perror("pipe");
> exit(1);
> }
>
> pid_t pid = fork();
> if (pid == -1) {
> perror("fork");
> exit(1);
> }
>
> if (pid == 0) {
> // Child process.
> close(pipefd[1]); // Close write end.
>
> // Minimal process that just exits after some time.
> sleep(1);
>
> _exit(0); // <-- The child hangs here.
> }
>
> // Parent process.
> close(pipefd[0]); // Close read end.
>
> printf("PID1=%d\n", getpid());
> printf("PID2=%d\n", pid);
> printf("ps -f -p $PID1,$PID2\n");
> printf("sudo tail /proc/{$PID1,$PID2}/{stack,syscall}\n");
>
> #ifdef TEST_USE_STDIN
> int in_fd = STDIN_FILENO;
> #else
> socketpair(AF_UNIX, SOCK_STREAM, 0, sks);
> int in_fd = sks[0];
> #endif
> // Copy from in_fd to pipe.
> ssize_t ret = sendfile(pipefd[1], in_fd, 0, 1 << 22);
> if (ret == -1) {
> perror("sendfile");
> }
>
> // Wait for child
> int status;
> waitpid(pid, &status, 0);
>
> close(pipefd[1]); // Close write end.
> return 0;
> }
>
> To reproduce, compile and run the above code, and when it hangs (instead
> of exiting), copy its output to a shell in another terminal. Here's what
> I saw:
>
> [kir@kir-tp1 linux]$ PID1=2174401
> PID2=2174402
> ps -f -p $PID1,$PID2
> sudo tail /proc/{$PID1,$PID2}/{stack,syscall}
> UID PID PPID C STIME TTY TIME CMD
> kir 2174401 63304 0 17:34 pts/1 00:00:00 ./repro
> kir 2174402 2174401 0 17:34 pts/1 00:00:00 [repro]
> ==> /proc/2174401/stack <==
> [<0>] unix_stream_read_generic+0x792/0xc90
> [<0>] unix_stream_splice_read+0x6f/0xb0
> [<0>] splice_file_to_pipe+0x65/0xd0
> [<0>] do_sendfile+0x176/0x440
> [<0>] __x64_sys_sendfile64+0xb3/0xd0
> [<0>] do_syscall_64+0x82/0x160
> [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> ==> /proc/2174401/syscall <==
> 40 0x4 0x3 0x0 0x400000 0x64 0xfffffff9 0x7fff2ab3fc58 0x7f265ed6ca3e
>
> ==> /proc/2174402/stack <==
> [<0>] pipe_release+0x1f/0x100
> [<0>] __fput+0xde/0x2a0
> [<0>] task_work_run+0x59/0x90
> [<0>] do_exit+0x309/0xab0
> [<0>] do_group_exit+0x30/0x80
> [<0>] __x64_sys_exit_group+0x18/0x20
> [<0>] x64_sys_call+0x14b4/0x14c0
> [<0>] do_syscall_64+0x82/0x160
> [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> ==> /proc/2174402/syscall <==
> 231 0x0 0xffffffffffffff88 0xe7 0x0 0x0 0x7f265eea01a0 0x7fff2ab3fc58 0x7f265ed43acd
>
> Presumably, what happens here is the child process is stuck in the
> exit_group syscall, being blocked by parent's splice which holds the
> lock to the pipe (in splice_file_to_pipe).
Splice is notoriously problematic when interacting with pipes due to how
it holds the pipe lock. We've had handwavy discussions how to improve
this but nothing ever materialized.
The gist here seems to me that unix_stream_read_generic() is waiting on
data to read from the write-side of the socketpair(). Until you close
that fd or provide data you'll simply hang forever.
Similar with STDIN_FILENO fwiw. If you never enter any character you
simply hang forever waiting for input.
So imho the way the program is written is buggy.
But Jens might be able to provide more details.
>
> To me, the code above looks valid, and the kernel behavior seems to
> be a bug. In particular, if the process is exiting, the pipe it was
> using is now being closed, and splice (or sendfile) should return.
>
> If this is not a kernel bug, then the code above is not correct; in
> this case, please suggest how to fix it.
>
> Regards,
> Kir.
>
> ----
> [1]: https://github.com/opencontainers/runc
> [2]: https://github.com/opencontainers/runc/pull/4598
> [3]: https://go-review.googlesource.com/c/go/+/603295
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug with splice to a pipe preventing a process exit
2025-01-22 13:31 ` Christian Brauner
@ 2025-01-23 2:30 ` Kirill Kolyshkin
2025-01-28 16:36 ` Andrei Vagin
1 sibling, 0 replies; 4+ messages in thread
From: Kirill Kolyshkin @ 2025-01-23 2:30 UTC (permalink / raw)
To: Christian Brauner
Cc: Jens Axboe, Alexander Viro, Jan Kara, linux-fsdevel, Aleksa Sarai,
Andrei Vagin
On Wed, Jan 22, 2025 at 5:31 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Jan 21, 2025 at 06:08:41PM -0800, Kir Kolyshkin wrote:
> > While checking if the tool I'm co-maintaining [1] works OK when compiled
> > with the future release of golang (1.24, tested with go1.24rc2), I found
> > out it's not [2], and the issue is caused by Go using sendfile more [3].
> >
> > I came up with the following simple reproducer:
> >
> > #define _GNU_SOURCE
> > #include <fcntl.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <sys/sendfile.h>
> > #include <sys/wait.h>
> > #include <sys/socket.h>
> > #include <sys/un.h>
> >
> > int main() {
> > int sks[2];
> > int pipefd[2];
> > if (pipe(pipefd) == -1) {
> > perror("pipe");
> > exit(1);
> > }
> >
> > pid_t pid = fork();
> > if (pid == -1) {
> > perror("fork");
> > exit(1);
> > }
> >
> > if (pid == 0) {
> > // Child process.
> > close(pipefd[1]); // Close write end.
> >
> > // Minimal process that just exits after some time.
> > sleep(1);
> >
> > _exit(0); // <-- The child hangs here.
> > }
> >
> > // Parent process.
> > close(pipefd[0]); // Close read end.
> >
> > printf("PID1=%d\n", getpid());
> > printf("PID2=%d\n", pid);
> > printf("ps -f -p $PID1,$PID2\n");
> > printf("sudo tail /proc/{$PID1,$PID2}/{stack,syscall}\n");
> >
> > #ifdef TEST_USE_STDIN
> > int in_fd = STDIN_FILENO;
> > #else
> > socketpair(AF_UNIX, SOCK_STREAM, 0, sks);
> > int in_fd = sks[0];
> > #endif
> > // Copy from in_fd to pipe.
> > ssize_t ret = sendfile(pipefd[1], in_fd, 0, 1 << 22);
> > if (ret == -1) {
> > perror("sendfile");
> > }
> >
> > // Wait for child
> > int status;
> > waitpid(pid, &status, 0);
> >
> > close(pipefd[1]); // Close write end.
> > return 0;
> > }
> >
> > To reproduce, compile and run the above code, and when it hangs (instead
> > of exiting), copy its output to a shell in another terminal. Here's what
> > I saw:
> >
> > [kir@kir-tp1 linux]$ PID1=2174401
> > PID2=2174402
> > ps -f -p $PID1,$PID2
> > sudo tail /proc/{$PID1,$PID2}/{stack,syscall}
> > UID PID PPID C STIME TTY TIME CMD
> > kir 2174401 63304 0 17:34 pts/1 00:00:00 ./repro
> > kir 2174402 2174401 0 17:34 pts/1 00:00:00 [repro]
> > ==> /proc/2174401/stack <==
> > [<0>] unix_stream_read_generic+0x792/0xc90
> > [<0>] unix_stream_splice_read+0x6f/0xb0
> > [<0>] splice_file_to_pipe+0x65/0xd0
> > [<0>] do_sendfile+0x176/0x440
> > [<0>] __x64_sys_sendfile64+0xb3/0xd0
> > [<0>] do_syscall_64+0x82/0x160
> > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > ==> /proc/2174401/syscall <==
> > 40 0x4 0x3 0x0 0x400000 0x64 0xfffffff9 0x7fff2ab3fc58 0x7f265ed6ca3e
> >
> > ==> /proc/2174402/stack <==
> > [<0>] pipe_release+0x1f/0x100
> > [<0>] __fput+0xde/0x2a0
> > [<0>] task_work_run+0x59/0x90
> > [<0>] do_exit+0x309/0xab0
> > [<0>] do_group_exit+0x30/0x80
> > [<0>] __x64_sys_exit_group+0x18/0x20
> > [<0>] x64_sys_call+0x14b4/0x14c0
> > [<0>] do_syscall_64+0x82/0x160
> > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > ==> /proc/2174402/syscall <==
> > 231 0x0 0xffffffffffffff88 0xe7 0x0 0x0 0x7f265eea01a0 0x7fff2ab3fc58 0x7f265ed43acd
> >
> > Presumably, what happens here is the child process is stuck in the
> > exit_group syscall, being blocked by parent's splice which holds the
> > lock to the pipe (in splice_file_to_pipe).
>
> Splice is notoriously problematic when interacting with pipes due to how
> it holds the pipe lock. We've had handwavy discussions how to improve
> this but nothing ever materialized.
>
> The gist here seems to me that unix_stream_read_generic() is waiting on
> data to read from the write-side of the socketpair(). Until you close
> that fd or provide data you'll simply hang forever.
My thinking is splice should also return upon closing the other end of
the pipe it should writes to (i.e. pipefd[0] which the child is supposed
to read from), as the pipe consumer is gone. The program above
does just that -- it tries to close the end of the pipe it's supposed to
read from (implicitly, upon exit) -- alas, the close is blocked by that
very splice.
The parent could also close the reading fd (in_fd), as per your
suggestion, but it makes sense to do so only once the child has
exited -- and again, it's not possible because splice causes it
to block during exit, and so the parent couldn't know the child
is done. Surely, we can add another mechanism so the child
can tell the parent that it's about done.
In real life, the parent is "runc exec" and the child is any binary
executed in a container. Splice is used to forward data from
runc's stdin to that container process. It looks like it's impossible
to use splice in this scenario.
From a (perhaps overly naive) userspace perspective, a pipe has
two ends (file descriptors), and using one should not block the other
from closing.
> Similar with STDIN_FILENO fwiw. If you never enter any character you
> simply hang forever waiting for input.
>
> So imho the way the program is written is buggy.
Alas, nothing in the splice(2) man page prepared me for this
(and I guess it is not an easy task to explain this limitation in there).
I thought about ways to make it work, while still using splice, and
taking into account the child execs an arbitrary binary, but could
not come up with anything not overly complicated.
I still hope this can be fixed in the kernel (by using some finer-grain
locking maybe), but for now, I guess, Go's io.Copy should stop using
sendfile if either file descriptor refers to a pipe.
> But Jens might be able to provide more details.
>
> >
> > To me, the code above looks valid, and the kernel behavior seems to
> > be a bug. In particular, if the process is exiting, the pipe it was
> > using is now being closed, and splice (or sendfile) should return.
> >
> > If this is not a kernel bug, then the code above is not correct; in
> > this case, please suggest how to fix it.
> >
> > Regards,
> > Kir.
> >
> > ----
> > [1]: https://github.com/opencontainers/runc
> > [2]: https://github.com/opencontainers/runc/pull/4598
> > [3]: https://go-review.googlesource.com/c/go/+/603295
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug with splice to a pipe preventing a process exit
2025-01-22 13:31 ` Christian Brauner
2025-01-23 2:30 ` Kirill Kolyshkin
@ 2025-01-28 16:36 ` Andrei Vagin
1 sibling, 0 replies; 4+ messages in thread
From: Andrei Vagin @ 2025-01-28 16:36 UTC (permalink / raw)
To: Christian Brauner
Cc: Kir Kolyshkin, Jens Axboe, Alexander Viro, Jan Kara,
linux-fsdevel, Aleksa Sarai
On Wed, Jan 22, 2025 at 02:31:35PM +0100, Christian Brauner wrote:
> On Tue, Jan 21, 2025 at 06:08:41PM -0800, Kir Kolyshkin wrote:
> > While checking if the tool I'm co-maintaining [1] works OK when compiled
> > with the future release of golang (1.24, tested with go1.24rc2), I found
> > out it's not [2], and the issue is caused by Go using sendfile more [3].
> >
> > I came up with the following simple reproducer:
> >
> > #define _GNU_SOURCE
> > #include <fcntl.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <sys/sendfile.h>
> > #include <sys/wait.h>
> > #include <sys/socket.h>
> > #include <sys/un.h>
> >
> > int main() {
> > int sks[2];
> > int pipefd[2];
> > if (pipe(pipefd) == -1) {
> > perror("pipe");
> > exit(1);
> > }
> >
> > pid_t pid = fork();
> > if (pid == -1) {
> > perror("fork");
> > exit(1);
> > }
> >
> > if (pid == 0) {
> > // Child process.
> > close(pipefd[1]); // Close write end.
> >
> > // Minimal process that just exits after some time.
> > sleep(1);
> >
> > _exit(0); // <-- The child hangs here.
> > }
> >
> > // Parent process.
> > close(pipefd[0]); // Close read end.
> >
> > printf("PID1=%d\n", getpid());
> > printf("PID2=%d\n", pid);
> > printf("ps -f -p $PID1,$PID2\n");
> > printf("sudo tail /proc/{$PID1,$PID2}/{stack,syscall}\n");
> >
> > #ifdef TEST_USE_STDIN
> > int in_fd = STDIN_FILENO;
> > #else
> > socketpair(AF_UNIX, SOCK_STREAM, 0, sks);
> > int in_fd = sks[0];
> > #endif
> > // Copy from in_fd to pipe.
> > ssize_t ret = sendfile(pipefd[1], in_fd, 0, 1 << 22);
> > if (ret == -1) {
> > perror("sendfile");
> > }
> >
> > // Wait for child
> > int status;
> > waitpid(pid, &status, 0);
> >
> > close(pipefd[1]); // Close write end.
> > return 0;
> > }
> >
> > To reproduce, compile and run the above code, and when it hangs (instead
> > of exiting), copy its output to a shell in another terminal. Here's what
> > I saw:
> >
> > [kir@kir-tp1 linux]$ PID1=2174401
> > PID2=2174402
> > ps -f -p $PID1,$PID2
> > sudo tail /proc/{$PID1,$PID2}/{stack,syscall}
> > UID PID PPID C STIME TTY TIME CMD
> > kir 2174401 63304 0 17:34 pts/1 00:00:00 ./repro
> > kir 2174402 2174401 0 17:34 pts/1 00:00:00 [repro]
> > ==> /proc/2174401/stack <==
> > [<0>] unix_stream_read_generic+0x792/0xc90
> > [<0>] unix_stream_splice_read+0x6f/0xb0
> > [<0>] splice_file_to_pipe+0x65/0xd0
> > [<0>] do_sendfile+0x176/0x440
> > [<0>] __x64_sys_sendfile64+0xb3/0xd0
> > [<0>] do_syscall_64+0x82/0x160
> > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > ==> /proc/2174401/syscall <==
> > 40 0x4 0x3 0x0 0x400000 0x64 0xfffffff9 0x7fff2ab3fc58 0x7f265ed6ca3e
> >
> > ==> /proc/2174402/stack <==
> > [<0>] pipe_release+0x1f/0x100
> > [<0>] __fput+0xde/0x2a0
> > [<0>] task_work_run+0x59/0x90
> > [<0>] do_exit+0x309/0xab0
> > [<0>] do_group_exit+0x30/0x80
> > [<0>] __x64_sys_exit_group+0x18/0x20
> > [<0>] x64_sys_call+0x14b4/0x14c0
> > [<0>] do_syscall_64+0x82/0x160
> > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > ==> /proc/2174402/syscall <==
> > 231 0x0 0xffffffffffffff88 0xe7 0x0 0x0 0x7f265eea01a0 0x7fff2ab3fc58 0x7f265ed43acd
> >
> > Presumably, what happens here is the child process is stuck in the
> > exit_group syscall, being blocked by parent's splice which holds the
> > lock to the pipe (in splice_file_to_pipe).
>
> Splice is notoriously problematic when interacting with pipes due to how
> it holds the pipe lock. We've had handwavy discussions how to improve
> this but nothing ever materialized.
>
> The gist here seems to me that unix_stream_read_generic() is waiting on
> data to read from the write-side of the socketpair(). Until you close
> that fd or provide data you'll simply hang forever.
>
> Similar with STDIN_FILENO fwiw. If you never enter any character you
> simply hang forever waiting for input.
>
> So imho the way the program is written is buggy.
> But Jens might be able to provide more details.
Christian,
This is just a reproducer for the problem. I can easily imagine a real
life use case where this issue will be triggered. Basically, we have
one task that calls sendfile or splice to the pipe and another task
reading from the other end of the pipe. Then something (like OOM) kills
the second task, but it doesn’t exit and it is stuck in an uncertain
state. IMHO, this is the kernel issue that has to be fixed.
Here is another issue with this pipe lock. One task calls `write` to a
pipe while another task is blocked in `sendfile` to the same pipe. The
second task holds the pipe lock and the first task is waiting for it in
the TASK_UNINTERRUPTIBLE state. It will be unkillable and it will be
impossible to attach to the first task with ptrace to find out what it
is doing... This looks like an even more critical issue.
One possible solution is splitting the splice process into two stages.
The first stage does splice in non-blocking mode. The second stage polls
file descriptors and re-executes the first stage when both are ready.
Polling doesn't take locks, and while a small race condition exists
between stages, any descriptor state change triggers just another
iteration.
Here is a draft patch just to demonstate the idea. There are still a few
things to be adjusted. For example, SPLICE_F_NONBLOCK doesn't gurante
that do_splice_read will not block. It works for sockets, but it doesn't
work for tty-s right now. I'd like to know if the overall idea looks
reasonable.
---
diff --git a/fs/internal.h b/fs/internal.h
index e7f02ae1e098..8022055fc997 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -254,8 +254,7 @@ int do_statx_fd(int fd, unsigned int flags, unsigned int mask,
/*
* fs/splice.c:
*/
-ssize_t splice_file_to_pipe(struct file *in,
- struct pipe_inode_info *opipe,
+ssize_t splice_file_to_pipe(struct file *in, struct file *out,
loff_t *offset,
size_t len, unsigned int flags);
diff --git a/fs/read_write.c b/fs/read_write.c
index a6133241dfb8..2f69039ec978 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1366,7 +1366,7 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
if (fd_file(out)->f_flags & O_NONBLOCK)
fl |= SPLICE_F_NONBLOCK;
- retval = splice_file_to_pipe(fd_file(in), opipe, &pos, count, fl);
+ retval = splice_file_to_pipe(fd_file(in), fd_file(out), &pos, count, fl);
}
if (retval > 0) {
diff --git a/fs/splice.c b/fs/splice.c
index 2898fa1e9e63..61fc419e87a9 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1282,18 +1282,75 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
struct pipe_inode_info *opipe,
size_t len, unsigned int flags);
-ssize_t splice_file_to_pipe(struct file *in,
- struct pipe_inode_info *opipe,
+struct splice_poll_wait {
+ poll_table pt;
+ wait_queue_entry_t wait;
+ wait_queue_head_t *wait_address;
+};
+
+static void spw_queue(struct file *file, wait_queue_head_t *wqh, poll_table *pt)
+{
+ struct splice_poll_wait *spw;
+
+ spw = container_of(pt, struct splice_poll_wait, pt);
+ spw->wait_address = wqh;
+ add_wait_queue(wqh, &spw->wait);
+}
+
+static int spw_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key)
+{
+ return default_wake_function(wait, mode, sync, key);
+}
+
+ssize_t splice_file_to_pipe(struct file *in, struct file *out,
loff_t *offset,
size_t len, unsigned int flags)
{
+ struct pipe_inode_info *opipe = get_pipe_info(out, true);
+ unsigned poll_flag = 0;
ssize_t ret;
+ if (file_can_poll(in) && !(in->f_flags & O_NONBLOCK) &&
+ !(flags & SPLICE_F_NONBLOCK))
+ poll_flag = SPLICE_F_NONBLOCK;
+
+try_again:
pipe_lock(opipe);
- ret = wait_for_space(opipe, flags);
+ ret = wait_for_space(opipe, flags | poll_flag);
if (!ret)
- ret = do_splice_read(in, offset, opipe, len, flags);
+ ret = do_splice_read(in, offset, opipe, len, flags | poll_flag);
pipe_unlock(opipe);
+
+ if (ret == -EAGAIN && poll_flag) {
+ struct splice_poll_wait spw_in, spw_out;
+ __poll_t in_events, out_events;
+
+ init_waitqueue_func_entry(&spw_in.wait, spw_wakeup);
+ spw_in.wait.private = current;
+ init_poll_funcptr(&spw_in.pt, spw_queue);
+
+ init_waitqueue_func_entry(&spw_out.wait, spw_wakeup);
+ spw_out.wait.private = current;
+ init_poll_funcptr(&spw_out.pt, spw_queue);
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ goto out;
+ }
+ in_events = vfs_poll(in, &spw_in.pt);
+ out_events = vfs_poll(out, &spw_out.pt);
+ if ((in_events & (EPOLLIN | POLLRDHUP | POLLERR)) == 0 ||
+ (out_events & (EPOLLOUT | POLLERR)) == 0) {
+ schedule();
+ }
+
+ remove_wait_queue(spw_in.wait_address, &spw_in.wait);
+ remove_wait_queue(spw_out.wait_address, &spw_out.wait);
+ __set_current_state(TASK_RUNNING);
+ goto try_again;
+ }
+out:
if (ret > 0)
wakeup_pipe_readers(opipe);
return ret;
@@ -1376,7 +1433,7 @@ ssize_t do_splice(struct file *in, loff_t *off_in, struct file *out,
if (out->f_flags & O_NONBLOCK)
flags |= SPLICE_F_NONBLOCK;
- ret = splice_file_to_pipe(in, opipe, &offset, len, flags);
+ ret = splice_file_to_pipe(in, out, &offset, len, flags);
if (!off_in)
in->f_pos = offset;
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-01-28 16:36 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-22 2:08 Bug with splice to a pipe preventing a process exit Kir Kolyshkin
2025-01-22 13:31 ` Christian Brauner
2025-01-23 2:30 ` Kirill Kolyshkin
2025-01-28 16:36 ` Andrei Vagin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox