linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: "Ahelenia Ziemiańska" <nabijaczleweli@nabijaczleweli.xyz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Howells <dhowells@redhat.com>, Jens Axboe <axboe@kernel.dk>
Subject: Re: Pending splice(file -> FIFO) always blocks read(FIFO), regardless of O_NONBLOCK on read side?
Date: Mon, 26 Jun 2023 11:32:16 +0200	[thread overview]
Message-ID: <20230626-vorverlegen-setzen-c7f96e10df34@brauner> (raw)
In-Reply-To: <qk6hjuam54khlaikf2ssom6custxf5is2ekkaequf4hvode3ls@zgf7j5j4ubvw>

On Mon, Jun 26, 2023 at 03:12:09AM +0200, Ahelenia Ziemiańska wrote:
> Hi! (starting with get_maintainers.pl fs/splice.c,
>      idk if that's right though)
> 
> Per fs/splice.c:
>  * The traditional unix read/write is extended with a "splice()" operation
>  * that transfers data buffers to or from a pipe buffer.
> so I expect splice() to work just about the same as read()/write()
> (and, to a large extent, it does so).
> 
> Thus, a refresher on pipe read() semantics
> (quoting Issue 8 Draft 3; Linux when writing with write()):
> 60746  When attempting to read from an empty pipe or FIFO:
> 60747  • If no process has the pipe open for writing, read( ) shall return 0 to indicate end-of-file.
> 60748  • If some process has the pipe open for writing and O_NONBLOCK is set, read( ) shall return
> 60749    −1 and set errno to [EAGAIN].
> 60750  • If some process has the pipe open for writing and O_NONBLOCK is clear, read( ) shall
> 60751    block the calling thread until some data is written or the pipe is closed by all processes that
> 60752    had the pipe open for writing.
> 
> However, I've observed that this is not the case when splicing from
> something that sleeps on read to a pipe, and that in that case all
> readers block, /including/ ones that are reading from fds with
> O_NONBLOCK set!
> 
> As an example, consider these two programs:
> -- >8 --
> // wr.c
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <stdio.h>
> int main() {
>   while (splice(0, 0, 1, 0, 128 * 1024 * 1024, 0) > 0)
>     ;
>   fprintf(stderr, "wr: %m\n");
> }
> -- >8 --
> 
> -- >8 --
> // rd.c
> #define _GNU_SOURCE
> #include <errno.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <unistd.h>
> int main() {
>   fcntl(0, F_SETFL, fcntl(0, F_GETFL) | O_NONBLOCK);
> 
>   char buf[64 * 1024] = {};
>   for (ssize_t rd;;) {
> #if 1
>     while ((rd = read(0, buf, sizeof(buf))) == -1 && errno == EINTR)
>       ;
> #else
>     while ((rd = splice(0, 0, 1, 0, 128 * 1024 * 1024, 0)) == -1 &&
>            errno == EINTR)
>       ;
> #endif
>     fprintf(stderr, "rd=%zd: %m\n", rd);
>     write(1, buf, rd);
> 
>     errno = 0;
>     sleep(1);
>   }
> }
> -- >8 --
> 
> Thus:
> -- >8 --
> a$ make rd wr
> a$ mkfifo fifo
> a$ ./rd < fifo                           b$ echo qwe > fifo
> rd=4: Success
> qwe
> rd=0: Success
> rd=0: Success                            b$ sleep 2 > fifo
> rd=-1: Resource temporarily unavailable
> rd=-1: Resource temporarily unavailable
> rd=0: Success
> rd=0: Success                            
> rd=-1: Resource temporarily unavailable  b$ /bin/cat > fifo
> rd=-1: Resource temporarily unavailable
> rd=4: Success                            abc
> abc
> rd=-1: Resource temporarily unavailable
> rd=4: Success                            def
> def
> rd=0: Success                            ^D
> rd=0: Success
> rd=0: Success                            b$ ./wr > fifo
> -- >8 --
> and nothing. Until you actually type a line (or a few) into teletype b
> so that the splice completes, at which point so does the read.
> 
> An even simpler case is 
> -- >8 --
> $ ./wr | ./rd
> abc
> def
> rd=8: Success
> abc
> def
> ghi
> jkl
> rd=8: Success
> ghi
> jkl
> ^D
> wr: Success
> rd=-1: Resource temporarily unavailable
> rd=0: Success
> rd=0: Success
> -- >8 --
> 
> splice flags don't do anything.
> Tested on bookworm (6.1.27-1) and Linus' HEAD (v6.4-rc7-234-g547cc9be86f4).
> 
> You could say this is a "denial of service", since this is a valid
> way of following pipes (and, sans SIGIO, the only portable one),

splice() may block for any of the two file descriptors if they don't
have O_NONBLOCK set even if SPLICE_F_NONBLOCK is raised.

SPLICE_F_NONBLOCK in splice_file_to_pipe() is only relevant if the pipe
is full. If the pipe isn't full then the write is attempted. That of
course involves reading the data to splice from the source file. If the
source file isn't O_NONBLOCK that read may block holding pipe_lock().

If you raise O_NONBLOCK on the source fd in wr.c then your problems go
away. This is pretty long-standing behavior. Splice would have to be
refactored to not rely on pipe_lock(). That's likely major work with a
good portion of regressions if the past is any indication.

If you need that ability to fully async read from a pipe with splice
rn then io_uring will at least allow you to punt that read into an async
worker thread afaict.

  reply	other threads:[~2023-06-26  9:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26  1:12 Pending splice(file -> FIFO) always blocks read(FIFO), regardless of O_NONBLOCK on read side? Ahelenia Ziemiańska
2023-06-26  9:32 ` Christian Brauner [this message]
2023-06-26 11:59   ` Pending splice(file -> FIFO) excludes all other FIFO operations forever (was: ... always blocks read(FIFO), regardless of O_NONBLOCK on read side?) Ahelenia Ziemiańska
2023-06-26 15:56     ` Christian Brauner
2023-06-26 16:14       ` Ahelenia Ziemiańska
2023-07-06 21:56         ` Linus Torvalds
2023-07-07 17:21           ` Christian Brauner
2023-07-07 19:10             ` Linus Torvalds
2023-07-07 19:57               ` Jens Axboe
2023-07-07 22:41               ` Ahelenia Ziemiańska
2023-07-07 22:57                 ` Linus Torvalds
2023-07-08  0:30                   ` Ahelenia Ziemiańska
2023-07-08 20:06                     ` Linus Torvalds
2023-07-09  1:03                       ` Ahelenia Ziemiańska
2023-07-09 22:33                         ` Ahelenia Ziemiańska
2023-07-10 13:22                           ` Ahelenia Ziemiańska
2023-07-08  0:00           ` Matthew Wilcox
2023-07-08  0:07             ` Linus Torvalds
2023-07-08  0:21               ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230626-vorverlegen-setzen-c7f96e10df34@brauner \
    --to=brauner@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nabijaczleweli@nabijaczleweli.xyz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).