linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Christian Brauner <brauner@kernel.org>
Cc: "Ahelenia Ziemiańska" <nabijaczleweli@nabijaczleweli.xyz>,
	"David Howells" <dhowells@redhat.com>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Pending splice(file -> FIFO) excludes all other FIFO operations forever (was: ... always blocks read(FIFO), regardless of O_NONBLOCK on read side?)
Date: Fri, 7 Jul 2023 13:57:06 -0600	[thread overview]
Message-ID: <42294e31-9cc8-3c5a-c28f-cfa3854fbe69@kernel.dk> (raw)
In-Reply-To: <CAHk-=whHXogGiPkGFwQQBtn364M4caVNcBTs7hLNfa_X67ouzA@mail.gmail.com>

On 7/7/23 1:10?PM, Linus Torvalds wrote:
> On Fri, 7 Jul 2023 at 10:21, Christian Brauner <brauner@kernel.org> wrote:
>>
>> Forgot to say, fwiw, I've been running this through the LTP splice,
>> pipe, and ipc tests without issues. A hanging reader can be signaled
>> away cleanly with this.
> 
> So that patch still has a couple of "wait for this" cases remaining.
> 
> In particular, when we do a read, and we do have pipe buffers, both
> the read() system call and a number of internal splice functions will
> go "Ahh, I have data", and then do pipe_buf_confirm() and read it.
> 
> Which then results in pipe_buf_confirm() blocking. It now blocks
> interruptibly, which is much nicer, but several of these users *could*
> just do a non-blocking confirmation instead, and wait for pipe
> readability.
> 
> HOWEVER, that's slightly less trivial than you'd expect, because the
> "wait for readability" needs to be done without the pipe lock held -
> so you can't actually check the pipe buffer state at that point (since
> you need the pipe lock to look up the buffer).
> 
> That's true even of "trivial" cases like actual user-space "read()
> with O_NONBLOCK and poll()" situations.
> 
> Now, the solution to all this is *fairly* straightforward:
> 
>  (a) don't use "!pipe_empty()" for a readability check.
> 
>      We already have "pipe_readable()", but it's hidden in fs/pipe.c,
> so all the splice() code ended up writing the "does this pipe have
> data" using "!pipe_empty()" instead.
> 
>  (b) make "pipe_buf_confirm()" take a "non-blocking" boolean argument,
> and if it is non-blocking but hits one of those blocked pages, set
> "pipe->not_ready", and return -EAGAIN.
> 
>      This is ok, because "pipe_buf_confirm()" is always under the pipe
> lock, and we'll just clear "pipe->not_ready" under the pipe lock after
> finalizing all those pages (and before waking up readers)
> 
>  (c) make "pipe_wait_readable()" and "poll()" know about this all, so
> that we wait properly for a pipe that was not ready to become ready
> 
> This all makes *most* users deal properly with these blocking events.
> In particular, things like splice_to_socket() can now do the whole
> proper "wait without holding the pipe lock" sequence, even when the
> pipe is not empty, just in this blocked state.
> 
> This *may* also make all the cases Jens had with io_uring and splicing
> JustWork(tm).

Exactly! I was reading this thread with excitement just now, would be
nice to get rid of that kludge.

> NOTE! NOTE! NOTE! Once more, this "feels right to me", and I'd argue
> that the basic approach is fairly straightfoward. The patch is also
> not horrendous. It all makes a fair amount of sense. BUT! I haven't
> tested this, and like the previous patch, I really would want people
> to think about this a lot.
> 
> Comments? Jens?

I'll take a closer look at this, but won't be until Monday most likely.
But the approach seems sane, and going in a more idiomatic direction
than before. So seems promising.

-- 
Jens Axboe


  reply	other threads:[~2023-07-07 19:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26  1:12 Pending splice(file -> FIFO) always blocks read(FIFO), regardless of O_NONBLOCK on read side? Ahelenia Ziemiańska
2023-06-26  9:32 ` Christian Brauner
2023-06-26 11:59   ` Pending splice(file -> FIFO) excludes all other FIFO operations forever (was: ... always blocks read(FIFO), regardless of O_NONBLOCK on read side?) Ahelenia Ziemiańska
2023-06-26 15:56     ` Christian Brauner
2023-06-26 16:14       ` Ahelenia Ziemiańska
2023-07-06 21:56         ` Linus Torvalds
2023-07-07 17:21           ` Christian Brauner
2023-07-07 19:10             ` Linus Torvalds
2023-07-07 19:57               ` Jens Axboe [this message]
2023-07-07 22:41               ` Ahelenia Ziemiańska
2023-07-07 22:57                 ` Linus Torvalds
2023-07-08  0:30                   ` Ahelenia Ziemiańska
2023-07-08 20:06                     ` Linus Torvalds
2023-07-09  1:03                       ` Ahelenia Ziemiańska
2023-07-09 22:33                         ` Ahelenia Ziemiańska
2023-07-10 13:22                           ` Ahelenia Ziemiańska
2023-07-08  0:00           ` Matthew Wilcox
2023-07-08  0:07             ` Linus Torvalds
2023-07-08  0:21               ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42294e31-9cc8-3c5a-c28f-cfa3854fbe69@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nabijaczleweli@nabijaczleweli.xyz \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).