Re: [GIT PULL] pipe: nonblocking rw for io_uring

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [GIT PULL] pipe: nonblocking rw for io_uring
Date: Mon, 24 Apr 2023 15:22:04 -0600	[thread overview]
Message-ID: <6882b74e-874a-c116-62ac-564104c5ad34@kernel.dk> (raw)
In-Reply-To: <CAHk-=wgyL9OujQ72er7oXt_VsMeno4bMKCTydBT1WSaagZ_5CA@mail.gmail.com>

On 4/24/23 3:05?PM, Linus Torvalds wrote:
> On Fri, Apr 21, 2023 at 7:02?AM Christian Brauner <brauner@kernel.org> wrote:
>>
>> This contains Jens' work to support FMODE_NOWAIT and thus IOCB_NOWAIT
>> for pipes ensuring that all places can deal with non-blocking requests.
>>
>> To this end, pass down the information that this is a nonblocking
>> request so that pipe locking, allocation, and buffer checking correctly
>> deal with those.
> 
> Ok, I pulled this, but then I unpulled it again.
> 
> Doing conditional locking for O_NONBLOCK and friends is not ok. Yes,
> it's been done, and I may even have let some slip through, but it's
> just WRONG.
> 
> There is absolutely no situation where a "ok, so the lock on this data
> structure was taken, we'll go to some async model" is worth it.
> 
> Every single time I've seen this, it's been some developer who thinks
> that O_NONBLOCk is somehow some absolute "this cannot schedule" thing.
> And every single time it's been broken and horrid crap that just made
> everything more complicated and slowed things down.
> 
> If some lock wait is a real problem, then the lock needs to be just
> fixed. Not "ok, let's make a non-blocking version and fall back if
> it's held".
> 
> Note that FMODE_NOWAIT does not mean (and *CANNOT* mean) that you'd
> somehow be able to do the IO in some atomic context anyway. Many of
> our kernel locks don't even support that (eg mutexes).
> 
> So thinking that FMODE_NOWAIT is that kind of absolute is the wrong
> kind of thinking entirely.
> 
> FMODE_NOWAIT should mean that no *IO* gets done. And yes, that might
> mean that allocations fail too. But not this kind of "let's turn
> locking into 'trylock' stuff".
> 
> The whoe flag is misnamed. It should have been FMODE_NOIO, the same
> way we have IOCB_NOIO.
> 
> If you want FMODE_ATOMIC, then that is something entirely and
> completely different, and is probably crazy.
> 
> We have done it in one area (RCU pathname lookup), and it was worth it
> there, and it was a *huge* undertaking. It was worth it, but it was
> worth it because it was a serious thing with some serious design and a
> critical area.
> 
> Not this kind of "conditional trylock" garbage which just means that
> people will require 'poll()' to now add the lock to the waitqueue, or
> make all callers go into some "let's use a different thread instead"
> logic.

I'm fully agreeing with you on the semantics for
FMODE_NOWAIT/IOCB_NOWAIT, and also agree that they are misnamed and
really should be _NOIO. There is no conceptual misunderstanding there,
nor do I care about atomic semantics. Obviously the above is for
io_uring to be able to improve the performance on pipes, because right
now everything is punted to io-wq and that severly hampers performance
with how pipes are generally used. io_uring doesn't care about atomic
context, but it very much cares about NOT sleeping for IO during issue
as that has adverse performance implications.

If we don't ever wait for IO with the pipe lock held, then we can skip
the conditional locking. But with splice, that's not at all the case! We
most certainly wait for IO there with the pipe lock held.

And yes I totally agree that conditional locking is not pretty, but for
some cases it really is a necessary evil. People have complained about
io_uring pipe performance, and I ran some testing myself. Being able to
sanely read/write from/to pipes without punting to io-wq is an easy 10x
improvement for that use case. How can I enable FMODE_NOWAIT on pipes if
we have splice holding the pipe lock over IO? It's not feasible.

So please reconsider, doing IO to/from pipes efficiently is actually
kind of useful for io_uring...

-- 
Jens Axboe

next prev parent reply	other threads:[~2023-04-24 21:22 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-21 14:01 [GIT PULL] pipe: nonblocking rw for io_uring Christian Brauner
2023-04-24 21:05 ` Linus Torvalds
2023-04-24 21:22   ` Jens Axboe [this message]
2023-04-24 21:37     ` Linus Torvalds
2023-04-24 21:55       ` Jens Axboe
2023-04-24 22:00         ` Linus Torvalds
2023-04-24 22:05           ` Jens Axboe
2023-04-24 22:03         ` Jens Axboe
2023-04-24 21:58       ` Linus Torvalds
2023-04-24 22:07         ` Jens Axboe
2023-04-24 22:44           ` Jens Axboe
2023-04-25  3:16             ` Linus Torvalds
2023-04-25 13:46               ` Jens Axboe
2023-04-25 17:20               ` Linus Torvalds
2023-04-25 19:49                 ` Peter Zijlstra
2023-04-25 19:58                   ` Linus Torvalds
2023-04-25 20:10                     ` Jens Axboe
2023-04-25 20:29                     ` Linus Torvalds
     [not found]                     ` <978690c4-1d25-46e8-3375-45940ec1ea51@huaweicloud.com>
2023-05-08  8:39                       ` Peter Zijlstra
2023-05-08 10:16                         ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6882b74e-874a-c116-62ac-564104c5ad34@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox