From: Jens Axboe <axboe@kernel.dk>
To: Christoph Hellwig <hch@infradead.org>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
Michael Kelley <mikelley@microsoft.com>
Subject: Re: [PATCH] block: don't allow multiple bios for IOCB_NOWAIT issue
Date: Mon, 16 Jan 2023 11:30:54 -0700 [thread overview]
Message-ID: <9bdd64c6-2c8e-cfc9-059b-2922ce685994@kernel.dk> (raw)
In-Reply-To: <74da71bf-d352-0aad-3cb5-3d65cba5bc24@kernel.dk>
On 1/16/23 11:15?AM, Jens Axboe wrote:
> On 1/16/23 11:03?AM, Jens Axboe wrote:
>>>> + /*
>>>> + * We're doing more than a bio worth of IO (> 256 pages), and we
>>>> + * cannot guarantee that one of the sub bios will not fail getting
>>>> + * issued FOR NOWAIT as error results are coalesced across all of
>>>> + * them. Be safe and ask for a retry of this from blocking context.
>>>> + */
>>>> + if (iocb->ki_flags & IOCB_NOWAIT)
>>>> + return -EAGAIN;
>>>> return __blkdev_direct_IO(iocb, iter, bio_max_segs(nr_pages));
>>>
>>> If the I/O is too a huge page we could easily end up with a single
>>> bio here.
>>
>> True - we can push the decision making further down potentially, but
>> honestly not sure it's worth the effort.
>
> And even for page merges too, fwiw. We could probably do something like
> the below (totally untested), downside there would be that we've already
> mapped and allocated a bio at that point.
Was missing a plug finish, but apart from that it works in testing.
Question is just if we end up doing the punt anyway in the majority of
the cases, then it's slower then it was before. If we end up skipping
some -EAGAIN's, then it'd be better. Even without huge pages, I see fio
runs that have a 1:1 ratio between them (eg we always end up punting
anyway), and cases where we now do zero punting. This must be down to
memory layout - if we can successfully merge pages in a vec, then we
don't punt anyway.
I'm leaning towards the below likely being the more optimal fix, even
with a worse worst case behavior of punting anyway and now allocating
and mapping data twice.
diff --git a/block/fops.c b/block/fops.c
index a03cb732c2a7..1a371f50cb13 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -221,6 +221,15 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
bio_endio(bio);
break;
}
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ if (iov_iter_count(iter)) {
+ bio_release_pages(bio, false);
+ bio_put(bio);
+ blk_finish_plug(&plug);
+ return -EAGAIN;
+ }
+ bio->bi_opf |= REQ_NOWAIT;
+ }
if (is_read) {
if (dio->flags & DIO_SHOULD_DIRTY)
@@ -228,9 +237,6 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
} else {
task_io_account_write(bio->bi_iter.bi_size);
}
- if (iocb->ki_flags & IOCB_NOWAIT)
- bio->bi_opf |= REQ_NOWAIT;
-
dio->size += bio->bi_iter.bi_size;
pos += bio->bi_iter.bi_size;
--
Jens Axboe
prev parent reply other threads:[~2023-01-16 18:39 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-16 16:01 [PATCH] block: don't allow multiple bios for IOCB_NOWAIT issue Jens Axboe
2023-01-16 17:11 ` Michael Kelley (LINUX)
2023-01-16 17:27 ` Jens Axboe
2023-01-16 17:51 ` Christoph Hellwig
2023-01-16 18:03 ` Jens Axboe
2023-01-16 18:15 ` Jens Axboe
2023-01-16 18:30 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9bdd64c6-2c8e-cfc9-059b-2922ce685994@kernel.dk \
--to=axboe@kernel.dk \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=mikelley@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.