From: Filipe Manana <fdmanana@kernel.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org,
Dylan Yudaken <dylany@fb.com>
Subject: Re: [PATCH] btrfs: don't allow large NOWAIT direct reads
Date: Fri, 19 Aug 2022 17:50:16 +0100 [thread overview]
Message-ID: <20220819165016.GB3012163@falcondesktop> (raw)
In-Reply-To: <882730e60b58b8d970bd8bc3a670e598184eefef.1660924379.git.josef@toxicpanda.com>
On Fri, Aug 19, 2022 at 11:53:39AM -0400, Josef Bacik wrote:
> Dylan and Jens reported a problem where they had an io_uring test that
> was returning short reads, and bisected it to ee5b46a353af ("btrfs:
> increase direct io read size limit to 256 sectors").
>
> The root cause is their test was doing larger reads via io_uring with
> NOWAIT and async. This was triggering a page fault during the direct
> read, however the first page was able to work just fine and thus we
> submitted a 4k read for a larger iocb.
>
> Btrfs allows for partial IO's in this case specifically because we don't
> allow page faults, and thus we'll attempt to do any io that we can,
> submit what we could, come back and fault in the rest of the range and
> try to do the remaining IO.
>
> However for !is_sync_kiocb() we'll call ->ki_complete() as soon as the
> partial dio is done, which is incorrect. In the sync case we can exit
> the iomap code, submit more io's, and return with the amount of IO we
> were able to complete successfully.
>
> We were always doing short reads in this case, but for NOWAIT we were
> getting saved by the fact that we were limiting direct reads to
> sectorsize, and if we were larger than that we would return EAGAIN.
>
> Fix the regression by simply returning EAGAIN in the NOWAIT case with
> larger reads, that way io_uring can retry and get the larger IO and have
> the fault logic handle everything properly.
>
> This still leaves the AIO short read case, but that existed before this
> change. The way to properly fix this would be to handle partial iocb
> completions, but that's a lot of work, for now deal with the regression
> in the most straightforward way possible.
>
> Reported-by: Dylan Yudaken <dylany@fb.com>
> Fixes: ee5b46a353af ("btrfs: increase direct io read size limit to 256 sectors")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Looks good, thanks.
> ---
> fs/btrfs/inode.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5101111c5557..b39673e49732 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7694,6 +7694,20 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
> const u64 data_alloc_len = length;
> bool unlock_extents = false;
>
> + /*
> + * We could potentially fault if we have a buffer > PAGE_SIZE, and if
> + * we're NOWAIT we may submit a bio for a partial range and return
> + * EIOCBQUEUED, which would result in an errant short read.
> + *
> + * The best way to handle this would be to allow for partial completions
> + * of iocb's, so we could submit the partial bio, return and fault in
> + * the rest of the pages, and then submit the io for the rest of the
> + * range. However we don't have that currently, so simply return
> + * -EAGAIN at this point so that the normal path is used.
> + */
> + if (!write && (flags & IOMAP_NOWAIT) && length > PAGE_SIZE)
> + return -EAGAIN;
> +
> /*
> * Cap the size of reads to that usually seen in buffered I/O as we need
> * to allocate a contiguous array for the checksums.
> --
> 2.26.3
>
next prev parent reply other threads:[~2022-08-19 17:35 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-19 15:53 [PATCH] btrfs: don't allow large NOWAIT direct reads Josef Bacik
2022-08-19 16:50 ` Filipe Manana [this message]
2022-08-19 22:11 ` Qu Wenruo
2022-08-22 15:58 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220819165016.GB3012163@falcondesktop \
--to=fdmanana@kernel.org \
--cc=dylany@fb.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox