linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Will Manley" <will@williammanley.net>
To: linux-fsdevel@vger.kernel.org
Cc: "Dave Chinner" <david@fromorbit.com>,
	"Kent Overstreet" <kent.overstreet@gmail.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Jens Axboe" <axboe@kernel.dk>,
	linux-kernel@vger.kernel.org, "Alice Ryhl" <alice@ryhl.io>,
	br0adcast <br0adcast.007@gmail.com>
Subject: BUG: preadv2(.., RWF_NOWAIT) returns spurious EOF
Date: Mon, 24 May 2021 11:42:52 +0100	[thread overview]
Message-ID: <fea8b16d-5a69-40f9-b123-e84dcd6e8f2e@www.fastmail.com> (raw)

Hi All

We've seen preadv2(..., -1, RWF_NOWAIT) return 0 when at offset 4096 in a file much larger than 4096B.  This breaks code that reads an entire file because the 0 return makes it believe that it's already read the whole file. We came across this when investigating a bug reported against the Rust async I/O library tokio. The latest release now takes advantage of RWF_NOWAIT for file I/O, but it's caused problems for users.

https://github.com/tokio-rs/tokio/issues/3803

The issue is readily reproducible. We've tested on armv7, i686 and x86_64 with the ext4 filesystem.  Here's the strace output:

preadv2(9, [{iov_base=..., iov_len=32}], 1, -1, RWF_NOWAIT) = 32
preadv2(9, [{iov_base=..., iov_len=32}], 1, -1, RWF_NOWAIT) = 32
preadv2(9, [{iov_base=..., iov_len=64}], 1, -1, RWF_NOWAIT) = 64
preadv2(9, [{iov_base=..., iov_len=128}], 1, -1, RWF_NOWAIT) = 128
preadv2(9, [{iov_base=..., iov_len=256}], 1, -1, RWF_NOWAIT) = 256
preadv2(9, [{iov_base=..., iov_len=512}], 1, -1, RWF_NOWAIT) = 512
preadv2(9, [{iov_base=..., iov_len=1024}], 1, -1, RWF_NOWAIT) = 1024
preadv2(9, [{iov_base=..., iov_len=2048}], 1, -1, RWF_NOWAIT) = 2048
preadv2(9, [{iov_base="", iov_len=4096}], 1, -1, RWF_NOWAIT) = 0

I'm not certain that it's caused by the offset being 4096.  Maybe it's that the data will be written into an uncommitted page causes the bug? I'm not certain.

The bug is present in Linux 5.9 and 5.10, but was fixed in Linux 5.11.  I've run a bisect and it was introduced in 

    efa8480a831 fs: RWF_NOWAIT should imply IOCB_NOIO

and fixed in

    06c0444290 mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig

This is already fixed but I thought it would be important to report it as the fix seems to be incidental.  The fix commit message doesn't mention anything about bugs so I wonder if the underlying issue still exists.

Our current plan is to add a uname check and to disable using the RWF_NOWAIT optimisation on 5.9 and 5.10.  Given that we don't understand the bug I thought it would be best to check with you. Maybe there's a better way of detecting the presence of this bug?

There's more information at https://github.com/tokio-rs/tokio/issues/3803

Thanks

Will

                 reply	other threads:[~2021-05-24 10:44 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fea8b16d-5a69-40f9-b123-e84dcd6e8f2e@www.fastmail.com \
    --to=will@williammanley.net \
    --cc=alice@ryhl.io \
    --cc=axboe@kernel.dk \
    --cc=br0adcast.007@gmail.com \
    --cc=david@fromorbit.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).