From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
linux-fsdevel@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: BLKZEROOUT + pread should return zeroes, right?
Date: Mon, 13 Oct 2014 23:02:42 -0700 [thread overview]
Message-ID: <20141014060242.GA22878@birch.djwong.org> (raw)
In-Reply-To: <20141014042711.GJ5267@dastard>
On Tue, Oct 14, 2014 at 03:27:11PM +1100, Dave Chinner wrote:
> On Mon, Oct 13, 2014 at 08:01:32PM -0700, Darrick J. Wong wrote:
> > Hi everyone,
> >
> > What's the intended behavior if I issue BLKZEROOUT against a range of disk
> > sectors and immediately re-read the sectors into a buffer?
>
> Should return zeros.
>
> [...]
>
> > I boiled the whole thing down into the attached test program, which can
> > reproduce the symptoms in a few loop iterations. If I insert "sleep(1);"
> > before the pread64, I pread zeroes every time; otherwise, I only pread zeroes
> > part of the time. If I call "ioctl(fd, BLKFLSBUF);" before the BLKZEROOUT, the
> > chances of preading zeroes increases dramatically, but is still not 100%.
>
> Hint #1: buffered IO == data in page cache.
> Hint #2: BLKZEROOUT operates at the bio level.
Yeah, I forgot about that little quirk where the page cache is left in the
dark. Thank you for the sanity check, Dave.
> > So, uh, is this a bug? Or is that just how BLKZEROOUT works? Or did I fubar
> > the ioctl call?
>
> Broken usage, IMO. If you are going to use the block layer ioctls to
> manipulate data int eh block device, you should be using direct Io
> for all your data IO to the block device. Otherwise, coherency
> problems occur....
So... if these ioctls require direct IO read and write for any sane use model,
why doesn't the kernel fail the request if the fd isn't in O_DIRECT mode? Or,
if we do want to allow the ioctls to run on an fd that's opened in buffered IO
mode, can we simply invalidate that part of the page cache after calling
ZEROOUT?
Something idiotic like fsync_bdev() -> blkdev_issue_zeroout -> invalidate_bdev
-> invalidate_inode_pages2 seems to smooth things over, but that's a big dumb
hammer.
Tired of this for now, going to bed.
--D
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
next prev parent reply other threads:[~2014-10-14 6:02 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-14 3:01 BLKZEROOUT + pread should return zeroes, right? Darrick J. Wong
2014-10-14 4:27 ` Dave Chinner
2014-10-14 6:02 ` Darrick J. Wong [this message]
2014-10-14 6:32 ` Theodore Ts'o
2014-10-15 1:25 ` Darrick J. Wong
2014-10-15 1:32 ` Martin K. Petersen
2014-10-16 20:04 ` Darrick J. Wong
2014-10-15 10:02 ` Theodore Ts'o
2014-10-15 12:09 ` Martin K. Petersen
2014-10-18 0:03 ` [RFC PATCH] block: make BLKZEROOUT invalidate page cache contents Darrick J. Wong
2014-10-14 9:21 ` BLKZEROOUT + pread should return zeroes, right? Christoph Hellwig
2014-10-14 13:44 ` Martin K. Petersen
2014-10-14 18:57 ` Zach Brown
2014-10-14 20:21 ` Dave Chinner
2014-10-15 1:02 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141014060242.GA22878@birch.djwong.org \
--to=darrick.wong@oracle.com \
--cc=axboe@kernel.dk \
--cc=david@fromorbit.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).