From: Dave Chinner <david@fromorbit.com>
To: Chris Mason <clm@fb.com>
Cc: Eric Sandeen <sandeen@redhat.com>, xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: don't zero partial page cache pages during O_DIRECT
Date: Sat, 9 Aug 2014 14:17:00 +1000 [thread overview]
Message-ID: <20140809041700.GH26465@dastard> (raw)
In-Reply-To: <53E5885A.9090302@fb.com>
On Fri, Aug 08, 2014 at 10:32:58PM -0400, Chris Mason wrote:
>
>
> On 08/08/2014 08:36 PM, Dave Chinner wrote:
> > On Fri, Aug 08, 2014 at 10:35:38AM -0400, Chris Mason wrote:
> >>
> >> xfs is using truncate_pagecache_range to invalidate the page cache
> >> during DIO reads. This is different from the other filesystems who only
> >> invalidate pages during DIO writes.
> >
> > Historical oddity thanks to wrapper functions that were kept way
> > longer than they should have been.
> >
> >> truncate_pagecache_range is meant to be used when we are freeing the
> >> underlying data structs from disk, so it will zero any partial ranges
> >> in the page. This means a DIO read can zero out part of the page cache
> >> page, and it is possible the page will stay in cache.
> >
> > commit fb59581 ("xfs: remove xfs_flushinval_pages"). also removed
> > the offset masks that seem to be the issue here. Classic case of a
> > regression caused by removing 10+ year old code that was not clearly
> > documented and didn't appear important.
> >
> > The real question is why isn't fsx and other corner case data
> > integrity tools tripping over this?
> >
>
> My question too. Maybe not mixing buffered/direct for partial pages?
> Does fsx only do 4K O_DIRECT?
No. xfstests::tests/generic/091 is supposed to cover this exact
case. It runs fsx with direct IO aligned to sector boundaries
amongst other things.
$ ./lsqa.pl tests/generic/091
FS QA Test No. 091
fsx exercising direct IO -- sub-block sizes and concurrent buffered IO
$
>
> >> buffered reads will find an up to date page with zeros instead of the
> >> data actually on disk.
> >>
> >> This patch fixes things by leaving the page cache alone during DIO
> >> reads.
> >>
> >> We discovered this when our buffered IO program for distributing
> >> database indexes was finding zero filled blocks. I think writes
> >> are broken too, but I'll leave that for a separate patch because I don't
> >> fully understand what XFS needs to happen during a DIO write.
> >>
> >> Test program:
> >
> > Encapsulate it in a generic xfstest, please, and send it to
> > fstests@vger.kernel.org.
>
> This test prog was looking for races, which we really don't have. It
> can be much shorter to just look for the improper zeroing from a single
> thread. I can send it either way.
Doesn't matter, as long as we have something that exercises this
case....
> > Besides, XFS's direct IO semantics are far saner, more predictable
> > and hence are more widely useful than the generic code. As such,
> > we're not going to regress semantics that have been unchanged
> > over 20 years just to match whatever insanity the generic Linux code
> > does right now.
> >
> > Go on, call me a deranged monkey on some serious mind-controlling
> > substances. I don't care. :)
>
> The deranged part is invalidating pos -> -1 on a huge file because of a
> single 512b direct read. But, if you mix O_DIRECT and buffered you get
> what the monkeys give you and like it.
That's a historical artifact - it predates the range interfaces that
Linux has grown over the years, and every time we've changed it to
match teh I/O range subtle problems have arisen. THose are usually
due to other bugs we knew nothing about at the time, but that's the
way it goes...
> > I think the fix should probably just be:
> >
> > - truncate_pagecache_range(VFS_I(ip), pos, -1);
> > + invalidate_inode_pages2_range(VFS_I(ip)->i_mapping,
> > + pos >> PAGE_CACHE_SHIFT, -1);
> >
>
> I'll retest with this in the morning. The invalidate is basically what
> we had before with the masking & PAGE_CACHE_SHIFT.
Yup. Thanks for finding these issuesi, Chris!
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-08-09 4:17 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-08 14:35 [PATCH] xfs: don't zero partial page cache pages during O_DIRECT Chris Mason
2014-08-08 15:17 ` Chris Mason
2014-08-08 16:04 ` [PATCH RFC] xfs: use invalidate_inode_pages2_range for DIO writes Chris Mason
2014-08-09 0:48 ` Dave Chinner
2014-08-09 2:42 ` Chris Mason
2014-08-08 20:39 ` [PATCH] xfs: don't zero partial page cache pages during O_DIRECT Brian Foster
2014-08-09 0:36 ` Dave Chinner
2014-08-09 2:32 ` Chris Mason
2014-08-09 3:19 ` Eric Sandeen
2014-08-09 4:17 ` Dave Chinner [this message]
2014-08-09 12:57 ` [PATCH v2] " Chris Mason
2014-08-11 13:29 ` Brian Foster
2014-08-12 1:17 ` Dave Chinner
2014-08-19 19:24 ` Chris Mason
2014-08-19 22:35 ` Dave Chinner
2014-08-20 1:54 ` Chris Mason
2014-08-20 2:19 ` Dave Chinner
2014-08-20 2:36 ` Dave Chinner
2014-08-20 4:41 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140809041700.GH26465@dastard \
--to=david@fromorbit.com \
--cc=clm@fb.com \
--cc=sandeen@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox