From: Dave Chinner <david@fromorbit.com>
To: Chris Mason <clm@fb.com>
Cc: Eric Sandeen <sandeen@redhat.com>, xfs@oss.sgi.com
Subject: Re: [PATCH v2] xfs: don't zero partial page cache pages during O_DIRECT
Date: Wed, 20 Aug 2014 12:36:49 +1000 [thread overview]
Message-ID: <20140820023649.GV20518@dastard> (raw)
In-Reply-To: <20140820021950.GU20518@dastard>
On Wed, Aug 20, 2014 at 12:19:50PM +1000, Dave Chinner wrote:
> On Tue, Aug 19, 2014 at 09:54:14PM -0400, Chris Mason wrote:
> > On 08/19/2014 06:35 PM, Dave Chinner wrote:
> > > On Tue, Aug 19, 2014 at 03:24:48PM -0400, Chris Mason wrote:
> > >> On 08/11/2014 09:17 PM, Dave Chinner wrote:
> > >>> On Sat, Aug 09, 2014 at 08:57:00AM -0400, Chris Mason wrote:
> > >>>>
> > >>>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > >>>> index 1f66779..023d575 100644
> > >>>> --- a/fs/xfs/xfs_file.c
> > >>>> +++ b/fs/xfs/xfs_file.c
> > >>>> @@ -295,7 +295,8 @@ xfs_file_read_iter(
> > >>>> xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL);
> > >>>> return ret;
> > >>>> }
> > >>>> - truncate_pagecache_range(VFS_I(ip), pos, -1);
> > >>>> + invalidate_inode_pages2_range(VFS_I(ip)->i_mapping,
> > >>>> + pos >> PAGE_CACHE_SHIFT, -1);
> > >>>> }
> > >>>> xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
> > >>>> }
> > >>>
> > >>> I added the WARN_ON_ONCE(ret) check to this and I am seeing it fire
> > >>> occasionally. It always fires immediately before some other ASSERT()
> > >>> they fires with a block map/page cache inconsistency. It usually
> > >>> fires in a test that runs fsx or fsstress. The fsx failures are new
> > >>> regressions caused by this patch. e.g. generic/263 hasn't failed for
> > >>> months on any of my systems and this patch causes it to fail
> > >>> reliably on my 1k block size test config.
> > >>>
> > >>> I'm going to assume at this point that this is uncovering some other
> > >>> existing bug, but it means I'm not going to push this fix until I
> > >>> understand what is actually happening here. It is possible that what
> > >>> I'm seeing is related to Brian's collapse range bug fixes, but until
> > >>> I applied this direct IO patch I'd never seen fsx throw ASSERTs in
> > >>> xfs_bmap_shift_extents()....
> > >>>
> > >>> Either way, more testing and understanding is needed.
> > >>
> > >> Do you have the output from xfs and the command line args it used? For
> > >> my device, it picks:
> > >>
> > >> -r 4096 -t 512 -w 512 -Z
> > >>
> > >> And for a blocksize 1024 test I did mkfs.xfs -b size=1024
> > >
> > > I'm running:
> > >
> > > $ mkfs.xfs -f -m crc=1,finobt=1 -b size=1k /dev/vda
> > > $ mount /dev/vda /mnt/test
> > > $ ltp/fsx -o 128000 -l 500000 -r 4096 -t 512 -w 512 -Z -d /mnt/test/foo
> > >
> > >> But I can't trigger failures with or without the invalidate_inode_pages2
> > >> change. I was hoping to trigger on 3.16, and then jump back to 3.10 +
> > >> my patch to see if the patch alone was at fault.
> > >
> > > I am seeing failures at operation 1192.
> > >
> > > Yesterday, I found a new class of bufferhead state coherency issues
> > > to do with EOF handling that are causing the problems. Basically,
> > > when the page cache marks a page dirty, the generic code marks all
> > > the buffers on the page dirty, even when they are beyond EOF.
> > >
> > > As a result, when we go to invalidate the page that spans EOF, it
> > > cannot be invalidated because there are dirty buffers on the page.
> > > Those buffers persist in that state because they are beyond EOF,
> > > have no blocks allocated to them, and cannot be written. And so when
> > > we do a direct IO that spans the current EOF, it now fails to
> > > invalidate that page and so triggers the warning.
> > >
> > > Worse is that it appears that these bufferheads can leak into the
> > > internal blocks into the file when the file is extended, leading to
> > > all sorts of other ASSERT failures (which I've been seeing for a
> > > while now).
> > >
> > > I've got the above fsx command to run for somewhere between 100k and
> > > 110k operations with the fixes I currently have, but I haven't found
> > > the cause of the dirty buffer beyond EOF state leaking into the
> > > interior of the file from extend operations yet.
> > >
> > > Once I have something that passes a few million fsx ops....
> >
> > I have to admit, I'm not sure where this leaves us in terms of safely
> > applying my patch to our 3.10 or mainline kernel... Failing to
> > invalidate the page and zeroing the page are really both wrong.
>
> Well, yes they are, and that's one of the reasons why I wanted to
> ensure we caught failures. As such, I think this invalidation bug
> goes back a *long time* because we've never, ever checked for
> invalidation failure before this patch.
>
> However, I think I've got a self-contained fix that can be
> backported for the invalidation problem now - it's well past 10
> million fsx ops at this point with both read and write DIO
> invalidation fixes included. I'll post my series when I've done
> some more robust testing on it....
There's more issues. generic/247 on a 4k block size filesystem just
triggered an invalidation failure on a direct IO write. That test is
specifically mixing DIO overwrite with mmap overwrite on the same
file, so there's clearly more issues in play here than I first
thought.
So, more debugging needed to determine if this is a result of the
usual, currently unsolvable page fault vs DIO races...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-08-20 2:36 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-08 14:35 [PATCH] xfs: don't zero partial page cache pages during O_DIRECT Chris Mason
2014-08-08 15:17 ` Chris Mason
2014-08-08 16:04 ` [PATCH RFC] xfs: use invalidate_inode_pages2_range for DIO writes Chris Mason
2014-08-09 0:48 ` Dave Chinner
2014-08-09 2:42 ` Chris Mason
2014-08-08 20:39 ` [PATCH] xfs: don't zero partial page cache pages during O_DIRECT Brian Foster
2014-08-09 0:36 ` Dave Chinner
2014-08-09 2:32 ` Chris Mason
2014-08-09 3:19 ` Eric Sandeen
2014-08-09 4:17 ` Dave Chinner
2014-08-09 12:57 ` [PATCH v2] " Chris Mason
2014-08-11 13:29 ` Brian Foster
2014-08-12 1:17 ` Dave Chinner
2014-08-19 19:24 ` Chris Mason
2014-08-19 22:35 ` Dave Chinner
2014-08-20 1:54 ` Chris Mason
2014-08-20 2:19 ` Dave Chinner
2014-08-20 2:36 ` Dave Chinner [this message]
2014-08-20 4:41 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140820023649.GV20518@dastard \
--to=david@fromorbit.com \
--cc=clm@fb.com \
--cc=sandeen@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox