Re: fallocate INSERT_RANGE/COLLAPSE_RANGE is completely broken [PATCH]

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Al Viro <viro@zeniv.linux.org.uk>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Theodore Ts'o <tytso@mit.edu>
Subject: Re: fallocate INSERT_RANGE/COLLAPSE_RANGE is completely broken [PATCH]
Date: Wed, 30 Mar 2016 09:37:43 +1100	[thread overview]
Message-ID: <20160329223743.GE30721@dastard> (raw)
In-Reply-To: <20160329060410.GA2383@kmo-pixel>

On Mon, Mar 28, 2016 at 10:04:10PM -0800, Kent Overstreet wrote:
> On Tue, Mar 29, 2016 at 04:15:58PM +1100, Dave Chinner wrote:
> > On Mon, Mar 28, 2016 at 08:25:46PM -0800, Kent Overstreet wrote:
> > > Bit of previous discussion:
> > > http://thread.gmane.org/gmane.linux.file-systems/101201/
> > > 
> > > The underlying issue is that we have no mechanism for invalidating a range of
> > > the pagecache and then _keeping it invalidated_ while we Do Stuff. 
> > > 
> > > The fallocate INSERT_RANGE/COLLAPSE_RANGE situation seems likely to be worse
> > > than I initially thought. I've been digging into this in the course of bcachefs
> > > testing - I was hitting assertions that meant state hanging off the page cache
> > > (in this case, allocation information, i.e. whether we needed to reserve space
> > > on write) was inconsistent with the btree in writepages().
> > > 
> > > Well, bcachefs isn't the only filesystem that hangs additional state off the
> > > pagecache, and the situation today is that an unpriviliged user can cause
> > > inconsistencies there by just doing buffered reads concurrently with
> > > INSERT_RANGE/COLLAPSE_RANGE. I highly highly doubt this is an issue of just
> > > "oops, you corrupted your file because you were doing stupid stuff" - who knows
> > > what internal invariants are getting broken here, and I don't particularly care
> > > to find out.
> > 
> > I'd like to see a test case for this. Concurrent IO and/or page
> > faults should not run at the same as fallocate on XFS. Hence I'd
> > like to see the test cases that demonstrate buffered reads are
> > causing corruption during insert/collapse range operations. We use
> > the same locking strategy for fallocate as we use for truncate and
> > all the other internal extent manipulation operations, so if there's
> > something wrong, we need to fix it.
> 
> It's entirely possible I'm wrong about XFS - your fault path locking looked
> correct, and I did see you had extra locking in your buffered read path but I
> thought it was a different lock. I'll recheck later, but for the moment I'm just
> going to assume I misspoke (and tbh always found xfs's locking to be quite
> rigorous).

There are two locks the XFS_IOLOCK for read/write/splice IO path vs
truncate/fallocate exclusion, and XFS_MMAPLOCK for page fault vs
truncate/fallocate exclusion.

> ext4 uses the generic code in all the places you're hooking into though -
> .fault, .read_iter, etc.
> 
> The scheme I've got in this patch should perform quite a bit better than what
> you're doing - only locking in the slow cache miss path, vs. every time you
> touch the page cache.

I'm not sure I follow - how does this work when a fallocate
operation use the page cache for, say, zeroing data blocks rather
than invalidating them (e.g.  FALLOC_FL_ZERO_RANGE can validly zero
blocks through the page cache, so can hole punching)?  Won't the
buffered read then return a mix of real and zeroed data, depending
who wins the race to each underlying page lock?

i.e. if the locking only occurs in the page insert slow path, then
it doesn't provide sufficient exclusion for extent manipulation
operations that use the page cache during their normal operation.
IOWs, other, higher level synchronisation methods for fallocate
are still necessary....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2016-03-29 22:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-29  4:25 fallocate INSERT_RANGE/COLLAPSE_RANGE is completely broken [PATCH] Kent Overstreet
2016-03-29  5:15 ` Dave Chinner
2016-03-29  6:04   ` Kent Overstreet
2016-03-29 22:37     ` Dave Chinner [this message]
2016-03-29 23:46       ` Kent Overstreet
2016-03-30 23:17         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160329223743.GE30721@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@redhat.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).