From: Kent Overstreet <kent.overstreet@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Al Viro <viro@zeniv.linux.org.uk>, Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Theodore Ts'o <tytso@mit.edu>
Subject: Re: fallocate INSERT_RANGE/COLLAPSE_RANGE is completely broken [PATCH]
Date: Tue, 29 Mar 2016 15:46:45 -0800 [thread overview]
Message-ID: <20160329234645.GA11212@kmo-pixel> (raw)
In-Reply-To: <20160329223743.GE30721@dastard>
On Wed, Mar 30, 2016 at 09:37:43AM +1100, Dave Chinner wrote:
> There are two locks the XFS_IOLOCK for read/write/splice IO path vs
> truncate/fallocate exclusion, and XFS_MMAPLOCK for page fault vs
> truncate/fallocate exclusion.
Ok, that makes sense then.
> > ext4 uses the generic code in all the places you're hooking into though -
> > .fault, .read_iter, etc.
> >
> > The scheme I've got in this patch should perform quite a bit better than what
> > you're doing - only locking in the slow cache miss path, vs. every time you
> > touch the page cache.
>
> I'm not sure I follow - how does this work when a fallocate
> operation use the page cache for, say, zeroing data blocks rather
> than invalidating them (e.g. FALLOC_FL_ZERO_RANGE can validly zero
> blocks through the page cache, so can hole punching)? Won't the
> buffered read then return a mix of real and zeroed data, depending
> who wins the race to each underlying page lock?
Yes, but does that matter? I don't know of any requirement that the entire
fallocate be atomic, and writes > PAGE_SIZE aren't generally atomic wrt reads
either (but looking at the code it looks like XFS is special there too...)
> i.e. if the locking only occurs in the page insert slow path, then
> it doesn't provide sufficient exclusion for extent manipulation
> operations that use the page cache during their normal operation.
> IOWs, other, higher level synchronisation methods for fallocate
> are still necessary....
Well, I would argue it _is_ enough for internal consistency, if you want to
provide stronger atomicity guarantees to userspace (not that there's anything
wrong with that) I think I'd argue that really that's a different issue.
Also the situation seems to be that XFS provides stronger guarantees than any
other Linux filesystem, which is sucky from an application point of view because
they don't want to have to depend on running on XFS for correctness. It seems
like it'd actually be pretty easy to lift your locking scheme to the generic VFS
code, and then possibly even have it be controlled with a per inode flag so that
applications that don't need the locking aren't paying for it.
next prev parent reply other threads:[~2016-03-29 23:46 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-29 4:25 fallocate INSERT_RANGE/COLLAPSE_RANGE is completely broken [PATCH] Kent Overstreet
2016-03-29 5:15 ` Dave Chinner
2016-03-29 6:04 ` Kent Overstreet
2016-03-29 22:37 ` Dave Chinner
2016-03-29 23:46 ` Kent Overstreet [this message]
2016-03-30 23:17 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160329234645.GA11212@kmo-pixel \
--to=kent.overstreet@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@redhat.com \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).