Re: hole-punch vs fault

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <matthew@wil.cx>
Cc: Jan Kara <jack@suse.cz>, linux-fsdevel@vger.kernel.org
Subject: Re: hole-punch vs fault
Date: Mon, 2 Dec 2013 08:52:21 +1100	[thread overview]
Message-ID: <20131201215221.GW10988@dastard> (raw)
In-Reply-To: <20131129131154.GA11106@parisc-linux.org>

On Fri, Nov 29, 2013 at 06:11:55AM -0700, Matthew Wilcox wrote:
> On Fri, Nov 29, 2013 at 09:12:46AM +1100, Dave Chinner wrote:
> > Indeed, we've already discussed this at length - if we are going to
> > introduce a new method of synchronisation between mmap and
> > holepunch, then we cannot ignore all the other mmap vs IO
> > synchronisation problems we have that are caused by the same issue.
> > Hacking workarounds into the holepunch/mmap paths is not a robust or
> > maintainable solution.
> 
> This isn't hacking in a workaround.  This is a solution to the problem
> that assumes holepunches are rare, and it's OK to make the fault path retry
> if a holepunch happened. 

Workloads that use hole punches tend to make extensive use of them,
so I suspect that what is OK for you is going to be a fairly
significant problem for others...

> I agree it needs to be documented and the exact
> details of what is protected by i_damaged_mutex needs to be clear.
> 
> > 	a) it is a entire file lock and truncate/holepunch do not
> > 	   affect the entire scope of the file. i.e. there is no
> > 	   reason why holepunch cannot run concurrently with IO if
> > 	   they are to different ranges of a file.
> 
> Yes, if we care about the scalability of truncate and holepunch.  I really
> don't think that's an important workload.

See my comment above about workloads that use holepunch often care
about scalability. i.e. Workloads that use preallocation and hole
punching typically care about IO scalability.

For example, sparse image files on loopback devices where
mounted filesystems use discard to run hole punches to ensure
mimimal space usage. We most certainly care about the scalability of
hole punching w.r.t. other IO going to the same file (probably
direct IO) in these workloads. Right now a hole punch is effectively
an unqueuable operation - it causes a pipeline stall - just like
TRIM on SATA....

> > 	b) it increases the struct inode by at least 5%. If we are
> > 	   going to increase the inode by that much, we better make
> > 	   sure we solve more than one isolated, relatively rare
> > 	   corner case behaviour.
> 
> About 7% (568 -> 608 bytes) with my .config (SMP, SPIN_ON_OWNER,
> !DEBUG_MUTEXES, !DEBUG_LOCK_ALLOC).  I rearranged the order (thanks,
> pahole!) to put i_damage in the hole created by private_lock.

Which is a huge price to pay for solving a problem that the majority
of users will never hit or care about. As I say to everyone that
thinks they can just add stuff to the struct inode without caring
about the size increase: use filesystem developers will kill to save
4 bytes in the struct inode, so you better have a damn good reason
for adding 40+ bytes to the structure....

> I'm solving two cases here, one is holepunch vs fault for regular
> files.  The other is truncate vs fault for XIP files, which is
> probably a little less rare.  I think the same mechanism can be
> used when a filesystem is doing a defragment or otherwise moving
> blocks around on an XIP block device.

Well, you kind of raised that as a secondary issue in passing, not
that it was the problem you actually need to solve.

> Look, I'm not attached to the i_damage solution,
> but I do need something to protect against truncate vs fault on XIP.
> Fixing holepunch vs fault too just felt like the right thing to do.

So, your real problem is that ext2-xip has a truncate vs fault race
because of a lack of a page lock to serialise page faults against
truncate. In fact, it appears to bypass the page cache completely
for both reads and writes (writes go directly to block device
memory), so it would seem that all the assumptions we've made about
serialisation on page locks throughout every filesystem are not
valid for XIP?

That seems like an XIP design fault, yes?  Why should we be
hacking crap into the struct inode to fix a XIP design flaw?

Indeed, XIP could make use of the structures we already
have in the struct inode/address space to behave more like a normal
filesystem. We already use the mapping tree to store
per-page information that is used to serialise truncate vs page
faults, so why not make XIP do exactly the same thing? 

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2013-12-01 21:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27 13:48 hole-punch vs fault Matthew Wilcox
2013-11-27 22:19 ` Jan Kara
2013-11-28  2:33   ` Matthew Wilcox
2013-11-28  3:30     ` Matthew Wilcox
2013-11-28  4:22     ` Dave Chinner
2013-11-28  4:44     ` Matthew Wilcox
2013-11-28 12:24       ` Matthew Wilcox
2013-11-28 22:12       ` Dave Chinner
2013-11-29 13:11         ` Matthew Wilcox
2013-12-01 21:52           ` Dave Chinner [this message]
2013-12-02  8:33             ` Jan Kara
2013-12-02 15:58               ` Matthew Wilcox
2013-12-02 20:11                 ` Jan Kara
2013-12-02 20:13                 ` Matthew Wilcox
2013-12-02 23:13                   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131201215221.GW10988@dastard \
    --to=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=matthew@wil.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).