Re: hole-punch vs fault

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <matthew@wil.cx>
Cc: Jan Kara <jack@suse.cz>, linux-fsdevel@vger.kernel.org
Subject: Re: hole-punch vs fault
Date: Fri, 29 Nov 2013 09:12:46 +1100	[thread overview]
Message-ID: <20131128221246.GQ8803@dastard> (raw)
In-Reply-To: <20131128044454.GL24288@parisc-linux.org>

On Wed, Nov 27, 2013 at 09:44:54PM -0700, Matthew Wilcox wrote:
> On Wed, Nov 27, 2013 at 07:33:43PM -0700, Matthew Wilcox wrote:
> > On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote:
> > > > The second is to put some kind of generation counter in the inode or
> > > > address_space that is incremented on every deallocation.  The fault path
> > > > would read the generation counter before calling find_get_page() and then
> > > > check it hasn't changed after getting the page lock (goto retry_find).  I
> > > > like that one a little more since we can fix it all in common code, and I
> > > > think it'll be lower overhead in the fault path.
> > >   I'm not sure I understand how this should work. After pagecache is
> > > truncated, fault can come and happily instantiate the page again. After
> > > that fault is done, hole punching awakes and removes blocks from under that
> > > page. So checking the generation counter after you get the page lock seems
> > > useless to me.
> > 
> > Yeah, I don't think the page lock is enough (maybe someone can convince
> > me otherwise).  How does this look?  Checking the generation number
> > after we get i_mutex ensures that the truncate has finished running.
> 
> Ohh.  We can't take i_mutex because that's an inversion with mmap_sem.
> So option 1 is to convince ourselves that page lock is enough (and that
> doesn't solve the problem for ext2-xip).
> 
> Option 2 is to do something similar to seqcount_t.  We can't use
> seqcount_t as-is because it spins in read_seqcount_begin, and it could
> be spinning for a very long time while we read a page in from disc!

Option 2 is really any other method of synchronisation. It doesn't
matter is it's a seqlock like construct, or a more complex
shared/exclusive range lock - it's still the "same" solution.

Indeed, we've already discussed this at length - if we are going to
introduce a new method of synchronisation between mmap and
holepunch, then we cannot ignore all the other mmap vs IO
synchronisation problems we have that are caused by the same issue.
Hacking workarounds into the holepunch/mmap paths is not a robust or
maintainable solution.

> How about something like this:
> 
> typedef struct seqtex {
> 	unsigned		sequence;
> 	spinlock_t		wait_lock;
> 	struct list_head	wait_list;
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> 	struct lockdep_map	dep_map;
> #endif
> }

The biggest problem I see here is that

	a) it is a entire file lock and truncate/holepunch do not
	   affect the entire scope of the file. i.e. there is no
	   reason why holepunch cannot run concurrently with IO if
	   they are to different ranges of a file.

	b) it increases the struct inode by at least 5%. If we are
	   going to increase the inode by that much, we better make
	   sure we solve more than one isolated, relatively rare
	   corner case behaviour.

We need to do better than playing whack-a-mole with a big hammer
here...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2013-11-28 22:12 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27 13:48 hole-punch vs fault Matthew Wilcox
2013-11-27 22:19 ` Jan Kara
2013-11-28  2:33   ` Matthew Wilcox
2013-11-28  3:30     ` Matthew Wilcox
2013-11-28  4:22     ` Dave Chinner
2013-11-28  4:44     ` Matthew Wilcox
2013-11-28 12:24       ` Matthew Wilcox
2013-11-28 22:12       ` Dave Chinner [this message]
2013-11-29 13:11         ` Matthew Wilcox
2013-12-01 21:52           ` Dave Chinner
2013-12-02  8:33             ` Jan Kara
2013-12-02 15:58               ` Matthew Wilcox
2013-12-02 20:11                 ` Jan Kara
2013-12-02 20:13                 ` Matthew Wilcox
2013-12-02 23:13                   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131128221246.GQ8803@dastard \
    --to=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=matthew@wil.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).