All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Dave Chinner <david@fromorbit.com>
Cc: Josef Bacik <josef@redhat.com>,
	linux-fsdevel@vger.kernel.org, chris.mason@oracle.com,
	hch@infradead.org, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] new ->perform_write fop
Date: Thu, 20 May 2010 16:48:08 +1000	[thread overview]
Message-ID: <20100520064808.GF2516@laptop> (raw)
In-Reply-To: <20100519235054.GS8120@dastard>

On Thu, May 20, 2010 at 09:50:54AM +1000, Dave Chinner wrote:
> > As I said, we can have a dumb fallback path for filesystems that
> > don't implement hole punching. Clear the blocks past i size, and
> > zero out the allocated but not initialized blocks.
> > 
> > There does not have to be pagecache allocated in order to do this,
> > you could do direct IO from the zero page in order to do it.
> 
> I don't see that as a good solution - it's once again a fairly
> complex way of dealing with the problem, especially as it now means
> that direct io would fall back to buffered which would fall back to
> direct IO....

Well it wouldn't use the full direct IO path. It has the block, just
build a bio with the source zero page and write it out. If the fs
requires anything more fancy than that, tough, it should just
implement hole punching.

 
> > Hole punching is not only useful there, it is already exposed to
> > userspace via MADV_REMOVE.
> 
> That interface is *totally broken*.

Why?

> It has all the same problems as
> vmtruncate() for removing file blocks (because it uses vmtruncate).
> It also has the fundamental problem of being called un the mmap_sem,
> which means that inode locks and therefore de-allocation cannot be
> executed without the possibility of deadlocks.

None of that is an API problem, it's all implementation. Yes fadivse
would be a much better API, but the madvise API is still there.
Implementation wise: it does not use vmtruncate; it has no mmap_sem
problem.


> Fundamentally, hole
> punching is an inode operation, not a VM operation....

VM acts as a handle to inode operations. It's no big deal.


> > An API that doesn't require that, though, should be less overhead
> > and simpler.
> > 
> > Is it really going to be a problem to implement block hole punching
> > in ext4 and gfs2?
> 
> I can't follow the ext4 code - it's an intricate maze of weird entry
> and exit points, so I'm not even going to attempt to comment on it.
> 
> The gfs2 code is easier to follow and it looks like it would require
> a redesign and rewrite of the block truncation implementation as it
> appears to assume that blocks are only ever removed from the end of
> the file - I don't think the recursive algorithms for trimming the
> indirect block trees can be easily modified for punching out
> arbitrary ranges of blocks easily. I could be wrong, though, as I'm
> not a gfs2 expert....

I'm far more in favour of doing the interfaces right, and making
the filesystems fix themselves to use it.


  reply	other threads:[~2010-05-20  6:48 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-12 21:24 [RFC] new ->perform_write fop Josef Bacik
2010-05-13  1:39 ` Josef Bacik
2010-05-13 15:36   ` Christoph Hellwig
2010-05-14  1:00   ` Dave Chinner
2010-05-14  3:30     ` Josef Bacik
2010-05-14  5:50       ` Nick Piggin
2010-05-14  7:20         ` Dave Chinner
2010-05-14  7:33           ` Nick Piggin
2010-05-14  6:41       ` Dave Chinner
2010-05-14  7:22         ` Nick Piggin
2010-05-14  8:38           ` Dave Chinner
2010-05-14 13:33             ` Chris Mason
2010-05-18  6:36             ` Nick Piggin
2010-05-18  8:05               ` Dave Chinner
2010-05-18 10:43                 ` Nick Piggin
2010-05-18 10:43                   ` Nick Piggin
2010-05-18 12:27                   ` Dave Chinner
2010-05-18 12:27                     ` Dave Chinner
2010-05-18 15:09                     ` Nick Piggin
2010-05-19 23:50                       ` Dave Chinner
2010-05-20  6:48                         ` Nick Piggin [this message]
2010-05-20 20:12                         ` Jan Kara
2010-05-20 23:05                           ` Dave Chinner
2010-05-21  9:05                             ` Steven Whitehouse
2010-05-21 13:50                             ` Josef Bacik
2010-05-21 13:50                               ` Josef Bacik
2010-05-21 14:23                               ` Nick Piggin
2010-05-21 15:19                                 ` Josef Bacik
2010-05-24  3:29                                   ` Nick Piggin
2010-05-22  0:31                               ` Dave Chinner
2010-05-21 18:58                             ` Jan Kara
2010-05-22  0:27                               ` Dave Chinner
2010-05-24  9:20                                 ` Jan Kara
2010-05-24  9:33                                   ` Nick Piggin
2010-06-05 15:05                                   ` tytso
2010-06-06  7:59                                     ` Nick Piggin
2010-06-06  7:59                                       ` Nick Piggin
2010-05-21 15:15           ` Christoph Hellwig
2010-05-22  2:31             ` Nick Piggin
2010-05-22  8:37               ` Dave Chinner
2010-05-24  3:09                 ` Nick Piggin
2010-05-24  5:53                   ` Dave Chinner
2010-05-24  6:55                     ` Nick Piggin
2010-05-24 10:21                       ` Dave Chinner
2010-06-01  6:27                         ` Nick Piggin
2010-05-24 18:40                       ` Joel Becker
2010-05-17 23:35         ` Jan Kara
2010-05-18  1:21           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100520064808.GF2516@laptop \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=josef@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.