Re: [RFC] Ext3 online defrag

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Kleikamp <shaggy@austin.ibm.com>
To: David Chinner <dgc@sgi.com>
Cc: Jeff Garzik <jeff@garzik.org>, Alex Tomas <alex@clusterfs.com>,
	Theodore Tso <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [RFC] Ext3 online defrag
Date: Tue, 24 Oct 2006 11:26:26 -0500	[thread overview]
Message-ID: <1161707186.20134.26.camel@kleikamp.austin.ibm.com> (raw)
In-Reply-To: <20061024160128.GF11034@melbourne.sgi.com>

On Wed, 2006-10-25 at 02:01 +1000, David Chinner wrote:
> On Tue, Oct 24, 2006 at 09:51:41AM -0500, Dave Kleikamp wrote:
> > On Tue, 2006-10-24 at 23:59 +1000, David Chinner wrote:
> > > That's the wrong way to look at it. if you want the userspace
> > > process to specify a location, then you should preallocate it first
> > > before doing anything else. There is no need to clutter a simple
> > > data mover interface with all sorts of unnecessary error handling.
> > 
> > You are implying the the 2-step interface, creating a new inode then
> > swapping the contents, is the only way to implement this.
> 
> No, it's not the only way to implement it, but it seems the cleanest
> way to me when you have to consider crash recovery. With a temporary
> inode, you can create it, hold a reference and then unlink it so
> that any crash at that point will free the inode and any extents
> it has on it.
> 
> The only way I can see anything different working is having the
> filesystem hold extents somewhere internally that provides us the
> same recovery guarantees while we copy the data and insert the new
> extents.  This is obviously a filesystem specific solution and is
> more complex to implement than a swap extent transaction. it
> probably also needs on disk format changes to support properly....

This is definitely filesystem-dependent.  I would think allocating an
extent would be like any other allocation done by the filesystem, and
there are already recovery mechanisms for that.

> > > Once you've separated the destination allocation from the data
> > > mover, the mover is basically a splice copy from source to
> > > destination, an fsync and then an atomic swap blocks/extents operation.
> > > Most of this code is generic, and a per-fs swap-extents vector
> > > could be easily provided for the one bit that is not....
> > 
> > The benefit of having such a simple data mover is negated by moving the
> > complexity into the allocator.
> 
> What complexity does it introduce that the allocator doesn't already
> have or needs to provide for the single call interface to work?

I don't see it as any more or less complex than a single interface.

> > A single interface that would move a part of a file at a time has the
> > advantage that a large file which is only fragmented in a few areas does
> > not need to be completely moved.
> 
> And the two-step process can do exactly this as well - splice can
> work on any offset within the file...

I wasn't aware of that.  That makes your proposal sound a lot better.

> > > The allocation interface, OTOH, is anything but simple and is really
> > > a filesystem specific interface. Seems logical to me to separate
> > > the two. 
> > 
> > So what then is the benefit of having a simple generic data mover if
> > every file system needs to implement it's own interface to allocate a
> > copy of the data?
> 
> I assume you meant "....allocate the space to store the copy of the data."

Yeah.

> The allocation interface needs to be be able to be  extended
> independently of the data mover interface. XFS already exposes
> allocation ioctls to userspace for preallocation and we've got plans
> to extnd this further to allow userspace controlled allocation for
> smart defrag tools for XFS. Tying allocation to the data mover
> just makes the interface less flexible and harder to do anything
> smart with....

Okay.  It would be nice to standardize the interface so we don't have
every filesystem introducing new ioctls.

> Cheers,
> 
> Dave.
-- 
David Kleikamp
IBM Linux Technology Center

next prev parent reply	other threads:[~2006-10-24 16:26 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20061023122710.GA12034@atrey.karlin.mff.cuni.cz>
2006-10-23 14:16 ` [RFC] Ext3 online defrag Theodore Tso
2006-10-23 14:31   ` Alex Tomas
2006-10-23 14:48     ` Andreas Dilger
2006-10-23 14:55       ` Jan Kara
2006-10-23 14:51     ` Jan Kara
2006-10-23 15:01     ` Eric Sandeen
2006-10-24  4:14     ` Jeff Garzik
2006-10-24 13:59       ` David Chinner
2006-10-24 14:51         ` Dave Kleikamp
2006-10-24 16:01           ` David Chinner
2006-10-24 16:26             ` Dave Kleikamp [this message]
2006-10-25  1:18               ` David Chinner
2006-10-25  2:30                 ` Barry Naujok
2006-10-25  2:42                   ` Jeff Garzik
2006-10-25  4:27                     ` David Chinner
2006-10-25  4:48                       ` Jeff Garzik
2006-10-25  5:38                         ` David Chinner
2006-10-25  6:01                           ` Jeff Garzik
2006-10-25  8:11                             ` David Chinner
2006-10-25 17:00                               ` Jeff Garzik
2006-10-26  1:40                                 ` David Chinner
2006-10-26  3:33                                   ` Theodore Tso
2006-10-26  6:36                                     ` David Chinner
2006-10-26 13:37                                       ` Theodore Tso
2006-10-26 14:40                                         ` Dave Kleikamp
2006-10-26 11:37                                   ` Jan Kara
2006-10-27  1:32                                     ` David Chinner
2006-10-24 14:52         ` Eric Sandeen
2006-10-24 19:44         ` Theodore Tso
2006-10-24 20:31           ` Russell Cattelan
2006-10-24 23:00           ` Andreas Dilger
2006-10-25 14:54             ` Jan Kara
2006-10-25 17:02               ` Jeff Garzik
2006-10-25 17:58                 ` Jan Kara
2006-10-25 18:08                   ` Jeff Garzik
2006-10-25 18:25                     ` Jan Kara
2006-10-25 18:33                       ` Jeff Garzik
2006-10-26  9:30               ` Andreas Dilger
2006-10-25  2:09           ` David Chinner
2006-10-23 14:45   ` Jan Kara
2006-10-23 15:14   ` Andreas Dilger
2006-10-23 16:03     ` Jan Kara
2006-10-23 17:29       ` Andreas Dilger
2006-10-25 18:36         ` Jan Kara
2006-10-25 18:41           ` Jeff Garzik
2006-10-26 15:25             ` Jörn Engel
2006-10-24  4:13 ` Jeff Garzik
2006-10-24  4:21 ` Chris Wedgwood
2006-10-24 10:09   ` Jan Kara
2006-10-27  7:23 sho
2006-10-27  7:44 ` Alex Tomas
2006-10-27 13:53   ` Eric Sandeen
2006-10-27 14:05     ` Alex Tomas
2006-10-27 14:24       ` Eric Sandeen
2006-10-27 14:39         ` Alex Tomas
2006-11-15  9:54   ` Takashi Sato

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1161707186.20134.26.camel@kleikamp.austin.ibm.com \
    --to=shaggy@austin.ibm.com \
    --cc=alex@clusterfs.com \
    --cc=dgc@sgi.com \
    --cc=jack@suse.cz \
    --cc=jeff@garzik.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).