From: Andreas Dilger <adilger@clusterfs.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [RFC] Ext3 online defrag
Date: Mon, 23 Oct 2006 09:14:47 -0600 [thread overview]
Message-ID: <20061023151447.GL3509@schatzie.adilger.int> (raw)
In-Reply-To: <20061023141641.GA29649@thunk.org>
On Oct 23, 2006 10:16 -0400, Theodore Tso wrote:
> As a suggestion, I would pass the inode number and inode generation
> number into the ext3_file_mode_data array:
>
> struct ext3_file_move_data {
> int extents;
> struct ext3_reloc_extent __user *ext_array;
> };
>
> This will be much more efficient for the userspace relocator, since it
> won't need to translate from an inode number to a pathname, and then
> try to open the file before relocating it.
>
> I'd also use an explicit 64-bit block numbers type so that we don't
> have to worry about the ABI changing when we support 64-bit block
> numbers.
I would in fact go so far as to allow only a single extent to be specified
per call. This is to avoid the passing of any pointers as part of the
interface (hello ioctl police :-), and also makes the kernel code simpler.
I don't think the syscall/ioctl overhead is significant compared to the
journal and IO overhead.
Also, I would specify both the source extent and the target extent in
the inode. This first allows defragmenting only part of the file
instead of (it appears) requiring the whole file to be relocated. That
would be a killer if the file being defragmented is larger than free
space. It secondly provides a level of insurance that what the kernel
is relocating matches what userspace thinks it is doing. It would
protect against problems if the kernel ever does block relocation
itself (e.g. merge fragments into a single extent on (re)write, or for
snapshot/COW).
> The other problem I see with this patch is that there will be cache
> coherency problems between the buffer cache and the page cache. I
> think you will want to pull the data blocks of the file into the page
> cache, and then write them out from the page cache, and only *then*
> update the indirect blocks and commit the transaction.
Alternately (maybe even better) is to treat it as O_DIRECT and ensure
the page cache is flushed. This also avoids polluting the whole page
cache while running a defragmenter on the filesystem.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
next prev parent reply other threads:[~2006-10-23 15:14 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20061023122710.GA12034@atrey.karlin.mff.cuni.cz>
2006-10-23 14:16 ` [RFC] Ext3 online defrag Theodore Tso
2006-10-23 14:31 ` Alex Tomas
2006-10-23 14:48 ` Andreas Dilger
2006-10-23 14:55 ` Jan Kara
2006-10-23 14:51 ` Jan Kara
2006-10-23 15:01 ` Eric Sandeen
2006-10-24 4:14 ` Jeff Garzik
2006-10-24 13:59 ` David Chinner
2006-10-24 14:51 ` Dave Kleikamp
2006-10-24 16:01 ` David Chinner
2006-10-24 16:26 ` Dave Kleikamp
2006-10-25 1:18 ` David Chinner
2006-10-25 2:30 ` Barry Naujok
2006-10-25 2:42 ` Jeff Garzik
2006-10-25 4:27 ` David Chinner
2006-10-25 4:48 ` Jeff Garzik
2006-10-25 5:38 ` David Chinner
2006-10-25 6:01 ` Jeff Garzik
2006-10-25 8:11 ` David Chinner
2006-10-25 17:00 ` Jeff Garzik
2006-10-26 1:40 ` David Chinner
2006-10-26 3:33 ` Theodore Tso
2006-10-26 6:36 ` David Chinner
2006-10-26 13:37 ` Theodore Tso
2006-10-26 14:40 ` Dave Kleikamp
2006-10-26 11:37 ` Jan Kara
2006-10-27 1:32 ` David Chinner
2006-10-24 14:52 ` Eric Sandeen
2006-10-24 19:44 ` Theodore Tso
2006-10-24 20:31 ` Russell Cattelan
2006-10-24 23:00 ` Andreas Dilger
2006-10-25 14:54 ` Jan Kara
2006-10-25 17:02 ` Jeff Garzik
2006-10-25 17:58 ` Jan Kara
2006-10-25 18:08 ` Jeff Garzik
2006-10-25 18:25 ` Jan Kara
2006-10-25 18:33 ` Jeff Garzik
2006-10-26 9:30 ` Andreas Dilger
2006-10-25 2:09 ` David Chinner
2006-10-23 14:45 ` Jan Kara
2006-10-23 15:14 ` Andreas Dilger [this message]
2006-10-23 16:03 ` Jan Kara
2006-10-23 17:29 ` Andreas Dilger
2006-10-25 18:36 ` Jan Kara
2006-10-25 18:41 ` Jeff Garzik
2006-10-26 15:25 ` Jörn Engel
2006-10-24 4:13 ` Jeff Garzik
2006-10-24 4:21 ` Chris Wedgwood
2006-10-24 10:09 ` Jan Kara
2006-10-27 7:23 sho
2006-10-27 7:44 ` Alex Tomas
2006-10-27 13:53 ` Eric Sandeen
2006-10-27 14:05 ` Alex Tomas
2006-10-27 14:24 ` Eric Sandeen
2006-10-27 14:39 ` Alex Tomas
2006-11-15 9:54 ` Takashi Sato
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061023151447.GL3509@schatzie.adilger.int \
--to=adilger@clusterfs.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).