* Ext2/3 block remapping tool
@ 2007-04-26 19:29 Jan Kara
2007-04-27 18:09 ` Andreas Dilger
0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2007-04-26 19:29 UTC (permalink / raw)
To: linux-fsdevel
Hello,
I've been lately playing with remapping ext2/ext3 blocks (especially how
much it can give us in terms of speed of things like KDE start). For that
I've written two simple tools (you can get them from
ftp.suse.com/pub/people/jack/ext3remapper.tar.gz):
e2block2file to transform (preparsed) output from blktrace into a list
of accessed files and offsets accessed
e2remapblocks to use output from e2block2file and remap blocks into big
chunks in the order in which they were accessed.
(see README in the tools archive for more details)
So far the tools (especially e2remapblocks ;) work on unmounted
filesystem. The ultimate goal is to be able to do similar things for
mounted filesystems but I wanted to see whether block remapping is worth it
and what kernel interfaces would be useful for achieving the goal.
BTW, the results for KDE startup are as follows:
The root partition was about 4.8 GB with around 1 GB free. System has
1GB mem. All measurements (except for warmcache) were performed after
sync; echo 3 >/proc/sys/vm/drop_caches
Ordinary start: 19.2 20.3 19.5 19.8 19.3; avg. 19.62
Start with all data cached: 7 7.6 7.3 7.1 7.1; avg. 7.22
Start with fcache (see thread http://lkml.org/lkml/2006/5/15/46 for details
on fcache):
11.3 11 10.3 10.8 10.6; avg. 10.8
Start with blocks remapped with e2remapblocks:
13.5 15 13 14.5 14.5; avg. 14.1
(after remapping, data was stored in 20 continguous extents on disk)
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext2/3 block remapping tool
2007-04-26 19:29 Ext2/3 block remapping tool Jan Kara
@ 2007-04-27 18:09 ` Andreas Dilger
2007-04-30 10:12 ` Jan Kara
2007-04-30 12:09 ` Theodore Tso
0 siblings, 2 replies; 9+ messages in thread
From: Andreas Dilger @ 2007-04-27 18:09 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-fsdevel
On Apr 26, 2007 21:29 +0200, Jan Kara wrote:
> I've been lately playing with remapping ext2/ext3 blocks (especially how
> much it can give us in terms of speed of things like KDE start). For that
> I've written two simple tools (you can get them from
> ftp.suse.com/pub/people/jack/ext3remapper.tar.gz):
> e2block2file to transform (preparsed) output from blktrace into a list
> of accessed files and offsets accessed
> e2remapblocks to use output from e2block2file and remap blocks into big
> chunks in the order in which they were accessed.
Does it map the whole file contiguously, or does it interleave blocks of the
file in the order they are accessed? I would hope that it maps the whole
file contiguously, and let readahead work properly to fetch the whole file.
Also, keeping the file contiguous avoids fragmentation later if that file is
updated, deleted, etc, and conflicts with allocator/defrag/etc.
> (see README in the tools archive for more details)
>
> So far the tools (especially e2remapblocks ;) work on unmounted
> filesystem. The ultimate goal is to be able to do similar things for
> mounted filesystems but I wanted to see whether block remapping is worth it
> and what kernel interfaces would be useful for achieving the goal.
I'd prefer that such functionality be integrated with Takashi's online
defrag tool, since it needs virtually the same functionality. For that
matter, this is also very similar to the block-mapped -> extents tool
from Aneesh. It doesn't make sense to have so many separate tools for
users, especially if they start interfering with each other (i.e. defrag
undoes the remapping done by your tool).
> BTW, the results for KDE startup are as follows:
> The root partition was about 4.8 GB with around 1 GB free. System has
> 1GB mem. All measurements (except for warmcache) were performed after
> sync; echo 3 >/proc/sys/vm/drop_caches
>
> Ordinary start: 19.2 20.3 19.5 19.8 19.3; avg. 19.62
> Start with all data cached: 7 7.6 7.3 7.1 7.1; avg. 7.22
> Start with fcache (see thread http://lkml.org/lkml/2006/5/15/46 for details
> on fcache):
> 11.3 11 10.3 10.8 10.6; avg. 10.8
> Start with blocks remapped with e2remapblocks:
> 13.5 15 13 14.5 14.5; avg. 14.1
> (after remapping, data was stored in 20 continguous extents on disk)
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext2/3 block remapping tool
2007-04-27 18:09 ` Andreas Dilger
@ 2007-04-30 10:12 ` Jan Kara
2007-04-30 12:09 ` Theodore Tso
1 sibling, 0 replies; 9+ messages in thread
From: Jan Kara @ 2007-04-30 10:12 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Andreas Dilger
On Fri 27-04-07 12:09:42, Andreas Dilger wrote:
> On Apr 26, 2007 21:29 +0200, Jan Kara wrote:
> > I've been lately playing with remapping ext2/ext3 blocks (especially how
> > much it can give us in terms of speed of things like KDE start). For that
> > I've written two simple tools (you can get them from
> > ftp.suse.com/pub/people/jack/ext3remapper.tar.gz):
> > e2block2file to transform (preparsed) output from blktrace into a list
> > of accessed files and offsets accessed
> > e2remapblocks to use output from e2block2file and remap blocks into big
> > chunks in the order in which they were accessed.
>
> Does it map the whole file contiguously, or does it interleave blocks of the
> file in the order they are accessed? I would hope that it maps the whole
> file contiguously, and let readahead work properly to fetch the whole file.
> Also, keeping the file contiguous avoids fragmentation later if that file is
> updated, deleted, etc, and conflicts with allocator/defrag/etc.
No, it does interleave blocks of different files. Reading the whole file
is exactly what you often don't want. During startup KDE (which was my
benchmark) accesses basically two things: shared libraries and config files / icons.
Config files and icons usually fit into a single block so just mapping them
in the right order close together is fine. On the other hand, shared
libraries are large and you usually need just a few blocks scattered all
over them. So here we just remap those few blocks we need...
I see the downsides of this approach. If the file is rewritten, you
loose the tight packing, but this is not going to happen often. I'm more
seriously concerned about the possibility, that this optimizatiom of
startup time may hurt running performace or more probably performance of
other apps...
> > (see README in the tools archive for more details)
> >
> > So far the tools (especially e2remapblocks ;) work on unmounted
> > filesystem. The ultimate goal is to be able to do similar things for
> > mounted filesystems but I wanted to see whether block remapping is worth it
> > and what kernel interfaces would be useful for achieving the goal.
>
> I'd prefer that such functionality be integrated with Takashi's online
> defrag tool, since it needs virtually the same functionality. For that
Yes, definitely these two have quite similar needs and I'd like to have
just one tool in the end.
> matter, this is also very similar to the block-mapped -> extents tool
> from Aneesh. It doesn't make sense to have so many separate tools for
> users, especially if they start interfering with each other (i.e. defrag
> undoes the remapping done by your tool).
Agreed.
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext2/3 block remapping tool
2007-04-27 18:09 ` Andreas Dilger
2007-04-30 10:12 ` Jan Kara
@ 2007-04-30 12:09 ` Theodore Tso
2007-04-30 12:29 ` Jan Kara
2007-05-01 6:01 ` Andreas Dilger
1 sibling, 2 replies; 9+ messages in thread
From: Theodore Tso @ 2007-04-30 12:09 UTC (permalink / raw)
To: Jan Kara, linux-fsdevel, linux-ext4
On Fri, Apr 27, 2007 at 12:09:42PM -0600, Andreas Dilger wrote:
> I'd prefer that such functionality be integrated with Takashi's online
> defrag tool, since it needs virtually the same functionality. For that
> matter, this is also very similar to the block-mapped -> extents tool
> from Aneesh. It doesn't make sense to have so many separate tools for
> users, especially if they start interfering with each other (i.e. defrag
> undoes the remapping done by your tool).
Yep, in fact, I'm really glad that Jan is working on the remapping
tool because if the on-line defrag kernel interfaces don't have the
right support for it, then that means we need to fix the on-line
defrag patches. :-)
While we're at it, someone want to start thinking about on-line
shrinking of ext4 filesystems? Again, the same block remapping
interfaces for defrag and file access optimizations should also be
useful for shrinking filesystems (even if some of the files that need
to be relocated are being actively used). If not, that probably means
we got the interface wrong.
- Ted
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext2/3 block remapping tool
2007-04-30 12:09 ` Theodore Tso
@ 2007-04-30 12:29 ` Jan Kara
2007-05-01 6:01 ` Andreas Dilger
1 sibling, 0 replies; 9+ messages in thread
From: Jan Kara @ 2007-04-30 12:29 UTC (permalink / raw)
To: Theodore Tso; +Cc: linux-fsdevel, linux-ext4
On Mon 30-04-07 08:09:30, Theodore Tso wrote:
> On Fri, Apr 27, 2007 at 12:09:42PM -0600, Andreas Dilger wrote:
> > I'd prefer that such functionality be integrated with Takashi's online
> > defrag tool, since it needs virtually the same functionality. For that
> > matter, this is also very similar to the block-mapped -> extents tool
> > from Aneesh. It doesn't make sense to have so many separate tools for
> > users, especially if they start interfering with each other (i.e. defrag
> > undoes the remapping done by your tool).
>
> Yep, in fact, I'm really glad that Jan is working on the remapping
> tool because if the on-line defrag kernel interfaces don't have the
> right support for it, then that means we need to fix the on-line
> defrag patches. :-)
;-) Exactly that was the reason why I wrote the userspace program - so
that I have something in hands when we start discussing how the kernel
interface will look like.
> While we're at it, someone want to start thinking about on-line
> shrinking of ext4 filesystems? Again, the same block remapping
> interfaces for defrag and file access optimizations should also be
> useful for shrinking filesystems (even if some of the files that need
> to be relocated are being actively used). If not, that probably means
> we got the interface wrong.
Yes, that's a good idea. Currently it seems to me that block+inode
relocation (we also need for defrag) would be enough to support filesystem
shrinking. Actually, in some ancient times (like 6-7 years ago) I had
written ext2 online filesystem shrinking. Currently, the patch is probably
unusably obsolete but I can still dig it out and look what functions did I
need at that time.
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext2/3 block remapping tool
2007-04-30 12:09 ` Theodore Tso
2007-04-30 12:29 ` Jan Kara
@ 2007-05-01 6:01 ` Andreas Dilger
2007-05-01 15:28 ` Theodore Tso
1 sibling, 1 reply; 9+ messages in thread
From: Andreas Dilger @ 2007-05-01 6:01 UTC (permalink / raw)
To: Theodore Tso; +Cc: Jan Kara, linux-fsdevel, linux-ext4
On Apr 30, 2007 08:09 -0400, Theodore Tso wrote:
> On Fri, Apr 27, 2007 at 12:09:42PM -0600, Andreas Dilger wrote:
> > I'd prefer that such functionality be integrated with Takashi's online
> > defrag tool, since it needs virtually the same functionality. For that
> > matter, this is also very similar to the block-mapped -> extents tool
> > from Aneesh. It doesn't make sense to have so many separate tools for
> > users, especially if they start interfering with each other (i.e. defrag
> > undoes the remapping done by your tool).
>
> While we're at it, someone want to start thinking about on-line
> shrinking of ext4 filesystems? Again, the same block remapping
> interfaces for defrag and file access optimizations should also be
> useful for shrinking filesystems (even if some of the files that need
> to be relocated are being actively used). If not, that probably means
> we got the interface wrong.
Except one other issue with online shrinking is that we need to move
inodes on occasion and this poses a bunch of other problems over just
remapping the data blocks.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext2/3 block remapping tool
2007-05-01 6:01 ` Andreas Dilger
@ 2007-05-01 15:28 ` Theodore Tso
2007-05-01 18:52 ` Andreas Dilger
0 siblings, 1 reply; 9+ messages in thread
From: Theodore Tso @ 2007-05-01 15:28 UTC (permalink / raw)
To: Jan Kara, linux-fsdevel, linux-ext4
On Tue, May 01, 2007 at 12:01:42AM -0600, Andreas Dilger wrote:
> Except one other issue with online shrinking is that we need to move
> inodes on occasion and this poses a bunch of other problems over just
> remapping the data blocks.
Well, I did say "necessary", and not "sufficient". But yes, moving
inodes, especially if the inode is currently open gets interesting. I
don't think there are that many user space applications that would
notice or care if the st_ino of an open file changed out from under
them, but there are obviously userspace applications, such as tar,
that would most definitely care.
- Ted
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext2/3 block remapping tool
2007-05-01 15:28 ` Theodore Tso
@ 2007-05-01 18:52 ` Andreas Dilger
2007-05-01 22:18 ` Theodore Tso
0 siblings, 1 reply; 9+ messages in thread
From: Andreas Dilger @ 2007-05-01 18:52 UTC (permalink / raw)
To: Theodore Tso; +Cc: Jan Kara, linux-fsdevel, linux-ext4
On May 01, 2007 11:28 -0400, Theodore Tso wrote:
> On Tue, May 01, 2007 at 12:01:42AM -0600, Andreas Dilger wrote:
> > Except one other issue with online shrinking is that we need to move
> > inodes on occasion and this poses a bunch of other problems over just
> > remapping the data blocks.
>
> Well, I did say "necessary", and not "sufficient". But yes, moving
> inodes, especially if the inode is currently open gets interesting. I
> don't think there are that many user space applications that would
> notice or care if the st_ino of an open file changed out from under
> them, but there are obviously userspace applications, such as tar,
> that would most definitely care.
I think "rm -r" does a LOT of this kind of operation, like:
stat(.); stat(foo); chdir(foo); stat(.); unlink(*); chdir(..); stat(.)
I think "find" does the same to avoid security problems with malicious
path manipulation.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext2/3 block remapping tool
2007-05-01 18:52 ` Andreas Dilger
@ 2007-05-01 22:18 ` Theodore Tso
0 siblings, 0 replies; 9+ messages in thread
From: Theodore Tso @ 2007-05-01 22:18 UTC (permalink / raw)
To: Jan Kara, linux-fsdevel, linux-ext4
On Tue, May 01, 2007 at 12:52:49PM -0600, Andreas Dilger wrote:
> I think "rm -r" does a LOT of this kind of operation, like:
>
> stat(.); stat(foo); chdir(foo); stat(.); unlink(*); chdir(..); stat(.)
>
> I think "find" does the same to avoid security problems with malicious
> path manipulation.
Yep, so if you're doing an rm -rf (or any other recursive descent)
while we're doing an on-line shrink, it's going to fail. I suppose we
could have an in-core inode mapping table that would continue to remap
inode numbers until the next reboot. I'm not sure we would want to
keep the inode remapping indefinitely, although if we don't it could
also end up screwing up NFS as well. Not sure I care, though. :-)
- Ted
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-05-01 22:18 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-26 19:29 Ext2/3 block remapping tool Jan Kara
2007-04-27 18:09 ` Andreas Dilger
2007-04-30 10:12 ` Jan Kara
2007-04-30 12:09 ` Theodore Tso
2007-04-30 12:29 ` Jan Kara
2007-05-01 6:01 ` Andreas Dilger
2007-05-01 15:28 ` Theodore Tso
2007-05-01 18:52 ` Andreas Dilger
2007-05-01 22:18 ` Theodore Tso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).