From: Andreas Dilger <adilger@clusterfs.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Valerie Clement <valerie.clement@bull.net>,
ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: Large File Deletion Comparison (ext3, ext4, XFS)
Date: Fri, 27 Apr 2007 14:33:11 -0600 [thread overview]
Message-ID: <20070427203311.GI5967@schatzie.adilger.int> (raw)
In-Reply-To: <20070427183345.GJ24852@thunk.org>
On Apr 27, 2007 14:33 -0400, Theodore Tso wrote:
> > Here are the results obtained with a not very fragmented 100-GB file:
> >
> > | ext3 ext4 + extents xfs
> > ------------------------------------------------------------
> > nb of fragments | 796 798 15
> > elapsed time | 2m0.306s 0m11.127s 0m0.553s
> > |
> > blks read | 206600 6416 352
> > blks written | 13592 13064 104
> > ------------------------------------------------------------
>
> The metablockgroups feature should help the file fragmentation level
> with extents. It's easy enough to enable this for ext4 (we just need
> to remove some checks in ext4_check_descriptors), so we should just do
> it.
I agree in this case that the META_BG feature would help here (100GB / 128MB
is in fact the 800 fragments shown), I don't think that is the major
performance hit.
The fact that we need to read 6000 blocks and write 13000 blocks is the
more serious part. I assume that since there are only 800 fragments
there should be only 800 extents. We can fit (4096 / 12 - 1) = 340
extents into each block, and 4 index blocks into the inode, so this
should allow all 800 extents in only 3 index blocks. It would be useful
to know where those 6416 block reads are going in the extent case.
I suspect that is because the "tail first" truncation mechanism of ext3
causes it to zero out FAR more blocks than needed. With extents and a
default 128MB journal we should be able to truncate + unlink a file with
only writes to the inode and the 800 bitmap + gdt blocks. The reads should
also be limited to the bitmap blocks and extent indexes (gdt being read at
mount time).
What is needed is for truncate to walk the inode block tree (extents or
indirect blocks) and count the bitmaps + gdt blocks dirtied, and then try
and do the whole truncate under a single transaction. That avoids any need
for truncate to be "restartable" and then there is no need to zero out the
indirect blocks from the end one-at-a-time.
Doing the bitmap read/write will definitely be more efficient with META_BG,
but that doesn't explain the other 19k blocks undergoing IO.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
next prev parent reply other threads:[~2007-04-27 20:33 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-27 13:41 Large File Deletion Comparison (ext3, ext4, XFS) Valerie Clement
2007-04-27 18:33 ` Theodore Tso
2007-04-27 20:33 ` Andreas Dilger [this message]
2007-04-27 18:51 ` Alex Tomas
2007-04-27 20:38 ` Andreas Dilger
2007-04-27 20:48 ` Alex Tomas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070427203311.GI5967@schatzie.adilger.int \
--to=adilger@clusterfs.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=valerie.clement@bull.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).