Re: Large File Deletion Comparison (ext3, ext4, XFS)

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andreas Dilger <adilger@clusterfs.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Valerie Clement <valerie.clement@bull.net>,
	ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: Large File Deletion Comparison (ext3, ext4, XFS)
Date: Fri, 27 Apr 2007 14:33:11 -0600	[thread overview]
Message-ID: <20070427203311.GI5967@schatzie.adilger.int> (raw)
In-Reply-To: <20070427183345.GJ24852@thunk.org>

On Apr 27, 2007  14:33 -0400, Theodore Tso wrote:
> > Here are the results obtained with a not very fragmented 100-GB file:
> > 
> >                  |     ext3       ext4 + extents      xfs
> > ------------------------------------------------------------
> >  nb of fragments |     796             798             15
> >  elapsed time    |  2m0.306s        0m11.127s       0m0.553s
> >                  |
> >  blks read       |  206600            6416            352
> >  blks written    |   13592           13064            104
> > ------------------------------------------------------------
> 
> The metablockgroups feature should help the file fragmentation level
> with extents.  It's easy enough to enable this for ext4 (we just need
> to remove some checks in ext4_check_descriptors), so we should just do
> it.

I agree in this case that the META_BG feature would help here (100GB / 128MB
is in fact the 800 fragments shown), I don't think that is the major
performance hit.

The fact that we need to read 6000 blocks and write 13000 blocks is the
more serious part.  I assume that since there are only 800 fragments
there should be only 800 extents.  We can fit (4096 / 12 - 1) = 340
extents into each block, and 4 index blocks into the inode, so this
should allow all 800 extents in only 3 index blocks.  It would be useful
to know where those 6416 block reads are going in the extent case.

I suspect that is because the "tail first" truncation mechanism of ext3
causes it to zero out FAR more blocks than needed.  With extents and a
default 128MB journal we should be able to truncate + unlink a file with
only writes to the inode and the 800 bitmap + gdt blocks.  The reads should
also be limited to the bitmap blocks and extent indexes (gdt being read at
mount time).

What is needed is for truncate to walk the inode block tree (extents or
indirect blocks) and count the bitmaps + gdt blocks dirtied, and then try
and do the whole truncate under a single transaction.  That avoids any need
for truncate to be "restartable" and then there is no need to zero out the
indirect blocks from the end one-at-a-time.

Doing the bitmap read/write will definitely be more efficient with META_BG,
but that doesn't explain the other 19k blocks undergoing IO.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

next prev parent reply	other threads:[~2007-04-27 20:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-27 13:41 Large File Deletion Comparison (ext3, ext4, XFS) Valerie Clement
2007-04-27 18:33 ` Theodore Tso
2007-04-27 20:33   ` Andreas Dilger [this message]
2007-04-27 18:51 ` Alex Tomas
2007-04-27 20:38 ` Andreas Dilger
2007-04-27 20:48   ` Alex Tomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070427203311.GI5967@schatzie.adilger.int \
    --to=adilger@clusterfs.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=valerie.clement@bull.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).