From: Andreas Dilger <adilger@sun.com>
To: Andreas Schlick <schlick@lavabit.com>
Cc: Theodore Tso <tytso@mit.edu>, linux-ext4@vger.kernel.org
Subject: Re: [PATCH 1/1] dir shrink (was Re: ext3/ext4 directories don't shrink after deleting lots of files)
Date: Sat, 22 Aug 2009 21:10:39 -0600 [thread overview]
Message-ID: <20090823031039.GF5931@webber.adilger.int> (raw)
In-Reply-To: <200908221620.50103.schlick@lavabit.com>
On Aug 22, 2009 16:20 +0200, Andreas Schlick wrote:
> I'd like to try it. It looks like a nice starting project.
> Following your outline the first version of the patch tries to remove an
> empty block at the end of a non-htree directory.
> I'd appreciate it if you checked it and gave me suggestions for improving it.
Adding the extra "dc" to each of the functions probably isn't necessary,
as this makes the API messier. Probably a better approach would be to
just do this within ext4_delete_entry(), analogous to how ext4_add_entry()
might add a new block at any time.
It would be even better if this could be done repeatedly if there are
more empty blocks at the end (i.e. they were not previously at the end
of the file), but that gets into trouble with the transactions. It isn't
easy to remove an intermediate block, because this will result in a hole
in the directory (a no-no), and there is no safe way to reorder the
blocks in the directory.
> At the moment I am looking at the dir_index code, so I can extend it to htree
> directories. Please let me know if you want me to port it to ext3, although
> personally I think it is better to do so at later point.
For dir_index what is important is that you don't have any holes in the
hash space, nor in the logical directory blocks. One possibility is in
the case where the direntry being removed is the last one[*] to remove
the block it resides in, move the last block to the current logical
offset, and update the htree index to reflect this.
Note that the htree index only records the starting hash value for each
block, so all that would need to be done to remove any mention of the
deleted block is to memmove() the entries to cover the deleted block and
the hash buckets will still be correct. Also, the logical block number
of the last entry would need to be changed to reflect its new position.
[*] This is easily determined in ext4_delete_entry() because it always
walks the block until it finds the entry, and if there are valid
entries before the one being deleted the block is not empty. Tracking
this takes basically no extra effort. If no valid entries are before
the one being deleted, and if the length of the entry after it fills
the rest of the space in the block then the block is empty.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
next prev parent reply other threads:[~2009-08-23 3:10 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1242338523.6933.664.camel@timo-desktop>
[not found] ` <1b7401870905141732v43bd7321g1f0d9721b5e3f761@mail.gmail.com>
[not found] ` <605A8D56-81CD-4775-8FCD-58CDB12CBA36@iki.fi>
2009-05-17 21:33 ` ext3/ext4 directories don't shrink after deleting lots of files Theodore Tso
2009-05-18 2:49 ` david
2009-05-18 3:21 ` Theodore Tso
2009-08-22 14:20 ` [PATCH 1/1] dir shrink (was Re: ext3/ext4 directories don't shrink after deleting lots of files) Andreas Schlick
2009-08-23 3:10 ` Andreas Dilger [this message]
2009-08-28 22:18 ` Andreas Schlick
2009-08-28 23:11 ` Andreas Dilger
2009-08-30 19:15 ` [PATCH 1/1] dir shrink Andreas Schlick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090823031039.GF5931@webber.adilger.int \
--to=adilger@sun.com \
--cc=linux-ext4@vger.kernel.org \
--cc=schlick@lavabit.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).