public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Y. Ts'o" <tytso@mit.edu>
To: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: attempt to shrink directory on dentry removal
Date: Fri, 23 Aug 2019 22:31:10 -0400	[thread overview]
Message-ID: <20190824023110.GB19348@mit.edu> (raw)
In-Reply-To: <20190821182740.97127-1-harshadshirwadkar@gmail.com>

On Wed, Aug 21, 2019 at 11:27:40AM -0700, Harshad Shirwadkar wrote:
> As of now, we only support non-indexed directories and indexed
> directories with no intermediate dx nodes. 

From my testing, it doen't work on non-indexed directories; the
problem is in the is_empty_dirent_block() function.

> This technique can also be used to remove intermediate dx nodes. But
> it needs a little more interesting logic to make that happen since
> we don't store directory entry name in intermediate nodes.

Actually, that's not the problem.  An empty dx node is not allowed; in
a two-level htree directory, if removing an empty leaf node becomes
causes its parent dx node to become empty, we need to remove the
parent dx node immediately.  Which is fine, because when we did the
dx_probe, we know the path from the root to the leaf node.

But removing the parent dx node means we need to be able to remove an
empty block from the directory, regardless whether it is at the end of
the directory or not.  OK, so how to do that?

First of all, how to do find any directory block's parent?  If it is a
leaf node, we can simply calculate the hash on the first directory
entry (whether it is "dead" or not), and then do a lookup based on
that hash.  For an intermediate dx node, we can do the same thing,
since a dx node is simply a list of hashes and block numbers.  The
first hash in the dx node can be used to do a htree lookup, and that
will give us the full path from the root of the htree to the dx node.

So, suppose we delete a file, and that causes a leaf node to become
empty.  We can actually shrink the directory right then and there, and
not wait until the last block in the directory beomes empty.  How do we do that?

1) We'll call the logical block number of that empty leaf block Empty,
   and the block number of the parent of that empty leaf block Parent(Empty).
   We'll also call the logical block number of the last entry in the
   directory, Last.

2) Remove the pointer to Empty from Parent(Empty).  If that causes
   Parent(Empty) to become empty, we also need to remove the
   reference to Parent(Empty) in Parent(Parent(Empty)).  (And for 3
   level htree directories, which are present in Largedir
   directories, we might also need to do that one more level up.)

3) To free the directory block Empty, if Empty != Last, we copy the
   contents of Last into the directory block Empty, and then determine
   Parent(Last), and find the pointer to Last, and update it to be
   Empty.  At this point, the last directory block is not in use, so
   we can release the last directory block, and shrink the size of the
   directory by one block.

4) If we need to free Parent(Empty) because it was emptied by step 2,
   follow the procedure in step 3.

This has a couple of benefits.  The first is that it will work for
large dx directories, where directory shrinking is most needed.
Secondly, also allows the directory shrinking to take gradually place
as the directory is emptied, instead waiting until last directory
block is empty.  (And for multi-level dx directories that have been
optimized via e2fsck -fD, the above is needed to allow shrinking from
the end won't work at all.  Why that is the case is left as an
exercise to the reader.  Hint: try creating a very large directory and
optimize it using e2fsck -fD, and see where the index nodes end up.)

BTW, for non-indexed directories, we must always shrink from the end.
We can't play games with swapping with the last directory block,
unless we can guarantee that (a) there are no open fd's on the
directories (since there might be telldir/seekdir cookies that have to
stay valid), and (b) the directory isn't exported via NFS.  Given that
establishes these guarantees is tricky, and most of the file systems
we care about will have indexing enabled, I'm not *that* concerned
about making directory shrinking working well for non-indexed
directories.  (Which will tend to only exist from ext2 file systems.)


+static inline bool is_empty_dirent_block(struct inode *dir,
+					 struct buffer_head *bh)
+{
+	struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *)bh->b_data;
+	int	csum_size = 0;
+
+	if (ext4_has_metadata_csum(dir->i_sb))
+		csum_size = sizeof(struct ext4_dir_entry_tail);

The dx_tail only exists for indexed directories.  So the if statement
should read:

	if (ext4_has_metadata_csum(dir->i_sb) && is_dx(dir))


> +	/*
> +	 * If i was 0 when we began above loop, we would have overwritten count
> +	 * and limit values sin ce those values live in dx_entry->hash of the

s/sin ce/since/


> +	/*
> +	 * Read blocks from directory in reverse orders and clean them up one by
> +	 * one!
> +	 */

s/reverse orders/reverse order/


> +		info = &((struct dx_root *)frames[0].bh->b_data)->info;
> +		if (info->indirect_levels > 0) {
> +			/*
> +			 * We don't shrink in this case. That's because the
> +			 * block that we just read could very well be an index
> +			 * block. If it's an index block, we need to do special
> +			 * handling to delete the block. 

Please delete this comment, and replace it with "we don't support dx
directories with a depth > 2 for now".  See the above discussion...

Cheers,

					- Ted

  reply	other threads:[~2019-08-24  2:31 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 18:27 [PATCH] ext4: attempt to shrink directory on dentry removal Harshad Shirwadkar
2019-08-24  2:31 ` Theodore Y. Ts'o [this message]
2019-08-26  2:46   ` harshad shirwadkar
2019-08-26  3:19     ` Theodore Y. Ts'o
2019-08-26  5:07 ` Andreas Dilger
2019-08-26 21:46   ` harshad shirwadkar
2019-08-27  2:38     ` Theodore Y. Ts'o
2019-08-27 20:53     ` Andreas Dilger
2019-08-27  2:27   ` Theodore Y. Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190824023110.GB19348@mit.edu \
    --to=tytso@mit.edu \
    --cc=harshadshirwadkar@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox