From: Erez Zadok <ezk@cs.sunysb.edu>
To: akpm@linux-foundation.org
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
viro@ftp.linux.org.uk, hch@infradead.org,
Erez Zadok <ezk@cs.sunysb.edu>,
Himanshu Kanda <hkanda@cs.sunysb.edu>
Subject: [PATCH 06/14] Unionfs: reduce number of whiteouts by deleting all instances of files
Date: Tue, 1 Apr 2008 17:06:48 -0400 [thread overview]
Message-ID: <12070840193574-git-send-email-ezk@cs.sunysb.edu> (raw)
In-Reply-To: <12070840162442-git-send-email-ezk@cs.sunysb.edu>
Optimize the unlinking of non-dir objects in unionfs by deleting all
possible lower inode objects from all writable lower branches. This may
consume a bit more processing, but on average reduces overall inode
consumption and further saves a lot by reducing the total number of
whiteouts needed. We create a whiteout now only if we could not delete all
lower objects, or if one of the lower branches was explicitly marked
read-only.
Signed-off-by: Himanshu Kanda <hkanda@cs.sunysb.edu>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
Documentation/filesystems/unionfs/concepts.txt | 25 ++++++
fs/unionfs/lookup.c | 8 +-
fs/unionfs/unlink.c | 107 +++++++++++++++---------
3 files changed, 97 insertions(+), 43 deletions(-)
diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
index 8d9a1c5..0bc1a19 100644
--- a/Documentation/filesystems/unionfs/concepts.txt
+++ b/Documentation/filesystems/unionfs/concepts.txt
@@ -62,6 +62,31 @@ simplest solution is to take the instance from the highest priority
(numerically lowest value) and "hide" the others.
+Unlinking:
+=========
+
+Unlink operation on non-directory instances is optimized to remove the
+maximum possible objects in case multiple underlying branches have the same
+file name. The unlink operation will first try to delete file instances
+from highest priority branch and then move further to delete from remaining
+branches in order of their decreasing priority. Consider a case (F..D..F),
+where F is a file and D is a directory of the same name; here, some
+intermediate branch could have an empty directory instance with the same
+name, so this operation also tries to delete this directory instance and
+proceed further to delete from next possible lower priority branch. The
+unionfs unlink operation will smoothly delete the files with same name from
+all possible underlying branches. In case if some error occurs, it creates
+whiteout in highest priority branch that will hide file instance in rest of
+the branches. An error could occur either if an unlink operations in any of
+the underlying branch failed or if a branch has no write permission.
+
+This unlinking policy is known as "delete all" and it has the benefit of
+overall reducing the number of inodes used by duplicate files, and further
+reducing the total number of inodes consumed by whiteouts. The cost is of
+extra processing, but testing shows this extra processing is well worth the
+savings.
+
+
Copyup:
=======
diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
index 755158e..7618716 100644
--- a/fs/unionfs/lookup.c
+++ b/fs/unionfs/lookup.c
@@ -264,7 +264,9 @@ struct dentry *unionfs_lookup_backend(struct dentry *dentry,
* branches, but we have to skip non-dirs (to avoid, say,
* calling readdir on a regular file).
*/
- if (!S_ISDIR(lower_dentry->d_inode->i_mode) && dentry_count) {
+ if ((lookupmode != INTERPOSE_PARTIAL) &&
+ !S_ISDIR(lower_dentry->d_inode->i_mode) &&
+ dentry_count) {
dput(lower_dentry);
continue;
}
@@ -295,10 +297,6 @@ struct dentry *unionfs_lookup_backend(struct dentry *dentry,
continue;
if (dentry_count == 1)
goto out_positive;
- /* This can only happen with mixed D-*-F-* */
- BUG_ON(!S_ISDIR(unionfs_lower_dentry(dentry)->
- d_inode->i_mode));
- continue;
}
opaque = is_opaque_dir(dentry, bindex);
diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
index 6e93da3..c66bb3e 100644
--- a/fs/unionfs/unlink.c
+++ b/fs/unionfs/unlink.c
@@ -18,7 +18,32 @@
#include "union.h"
-/* unlink a file by creating a whiteout */
+/*
+ * Helper function for Unionfs's unlink operation.
+ *
+ * The main goal of this function is to optimize the unlinking of non-dir
+ * objects in unionfs by deleting all possible lower inode objects from the
+ * underlying branches having same dentry name as the non-dir dentry on
+ * which this unlink operation is called. This way we delete as many lower
+ * inodes as possible, and save space. Whiteouts need to be created in
+ * branch0 only if unlinking fails on any of the lower branch other than
+ * branch0, or if a lower branch is marked read-only.
+ *
+ * Also, while unlinking a file, if we encounter any dir type entry in any
+ * intermediate branch, then we remove the directory by calling vfs_rmdir.
+ * The following special cases are also handled:
+
+ * (1) If an error occurs in branch0 during vfs_unlink, then we return
+ * appropriate error.
+ *
+ * (2) If we get an error during unlink in any of other lower branch other
+ * than branch0, then we create a whiteout in branch0.
+ *
+ * (3) If a whiteout already exists in any intermediate branch, we delete
+ * all possible inodes only up to that branch (this is an "opaqueness"
+ * as as per Documentation/filesystems/unionfs/concepts.txt).
+ *
+ */
static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry)
{
struct dentry *lower_dentry;
@@ -30,51 +55,57 @@ static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry)
if (err)
goto out;
- bindex = dbstart(dentry);
-
- lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
- if (!lower_dentry)
- goto out;
+ /* trying to unlink all possible valid instances */
+ for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+ if (!lower_dentry || !lower_dentry->d_inode)
+ continue;
+
+ lower_dir_dentry = lock_parent(lower_dentry);
+
+ /* avoid destroying the lower inode if the object is in use */
+ dget(lower_dentry);
+ err = is_robranch_super(dentry->d_sb, bindex);
+ if (!err) {
+ /* see Documentation/filesystems/unionfs/issues.txt */
+ lockdep_off();
+ if (!S_ISDIR(lower_dentry->d_inode->i_mode))
+ err = vfs_unlink(lower_dir_dentry->d_inode,
+ lower_dentry);
+ else
+ err = vfs_rmdir(lower_dir_dentry->d_inode,
+ lower_dentry);
+ lockdep_on();
+ }
- lower_dir_dentry = lock_parent(lower_dentry);
+ /* if lower object deletion succeeds, update inode's times */
+ if (!err)
+ unionfs_copy_attr_times(dentry->d_inode);
+ dput(lower_dentry);
+ fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
+ unlock_dir(lower_dir_dentry);
- /* avoid destroying the lower inode if the file is in use */
- dget(lower_dentry);
- err = is_robranch_super(dentry->d_sb, bindex);
- if (!err) {
- /* see Documentation/filesystems/unionfs/issues.txt */
- lockdep_off();
- err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
- lockdep_on();
+ if (err)
+ break;
}
- /* if vfs_unlink succeeded, update our inode's times */
- if (!err)
- unionfs_copy_attr_times(dentry->d_inode);
- dput(lower_dentry);
- fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
- unlock_dir(lower_dir_dentry);
-
- if (err && !IS_COPYUP_ERR(err))
- goto out;
/*
- * We create whiteouts if (1) there was an error unlinking the main
- * file; (2) there is a lower priority file with the same name
- * (dbopaque); (3) the branch in which the file is not the last
- * (rightmost0 branch. The last rule is an optimization to avoid
- * creating all those whiteouts if there's no chance they'd be
- * masking any lower-priority branch, as well as unionfs is used
- * with only one branch (using only one branch, while odd, is still
- * possible).
+ * Create the whiteout in branch 0 (highest priority) only if (a)
+ * there was an error in any intermediate branch other than branch 0
+ * due to failure of vfs_unlink/vfs_rmdir or (b) a branch marked or
+ * mounted read-only.
*/
if (err) {
- if (dbstart(dentry) == 0)
+ if ((bindex == 0) ||
+ ((bindex == dbstart(dentry)) &&
+ (!IS_COPYUP_ERR(err))))
goto out;
- err = create_whiteout(dentry, dbstart(dentry) - 1);
- } else if (dbopaque(dentry) != -1) {
- err = create_whiteout(dentry, dbopaque(dentry));
- } else if (dbstart(dentry) < sbend(dentry->d_sb)) {
- err = create_whiteout(dentry, dbstart(dentry));
+ else {
+ if (!IS_COPYUP_ERR(err))
+ pr_debug("unionfs: lower object deletion "
+ "failed in branch:%d\n", bindex);
+ err = create_whiteout(dentry, sbstart(dentry->d_sb));
+ }
}
out:
--
1.5.2.2
next prev parent reply other threads:[~2008-04-01 21:08 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
2008-04-01 21:06 ` [PATCH 01/14] VFS: export release_open_intent as GPL symbol, not regular symbol Erez Zadok
2008-04-01 21:06 ` [PATCH 02/14] VFS: rename do_splice_to/from to vfs_splice_* and export symbols Erez Zadok
2008-04-01 21:06 ` [PATCH 03/14] Unionfs: implement splice_read/write methods directly Erez Zadok
2008-04-01 21:06 ` [PATCH 04/14] Unionfs: implement vm_operations->fault Erez Zadok
2008-04-01 21:06 ` [PATCH 05/14] Unionfs: lock our dentry in file operations Erez Zadok
2008-04-01 21:06 ` Erez Zadok [this message]
2008-04-01 21:06 ` [PATCH 07/14] Unionfs: don't copy parent inode times in setattr Erez Zadok
2008-04-01 21:06 ` [PATCH 08/14] Unionfs: use __func__ instead of __FUNCTION__ Erez Zadok
2008-04-01 21:06 ` [PATCH 09/14] Unionfs: use noinline_for_stack Erez Zadok
2008-04-01 21:06 ` [PATCH 10/14] Unionfs: document reasons for opaque directories Erez Zadok
2008-04-01 21:06 ` [PATCH 11/14] Unionfs: display mount point name along with generation number Erez Zadok
2008-04-01 21:06 ` [PATCH 12/14] Unionfs: do not over-decrement lower superblock refs on remount Erez Zadok
2008-04-01 21:06 ` [PATCH 13/14] Unionfs: don't purge lower sb data " Erez Zadok
2008-04-01 21:06 ` [PATCH 14/14] Unionfs: update lower mnts on rmdir with copyup Erez Zadok
2008-04-02 13:48 ` [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Tomas M
2008-04-02 16:54 ` Erez Zadok
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=12070840193574-git-send-email-ezk@cs.sunysb.edu \
--to=ezk@cs.sunysb.edu \
--cc=akpm@linux-foundation.org \
--cc=hch@infradead.org \
--cc=hkanda@cs.sunysb.edu \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.