[PATCH 00/74] Union mounts version something or other

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/74] Union mounts version something or other
@ 2011-03-23  1:58 Valerie Aurora
  2011-03-23  1:58 ` [PATCH 01/74] VFS: Comment follow_mount() and friends Valerie Aurora
                   ` (46 more replies)
  0 siblings, 47 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora

Hi union mounts fans(?),

Here's my current union mounts patch set, against 2.6.36-rc5.  I'm
busy with other things[1] and unlikely to put in significant work on
union mounts in the next year.  I'm happy to answer questions from
anyone else working on them.

As always, git trees for the kernel, util-linux, and e2fsprogs, lots
of documentation, and LWN articles describing the various problems
unioning file systems will encounter are here:

http://valerieaurora.org/union/

The devkit linked to from that page includes my Usermode Linux testing
environment, including root file system image.  The README tells you
how to run the test suite automatically (yes, an automated test suite
- with Makefile and version control and comments and stuff!).

I took a quick look at the current overlayfs patch set, and it's
small, clean, and easy to understand.  If it does what people need, I
say ship it.

Thanks to everyone who reviewed and submitted patches for union mounts!

-VAL

[1] http://adainitiative.org

---

Felix Fietkau (2):
  whiteout: jffs2 whiteout support
  fallthru: jffs2 fallthru support

Jan Blunck (9):
  VFS: Make lookup_hash() return a struct path
  autofs4: Save autofs trigger's vfsmount in super block info
  whiteout/NFSD: Don't return information about whiteouts to userspace
  whiteout: Add vfs_whiteout() and whiteout inode operation
  whiteout: Allow removal of a directory with whiteouts
  whiteout: tmpfs whiteout support
  union-mount: Introduce MNT_UNION and MS_UNION flags
  union-mount: Free union stack on removal of topmost dentry from
    dcache
  union-mount: Create IS_MNT_UNION()

Valerie Aurora (63):
  VFS: Comment follow_mount() and friends
  Documentation: Fix trivial typo in filesystems/sharedsubtree.txt
  whiteout: Define opaque inode flags and operations
  ext2: Add ext2_dirent_in_use()
  ext2: Split ext2_add_entry() from ext2_add_link()
  whiteout: ext2 whiteout support
  fallthru: Basic fallthru definitions
  fallthru: ext2 fallthru support
  fallthru: tmpfs fallthru support
  VFS: Add hard read-only users count to superblock
  VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors
  VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree()
  VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
  VFS: Add CL_MAKE_HARD_READONLY flag to clone_mnt()/copy_tree()
  union-mount: Union mounts documentation
  union-mount: Add CONFIG_UNION_MOUNT option
  union-mount: Create union_stack structure
  union-mount: Add two superblock fields for union mounts
  union-mount: Add union_alloc()
  union-mount: Add union_find_dir()
  union-mount: Create d_free_unions()
  union-mount: Create union_add_dir()
  union-mount: Add union_create_topmost_dir()
  union-mount: Create needs_lookup_union()
  union-mount: Create check_topmost_union_mnt()
  union-mount: Add clone_union_tree() and put_union_sb()
  union-mount: Create build_root_union()
  union-mount: Create prepare_mnt_union() and cleanup_mnt_union()
  union-mount: Prevent improper union-related remounts
  union-mount: Prevent topmost file system from being mounted elsewhere
  union-mount: Prevent bind mounts of union mounts
  union-mount: Implement union mount
  union-mount: Temporarily disable some syscalls
  union-mount: Basic infrastructure of __lookup_union()
  union-mount: Process negative dentries in __lookup_union()
  union-mount: Return files found in lower layers in __lookup_union()
  union-mount: Build union stack in __lookup_union()
  union-mount: Follow mount in __lookup_union()
  union-mount: Add lookup_union()
  union-mount: Add do_lookup_union() wrapper for __lookup_union()
  union-mount: Call union lookup functions in lookup path
  union-mount: Create whiteout on unlink()
  union-mount: Create whiteout on rmdir()
  union-mount: Set opaque flag on new directories in unioned file
    systems
  union-mount: Copy up directory entries on first readdir()
  union-mount: Add generic_readdir_fallthru() helper
  fallthru: ext2 support for lookup of d_type/d_ino in fallthrus
  fallthru: tmpfs support for lookup of d_type/d_ino in fallthrus
  fallthru: jffs2 support for lookup of d_type/d_ino in fallthrus
  VFS: Split inode_permission() and create path_permission()
  VFS: Create user_path_nd() to lookup both parent and target
  union-mount: In-kernel file copyup routines
  union-mount: Implement union-aware access()/faccessat()
  union-mount: Implement union-aware link()
  union-mount: Implement union-aware rename()
  union-mount: Implement union-aware writable open()
  union-mount: Implement union-aware chown()
  union-mount: Implement union-aware truncate()
  union-mount: Implement union-aware chmod()/fchmodat()
  union-mount: Implement union-aware lchown()
  union-mount: Implement union-aware utimensat()
  union-mount: Implement union-aware setxattr()
  union-mount: Implement union-aware lsetxattr()

 Documentation/filesystems/sharedsubtree.txt |    4 +-
 Documentation/filesystems/union-mounts.txt  |  751 +++++++++++++++++++++++++
 Documentation/filesystems/vfs.txt           |   16 +-
 fs/Kconfig                                  |   13 +
 fs/Makefile                                 |    1 +
 fs/autofs4/autofs_i.h                       |    1 +
 fs/autofs4/init.c                           |   11 +-
 fs/autofs4/root.c                           |    6 +
 fs/compat.c                                 |    9 +
 fs/dcache.c                                 |   32 +-
 fs/ext2/dir.c                               |  116 ++++-
 fs/ext2/ext2.h                              |    3 +
 fs/ext2/inode.c                             |   11 +-
 fs/ext2/namei.c                             |   85 +++-
 fs/ext2/super.c                             |    6 +
 fs/jffs2/dir.c                              |  117 ++++-
 fs/jffs2/fs.c                               |    4 +
 fs/jffs2/super.c                            |    2 +-
 fs/libfs.c                                  |   20 +-
 fs/namei.c                                  |  807 ++++++++++++++++++++++++---
 fs/namespace.c                              |  394 +++++++++++--
 fs/nfsd/nfs3xdr.c                           |    5 +
 fs/nfsd/nfs4xdr.c                           |    5 +
 fs/nfsd/nfsxdr.c                            |    4 +
 fs/open.c                                   |  116 ++++-
 fs/pnode.c                                  |    5 +-
 fs/pnode.h                                  |    3 +
 fs/readdir.c                                |   18 +
 fs/super.c                                  |    9 +
 fs/union.c                                  |  714 ++++++++++++++++++++++++
 fs/union.h                                  |  105 ++++
 fs/utimes.c                                 |   14 +-
 fs/xattr.c                                  |   65 ++-
 include/linux/dcache.h                      |   37 ++-
 include/linux/ext2_fs.h                     |    8 +
 include/linux/fs.h                          |   45 ++
 include/linux/jffs2.h                       |    8 +
 include/linux/mount.h                       |    4 +
 include/linux/namei.h                       |    2 +
 kernel/audit_tree.c                         |   10 +-
 mm/shmem.c                                  |  193 ++++++-
 41 files changed, 3551 insertions(+), 228 deletions(-)
 create mode 100644 Documentation/filesystems/union-mounts.txt
 create mode 100644 fs/union.c
 create mode 100644 fs/union.h


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 01/74] VFS: Comment follow_mount() and friends
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 02/74] VFS: Make lookup_hash() return a struct path Valerie Aurora
                   ` (45 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Add comments describing what the directions "up" and "down" mean and
ref count handling to the VFS follow_mount() family of functions.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namei.c     |   43 +++++++++++++++++++++++++++++++++++++++----
 fs/namespace.c |   16 ++++++++++++++--
 2 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 24896e8..db0e7ce 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -591,6 +591,17 @@ loop:
 	return err;
 }
 
+/*
+ * follow_up - Find the mountpoint of path's vfsmount
+ *
+ * Given a path, find the mountpoint of its source file system.
+ * Replace @path with the path of the mountpoint in the parent mount.
+ * Up is towards /.
+ *
+ * Return 1 if we went up a level and 0 if we were already at the
+ * root.
+ */
+
 int follow_up(struct path *path)
 {
 	struct vfsmount *parent;
@@ -612,8 +623,22 @@ int follow_up(struct path *path)
 	return 1;
 }
 
-/* no need for dcache_lock, as serialization is taken care in
- * namespace.c
+/*
+ * __follow_mount - Return the most recent mount at this mountpoint
+ *
+ * Given a mountpoint, find the most recently mounted file system at
+ * this mountpoint and return the path to its root dentry.  This is
+ * the file system that is visible, and it is in the direction of VFS
+ * "down" - away from the root of the mount tree.  See comments to
+ * lookup_mnt() for an example of "down."
+ *
+ * Does not decrement the refcount on the given mount even if it
+ * follows it to another mount and returns that path instead.
+ *
+ * Returns 0 if path was unchanged, 1 if we followed it to another mount.
+ *
+ * No need for dcache_lock, as serialization is taken care in
+ * namespace.c.
  */
 static int __follow_mount(struct path *path)
 {
@@ -632,6 +657,12 @@ static int __follow_mount(struct path *path)
 	return res;
 }
 
+/*
+ * Like __follow_mount, but no return value and drops references to
+ * both mnt and dentry of the given path if it follows to another
+ * mount.
+ */
+
 static void follow_mount(struct path *path)
 {
 	while (d_mountpoint(path->dentry)) {
@@ -645,8 +676,12 @@ static void follow_mount(struct path *path)
 	}
 }
 
-/* no need for dcache_lock, as serialization is taken care in
- * namespace.c
+/*
+ * Like follow_mount(), but traverses only one layer instead of
+ * continuing until it runs out.
+ *
+ * No need for dcache_lock, as serialization is taken care in
+ * namespace.c.
  */
 int follow_down(struct path *path)
 {
diff --git a/fs/namespace.c b/fs/namespace.c
index a72eaab..745feaf 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -451,8 +451,20 @@ struct vfsmount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry,
 }
 
 /*
- * lookup_mnt increments the ref count before returning
- * the vfsmount struct.
+ * lookup_mnt - Return the first child mount mounted at path
+ *
+ * "First" means first mounted chronologically.  If you create the
+ * following mounts:
+ *
+ * mount /dev/sda1 /mnt
+ * mount /dev/sda2 /mnt
+ * mount /dev/sda3 /mnt
+ *
+ * Then lookup_mnt() on the base /mnt dentry in the root mount will
+ * return successively the root dentry and vfsmount of /dev/sda1, then
+ * /dev/sda2, then /dev/sda3, then NULL.
+ *
+ * lookup_mnt takes a reference to the found vfsmount.
  */
 struct vfsmount *lookup_mnt(struct path *path)
 {
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 02/74] VFS: Make lookup_hash() return a struct path
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
  2011-03-23  1:58 ` [PATCH 01/74] VFS: Comment follow_mount() and friends Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 03/74] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
                   ` (44 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Jan Blunck, Valerie Aurora, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namei.c |  113 ++++++++++++++++++++++++++++++-----------------------------
 1 files changed, 57 insertions(+), 56 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index db0e7ce..99fc88b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1154,7 +1154,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
 }
 
 static struct dentry *__lookup_hash(struct qstr *name,
-		struct dentry *base, struct nameidata *nd)
+				    struct dentry *base, struct nameidata *nd)
 {
 	struct dentry *dentry;
 	struct inode *inode;
@@ -1194,14 +1194,22 @@ out:
  * needs parent already locked. Doesn't follow mounts.
  * SMP-safe.
  */
-static struct dentry *lookup_hash(struct nameidata *nd)
+static int lookup_hash(struct nameidata *nd, struct qstr *name,
+		       struct path *path)
 {
 	int err;
 
 	err = exec_permission(nd->path.dentry->d_inode);
 	if (err)
-		return ERR_PTR(err);
-	return __lookup_hash(&nd->last, nd->path.dentry, nd);
+		return err;
+	path->mnt = nd->path.mnt;
+	path->dentry =  __lookup_hash(name, nd->path.dentry, nd);
+	if (IS_ERR(path->dentry)) {
+		err = PTR_ERR(path->dentry);
+		path->dentry = NULL;
+		path->mnt = NULL;
+	}
+	return err;
 }
 
 static int __lookup_one_len(const char *name, struct qstr *this,
@@ -1682,12 +1690,9 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
 
 	/* OK, it's O_CREAT */
 	mutex_lock(&dir->d_inode->i_mutex);
+	error = lookup_hash(nd, &nd->last, path);
 
-	path->dentry = lookup_hash(nd);
-	path->mnt = nd->path.mnt;
-
-	error = PTR_ERR(path->dentry);
-	if (IS_ERR(path->dentry)) {
+	if (error) {
 		mutex_unlock(&dir->d_inode->i_mutex);
 		goto exit;
 	}
@@ -1939,7 +1944,8 @@ EXPORT_SYMBOL(filp_open);
  */
 struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 {
-	struct dentry *dentry = ERR_PTR(-EEXIST);
+	struct path path;
+	int err;
 
 	mutex_lock_nested(&nd->path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	/*
@@ -1947,7 +1953,7 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	 * (foo/., foo/.., /////)
 	 */
 	if (nd->last_type != LAST_NORM)
-		goto fail;
+		return ERR_PTR(-EEXIST);
 	nd->flags &= ~LOOKUP_PARENT;
 	nd->flags |= LOOKUP_CREATE | LOOKUP_EXCL;
 	nd->intent.open.flags = O_EXCL;
@@ -1955,11 +1961,11 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	/*
 	 * Do the final lookup.
 	 */
-	dentry = lookup_hash(nd);
-	if (IS_ERR(dentry))
-		goto fail;
+	err = lookup_hash(nd, &nd->last, &path);
+	if (err)
+		return ERR_PTR(err);
 
-	if (dentry->d_inode)
+	if (path.dentry->d_inode)
 		goto eexist;
 	/*
 	 * Special case - lookup gave negative, but... we had foo/bar/
@@ -1968,15 +1974,14 @@ struct dentry *lookup_create(struct nameidata *nd, int is_dir)
 	 * been asking for (non-existent) directory. -ENOENT for you.
 	 */
 	if (unlikely(!is_dir && nd->last.name[nd->last.len])) {
-		dput(dentry);
-		dentry = ERR_PTR(-ENOENT);
+		dput(path.dentry);
+		return ERR_PTR(-ENOENT);
 	}
-	return dentry;
+
+	return path.dentry;
 eexist:
-	dput(dentry);
-	dentry = ERR_PTR(-EEXIST);
-fail:
-	return dentry;
+	path_put_conditional(&path, nd);
+	return ERR_PTR(-EEXIST);
 }
 EXPORT_SYMBOL_GPL(lookup_create);
 
@@ -2211,7 +2216,7 @@ static long do_rmdir(int dfd, const char __user *pathname)
 {
 	int error = 0;
 	char * name;
-	struct dentry *dentry;
+	struct path path;
 	struct nameidata nd;
 
 	error = user_path_parent(dfd, pathname, &nd, &name);
@@ -2233,21 +2238,20 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-	dentry = lookup_hash(&nd);
-	error = PTR_ERR(dentry);
-	if (IS_ERR(dentry))
+	error = lookup_hash(&nd, &nd.last, &path);
+	if (error)
 		goto exit2;
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto exit3;
-	error = security_path_rmdir(&nd.path, dentry);
+	error = security_path_rmdir(&nd.path, path.dentry);
 	if (error)
 		goto exit4;
-	error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
+	error = vfs_rmdir(nd.path.dentry->d_inode, path.dentry);
 exit4:
 	mnt_drop_write(nd.path.mnt);
 exit3:
-	dput(dentry);
+	path_put_conditional(&path, &nd);
 exit2:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
 exit1:
@@ -2303,7 +2307,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 {
 	int error;
 	char *name;
-	struct dentry *dentry;
+	struct path path;
 	struct nameidata nd;
 	struct inode *inode = NULL;
 
@@ -2318,26 +2322,25 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-	dentry = lookup_hash(&nd);
-	error = PTR_ERR(dentry);
-	if (!IS_ERR(dentry)) {
+	error = lookup_hash(&nd, &nd.last, &path);
+	if (!error) {
 		/* Why not before? Because we want correct error value */
 		if (nd.last.name[nd.last.len])
 			goto slashes;
-		inode = dentry->d_inode;
+		inode = path.dentry->d_inode;
 		if (inode)
 			atomic_inc(&inode->i_count);
 		error = mnt_want_write(nd.path.mnt);
 		if (error)
 			goto exit2;
-		error = security_path_unlink(&nd.path, dentry);
+		error = security_path_unlink(&nd.path, path.dentry);
 		if (error)
 			goto exit3;
-		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
+		error = vfs_unlink(nd.path.dentry->d_inode, path.dentry);
 exit3:
 		mnt_drop_write(nd.path.mnt);
 	exit2:
-		dput(dentry);
+		path_put_conditional(&path, &nd);
 	}
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
 	if (inode)
@@ -2348,8 +2351,8 @@ exit1:
 	return error;
 
 slashes:
-	error = !dentry->d_inode ? -ENOENT :
-		S_ISDIR(dentry->d_inode->i_mode) ? -EISDIR : -ENOTDIR;
+	error = !path.dentry->d_inode ? -ENOENT :
+		S_ISDIR(path.dentry->d_inode->i_mode) ? -EISDIR : -ENOTDIR;
 	goto exit2;
 }
 
@@ -2687,7 +2690,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 		int, newdfd, const char __user *, newname)
 {
 	struct dentry *old_dir, *new_dir;
-	struct dentry *old_dentry, *new_dentry;
+	struct path old, new;
 	struct dentry *trap;
 	struct nameidata oldnd, newnd;
 	char *from;
@@ -2721,16 +2724,15 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 
 	trap = lock_rename(new_dir, old_dir);
 
-	old_dentry = lookup_hash(&oldnd);
-	error = PTR_ERR(old_dentry);
-	if (IS_ERR(old_dentry))
+	error = lookup_hash(&oldnd, &oldnd.last, &old);
+	if (error)
 		goto exit3;
 	/* source must exist */
 	error = -ENOENT;
-	if (!old_dentry->d_inode)
+	if (!old.dentry->d_inode)
 		goto exit4;
 	/* unless the source is a directory trailing slashes give -ENOTDIR */
-	if (!S_ISDIR(old_dentry->d_inode->i_mode)) {
+	if (!S_ISDIR(old.dentry->d_inode->i_mode)) {
 		error = -ENOTDIR;
 		if (oldnd.last.name[oldnd.last.len])
 			goto exit4;
@@ -2739,32 +2741,31 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	}
 	/* source should not be ancestor of target */
 	error = -EINVAL;
-	if (old_dentry == trap)
+	if (old.dentry == trap)
 		goto exit4;
-	new_dentry = lookup_hash(&newnd);
-	error = PTR_ERR(new_dentry);
-	if (IS_ERR(new_dentry))
+	error = lookup_hash(&newnd, &newnd.last, &new);
+	if (error)
 		goto exit4;
 	/* target should not be an ancestor of source */
 	error = -ENOTEMPTY;
-	if (new_dentry == trap)
+	if (new.dentry == trap)
 		goto exit5;
 
 	error = mnt_want_write(oldnd.path.mnt);
 	if (error)
 		goto exit5;
-	error = security_path_rename(&oldnd.path, old_dentry,
-				     &newnd.path, new_dentry);
+	error = security_path_rename(&oldnd.path, old.dentry,
+				     &newnd.path, new.dentry);
 	if (error)
 		goto exit6;
-	error = vfs_rename(old_dir->d_inode, old_dentry,
-				   new_dir->d_inode, new_dentry);
+	error = vfs_rename(old_dir->d_inode, old.dentry,
+				   new_dir->d_inode, new.dentry);
 exit6:
 	mnt_drop_write(oldnd.path.mnt);
 exit5:
-	dput(new_dentry);
+	path_put_conditional(&new, &newnd);
 exit4:
-	dput(old_dentry);
+	path_put_conditional(&old, &oldnd);
 exit3:
 	unlock_rename(new_dir, old_dir);
 exit2:
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 03/74] autofs4: Save autofs trigger's vfsmount in super block info
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
  2011-03-23  1:58 ` [PATCH 01/74] VFS: Comment follow_mount() and friends Valerie Aurora
  2011-03-23  1:58 ` [PATCH 02/74] VFS: Make lookup_hash() return a struct path Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 04/74] Documentation: Fix trivial typo in filesystems/sharedsubtree.txt Valerie Aurora
                   ` (43 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux
  Cc: viro, Jan Blunck, Valerie Aurora, autofs, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

XXX - This is broken and included just to make union mounts work.  Ian
Kent and David Howells are working on a long-term solution that will
replace abuse of ->follow_link() to trigger an automount with a new
op.

Original commit message:

This is a bugfix/replacement for commit
051d381259eb57d6074d02a6ba6e90e744f1a29f:

    During a path walk if an autofs trigger is mounted on a dentry,
    when the follow_link method is called, the nameidata struct
    contains the vfsmount and mountpoint dentry of the parent mount
    while the dentry that is passed in is the root of the autofs
    trigger mount.  I believe it is impossible to get the vfsmount of
    the trigger mount, within the follow_link method, when only the
    parent vfsmount and the root dentry of the trigger mount are
    known.

The pre solution in this commit was to replace the path embedded in the
parent's nameidata with the path of the link itself in
__do_follow_link().  This is a relatively harmless misuse of the
field, but union mounts ran into a bug during follow_link() caused by
the nameidata containing the wrong path (we count on it being what it
is all other places - the path of the parent).

A cleaner and easier to understand solution is to save the necessary
vfsmount in the autofs superblock info when it is mounted.  Then we
can easily update the vfsmount in autofs4_follow_link().

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Acked-by: Ian Kent <raven@themaw.net>
Cc: autofs@linux.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/autofs4/autofs_i.h |    1 +
 fs/autofs4/init.c     |   11 ++++++++++-
 fs/autofs4/root.c     |    6 ++++++
 fs/namei.c            |    7 ++-----
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 3d283ab..de3af64 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -133,6 +133,7 @@ struct autofs_sb_info {
 	int reghost_enabled;
 	int needs_reghost;
 	struct super_block *sb;
+	struct vfsmount *mnt;
 	struct mutex wq_mutex;
 	spinlock_t fs_lock;
 	struct autofs_wait_queue *queues; /* Wait queue pointer */
diff --git a/fs/autofs4/init.c b/fs/autofs4/init.c
index 9722e4b..5e0dcd7 100644
--- a/fs/autofs4/init.c
+++ b/fs/autofs4/init.c
@@ -17,7 +17,16 @@
 static int autofs_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
-	return get_sb_nodev(fs_type, flags, data, autofs4_fill_super, mnt);
+	struct autofs_sb_info *sbi;
+	int ret;
+
+	ret = get_sb_nodev(fs_type, flags, data, autofs4_fill_super, mnt);
+	if (ret)
+		return ret;
+
+	sbi = autofs4_sbi(mnt->mnt_sb);
+	sbi->mnt = mnt;
+	return 0;
 }
 
 static struct file_system_type autofs_fs_type = {
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index cb1bd38..fb21c56 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -225,6 +225,12 @@ static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
 	DPRINTK("dentry=%p %.*s oz_mode=%d nd->flags=%d",
 		dentry, dentry->d_name.len, dentry->d_name.name, oz_mode,
 		nd->flags);
+
+	dput(nd->path.dentry);
+	mntput(nd->path.mnt);
+	nd->path.mnt = mntget(sbi->mnt);
+	nd->path.dentry = dget(dentry);
+
 	/*
 	 * For an expire of a covered direct or offset mount we need
 	 * to break out of follow_down() at the autofs mount trigger
diff --git a/fs/namei.c b/fs/namei.c
index 99fc88b..47ee4b9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -533,11 +533,8 @@ __do_follow_link(struct path *path, struct nameidata *nd, void **p)
 	touch_atime(path->mnt, dentry);
 	nd_set_link(nd, NULL);
 
-	if (path->mnt != nd->path.mnt) {
-		path_to_nameidata(path, nd);
-		dget(dentry);
-	}
-	mntget(path->mnt);
+	if (path->mnt == nd->path.mnt)
+		mntget(nd->path.mnt);
 	nd->last_type = LAST_BIND;
 	*p = dentry->d_inode->i_op->follow_link(dentry, nd);
 	error = PTR_ERR(*p);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 04/74] Documentation: Fix trivial typo in filesystems/sharedsubtree.txt
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (2 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 03/74] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 05/74] whiteout/NFSD: Don't return information about whiteouts to userspace Valerie Aurora
                   ` (42 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

This typo is easy to ignore unless you have spent a great deal of time
thinking about how to eliminate duplicate dentries in unions.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 Documentation/filesystems/sharedsubtree.txt |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/sharedsubtree.txt b/Documentation/filesystems/sharedsubtree.txt
index fc0e39a..4ede421 100644
--- a/Documentation/filesystems/sharedsubtree.txt
+++ b/Documentation/filesystems/sharedsubtree.txt
@@ -62,10 +62,10 @@ replicas continue to be exactly same.
 	# mount /dev/sd0  /tmp/a
 
 	#ls /tmp/a
-	t1 t2 t2
+	t1 t2 t3
 
 	#ls /mnt/a
-	t1 t2 t2
+	t1 t2 t3
 
 	Note that the mount has propagated to the mount at /mnt as well.
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 05/74] whiteout/NFSD: Don't return information about whiteouts to userspace
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (3 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 04/74] Documentation: Fix trivial typo in filesystems/sharedsubtree.txt Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 06/74] whiteout: Define opaque inode flags and operations Valerie Aurora
                   ` (41 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux
  Cc: viro, Jan Blunck, David Woodhouse, Valerie Aurora, linux-nfs,
	Neil Brown, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

Userspace isn't ready for handling another file type, so silently drop
whiteout directory entries before they leave the kernel.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Cc: linux-nfs@vger.kernel.org
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/compat.c       |    9 +++++++++
 fs/nfsd/nfs3xdr.c |    5 +++++
 fs/nfsd/nfs4xdr.c |    5 +++++
 fs/nfsd/nfsxdr.c  |    4 ++++
 fs/readdir.c      |    9 +++++++++
 5 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index 718c706..825bb7b 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -914,6 +914,9 @@ static int compat_fillonedir(void *__buf, const char *name, int namlen,
 	struct compat_old_linux_dirent __user *dirent;
 	compat_ulong_t d_ino;
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	if (buf->result)
 		return -EINVAL;
 	d_ino = ino;
@@ -986,6 +989,9 @@ static int compat_filldir(void *__buf, const char *name, int namlen,
 	int reclen = ALIGN(offsetof(struct compat_linux_dirent, d_name) +
 		namlen + 2, sizeof(compat_long_t));
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
@@ -1075,6 +1081,9 @@ static int compat_filldir64(void * __buf, const char * name, int namlen, loff_t
 		sizeof(u64));
 	u64 off;
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 2a533a0..9b96f5a 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -885,6 +885,11 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
 	int		elen;		/* estimated entry length in words */
 	int		num_entry_words = 0;	/* actual number of words */
 
+	if (d_type == DT_WHT) {
+		cd->common.err = nfs_ok;
+		return 0;
+	}
+
 	if (cd->offset) {
 		u64 offset64 = offset;
 
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1a468bb..136944d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2283,6 +2283,11 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 		return 0;
 	}
 
+	if (d_type == DT_WHT) {
+		cd->common.err = nfs_ok;
+		return 0;
+	}
+
 	if (cd->offset)
 		xdr_encode_hyper(cd->offset, (u64) offset);
 
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
index 4ce005d..0e57d4b 100644
--- a/fs/nfsd/nfsxdr.c
+++ b/fs/nfsd/nfsxdr.c
@@ -503,6 +503,10 @@ nfssvc_encode_entry(void *ccdv, const char *name,
 			namlen, name, offset, ino);
 	 */
 
+	if (d_type == DT_WHT) {
+		cd->common.err = nfs_ok;
+		return 0;
+	}
 	if (offset > ~((u32) 0)) {
 		cd->common.err = nfserr_fbig;
 		return -EINVAL;
diff --git a/fs/readdir.c b/fs/readdir.c
index 356f715..de703d6 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -77,6 +77,9 @@ static int fillonedir(void * __buf, const char * name, int namlen, loff_t offset
 	struct old_linux_dirent __user * dirent;
 	unsigned long d_ino;
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	if (buf->result)
 		return -EINVAL;
 	d_ino = ino;
@@ -155,6 +158,9 @@ static int filldir(void * __buf, const char * name, int namlen, loff_t offset,
 	int reclen = ALIGN(offsetof(struct linux_dirent, d_name) + namlen + 2,
 		sizeof(long));
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
@@ -241,6 +247,9 @@ static int filldir64(void * __buf, const char * name, int namlen, loff_t offset,
 	int reclen = ALIGN(offsetof(struct linux_dirent64, d_name) + namlen + 1,
 		sizeof(u64));
 
+	if (d_type == DT_WHT)
+		return 0;
+
 	buf->error = -EINVAL;	/* only used if we fail.. */
 	if (reclen > buf->count)
 		return -EINVAL;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 06/74] whiteout: Define opaque inode flags and operations
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (4 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 05/74] whiteout/NFSD: Don't return information about whiteouts to userspace Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 07/74] whiteout: Add vfs_whiteout() and whiteout inode operation Valerie Aurora
                   ` (40 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Opaque directories are the directory equivalent of whiteouts.  Define
the generic opaque inode flags and operations.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 include/linux/fs.h |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 76041b6..7c0f305 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -235,6 +235,7 @@ struct inodes_stat_t {
 #define S_NOCMTIME	128	/* Do not update file c/mtime */
 #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
 #define S_PRIVATE	512	/* Inode is fs-internal */
+#define S_OPAQUE	1024	/* Directory is opaque */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -269,6 +270,7 @@ struct inodes_stat_t {
 #define IS_NOCMTIME(inode)	((inode)->i_flags & S_NOCMTIME)
 #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
 #define IS_PRIVATE(inode)	((inode)->i_flags & S_PRIVATE)
+#define IS_OPAQUE(inode)	((inode)->i_flags & S_OPAQUE)
 
 /* the read-only stuff doesn't really belong here, but any other place is
    probably as bad and I don't want to create yet another include file. */
@@ -351,8 +353,11 @@ struct inodes_stat_t {
 #define FS_NOTAIL_FL			0x00008000 /* file tail should not be merged */
 #define FS_DIRSYNC_FL			0x00010000 /* dirsync behaviour (directories only) */
 #define FS_TOPDIR_FL			0x00020000 /* Top of directory hierarchies*/
+/* 0x00040000 is used by ext4 */
 #define FS_EXTENT_FL			0x00080000 /* Extents */
 #define FS_DIRECTIO_FL			0x00100000 /* Use direct i/o */
+/* 0x00200000 and 0x00400000 also used by ext4 */
+#define FS_OPAQUE_FL			0x00800000 /* Dir is opaque */
 #define FS_RESERVED_FL			0x80000000 /* reserved for ext2 lib */
 
 #define FS_FL_USER_VISIBLE		0x0003DFFF /* User visible flags */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 07/74] whiteout: Add vfs_whiteout() and whiteout inode operation
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (5 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 06/74] whiteout: Define opaque inode flags and operations Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 08/74] whiteout: Allow removal of a directory with whiteouts Valerie Aurora
                   ` (39 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux
  Cc: viro, Jan Blunck, David Woodhouse, Valerie Aurora, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

Whiteout a given directory entry.  File systems that support whiteouts
must implement the new ->whiteout() directory inode operation.

XXX - Only whiteout when there is a matching entry in a lower layer.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 Documentation/filesystems/vfs.txt |   10 +++++-
 fs/dcache.c                       |    4 ++-
 fs/namei.c                        |   73 ++++++++++++++++++++++++++++++++++++-
 include/linux/dcache.h            |    7 ++++
 include/linux/fs.h                |    2 +
 5 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index ed7e5ef..05c73b1 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -308,7 +308,7 @@ struct inode_operations
 -----------------------
 
 This describes how the VFS can manipulate an inode in your
-filesystem. As of kernel 2.6.22, the following members are defined:
+filesystem. As of kernel 2.6.34, the following members are defined:
 
 struct inode_operations {
 	int (*create) (struct inode *,struct dentry *,int, struct nameidata *);
@@ -319,6 +319,7 @@ struct inode_operations {
 	int (*mkdir) (struct inode *,struct dentry *,int);
 	int (*rmdir) (struct inode *,struct dentry *);
 	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
+	int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
 	int (*readlink) (struct dentry *, char __user *,int);
@@ -382,6 +383,13 @@ otherwise noted.
 	will probably need to call d_instantiate() just as you would
 	in the create() method
 
+  whiteout: called by the rmdir(2) and unlink(2) system calls on a
+        layered file system.  Only required if you want to support
+        whiteouts.  The first dentry passed in is that for the old
+        dentry if it exists, and a negative dentry otherwise.  The
+        second is the dentry for the whiteout itself.  This method
+        must unlink() or rmdir() the original entry if it exists.
+
   rename: called by the rename(2) system call to rename the object to
 	have the parent and name given by the second inode and dentry.
 
diff --git a/fs/dcache.c b/fs/dcache.c
index 83293be..28975dd 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -993,8 +993,10 @@ EXPORT_SYMBOL(d_alloc_name);
 /* the caller must hold dcache_lock */
 static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 {
-	if (inode)
+	if (inode) {
+		dentry->d_flags &= ~DCACHE_WHITEOUT;
 		list_add(&dentry->d_alias, &inode->i_dentry);
+	}
 	dentry->d_inode = inode;
 	fsnotify_d_instantiate(dentry, inode);
 }
diff --git a/fs/namei.c b/fs/namei.c
index 47ee4b9..9136c68 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1338,7 +1338,6 @@ static int may_delete(struct inode *dir,struct dentry *victim,int isdir)
 	if (!victim->d_inode)
 		return -ENOENT;
 
-	BUG_ON(victim->d_parent->d_inode != dir);
 	audit_inode_child(victim, dir);
 
 	error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
@@ -2149,6 +2148,78 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, int, mode)
 	return sys_mkdirat(AT_FDCWD, pathname, mode);
 }
 
+/**
+ * vfs_whiteout: create a whiteout for the given directory entry
+ * @dir: parent inode
+ * @dentry: directory entry to whiteout
+ *
+ * Create a whiteout for the given directory entry.  A whiteout
+ * prevents lookup from dropping down to a lower layer of a union
+ * mounted file system.
+ *
+ * There are two important cases: (a) The directory entry to be
+ * whited-out may already exist, in which case it must first be
+ * deleted before we create the whiteout, and (b) no such directory
+ * entry exists and we only have to create the whiteout itself.
+ *
+ * The caller must pass in a dentry for the directory entry to be
+ * whited-out - a positive one if it exists, and a negative if not.
+ * When this function returns, the caller should dput() the old, now
+ * defunct dentry it passed in.  The dentry for the whiteout itself is
+ * created inside this function.
+ */
+static int vfs_whiteout(struct inode *dir, struct dentry *old_dentry, int isdir)
+{
+	struct inode *old_inode = old_dentry->d_inode;
+	struct dentry *parent, *whiteout;
+	int err = 0;
+
+	BUG_ON(old_dentry->d_parent->d_inode != dir);
+
+	if (!dir->i_op || !dir->i_op->whiteout)
+		return -EOPNOTSUPP;
+
+	/*
+	 * If the old dentry is positive, then we have to delete this
+	 * entry before we create the whiteout.  The file system
+	 * ->whiteout() op does the actual delete, but we do all the
+	 * VFS-level checks and changes here.
+	 */
+	if (old_inode) {
+		mutex_lock(&old_inode->i_mutex);
+		if (d_mountpoint(old_dentry)) {
+			mutex_unlock(&old_inode->i_mutex);
+			return -EBUSY;
+		}
+		if (isdir) {
+			dentry_unhash(old_dentry);
+			err = security_inode_rmdir(dir, old_dentry);
+		} else {
+			err = security_inode_unlink(dir, old_dentry);
+		}
+	}
+
+	parent = dget_parent(old_dentry);
+	whiteout = d_alloc_name(parent, old_dentry->d_name.name);
+
+	if (!err)
+		err = dir->i_op->whiteout(dir, old_dentry, whiteout);
+
+	if (old_inode) {
+		mutex_unlock(&old_inode->i_mutex);
+		if (!err) {
+			fsnotify_link_count(old_inode);
+			d_delete(old_dentry);
+		}
+		if (isdir)
+			dput(old_dentry);
+	}
+
+	dput(whiteout);
+	dput(parent);
+	return err;
+}
+
 /*
  * We try to drop the dentry early: we should have
  * a usage count of 2 if we're the only user of this
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 6a4aea3..6e66b76 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -188,6 +188,8 @@ d_iput:		no		no		no       yes
 
 #define DCACHE_CANT_MOUNT	0x0100
 
+#define DCACHE_WHITEOUT		0x0200	/* Stop lookup in a unioned file system */
+
 extern spinlock_t dcache_lock;
 extern seqlock_t rename_lock;
 
@@ -374,6 +376,11 @@ static inline void dont_mount(struct dentry *dentry)
 	spin_unlock(&dentry->d_lock);
 }
 
+static inline int d_is_whiteout(struct dentry *dentry)
+{
+	return (dentry->d_flags & DCACHE_WHITEOUT);
+}
+
 static inline struct dentry *dget_parent(struct dentry *dentry)
 {
 	struct dentry *ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7c0f305..9d6e72f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -208,6 +208,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_WHITEOUT	(1<<25) /* FS supports whiteout filetype */
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
@@ -1523,6 +1524,7 @@ struct inode_operations {
 	int (*mkdir) (struct inode *,struct dentry *,int);
 	int (*rmdir) (struct inode *,struct dentry *);
 	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
+	int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
 	int (*readlink) (struct dentry *, char __user *,int);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 08/74] whiteout: Allow removal of a directory with whiteouts
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (6 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 07/74] whiteout: Add vfs_whiteout() and whiteout inode operation Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 09/74] whiteout: tmpfs whiteout support Valerie Aurora
                   ` (38 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Jan Blunck, Valerie Aurora, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

do_whiteout() allows removal of a directory when it has whiteouts but
is logically empty.

XXX - This patch abuses readdir() to check if the union directory is
logically empty - that is, all the entries are whiteouts (or "." or
"..").  Currently, we have no clean VFS interface to ask the lower
file system if a directory is empty.

Fixes:
 - Add ->is_directory_empty() op
 - Add is_directory_empty flag to dentry (ugly dcache populate)
 - Ask underlying fs to remove it and look for an error return
 - (your idea here)

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namei.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9136c68..ce54ed4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2221,6 +2221,90 @@ static int vfs_whiteout(struct inode *dir, struct dentry *old_dentry, int isdir)
 }
 
 /*
+ * XXX - We are abusing readdir to check if a union directory is
+ * logically empty.
+ */
+static int filldir_is_empty(void *__buf, const char *name, int namlen,
+			    loff_t offset, u64 ino, unsigned int d_type)
+{
+	int *is_empty = (int *)__buf;
+
+	switch (namlen) {
+	case 2:
+		if (name[1] != '.')
+			break;
+	case 1:
+		if (name[0] != '.')
+			break;
+		return 0;
+	}
+
+	if (d_type == DT_WHT)
+		return 0;
+
+	(*is_empty) = 0;
+	return 0;
+}
+
+static int directory_is_empty(struct path *path)
+{
+	struct file *file;
+	int err;
+	int is_empty = 1;
+
+	BUG_ON(!S_ISDIR(path->dentry->d_inode->i_mode));
+
+	/* references for the file pointer */
+	path_get(path);
+
+	file = dentry_open(path->dentry, path->mnt, O_RDONLY, current_cred());
+	if (IS_ERR(file))
+		return 0;
+
+	err = vfs_readdir(file, filldir_is_empty, &is_empty);
+
+	fput(file);
+	return is_empty;
+}
+
+static int do_whiteout(struct nameidata *nd, struct path *path, int isdir)
+{
+	struct path safe = nd->path;
+	struct dentry *dentry = path->dentry;
+	int err;
+
+	path_get(&safe);
+
+	err = may_delete(nd->path.dentry->d_inode, dentry, isdir);
+	if (err)
+		goto out;
+
+	err = -ENOTEMPTY;
+	if (isdir && !directory_is_empty(path))
+		goto out;
+
+	if (nd->path.dentry != dentry->d_parent) {
+		dentry = __lookup_hash(&path->dentry->d_name, nd->path.dentry,
+				       nd);
+		err = PTR_ERR(dentry);
+		if (IS_ERR(dentry))
+			goto out;
+
+		dput(path->dentry);
+		if (path->mnt != safe.mnt)
+			mntput(path->mnt);
+		path->mnt = nd->path.mnt;
+		path->dentry = dentry;
+	}
+
+	err = vfs_whiteout(nd->path.dentry->d_inode, dentry, isdir);
+
+out:
+	path_put(&safe);
+	return err;
+}
+
+/*
  * We try to drop the dentry early: we should have
  * a usage count of 2 if we're the only user of this
  * dentry, and if that is true (possibly after pruning
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 09/74] whiteout: tmpfs whiteout support
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (7 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 08/74] whiteout: Allow removal of a directory with whiteouts Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 10/74] ext2: Add ext2_dirent_in_use() Valerie Aurora
                   ` (37 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux
  Cc: viro, Jan Blunck, David Woodhouse, Valerie Aurora, Hugh Dickins,
	linux-mm, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

Add support for whiteout dentries to tmpfs.  This includes adding
support for whiteouts to d_genocide(), which is called to tear down
pinned tmpfs dentries.  Whiteouts have to be persistent, so they have
a pinning extra ref count that needs to be dropped by d_genocide().

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: linux-mm@kvack.org
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/dcache.c |   13 +++++-
 mm/shmem.c  |  145 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 143 insertions(+), 15 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 28975dd..9358dbc 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2337,7 +2337,18 @@ resume:
 		struct list_head *tmp = next;
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
-		if (d_unhashed(dentry)||!dentry->d_inode)
+		/*
+		 * Skip unhashed and negative dentries, but process
+		 * positive dentries and whiteouts.  A whiteout looks
+		 * kind of like a negative dentry for purposes of
+		 * lookup, but it has an extra pinning ref count
+		 * because it can't be evicted like a negative dentry
+		 * can.  What we care about here is ref counts - and
+		 * we need to drop the ref count on a whiteout before
+		 * we can evict it.
+		 */
+		if (d_unhashed(dentry)||(!dentry->d_inode &&
+					 !d_is_whiteout(dentry)))
 			continue;
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
diff --git a/mm/shmem.c b/mm/shmem.c
index 080b09a..0ac3af3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1831,6 +1831,76 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
 	return 0;
 }
 
+static int shmem_rmdir(struct inode *dir, struct dentry *dentry);
+static int shmem_unlink(struct inode *dir, struct dentry *dentry);
+
+/*
+ * This is the whiteout support for tmpfs. It uses one singleton whiteout
+ * inode per superblock thus it is very similar to shmem_link().
+ */
+static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
+			  struct dentry *new_dentry)
+{
+	struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
+	struct dentry *dentry;
+
+	if (!(dir->i_sb->s_flags & MS_WHITEOUT))
+		return -EPERM;
+
+	/* This gives us a proper initialized negative dentry */
+	dentry = simple_lookup(dir, new_dentry, NULL);
+	if (dentry && IS_ERR(dentry))
+		return PTR_ERR(dentry);
+
+	/*
+	 * No ordinary (disk based) filesystem counts whiteouts as inodes;
+	 * but each new link needs a new dentry, pinning lowmem, and
+	 * tmpfs dentries cannot be pruned until they are unlinked.
+	 */
+	if (sbinfo->max_inodes) {
+		spin_lock(&sbinfo->stat_lock);
+		if (!sbinfo->free_inodes) {
+			spin_unlock(&sbinfo->stat_lock);
+			return -ENOSPC;
+		}
+		sbinfo->free_inodes--;
+		spin_unlock(&sbinfo->stat_lock);
+	}
+
+	if (old_dentry->d_inode) {
+		if (S_ISDIR(old_dentry->d_inode->i_mode))
+			shmem_rmdir(dir, old_dentry);
+		else
+			shmem_unlink(dir, old_dentry);
+	}
+
+	dir->i_size += BOGO_DIRENT_SIZE;
+	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+	/* Extra pinning count for the created dentry */
+	dget(new_dentry);
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	return 0;
+}
+
+static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
+				struct inode *inode)
+{
+	if (d_is_whiteout(dentry)) {
+		/* Re-using an existing whiteout */
+		shmem_free_inode(dir->i_sb);
+		if (S_ISDIR(inode->i_mode))
+			inode->i_mode |= S_OPAQUE;
+	} else {
+		/* New dentry */
+		dir->i_size += BOGO_DIRENT_SIZE;
+		dget(dentry); /* Extra count - pin the dentry in core */
+	}
+	/* Will clear DCACHE_WHITEOUT flag */
+	d_instantiate(dentry, inode);
+
+}
 /*
  * File creation. Allocate an inode, and we're done..
  */
@@ -1859,10 +1929,8 @@ shmem_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
 #else
 		error = 0;
 #endif
-		dir->i_size += BOGO_DIRENT_SIZE;
+		shmem_d_instantiate(dir, dentry, inode);
 		dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-		d_instantiate(dentry, inode);
-		dget(dentry); /* Extra count - pin the dentry in core */
 	}
 	return error;
 }
@@ -1900,12 +1968,11 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
 	if (ret)
 		goto out;
 
-	dir->i_size += BOGO_DIRENT_SIZE;
+	shmem_d_instantiate(dir, dentry, inode);
+
 	inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
 	inc_nlink(inode);
 	atomic_inc(&inode->i_count);	/* New dentry reference */
-	dget(dentry);		/* Extra pinning count for the created dentry */
-	d_instantiate(dentry, inode);
 out:
 	return ret;
 }
@@ -1914,21 +1981,61 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
-		shmem_free_inode(inode->i_sb);
+	if (d_is_whiteout(dentry) || (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)))
+		shmem_free_inode(dir->i_sb);
 
+	if (inode) {
+		inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+		drop_nlink(inode);
+	}
 	dir->i_size -= BOGO_DIRENT_SIZE;
-	inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-	drop_nlink(inode);
 	dput(dentry);	/* Undo the count from "create" - this does all the work */
 	return 0;
 }
 
+static void shmem_dir_unlink_whiteouts(struct inode *dir, struct dentry *dentry)
+{
+	if (!dentry->d_inode)
+		return;
+
+	/* Remove whiteouts from logical empty directory */
+	if (S_ISDIR(dentry->d_inode->i_mode) &&
+	    dentry->d_inode->i_sb->s_flags & MS_WHITEOUT) {
+		struct dentry *child, *next;
+		LIST_HEAD(list);
+
+		spin_lock(&dcache_lock);
+		list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
+			spin_lock(&child->d_lock);
+			if (d_is_whiteout(child)) {
+				__d_drop(child);
+				if (!list_empty(&child->d_lru)) {
+					list_del(&child->d_lru);
+					dentry_stat.nr_unused--;
+				}
+				list_add(&child->d_lru, &list);
+			}
+			spin_unlock(&child->d_lock);
+		}
+		spin_unlock(&dcache_lock);
+
+		list_for_each_entry_safe(child, next, &list, d_lru) {
+			spin_lock(&child->d_lock);
+			list_del_init(&child->d_lru);
+			spin_unlock(&child->d_lock);
+
+			shmem_unlink(dentry->d_inode, child);
+		}
+	}
+}
+
 static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
 {
 	if (!simple_empty(dentry))
 		return -ENOTEMPTY;
 
+	/* Remove whiteouts from logical empty directory */
+	shmem_dir_unlink_whiteouts(dir, dentry);
 	drop_nlink(dentry->d_inode);
 	drop_nlink(dir);
 	return shmem_unlink(dir, dentry);
@@ -1937,7 +2044,7 @@ static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
 /*
  * The VFS layer already does all the dentry stuff for rename,
  * we just have to decrement the usage count for the target if
- * it exists so that the VFS layer correctly free's it when it
+ * it exists so that the VFS layer correctly frees it when it
  * gets overwritten.
  */
 static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry)
@@ -1948,7 +2055,12 @@ static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct
 	if (!simple_empty(new_dentry))
 		return -ENOTEMPTY;
 
+	if (d_is_whiteout(new_dentry))
+		shmem_unlink(new_dir, new_dentry);
+
 	if (new_dentry->d_inode) {
+		/* Remove whiteouts from logical empty directory */
+		shmem_dir_unlink_whiteouts(new_dir, new_dentry);
 		(void) shmem_unlink(new_dir, new_dentry);
 		if (they_are_dirs)
 			drop_nlink(old_dir);
@@ -2013,10 +2125,8 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
 		unlock_page(page);
 		page_cache_release(page);
 	}
-	dir->i_size += BOGO_DIRENT_SIZE;
+	shmem_d_instantiate(dir, dentry, inode);
 	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-	d_instantiate(dentry, inode);
-	dget(dentry);
 	return 0;
 }
 
@@ -2394,6 +2504,12 @@ int shmem_fill_super(struct super_block *sb, void *data, int silent)
 	if (!root)
 		goto failed_iput;
 	sb->s_root = root;
+
+#ifdef CONFIG_TMPFS
+	if (!(sb->s_flags & MS_NOUSER))
+		sb->s_flags |= MS_WHITEOUT;
+#endif
+
 	return 0;
 
 failed_iput:
@@ -2493,6 +2609,7 @@ static const struct inode_operations shmem_dir_inode_operations = {
 	.rmdir		= shmem_rmdir,
 	.mknod		= shmem_mknod,
 	.rename		= shmem_rename,
+	.whiteout       = shmem_whiteout,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	.setattr	= shmem_notify_change,
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 10/74] ext2: Add ext2_dirent_in_use()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (8 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 09/74] whiteout: tmpfs whiteout support Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 11/74] ext2: Split ext2_add_entry() from ext2_add_link() Valerie Aurora
                   ` (36 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Currently ext2 checks if a directory entry is in-use by checking if
the inode is non-zero.  Fallthrus and whiteouts will have zero inode
but be in-use.  Add a function to abstract out the directory entry
in-use test.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/ext2/dir.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 7641098..79987ab 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -209,6 +209,11 @@ fail:
 	return ERR_PTR(-EIO);
 }
 
+static inline int ext2_dirent_in_use(struct ext2_dir_entry_2 *de)
+{
+	return (de->inode);
+}
+
 /*
  * NOTE! unlike strncmp, ext2_match returns 1 for success, 0 for failure.
  *
@@ -219,7 +224,7 @@ static inline int ext2_match (int len, const char * const name,
 {
 	if (len != de->name_len)
 		return 0;
-	if (!de->inode)
+	if (!ext2_dirent_in_use(de))
 		return 0;
 	return !memcmp(name, de->name, len);
 }
@@ -518,6 +523,7 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 				rec_len = chunk_size;
 				de->rec_len = ext2_rec_len_to_disk(chunk_size);
 				de->inode = 0;
+				de->file_type = 0;
 				goto got_it;
 			}
 			if (de->rec_len == 0) {
@@ -531,7 +537,7 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 				goto out_unlock;
 			name_len = EXT2_DIR_REC_LEN(de->name_len);
 			rec_len = ext2_rec_len_from_disk(de->rec_len);
-			if (!de->inode && rec_len >= reclen)
+			if (!ext2_dirent_in_use(de) && rec_len >= reclen)
 				goto got_it;
 			if (rec_len >= name_len + reclen)
 				goto got_it;
@@ -549,7 +555,7 @@ got_it:
 	err = ext2_prepare_chunk(page, pos, rec_len);
 	if (err)
 		goto out_unlock;
-	if (de->inode) {
+	if (ext2_dirent_in_use(de)) {
 		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
 		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
 		de->rec_len = ext2_rec_len_to_disk(name_len);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 11/74] ext2: Split ext2_add_entry() from ext2_add_link()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (9 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 10/74] ext2: Add ext2_dirent_in_use() Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 12/74] whiteout: ext2 whiteout support Valerie Aurora
                   ` (35 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux
  Cc: viro, Valerie Aurora, Jan Kara, linux-ext4, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Allow future code to use the guts of ext2_add_link().

Cc: Jan Kara <jack@suse.cz>
Cc: linux-ext4@kernel.org
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/ext2/dir.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 79987ab..74b23b0 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -480,10 +480,7 @@ void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
 	mark_inode_dirty(dir);
 }
 
-/*
- *	Parent is locked.
- */
-int ext2_add_link (struct dentry *dentry, struct inode *inode)
+int ext2_add_entry (struct dentry *dentry, struct inode *inode)
 {
 	struct inode *dir = dentry->d_parent->d_inode;
 	const char *name = dentry->d_name.name;
@@ -579,6 +576,11 @@ out_unlock:
 	goto out_put;
 }
 
+int ext2_add_link (struct dentry *dentry, struct inode *inode)
+{
+	return ext2_add_entry(dentry, inode);
+}
+
 /*
  * ext2_delete_entry deletes a directory entry by merging it with the
  * previous entry. Page is up-to-date. Releases the page.
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 12/74] whiteout: ext2 whiteout support
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (10 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 11/74] ext2: Split ext2_add_entry() from ext2_add_link() Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 13/74] whiteout: jffs2 " Valerie Aurora
                   ` (34 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Add support for whiteouts to ext2.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/ext2/dir.c           |   65 +++++++++++++++++++++++++++++++++++++++++------
 fs/ext2/ext2.h          |    2 +
 fs/ext2/inode.c         |   11 ++++++--
 fs/ext2/namei.c         |   63 +++++++++++++++++++++++++++++++++++++++++++--
 fs/ext2/super.c         |    4 +++
 include/linux/ext2_fs.h |    4 +++
 6 files changed, 135 insertions(+), 14 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 74b23b0..6fa1217 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -211,7 +211,8 @@ fail:
 
 static inline int ext2_dirent_in_use(struct ext2_dir_entry_2 *de)
 {
-	return (de->inode);
+	return (de->inode ||
+		(de->file_type == EXT2_FT_WHT));
 }
 
 /*
@@ -260,6 +261,7 @@ static unsigned char ext2_filetype_table[EXT2_FT_MAX] = {
 	[EXT2_FT_FIFO]		= DT_FIFO,
 	[EXT2_FT_SOCK]		= DT_SOCK,
 	[EXT2_FT_SYMLINK]	= DT_LNK,
+	[EXT2_FT_WHT]		= DT_WHT,
 };
 
 #define S_SHIFT 12
@@ -458,6 +460,26 @@ static int ext2_prepare_chunk(struct page *page, loff_t pos, unsigned len)
 	return __block_write_begin(page, pos, len, ext2_get_block);
 }
 
+/* Special version for filetype based whiteout support */
+ino_t ext2_inode_by_dentry(struct inode *dir, struct dentry *dentry)
+{
+	ino_t res = 0;
+	struct ext2_dir_entry_2 *de;
+	struct page *page;
+
+	de = ext2_find_entry (dir, &dentry->d_name, &page);
+	if (de) {
+		res = le32_to_cpu(de->inode);
+		if (!res && de->file_type == EXT2_FT_WHT) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_WHITEOUT;
+			spin_unlock(&dentry->d_lock);
+		}
+		ext2_put_page(page);
+	}
+	return res;
+}
+
 /* Releases the page */
 void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
 		   struct page *page, struct inode *inode, int update_times)
@@ -480,7 +502,9 @@ void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
 	mark_inode_dirty(dir);
 }
 
-int ext2_add_entry (struct dentry *dentry, struct inode *inode)
+int ext2_add_entry (struct dentry *dentry, struct inode *inode,
+		    ext2_dirent *de, struct page *page,
+		    int new_file_type)
 {
 	struct inode *dir = dentry->d_parent->d_inode;
 	const char *name = dentry->d_name.name;
@@ -488,8 +512,6 @@ int ext2_add_entry (struct dentry *dentry, struct inode *inode)
 	unsigned chunk_size = ext2_chunk_size(dir);
 	unsigned reclen = EXT2_DIR_REC_LEN(namelen);
 	unsigned short rec_len, name_len;
-	struct page *page = NULL;
-	ext2_dirent * de;
 	unsigned long npages = dir_pages(dir);
 	unsigned long n;
 	char *kaddr;
@@ -547,12 +569,27 @@ int ext2_add_entry (struct dentry *dentry, struct inode *inode)
 	return -EINVAL;
 
 got_it:
+	/*
+	 * Pre-existing entries with the same name are allowable
+	 * depending on the type of the entry being created.  Regular
+	 * entries replace whiteouts.  Whiteouts replace regular
+	 * entries.
+	 */
+	err = -EEXIST;
+	if (ext2_match(namelen, name, de)) {
+		if (new_file_type == EXT2_FT_WHT) {
+			if (de->file_type == EXT2_FT_WHT)
+				goto out_unlock;
+		} else if (de->file_type != EXT2_FT_WHT) {
+			goto out_unlock;
+		}
+	}
 	pos = page_offset(page) +
 		(char*)de - (char*)page_address(page);
 	err = ext2_prepare_chunk(page, pos, rec_len);
 	if (err)
 		goto out_unlock;
-	if (ext2_dirent_in_use(de)) {
+	if (ext2_dirent_in_use(de) && !ext2_match (namelen, name, de)) {
 		ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
 		de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
 		de->rec_len = ext2_rec_len_to_disk(name_len);
@@ -560,8 +597,13 @@ got_it:
 	}
 	de->name_len = namelen;
 	memcpy(de->name, name, namelen);
-	de->inode = cpu_to_le32(inode->i_ino);
-	ext2_set_de_type (de, inode);
+	if (inode) {
+		de->inode = cpu_to_le32(inode->i_ino);
+		ext2_set_de_type (de, inode);
+	} else {
+		de->inode = 0;
+		de->file_type = new_file_type;
+	}
 	err = ext2_commit_chunk(page, pos, rec_len);
 	dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
 	EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
@@ -578,7 +620,14 @@ out_unlock:
 
 int ext2_add_link (struct dentry *dentry, struct inode *inode)
 {
-	return ext2_add_entry(dentry, inode);
+	ext2_dirent *de = NULL;
+	struct page *page = NULL;
+	return ext2_add_entry(dentry, inode, de, page, 0);
+}
+
+int ext2_whiteout_entry (struct dentry *dentry, ext2_dirent *de, struct page *page)
+{
+	return ext2_add_entry(dentry, NULL, de, page, EXT2_FT_WHT);
 }
 
 /*
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 416daa6..799dedb 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -102,9 +102,11 @@ extern void ext2_rsv_window_add(struct super_block *sb, struct ext2_reserve_wind
 /* dir.c */
 extern int ext2_add_link (struct dentry *, struct inode *);
 extern ino_t ext2_inode_by_name(struct inode *, struct qstr *);
+extern ino_t ext2_inode_by_dentry(struct inode *, struct dentry *);
 extern int ext2_make_empty(struct inode *, struct inode *);
 extern struct ext2_dir_entry_2 * ext2_find_entry (struct inode *,struct qstr *, struct page **);
 extern int ext2_delete_entry (struct ext2_dir_entry_2 *, struct page *);
+extern int ext2_whiteout_entry (struct dentry *, struct ext2_dir_entry_2 *, struct page *);
 extern int ext2_empty_dir (struct inode *);
 extern struct ext2_dir_entry_2 * ext2_dotdot (struct inode *, struct page **);
 extern void ext2_set_link(struct inode *, struct ext2_dir_entry_2 *, struct page *, struct inode *, int);
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 940c961..fb948f5 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1258,7 +1258,8 @@ void ext2_set_inode_flags(struct inode *inode)
 {
 	unsigned int flags = EXT2_I(inode)->i_flags;
 
-	inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC);
+	inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|
+			    S_OPAQUE);
 	if (flags & EXT2_SYNC_FL)
 		inode->i_flags |= S_SYNC;
 	if (flags & EXT2_APPEND_FL)
@@ -1269,6 +1270,8 @@ void ext2_set_inode_flags(struct inode *inode)
 		inode->i_flags |= S_NOATIME;
 	if (flags & EXT2_DIRSYNC_FL)
 		inode->i_flags |= S_DIRSYNC;
+	if (flags & EXT2_OPAQUE_FL)
+		inode->i_flags |= S_OPAQUE;
 }
 
 /* Propagate flags from i_flags to EXT2_I(inode)->i_flags */
@@ -1276,8 +1279,8 @@ void ext2_get_inode_flags(struct ext2_inode_info *ei)
 {
 	unsigned int flags = ei->vfs_inode.i_flags;
 
-	ei->i_flags &= ~(EXT2_SYNC_FL|EXT2_APPEND_FL|
-			EXT2_IMMUTABLE_FL|EXT2_NOATIME_FL|EXT2_DIRSYNC_FL);
+	ei->i_flags &= ~(EXT2_SYNC_FL|EXT2_APPEND_FL|EXT2_IMMUTABLE_FL|
+			 EXT2_NOATIME_FL|EXT2_DIRSYNC_FL|EXT2_OPAQUE_FL);
 	if (flags & S_SYNC)
 		ei->i_flags |= EXT2_SYNC_FL;
 	if (flags & S_APPEND)
@@ -1288,6 +1291,8 @@ void ext2_get_inode_flags(struct ext2_inode_info *ei)
 		ei->i_flags |= EXT2_NOATIME_FL;
 	if (flags & S_DIRSYNC)
 		ei->i_flags |= EXT2_DIRSYNC_FL;
+	if (flags & S_OPAQUE)
+		ei->i_flags |= EXT2_OPAQUE_FL;
 }
 
 struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 71efb0e..08ad675 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -55,15 +55,16 @@ static inline int ext2_add_nondir(struct dentry *dentry, struct inode *inode)
  * Methods themselves.
  */
 
-static struct dentry *ext2_lookup(struct inode * dir, struct dentry *dentry, struct nameidata *nd)
+static struct dentry *ext2_lookup(struct inode * dir, struct dentry *dentry,
+				  struct nameidata *nd)
 {
 	struct inode * inode;
 	ino_t ino;
-	
+
 	if (dentry->d_name.len > EXT2_NAME_LEN)
 		return ERR_PTR(-ENAMETOOLONG);
 
-	ino = ext2_inode_by_name(dir, &dentry->d_name);
+	ino = ext2_inode_by_dentry(dir, dentry);
 	inode = NULL;
 	if (ino) {
 		inode = ext2_iget(dir->i_sb, ino);
@@ -307,6 +308,61 @@ static int ext2_rmdir (struct inode * dir, struct dentry *dentry)
 	return err;
 }
 
+/*
+ * Create a whiteout for the dentry
+ */
+static int ext2_whiteout(struct inode *dir, struct dentry *dentry,
+			 struct dentry *new_dentry)
+{
+	struct inode * inode = dentry->d_inode;
+	struct ext2_dir_entry_2 * de = NULL;
+	struct page * page;
+	int err = -ENOTEMPTY;
+
+	if (!EXT2_HAS_INCOMPAT_FEATURE(dir->i_sb,
+				       EXT2_FEATURE_INCOMPAT_FILETYPE)) {
+		ext2_error (dir->i_sb, "ext2_whiteout",
+			    "can't set whiteout filetype");
+		err = -EPERM;
+		goto out;
+	}
+
+	dquot_initialize(dir);
+
+	if (inode) {
+		if (S_ISDIR(inode->i_mode) && !ext2_empty_dir(inode))
+			goto out;
+
+		err = -ENOENT;
+		de = ext2_find_entry(dir, &dentry->d_name, &page);
+		if (!de)
+			goto out;
+		lock_page(page);
+	}
+
+	err = ext2_whiteout_entry(dentry, de, page);
+	if (err)
+		goto out;
+
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	d_add(new_dentry, NULL);
+
+	if (inode) {
+		inode->i_ctime = dir->i_ctime;
+		inode_dec_link_count(inode);
+		if (S_ISDIR(inode->i_mode)) {
+			inode->i_size = 0;
+			inode_dec_link_count(inode);
+			inode_dec_link_count(dir);
+		}
+	}
+	err = 0;
+out:
+	return err;
+}
+
 static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
 	struct inode * new_dir,	struct dentry * new_dentry )
 {
@@ -409,6 +465,7 @@ const struct inode_operations ext2_dir_inode_operations = {
 	.mkdir		= ext2_mkdir,
 	.rmdir		= ext2_rmdir,
 	.mknod		= ext2_mknod,
+	.whiteout	= ext2_whiteout,
 	.rename		= ext2_rename,
 #ifdef CONFIG_EXT2_FS_XATTR
 	.setxattr	= generic_setxattr,
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 1ec6026..b941b99 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1080,6 +1080,10 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 	if (EXT2_HAS_COMPAT_FEATURE(sb, EXT3_FEATURE_COMPAT_HAS_JOURNAL))
 		ext2_msg(sb, KERN_WARNING,
 			"warning: mounting ext3 filesystem as ext2");
+
+	if (EXT2_HAS_INCOMPAT_FEATURE(sb, EXT2_FEATURE_INCOMPAT_WHITEOUT))
+		sb->s_flags |= MS_WHITEOUT;
+
 	if (ext2_setup_super (sb, es, sb->s_flags & MS_RDONLY))
 		sb->s_flags |= MS_RDONLY;
 	ext2_write_super(sb);
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index 2dfa707..b0fb356 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -189,6 +189,7 @@ struct ext2_group_desc
 #define EXT2_NOTAIL_FL			FS_NOTAIL_FL	/* file tail should not be merged */
 #define EXT2_DIRSYNC_FL			FS_DIRSYNC_FL	/* dirsync behaviour (directories only) */
 #define EXT2_TOPDIR_FL			FS_TOPDIR_FL	/* Top of directory hierarchies*/
+#define EXT2_OPAQUE_FL			FS_OPAQUE_FL	/* Dir is opaque */
 #define EXT2_RESERVED_FL		FS_RESERVED_FL	/* reserved for ext2 lib */
 
 #define EXT2_FL_USER_VISIBLE		FS_FL_USER_VISIBLE	/* User visible flags */
@@ -503,10 +504,12 @@ struct ext2_super_block {
 #define EXT3_FEATURE_INCOMPAT_RECOVER		0x0004
 #define EXT3_FEATURE_INCOMPAT_JOURNAL_DEV	0x0008
 #define EXT2_FEATURE_INCOMPAT_META_BG		0x0010
+#define EXT2_FEATURE_INCOMPAT_WHITEOUT		0x0020
 #define EXT2_FEATURE_INCOMPAT_ANY		0xffffffff
 
 #define EXT2_FEATURE_COMPAT_SUPP	EXT2_FEATURE_COMPAT_EXT_ATTR
 #define EXT2_FEATURE_INCOMPAT_SUPP	(EXT2_FEATURE_INCOMPAT_FILETYPE| \
+					 EXT2_FEATURE_INCOMPAT_WHITEOUT| \
 					 EXT2_FEATURE_INCOMPAT_META_BG)
 #define EXT2_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
@@ -573,6 +576,7 @@ enum {
 	EXT2_FT_FIFO		= 5,
 	EXT2_FT_SOCK		= 6,
 	EXT2_FT_SYMLINK		= 7,
+	EXT2_FT_WHT		= 8,
 	EXT2_FT_MAX
 };
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 13/74] whiteout: jffs2 whiteout support
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (11 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 12/74] whiteout: ext2 whiteout support Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 14/74] fallthru: Basic fallthru definitions Valerie Aurora
                   ` (33 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux
  Cc: viro, Felix Fietkau, Valerie Aurora, David Woodhouse, linux-mtd,
	Valerie Aurora

From: Felix Fietkau <nbd@openwrt.org>

Add support for whiteout dentries to jffs2.

XXX - David Woodhouse suggests several changes and provides an
untested patch.  See:

http://patchwork.ozlabs.org/patch/50466/

XXX - Backward compatibility?  Creating a whiteout on a JFFS2 file
system can only happen if it is deliberately mounted "-o union" so
there is some way to prevent creation of whiteouts on a file system
you want to later mount with an earlier (no support for whiteout) file
system.  However, ext2/3 has much more robust methods (explicit fs
feature flag) to prevent such an occurance.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/jffs2/dir.c        |   72 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/jffs2/fs.c         |    4 +++
 fs/jffs2/super.c      |    2 +-
 include/linux/jffs2.h |    2 +
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index ed78a3c..a5dbf12 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -35,6 +35,8 @@ static int jffs2_mknod (struct inode *,struct dentry *,int,dev_t);
 static int jffs2_rename (struct inode *, struct dentry *,
 			 struct inode *, struct dentry *);
 
+static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *);
+
 const struct file_operations jffs2_dir_operations =
 {
 	.read =		generic_read_dir,
@@ -57,6 +59,7 @@ const struct inode_operations jffs2_dir_inode_operations =
 	.mknod =	jffs2_mknod,
 	.rename =	jffs2_rename,
 	.check_acl =	jffs2_check_acl,
+	.whiteout =     jffs2_whiteout,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
@@ -99,8 +102,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target,
 			fd = fd_list;
 		}
 	}
-	if (fd)
-		ino = fd->ino;
+	if (fd) {
+		spin_lock(&target->d_lock);
+		if (fd->type == DT_WHT)
+			target->d_flags |= DCACHE_WHITEOUT;
+		else
+			ino = fd->ino;
+		spin_unlock(&target->d_lock);
+	}
 	mutex_unlock(&dir_f->sem);
 	if (ino) {
 		inode = jffs2_iget(dir_i->i_sb, ino);
@@ -499,6 +508,11 @@ static int jffs2_mkdir (struct inode *dir_i, struct dentry *dentry, int mode)
 		return PTR_ERR(inode);
 	}
 
+	if (dentry->d_flags & DCACHE_WHITEOUT) {
+		inode->i_flags |= S_OPAQUE;
+		ri->flags = cpu_to_je16(JFFS2_INO_FLAG_OPAQUE);
+	}
+
 	inode->i_op = &jffs2_dir_inode_operations;
 	inode->i_fop = &jffs2_dir_operations;
 
@@ -777,6 +791,60 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, int mode, de
 	return ret;
 }
 
+static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
+			   struct dentry *new_dentry)
+{
+	struct jffs2_sb_info *c = JFFS2_SB_INFO(dir->i_sb);
+	struct jffs2_inode_info *victim_f = NULL;
+	uint32_t now;
+	int ret;
+
+	/* If it's a directory, then check whether it is really empty */
+	if (new_dentry->d_inode) {
+		victim_f = JFFS2_INODE_INFO(old_dentry->d_inode);
+		if (S_ISDIR(old_dentry->d_inode->i_mode)) {
+			struct jffs2_full_dirent *fd;
+
+			mutex_lock(&victim_f->sem);
+			for (fd = victim_f->dents; fd; fd = fd->next) {
+				if (fd->ino) {
+					mutex_unlock(&victim_f->sem);
+					return -ENOTEMPTY;
+				}
+			}
+			mutex_unlock(&victim_f->sem);
+		}
+	}
+
+	now = get_seconds();
+	ret = jffs2_do_link(c, JFFS2_INODE_INFO(dir), 0, DT_WHT,
+			    new_dentry->d_name.name, new_dentry->d_name.len, now);
+	if (ret)
+		return ret;
+
+	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags |= DCACHE_WHITEOUT;
+	spin_unlock(&new_dentry->d_lock);
+	d_add(new_dentry, NULL);
+
+	if (victim_f) {
+		/* There was a victim. Kill it off nicely */
+		drop_nlink(old_dentry->d_inode);
+		/* Don't oops if the victim was a dirent pointing to an
+		   inode which didn't exist. */
+		if (victim_f->inocache) {
+			mutex_lock(&victim_f->sem);
+			if (S_ISDIR(old_dentry->d_inode->i_mode))
+				victim_f->inocache->pino_nlink = 0;
+			else
+				victim_f->inocache->pino_nlink--;
+			mutex_unlock(&victim_f->sem);
+		}
+	}
+
+	return 0;
+}
+
 static int jffs2_rename (struct inode *old_dir_i, struct dentry *old_dentry,
 			 struct inode *new_dir_i, struct dentry *new_dentry)
 {
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
index 6b2964a..9492b9b 100644
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -304,6 +304,10 @@ struct inode *jffs2_iget(struct super_block *sb, unsigned long ino)
 
 		inode->i_op = &jffs2_dir_inode_operations;
 		inode->i_fop = &jffs2_dir_operations;
+
+		if (je16_to_cpu(latest_node.flags) & JFFS2_INO_FLAG_OPAQUE)
+			inode->i_flags |= S_OPAQUE;
+
 		break;
 	}
 	case S_IFREG:
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index 662bba0..69f7479 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -170,7 +170,7 @@ static int jffs2_fill_super(struct super_block *sb, void *data, int silent)
 
 	sb->s_op = &jffs2_super_operations;
 	sb->s_export_op = &jffs2_export_ops;
-	sb->s_flags = sb->s_flags | MS_NOATIME;
+	sb->s_flags = sb->s_flags | MS_NOATIME | MS_WHITEOUT;
 	sb->s_xattr = jffs2_xattr_handlers;
 #ifdef CONFIG_JFFS2_FS_POSIX_ACL
 	sb->s_flags |= MS_POSIXACL;
diff --git a/include/linux/jffs2.h b/include/linux/jffs2.h
index a18b719..6404e01 100644
--- a/include/linux/jffs2.h
+++ b/include/linux/jffs2.h
@@ -88,6 +88,8 @@
 #define JFFS2_INO_FLAG_USERCOMPR  2	/* User has requested a specific
 					   compression type */
 
+#define JFFS2_INO_FLAG_OPAQUE     4	/* Directory is opaque (for union mounts) */
+
 
 /* These can go once we've made sure we've caught all uses without
    byteswapping */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 14/74] fallthru: Basic fallthru definitions
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (12 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 13/74] whiteout: jffs2 " Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 15/74] fallthru: ext2 fallthru support Valerie Aurora
                   ` (32 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Define the fallthru dcache flag and file system op.  Mask out the
DCACHE_FALLTHRU flag on dentry creation.  Actual users and changes to
lookup come in later patches.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 Documentation/filesystems/vfs.txt |    6 ++++++
 fs/dcache.c                       |    2 +-
 include/linux/dcache.h            |    7 +++++++
 include/linux/fs.h                |    2 ++
 4 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 05c73b1..ecfc4f9 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -320,6 +320,7 @@ struct inode_operations {
 	int (*rmdir) (struct inode *,struct dentry *);
 	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
 	int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
+	int (*fallthru) (struct inode *, struct dentry *);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
 	int (*readlink) (struct dentry *, char __user *,int);
@@ -390,6 +391,11 @@ otherwise noted.
         second is the dentry for the whiteout itself.  This method
         must unlink() or rmdir() the original entry if it exists.
 
+  fallthru: called by the readdir(2) system call on a layered file
+        system.  Only required if you want to support fallthrus.
+        Fallthrus are place-holders for directory entries visible from
+        a lower level file system.
+
   rename: called by the rename(2) system call to rename the object to
 	have the parent and name given by the second inode and dentry.
 
diff --git a/fs/dcache.c b/fs/dcache.c
index 9358dbc..5a0e3f5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -994,7 +994,7 @@ EXPORT_SYMBOL(d_alloc_name);
 static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 {
 	if (inode) {
-		dentry->d_flags &= ~DCACHE_WHITEOUT;
+		dentry->d_flags &= ~(DCACHE_WHITEOUT|DCACHE_FALLTHRU);
 		list_add(&dentry->d_alias, &inode->i_dentry);
 	}
 	dentry->d_inode = inode;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 6e66b76..19ddb8c 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -190,6 +190,8 @@ d_iput:		no		no		no       yes
 
 #define DCACHE_WHITEOUT		0x0200	/* Stop lookup in a unioned file system */
 
+#define DCACHE_FALLTHRU		0x0400	/* Continue lookup below an opaque dir */
+
 extern spinlock_t dcache_lock;
 extern seqlock_t rename_lock;
 
@@ -381,6 +383,11 @@ static inline int d_is_whiteout(struct dentry *dentry)
 	return (dentry->d_flags & DCACHE_WHITEOUT);
 }
 
+static inline int d_is_fallthru(struct dentry *dentry)
+{
+	return (dentry->d_flags & DCACHE_FALLTHRU);
+}
+
 static inline struct dentry *dget_parent(struct dentry *dentry)
 {
 	struct dentry *ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9d6e72f..92d248b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -209,6 +209,7 @@ struct inodes_stat_t {
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
 #define MS_WHITEOUT	(1<<25) /* FS supports whiteout filetype */
+#define MS_FALLTHRU	(1<<26) /* FS supports fallthru filetype */
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
@@ -1525,6 +1526,7 @@ struct inode_operations {
 	int (*rmdir) (struct inode *,struct dentry *);
 	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
 	int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
+	int (*fallthru) (struct inode *, struct dentry *);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
 	int (*readlink) (struct dentry *, char __user *,int);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 15/74] fallthru: ext2 fallthru support
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (13 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 14/74] fallthru: Basic fallthru definitions Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 16/74] fallthru: tmpfs " Valerie Aurora
                   ` (31 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux
  Cc: viro, Valerie Aurora, Jan Kara, linux-ext4, Jan Blunck,
	Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Add support for fallthru directory entries to ext2.

Cc: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/ext2/dir.c           |   39 +++++++++++++++++++++++++++++++++++----
 fs/ext2/ext2.h          |    1 +
 fs/ext2/namei.c         |   22 ++++++++++++++++++++++
 fs/ext2/super.c         |    2 ++
 include/linux/ext2_fs.h |    4 ++++
 5 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 6fa1217..daff471 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -212,7 +212,8 @@ fail:
 static inline int ext2_dirent_in_use(struct ext2_dir_entry_2 *de)
 {
 	return (de->inode ||
-		(de->file_type == EXT2_FT_WHT));
+		(de->file_type == EXT2_FT_WHT) ||
+		(de->file_type == EXT2_FT_FALLTHRU));
 }
 
 /*
@@ -262,6 +263,7 @@ static unsigned char ext2_filetype_table[EXT2_FT_MAX] = {
 	[EXT2_FT_SOCK]		= DT_SOCK,
 	[EXT2_FT_SYMLINK]	= DT_LNK,
 	[EXT2_FT_WHT]		= DT_WHT,
+	[EXT2_FT_FALLTHRU]	= DT_UNKNOWN,
 };
 
 #define S_SHIFT 12
@@ -348,6 +350,18 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t filldir)
 					ext2_put_page(page);
 					return 0;
 				}
+			} else if (de->file_type == EXT2_FT_FALLTHRU) {
+				int over;
+
+				offset = (char *)de - kaddr;
+				/* XXX placeholder until generic_readdir_fallthru() arrives */
+				over = filldir(dirent, de->name, de->name_len,
+					       (n<<PAGE_CACHE_SHIFT) | offset,
+					       1, DT_UNKNOWN); /* XXX */
+				if (over) {
+					ext2_put_page(page);
+					return 0;
+				}
 			}
 			filp->f_pos += ext2_rec_len_from_disk(de->rec_len);
 		}
@@ -474,6 +488,10 @@ ino_t ext2_inode_by_dentry(struct inode *dir, struct dentry *dentry)
 			spin_lock(&dentry->d_lock);
 			dentry->d_flags |= DCACHE_WHITEOUT;
 			spin_unlock(&dentry->d_lock);
+		} else if(!res && de->file_type == EXT2_FT_FALLTHRU) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_FALLTHRU;
+			spin_unlock(&dentry->d_lock);
 		}
 		ext2_put_page(page);
 	}
@@ -572,15 +590,18 @@ got_it:
 	/*
 	 * Pre-existing entries with the same name are allowable
 	 * depending on the type of the entry being created.  Regular
-	 * entries replace whiteouts.  Whiteouts replace regular
-	 * entries.
+	 * entries replace whiteouts and fallthrus.  Whiteouts replace
+	 * regular entries.  Fallthrus replace nothing.
 	 */
 	err = -EEXIST;
 	if (ext2_match(namelen, name, de)) {
 		if (new_file_type == EXT2_FT_WHT) {
 			if (de->file_type == EXT2_FT_WHT)
 				goto out_unlock;
-		} else if (de->file_type != EXT2_FT_WHT) {
+		} else if (new_file_type == EXT2_FT_FALLTHRU) {
+			goto out_unlock;
+		} else if ((de->file_type != EXT2_FT_WHT) &&
+			   (de->file_type != EXT2_FT_FALLTHRU)) {
 			goto out_unlock;
 		}
 	}
@@ -631,6 +652,16 @@ int ext2_whiteout_entry (struct dentry *dentry, ext2_dirent *de, struct page *pa
 }
 
 /*
+ * Create a fallthru entry.
+ */
+int ext2_fallthru_entry (struct dentry *dentry)
+{
+	ext2_dirent *de = NULL;
+	struct page *page = NULL;
+	return ext2_add_entry(dentry, NULL, de, page, EXT2_FT_FALLTHRU);
+}
+
+/*
  * ext2_delete_entry deletes a directory entry by merging it with the
  * previous entry. Page is up-to-date. Releases the page.
  */
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 799dedb..3df5572 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -107,6 +107,7 @@ extern int ext2_make_empty(struct inode *, struct inode *);
 extern struct ext2_dir_entry_2 * ext2_find_entry (struct inode *,struct qstr *, struct page **);
 extern int ext2_delete_entry (struct ext2_dir_entry_2 *, struct page *);
 extern int ext2_whiteout_entry (struct dentry *, struct ext2_dir_entry_2 *, struct page *);
+extern int ext2_fallthru_entry (struct dentry *);
 extern int ext2_empty_dir (struct inode *);
 extern struct ext2_dir_entry_2 * ext2_dotdot (struct inode *, struct page **);
 extern void ext2_set_link(struct inode *, struct ext2_dir_entry_2 *, struct page *, struct inode *, int);
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 08ad675..c65349b 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -345,6 +345,7 @@ static int ext2_whiteout(struct inode *dir, struct dentry *dentry,
 		goto out;
 
 	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags &= ~DCACHE_FALLTHRU;
 	new_dentry->d_flags |= DCACHE_WHITEOUT;
 	spin_unlock(&new_dentry->d_lock);
 	d_add(new_dentry, NULL);
@@ -363,6 +364,26 @@ out:
 	return err;
 }
 
+/*
+ * Create a fallthru entry.
+ */
+static int ext2_fallthru (struct inode *dir, struct dentry *dentry)
+{
+	int err;
+
+	dquot_initialize(dir);
+
+	err = ext2_fallthru_entry(dentry);
+	if (err)
+		return err;
+
+	d_instantiate(dentry, NULL);
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_FALLTHRU;
+	spin_unlock(&dentry->d_lock);
+	return 0;
+}
+
 static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
 	struct inode * new_dir,	struct dentry * new_dentry )
 {
@@ -466,6 +487,7 @@ const struct inode_operations ext2_dir_inode_operations = {
 	.rmdir		= ext2_rmdir,
 	.mknod		= ext2_mknod,
 	.whiteout	= ext2_whiteout,
+	.fallthru	= ext2_fallthru,
 	.rename		= ext2_rename,
 #ifdef CONFIG_EXT2_FS_XATTR
 	.setxattr	= generic_setxattr,
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index b941b99..e88329e 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1083,6 +1083,8 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 
 	if (EXT2_HAS_INCOMPAT_FEATURE(sb, EXT2_FEATURE_INCOMPAT_WHITEOUT))
 		sb->s_flags |= MS_WHITEOUT;
+	if (EXT2_HAS_INCOMPAT_FEATURE(sb, EXT2_FEATURE_INCOMPAT_FALLTHRU))
+		sb->s_flags |= MS_FALLTHRU;
 
 	if (ext2_setup_super (sb, es, sb->s_flags & MS_RDONLY))
 		sb->s_flags |= MS_RDONLY;
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index b0fb356..1a6f929 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -505,11 +505,14 @@ struct ext2_super_block {
 #define EXT3_FEATURE_INCOMPAT_JOURNAL_DEV	0x0008
 #define EXT2_FEATURE_INCOMPAT_META_BG		0x0010
 #define EXT2_FEATURE_INCOMPAT_WHITEOUT		0x0020
+/* ext3/4 incompat flags take up the intervening constants */
+#define EXT2_FEATURE_INCOMPAT_FALLTHRU		0x2000
 #define EXT2_FEATURE_INCOMPAT_ANY		0xffffffff
 
 #define EXT2_FEATURE_COMPAT_SUPP	EXT2_FEATURE_COMPAT_EXT_ATTR
 #define EXT2_FEATURE_INCOMPAT_SUPP	(EXT2_FEATURE_INCOMPAT_FILETYPE| \
 					 EXT2_FEATURE_INCOMPAT_WHITEOUT| \
+					 EXT2_FEATURE_INCOMPAT_FALLTHRU| \
 					 EXT2_FEATURE_INCOMPAT_META_BG)
 #define EXT2_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
@@ -577,6 +580,7 @@ enum {
 	EXT2_FT_SOCK		= 6,
 	EXT2_FT_SYMLINK		= 7,
 	EXT2_FT_WHT		= 8,
+	EXT2_FT_FALLTHRU	= 9,
 	EXT2_FT_MAX
 };
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 16/74] fallthru: tmpfs fallthru support
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (14 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 15/74] fallthru: ext2 fallthru support Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 17/74] fallthru: jffs2 " Valerie Aurora
                   ` (30 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Add support for fallthru directory entries to tmpfs.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/dcache.c |    3 +-
 fs/libfs.c  |   15 +++++++++++--
 mm/shmem.c  |   64 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5a0e3f5..ff3f949 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2348,7 +2348,8 @@ resume:
 		 * we can evict it.
 		 */
 		if (d_unhashed(dentry)||(!dentry->d_inode &&
-					 !d_is_whiteout(dentry)))
+					 !d_is_whiteout(dentry) &&
+					 !d_is_fallthru(dentry)))
 			continue;
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
diff --git a/fs/libfs.c b/fs/libfs.c
index 0a9da95..a73423d 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -130,6 +130,7 @@ int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
 	struct dentry *cursor = filp->private_data;
 	struct list_head *p, *q = &cursor->d_u.d_child;
 	ino_t ino;
+	char d_type;
 	int i = filp->f_pos;
 
 	switch (i) {
@@ -155,14 +156,22 @@ int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
 			for (p=q->next; p != &dentry->d_subdirs; p=p->next) {
 				struct dentry *next;
 				next = list_entry(p, struct dentry, d_u.d_child);
-				if (d_unhashed(next) || !next->d_inode)
+				if (d_unhashed(next) || (!next->d_inode && !d_is_fallthru(next)))
 					continue;
 
 				spin_unlock(&dcache_lock);
+				if (d_is_fallthru(next)) {
+					/* XXX placeholder until generic_readdir_fallthru() arrives */
+					ino = 1;
+					d_type = DT_UNKNOWN;
+				} else {
+					ino = next->d_inode->i_ino;
+					d_type = dt_type(next->d_inode);
+				}
+
 				if (filldir(dirent, next->d_name.name, 
 					    next->d_name.len, filp->f_pos, 
-					    next->d_inode->i_ino, 
-					    dt_type(next->d_inode)) < 0)
+					    ino, d_type) < 0)
 					return 0;
 				spin_lock(&dcache_lock);
 				/* next is still alive */
diff --git a/mm/shmem.c b/mm/shmem.c
index 0ac3af3..abfc275 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1835,8 +1835,7 @@ static int shmem_rmdir(struct inode *dir, struct dentry *dentry);
 static int shmem_unlink(struct inode *dir, struct dentry *dentry);
 
 /*
- * This is the whiteout support for tmpfs. It uses one singleton whiteout
- * inode per superblock thus it is very similar to shmem_link().
+ * Create a dentry to signify a whiteout.
  */
 static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
 			  struct dentry *new_dentry)
@@ -1867,8 +1866,10 @@ static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
 		spin_unlock(&sbinfo->stat_lock);
 	}
 
-	if (old_dentry->d_inode) {
-		if (S_ISDIR(old_dentry->d_inode->i_mode))
+	if (old_dentry->d_inode || d_is_fallthru(old_dentry)) {
+		/* A fallthru for a dir is treated like a regular link */
+		if (old_dentry->d_inode &&
+		    S_ISDIR(old_dentry->d_inode->i_mode))
 			shmem_rmdir(dir, old_dentry);
 		else
 			shmem_unlink(dir, old_dentry);
@@ -1885,6 +1886,48 @@ static int shmem_whiteout(struct inode *dir, struct dentry *old_dentry,
 }
 
 static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
+				struct inode *inode);
+
+/*
+ * Create a dentry to signify a fallthru.  A fallthru in tmpfs is the
+ * logical equivalent of an in-kernel readdir() cache.  It can't be
+ * deleted until the file system is unmounted.
+ */
+static int shmem_fallthru(struct inode *dir, struct dentry *dentry)
+{
+	struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
+
+	/* FIXME: this is stupid */
+	if (!(dir->i_sb->s_flags & MS_WHITEOUT))
+		return -EPERM;
+
+	if (dentry->d_inode || d_is_fallthru(dentry) || d_is_whiteout(dentry))
+		return -EEXIST;
+
+	/*
+	 * Each new link needs a new dentry, pinning lowmem, and tmpfs
+	 * dentries cannot be pruned until they are unlinked.
+	 */
+	if (sbinfo->max_inodes) {
+		spin_lock(&sbinfo->stat_lock);
+		if (!sbinfo->free_inodes) {
+			spin_unlock(&sbinfo->stat_lock);
+			return -ENOSPC;
+		}
+		sbinfo->free_inodes--;
+		spin_unlock(&sbinfo->stat_lock);
+	}
+
+	shmem_d_instantiate(dir, dentry, NULL);
+	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_FALLTHRU;
+	spin_unlock(&dentry->d_lock);
+	return 0;
+}
+
+static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
 				struct inode *inode)
 {
 	if (d_is_whiteout(dentry)) {
@@ -1892,14 +1935,15 @@ static void shmem_d_instantiate(struct inode *dir, struct dentry *dentry,
 		shmem_free_inode(dir->i_sb);
 		if (S_ISDIR(inode->i_mode))
 			inode->i_mode |= S_OPAQUE;
+	} else if (d_is_fallthru(dentry)) {
+		shmem_free_inode(dir->i_sb);
 	} else {
 		/* New dentry */
 		dir->i_size += BOGO_DIRENT_SIZE;
 		dget(dentry); /* Extra count - pin the dentry in core */
 	}
-	/* Will clear DCACHE_WHITEOUT flag */
+	/* Will clear DCACHE_WHITEOUT and DCACHE_FALLTHRU flags */
 	d_instantiate(dentry, inode);
-
 }
 /*
  * File creation. Allocate an inode, and we're done..
@@ -1981,7 +2025,8 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (d_is_whiteout(dentry) || (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)))
+	if (d_is_whiteout(dentry) || d_is_fallthru(dentry) ||
+	    (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)))
 		shmem_free_inode(dir->i_sb);
 
 	if (inode) {
@@ -2506,8 +2551,10 @@ int shmem_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_root = root;
 
 #ifdef CONFIG_TMPFS
-	if (!(sb->s_flags & MS_NOUSER))
+	if (!(sb->s_flags & MS_NOUSER)) {
 		sb->s_flags |= MS_WHITEOUT;
+		sb->s_flags |= MS_FALLTHRU;
+	}
 #endif
 
 	return 0;
@@ -2610,6 +2657,7 @@ static const struct inode_operations shmem_dir_inode_operations = {
 	.mknod		= shmem_mknod,
 	.rename		= shmem_rename,
 	.whiteout       = shmem_whiteout,
+	.fallthru       = shmem_fallthru,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	.setattr	= shmem_notify_change,
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 17/74] fallthru: jffs2 fallthru support
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (15 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 16/74] fallthru: tmpfs " Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 18/74] VFS: Add hard read-only users count to superblock Valerie Aurora
                   ` (29 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux
  Cc: viro, Felix Fietkau, David Woodhouse, linux-mtd, Valerie Aurora,
	Valerie Aurora

From: Felix Fietkau <nbd@openwrt.org>

Add support for fallthru dentries to jffs2.

XXX - untested changes from David Woodhouse and Valerie Aurora.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/jffs2/dir.c        |   44 ++++++++++++++++++++++++++++++++++++++++----
 include/linux/jffs2.h |    6 ++++++
 2 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index a5dbf12..dc0e01e 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -36,6 +36,7 @@ static int jffs2_rename (struct inode *, struct dentry *,
 			 struct inode *, struct dentry *);
 
 static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *);
+static int jffs2_fallthru (struct inode *, struct dentry *);
 
 const struct file_operations jffs2_dir_operations =
 {
@@ -60,6 +61,7 @@ const struct inode_operations jffs2_dir_inode_operations =
 	.rename =	jffs2_rename,
 	.check_acl =	jffs2_check_acl,
 	.whiteout =     jffs2_whiteout,
+	.fallthru =     jffs2_fallthru,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
@@ -104,10 +106,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target,
 	}
 	if (fd) {
 		spin_lock(&target->d_lock);
-		if (fd->type == DT_WHT)
+		switch (fd->type) {
+		case DT_WHT:
 			target->d_flags |= DCACHE_WHITEOUT;
-		else
+		case JFFS2_DT_FALLTHRU:
+			target->d_flags |= DCACHE_FALLTHRU;
+		default:
 			ino = fd->ino;
+		}
 		spin_unlock(&target->d_lock);
 	}
 	mutex_unlock(&dir_f->sem);
@@ -132,6 +138,8 @@ static int jffs2_readdir(struct file *filp, void *dirent, filldir_t filldir)
 	struct inode *inode = filp->f_path.dentry->d_inode;
 	struct jffs2_full_dirent *fd;
 	unsigned long offset, curofs;
+	ino_t ino;
+	char d_type;
 
 	D1(printk(KERN_DEBUG "jffs2_readdir() for dir_i #%lu\n", filp->f_path.dentry->d_inode->i_ino));
 
@@ -165,13 +173,20 @@ static int jffs2_readdir(struct file *filp, void *dirent, filldir_t filldir)
 				  fd->name, fd->ino, fd->type, curofs, offset));
 			continue;
 		}
-		if (!fd->ino) {
+		if (fd->type == JFFS2_DT_FALLTHRU) {
+			/* XXX placeholder until generic_readdir_fallthru() arrives */
+			ino = 1;
+			d_type = DT_UNKNOWN;
+		} else if (!fd->ino && (fd->type != DT_WHT)) {
 			D2(printk(KERN_DEBUG "Skipping deletion dirent \"%s\"\n", fd->name));
 			offset++;
 			continue;
+		} else {
+			ino = fd->ino;
+			d_type = fd->type;
 		}
 		D2(printk(KERN_DEBUG "Dirent %ld: \"%s\", ino #%u, type %d\n", offset, fd->name, fd->ino, fd->type));
-		if (filldir(dirent, fd->name, strlen(fd->name), offset, fd->ino, fd->type) < 0)
+		if (filldir(dirent, fd->name, strlen(fd->name), offset, ino, d_type) < 0)
 			break;
 		offset++;
 	}
@@ -791,6 +806,26 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, int mode, de
 	return ret;
 }
 
+static int jffs2_fallthru (struct inode *dir, struct dentry *dentry)
+{
+	struct jffs2_sb_info *c = JFFS2_SB_INFO(dir->i_sb);
+	uint32_t now;
+	int ret;
+
+	now = get_seconds();
+	ret = jffs2_do_link(c, JFFS2_INODE_INFO(dir), 0, DT_UNKNOWN,
+			    dentry->d_name.name, dentry->d_name.len, now);
+	if (ret)
+		return ret;
+
+	d_instantiate(dentry, NULL);
+	spin_lock(&dentry->d_lock);
+	dentry->d_flags |= DCACHE_FALLTHRU;
+	spin_unlock(&dentry->d_lock);
+
+	return 0;
+}
+
 static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
 			   struct dentry *new_dentry)
 {
@@ -823,6 +858,7 @@ static int jffs2_whiteout (struct inode *dir, struct dentry *old_dentry,
 		return ret;
 
 	spin_lock(&new_dentry->d_lock);
+	new_dentry->d_flags &= ~DCACHE_FALLTHRU;
 	new_dentry->d_flags |= DCACHE_WHITEOUT;
 	spin_unlock(&new_dentry->d_lock);
 	d_add(new_dentry, NULL);
diff --git a/include/linux/jffs2.h b/include/linux/jffs2.h
index 6404e01..1749127 100644
--- a/include/linux/jffs2.h
+++ b/include/linux/jffs2.h
@@ -115,6 +115,12 @@ struct jffs2_unknown_node
 	jint32_t hdr_crc;
 };
 
+/*
+ * Non-standard directory entry type(s), for on-disk use
+ */
+
+#define                JFFS2_DT_FALLTHRU       (DT_WHT + 1)
+
 struct jffs2_raw_dirent
 {
 	jint16_t magic;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 18/74] VFS: Add hard read-only users count to superblock
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (16 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 17/74] fallthru: jffs2 " Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 19/74] VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors Valerie Aurora
                   ` (28 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

While we can check if a file system is currently read-only, we can't
guarantee that it will stay read-only.  The file system can be mounted
or remounted read-write at any time.  This is a problem for union
mounts, which require the underlying file system be read-only for the
entire duration of the union mount.

Add a hard read-only users count to the superblock.  When this count
is non-zero, don't allow any read-write mounts of this super, or any
read-write remounts of existing mounts.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/super.c         |    8 ++++++++
 include/linux/fs.h |    7 +++++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 8819e3a..d02a4d6 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -126,6 +126,7 @@ static inline void destroy_super(struct super_block *s)
 #ifdef CONFIG_SMP
 	free_percpu(s->s_files);
 #endif
+	BUG_ON(s->s_hard_readonly_users);
 	security_sb_free(s);
 	kfree(s->s_subtype);
 	kfree(s->s_options);
@@ -577,6 +578,9 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 			return -EBUSY;
 	}
 
+	if (!(flags & MS_RDONLY) && sb->s_hard_readonly_users)
+		return -EROFS;
+
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		if (retval)
@@ -963,6 +967,10 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	WARN((mnt->mnt_sb->s_maxbytes < 0), "%s set sb->s_maxbytes to "
 		"negative value (%lld)\n", type->name, mnt->mnt_sb->s_maxbytes);
 
+	error = -EROFS;
+	if (!(flags & MS_RDONLY) && mnt->mnt_sb->s_hard_readonly_users)
+		goto out_sb;
+
 	mnt->mnt_mountpoint = mnt->mnt_root;
 	mnt->mnt_parent = mnt;
 	up_write(&mnt->mnt_sb->s_umount);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 92d248b..469e0ea 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1388,6 +1388,13 @@ struct super_block {
 	 * generic_show_options()
 	 */
 	char *s_options;
+
+	/*
+	 * Number of mounts requiring that the underlying file system
+	 * never transition to read-write.  Protected by s_umount.
+	 * Decremented by free_vfsmnt() if MNT_HARD_READONLY is set.
+	 */
+	int s_hard_readonly_users;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 19/74] VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (17 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 18/74] VFS: Add hard read-only users count to superblock Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 20/74] VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree() Valerie Aurora
                   ` (27 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux
  Cc: viro, Valerie Aurora, Andreas Gruenbacher, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

copy_tree() can theoretically fail in a case other than ENOMEM, but
always returns NULL which is interpreted by callers as -ENOMEM.
Convert to return an explicit error.  Convert clone_mnt() for
consistency and because union mounts will add new error cases.

Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix.

Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c      |  111 +++++++++++++++++++++++++++-----------------------
 fs/pnode.c          |    5 +-
 kernel/audit_tree.c |   10 ++--
 3 files changed, 68 insertions(+), 58 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 745feaf..79be922 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -593,53 +593,57 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 					int flag)
 {
 	struct super_block *sb = old->mnt_sb;
-	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
+	struct vfsmount *mnt;
+	int err;
 
-	if (mnt) {
-		if (flag & (CL_SLAVE | CL_PRIVATE))
-			mnt->mnt_group_id = 0; /* not a peer of original */
-		else
-			mnt->mnt_group_id = old->mnt_group_id;
-
-		if ((flag & CL_MAKE_SHARED) && !mnt->mnt_group_id) {
-			int err = mnt_alloc_group_id(mnt);
-			if (err)
-				goto out_free;
-		}
+	mnt = alloc_vfsmnt(old->mnt_devname);
+	if (!mnt)
+		return ERR_PTR(-ENOMEM);
 
-		mnt->mnt_flags = old->mnt_flags;
-		atomic_inc(&sb->s_active);
-		mnt->mnt_sb = sb;
-		mnt->mnt_root = dget(root);
-		mnt->mnt_mountpoint = mnt->mnt_root;
-		mnt->mnt_parent = mnt;
-
-		if (flag & CL_SLAVE) {
-			list_add(&mnt->mnt_slave, &old->mnt_slave_list);
-			mnt->mnt_master = old;
-			CLEAR_MNT_SHARED(mnt);
-		} else if (!(flag & CL_PRIVATE)) {
-			if ((flag & CL_MAKE_SHARED) || IS_MNT_SHARED(old))
-				list_add(&mnt->mnt_share, &old->mnt_share);
-			if (IS_MNT_SLAVE(old))
-				list_add(&mnt->mnt_slave, &old->mnt_slave);
-			mnt->mnt_master = old->mnt_master;
-		}
-		if (flag & CL_MAKE_SHARED)
-			set_mnt_shared(mnt);
-
-		/* stick the duplicate mount on the same expiry list
-		 * as the original if that was on one */
-		if (flag & CL_EXPIRE) {
-			if (!list_empty(&old->mnt_expire))
-				list_add(&mnt->mnt_expire, &old->mnt_expire);
-		}
+	if (flag & (CL_SLAVE | CL_PRIVATE))
+		mnt->mnt_group_id = 0; /* not a peer of original */
+	else
+		mnt->mnt_group_id = old->mnt_group_id;
+
+	if ((flag & CL_MAKE_SHARED) && !mnt->mnt_group_id) {
+		err = mnt_alloc_group_id(mnt);
+		if (err)
+			goto out_free;
 	}
+
+	mnt->mnt_flags = old->mnt_flags;
+	atomic_inc(&sb->s_active);
+	mnt->mnt_sb = sb;
+	mnt->mnt_root = dget(root);
+	mnt->mnt_mountpoint = mnt->mnt_root;
+	mnt->mnt_parent = mnt;
+
+	if (flag & CL_SLAVE) {
+		list_add(&mnt->mnt_slave, &old->mnt_slave_list);
+		mnt->mnt_master = old;
+		CLEAR_MNT_SHARED(mnt);
+	} else if (!(flag & CL_PRIVATE)) {
+		if ((flag & CL_MAKE_SHARED) || IS_MNT_SHARED(old))
+			list_add(&mnt->mnt_share, &old->mnt_share);
+		if (IS_MNT_SLAVE(old))
+			list_add(&mnt->mnt_slave, &old->mnt_slave);
+		mnt->mnt_master = old->mnt_master;
+	}
+	if (flag & CL_MAKE_SHARED)
+		set_mnt_shared(mnt);
+
+	/* stick the duplicate mount on the same expiry list
+	 * as the original if that was on one */
+	if (flag & CL_EXPIRE) {
+		if (!list_empty(&old->mnt_expire))
+			list_add(&mnt->mnt_expire, &old->mnt_expire);
+	}
+
 	return mnt;
 
  out_free:
 	free_vfsmnt(mnt);
-	return NULL;
+	return ERR_PTR(err);
 }
 
 static inline void __mntput(struct vfsmount *mnt)
@@ -1255,11 +1259,12 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
 	struct path path;
 
 	if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
-		return NULL;
+		return ERR_PTR(-EINVAL);
 
 	res = q = clone_mnt(mnt, dentry, flag);
-	if (!q)
-		goto Enomem;
+	if (IS_ERR(q))
+		return q;
+
 	q->mnt_mountpoint = mnt->mnt_mountpoint;
 
 	p = mnt;
@@ -1280,8 +1285,8 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
 			path.mnt = q;
 			path.dentry = p->mnt_mountpoint;
 			q = clone_mnt(p, p->mnt_root, flag);
-			if (!q)
-				goto Enomem;
+			if (IS_ERR(q))
+				goto out;
 			br_write_lock(vfsmount_lock);
 			list_add_tail(&q->mnt_list, &res->mnt_list);
 			attach_mnt(q, &path);
@@ -1289,7 +1294,7 @@ struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
 		}
 	}
 	return res;
-Enomem:
+out:
 	if (res) {
 		LIST_HEAD(umount_list);
 		br_write_lock(vfsmount_lock);
@@ -1297,9 +1302,11 @@ Enomem:
 		br_write_unlock(vfsmount_lock);
 		release_mounts(&umount_list);
 	}
-	return NULL;
+	return q;
 }
 
+/* Caller should check returned pointer for errors */
+
 struct vfsmount *collect_mounts(struct path *path)
 {
 	struct vfsmount *tree;
@@ -1574,14 +1581,15 @@ static int do_loopback(struct path *path, char *old_name,
 	if (!check_mnt(path->mnt) || !check_mnt(old_path.mnt))
 		goto out;
 
-	err = -ENOMEM;
 	if (recurse)
 		mnt = copy_tree(old_path.mnt, old_path.dentry, 0);
 	else
 		mnt = clone_mnt(old_path.mnt, old_path.dentry, 0);
 
-	if (!mnt)
+	if (IS_ERR(mnt)) {
+		err = PTR_ERR(mnt);
 		goto out;
+	}
 
 	err = graft_tree(mnt, path);
 	if (err) {
@@ -2119,10 +2127,11 @@ static struct mnt_namespace *dup_mnt_ns(struct mnt_namespace *mnt_ns,
 	/* First pass: copy the tree topology */
 	new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root,
 					CL_COPY_ALL | CL_EXPIRE);
-	if (!new_ns->root) {
+	if (IS_ERR(new_ns->root)) {
+		int err = PTR_ERR(new_ns->root);
 		up_write(&namespace_sem);
 		kfree(new_ns);
-		return ERR_PTR(-ENOMEM);
+		return ERR_PTR(err);
 	}
 	br_write_lock(vfsmount_lock);
 	list_add_tail(&new_ns->list, &new_ns->root->mnt_list);
diff --git a/fs/pnode.c b/fs/pnode.c
index 8066b8d..0710bb9 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -253,8 +253,9 @@ int propagate_mnt(struct vfsmount *dest_mnt, struct dentry *dest_dentry,
 
 		source =  get_source(m, prev_dest_mnt, prev_src_mnt, &type);
 
-		if (!(child = copy_tree(source, source->mnt_root, type))) {
-			ret = -ENOMEM;
+		child = copy_tree(source, source->mnt_root, type);
+		if (IS_ERR(child)) {
+			ret = PTR_ERR(child);
 			list_splice(tree_list, tmp_list.prev);
 			goto out;
 		}
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 7f18d3a..d32ca7a 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -596,7 +596,7 @@ void audit_trim_trees(void)
 
 		root_mnt = collect_mounts(&path);
 		path_put(&path);
-		if (!root_mnt)
+		if (IS_ERR(root_mnt))
 			goto skip_it;
 
 		spin_lock(&hash_lock);
@@ -670,8 +670,8 @@ int audit_add_tree_rule(struct audit_krule *rule)
 		goto Err;
 	mnt = collect_mounts(&path);
 	path_put(&path);
-	if (!mnt) {
-		err = -ENOMEM;
+	if (IS_ERR(mnt)) {
+		err = PTR_ERR(mnt);
 		goto Err;
 	}
 
@@ -720,8 +720,8 @@ int audit_tag_tree(char *old, char *new)
 		return err;
 	tagged = collect_mounts(&path2);
 	path_put(&path2);
-	if (!tagged)
-		return -ENOMEM;
+	if (IS_ERR(tagged))
+		return PTR_ERR(tagged);
 
 	err = kern_path(old, 0, &path1);
 	if (err) {
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 20/74] VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (18 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 19/74] VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 21/74] VFS: Add CL_NO_SLAVE " Valerie Aurora
                   ` (26 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Ram Pai, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Passing the CL_NO_SHARED flag to clone_mnt() causes the clone to fail
if the source mnt is shared.

Cc: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c |    3 +++
 fs/pnode.h     |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 79be922..df604ea 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -596,6 +596,9 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 	struct vfsmount *mnt;
 	int err;
 
+	if ((flag & CL_NO_SHARED) && (IS_MNT_SHARED(old)))
+		return ERR_PTR(-EINVAL);
+
 	mnt = alloc_vfsmnt(old->mnt_devname);
 	if (!mnt)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/pnode.h b/fs/pnode.h
index 1ea4ae1..bcb3c47 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -22,6 +22,7 @@
 #define CL_COPY_ALL 		0x04
 #define CL_MAKE_SHARED 		0x08
 #define CL_PRIVATE 		0x10
+#define CL_NO_SHARED 		0x20
 
 static inline void set_mnt_shared(struct vfsmount *mnt)
 {
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 21/74] VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (19 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 20/74] VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree() Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 22/74] VFS: Add CL_MAKE_HARD_READONLY " Valerie Aurora
                   ` (25 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Ram Pai, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Passing the CL_NO_SLAVE flag to clone_mnt() causes the clone
to fail if the source mnt is a slave.

Cc: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c |    3 +++
 fs/pnode.h     |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index df604ea..454d7ad 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -599,6 +599,9 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 	if ((flag & CL_NO_SHARED) && (IS_MNT_SHARED(old)))
 		return ERR_PTR(-EINVAL);
 
+	if ((flag & CL_NO_SLAVE) && (IS_MNT_SLAVE(old)))
+		return ERR_PTR(-EINVAL);
+
 	mnt = alloc_vfsmnt(old->mnt_devname);
 	if (!mnt)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/pnode.h b/fs/pnode.h
index bcb3c47..8920e47 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -23,6 +23,7 @@
 #define CL_MAKE_SHARED 		0x08
 #define CL_PRIVATE 		0x10
 #define CL_NO_SHARED 		0x20
+#define CL_NO_SLAVE 		0x40
 
 static inline void set_mnt_shared(struct vfsmount *mnt)
 {
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 22/74] VFS: Add CL_MAKE_HARD_READONLY flag to clone_mnt()/copy_tree()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (20 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 21/74] VFS: Add CL_NO_SLAVE " Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:58 ` [PATCH 23/74] union-mount: Union mounts documentation Valerie Aurora
                   ` (24 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Ram Pai, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Passing the CL_MAKE_HARD_READONLY flag to clone_mnt() causes the clone
to fail if the source superblock is not read-only.  If it is
read-only, it increments the hard read-only users and sets the
MNT_HARD_READONLY flag in the vfsmount.  When the mount is freed via
free_vfsmnt(), automatically decrement the hard read-only users count.

Cc: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c        |   18 ++++++++++++++++++
 fs/pnode.h            |    1 +
 include/linux/mount.h |    1 +
 3 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 454d7ad..0f028e0 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -417,6 +417,12 @@ EXPORT_SYMBOL(simple_set_mnt);
 void free_vfsmnt(struct vfsmount *mnt)
 {
 	kfree(mnt->mnt_devname);
+	if (mnt->mnt_flags & MNT_HARD_READONLY) {
+		BUG_ON(mnt->mnt_sb->s_hard_readonly_users <= 0);
+		down_write(&mnt->mnt_sb->s_umount);
+		mnt->mnt_sb->s_hard_readonly_users--;
+		up_write(&mnt->mnt_sb->s_umount);
+	}
 	mnt_free_id(mnt);
 #ifdef CONFIG_SMP
 	free_percpu(mnt->mnt_writers);
@@ -602,6 +608,16 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 	if ((flag & CL_NO_SLAVE) && (IS_MNT_SLAVE(old)))
 		return ERR_PTR(-EINVAL);
 
+	if (flag & CL_MAKE_HARD_READONLY) {
+		down_write(&sb->s_umount);
+		if (!(sb->s_flags & MS_RDONLY)) {
+			up_write(&sb->s_umount);
+			return ERR_PTR(-EBUSY);
+		}
+		sb->s_hard_readonly_users++;
+		up_write(&sb->s_umount);
+	}
+
 	mnt = alloc_vfsmnt(old->mnt_devname);
 	if (!mnt)
 		return ERR_PTR(-ENOMEM);
@@ -637,6 +653,8 @@ static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 	}
 	if (flag & CL_MAKE_SHARED)
 		set_mnt_shared(mnt);
+	if (flag & CL_MAKE_HARD_READONLY)
+		mnt->mnt_flags |= MNT_HARD_READONLY;
 
 	/* stick the duplicate mount on the same expiry list
 	 * as the original if that was on one */
diff --git a/fs/pnode.h b/fs/pnode.h
index 8920e47..dc7b468 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -24,6 +24,7 @@
 #define CL_PRIVATE 		0x10
 #define CL_NO_SHARED 		0x20
 #define CL_NO_SLAVE 		0x40
+#define CL_MAKE_HARD_READONLY	0x80
 
 static inline void set_mnt_shared(struct vfsmount *mnt)
 {
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 5e7a594..44aa119 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -45,6 +45,7 @@ struct mnt_namespace;
 
 
 #define MNT_INTERNAL	0x4000
+#define MNT_HARD_READONLY	0x8000	/* has a hard read-only ref on the sb */
 
 struct vfsmount {
 	struct list_head mnt_hash;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 23/74] union-mount: Union mounts documentation
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (21 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 22/74] VFS: Add CL_MAKE_HARD_READONLY " Valerie Aurora
@ 2011-03-23  1:58 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 24/74] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
                   ` (23 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:58 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Document design and implementation of union mounts (a.k.a. writable
overlays).

With corrections from Andreas Gruenbacher <agruen@suse.de>.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 Documentation/filesystems/union-mounts.txt |  751 ++++++++++++++++++++++++++++
 1 files changed, 751 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/union-mounts.txt b/Documentation/filesystems/union-mounts.txt
new file mode 100644
index 0000000..5632b7f
--- /dev/null
+++ b/Documentation/filesystems/union-mounts.txt
@@ -0,0 +1,751 @@
+Union mounts (a.k.a. writable overlays)
+=======================================
+
+This document describes the architecture and current status of union
+mounts, also known as writable overlays.
+
+In this document:
+ - Overview of union mounts
+ - Terminology
+ - VFS implementation
+ - Locking strategy
+ - VFS/file system interface
+ - Userland interface
+ - NFS interaction
+ - Status
+ - Contributing to union mounts
+
+Overview
+========
+
+A union mount layers one read-write file system over one or more
+read-only file systems, with all writes going to the writable file
+system.  The namespace of both file systems appears as a combined
+whole to userland, with files and directories on the writable file
+system covering up any files or directories with matching pathnames on
+the read-only file system.  The read-write file system is the
+"topmost" or "upper" file system and the read-only file systems are
+the "lower" file systems.  A few use cases:
+
+- Root file system on CD with writes saved to hard drive (LiveCD)
+- Multiple virtual machines with the same starting root file system
+- Cluster with NFS mounted root on clients
+
+Most if not all of these problems could be solved with a COW block
+device or a clustered file system (include NFS mounts).  However, for
+some use cases, sharing is more efficient and better performing if
+done at the file system namespace level.  COW block devices only
+increase their divergence as time goes on, and a fully coherent
+writable file system is unnecessary synchronization overhead if no
+other client needs to see the writes.
+
+What union mounts are not
+-------------------------
+
+Union mounts are not a general-purpose unioning file system.  They do
+not provide a generic "union of namespaces" operation for an arbitrary
+number of file systems.  Many interesting features can be implemented
+with a generic unioning facility: dynamic insertion and removal of
+branches, write policies based on space available, online upgrade,
+etc.  Some unioning file systems that do this are UnionFS and AUFS.
+
+Terminology
+===========
+
+The main physical metaphor for union mounts is that a writable file
+system is mounted "on top" of a read-only file system.  Lookups start
+at the "topmost" read-write file system and travel "down" to the
+"bottom" read-only file system only if no blocking entry exists on the
+top layer.
+
+Topmost layer: The read-write file system.  Lookups begin here.
+
+Bottom layer: The read-only file system.  Lookups end here.
+
+Path: Combination of the vfsmount and dentry structure.
+
+Follow down: Given a path from the top layer, find the corresponding
+path on the bottom layer.
+
+Follow up: Given a path from the bottom layer, find the corresponding
+path on the top layer.
+
+Whiteout: A directory entry in the top layer that prevents lookups
+from travelling down to the bottom layer.  Created on unlink()/rmdir()
+if a corresponding directory entry exists in the bottom layer.
+
+Opaque flag: A flag on a directory in the top layer that prevents
+lookups of entries in this directory from travelling down to the
+bottom layer (unless there is an explicit fallthru entry allowing that
+for a particular entry).  Set on creation of any new directory in in
+the topmost layer (that is, a directory that does not have any
+matching visible directory below it).
+
+Fallthru: A directory entry which allows lookups to "fall through" to
+the bottom layer for that exact directory entry.  This serves as a
+placeholder for directory entries from the bottom layer during
+readdir().  Fallthrus override opaque flags.
+
+File copyup: Create a file on the top layer that has the same metadata
+and contents as the file with the same pathname on the bottom layer.
+
+Directory copyup: Copy up the visible directory entries from the
+bottom layer as fallthrus in the matching top layer directory.  Mark
+the directory opaque to avoid unnecessary negative lookups on the
+bottom layer.
+
+Examples
+========
+
+What happens when I...
+
+- creat() /newfile -> creates on topmost layer
+- unlink() /oldfile -> creates a whiteout on topmost layer
+- Edit /existingfile -> copies up to top layer at open(O_WR) time
+- truncate /existingfile -> copies up to topmost layer + N bytes if specified
+- touch()/chmod()/chown()/etc. -> copies up to topmost layer
+- mkdir() /newdir -> creates opaque dir on topmost layer
+- rmdir() /olddir -> creates a whiteout on topmost layer
+- mkdir() /olddir after above -> creates opaque dir on topmost layer
+- readdir() /shareddir -> copies up entries from bottom layer as
+    fallthrus, processes duplicates and whiteouts
+- link() /oldfile /newlink -> copies up /oldfile, creates /newlink on
+    topmost layer
+- symlink() /oldfile /symlink -> nothing special
+- rename() /oldfile /newfile -> copies up /oldfile to /newfile on top layer,
+    whiteouts /oldfile
+- rename() /olddir /newdir -> EXDEV
+- rename() /topmost_only_dir /topmost_only_dir2 -> success
+- stat() /oldfile - inode & dev from lower layer
+- stat() /newfile - inode & dev from topmost layer
+- readdir() /shareddir - d_ino & d_type from lower layer on fallthrus
+
+Getting to a root file system with union mounts:
+
+- Mount the base read-only file system as the root file system
+- Mount the read-only file system again on /newroot
+- Mount the read-write layer on /newroot:
+   # mount -o union /dev/sda /newroot
+- pivot_root to /newroot
+- Start init
+
+See scripts/pivot.sh in the UML devkit linked to from:
+
+http://valerieaurora.org/union/
+
+VFS implementation
+==================
+
+Union mounts are implemented as an integral part of the VFS, rather
+than as a VFS client file system (i.e., a stacked file system like
+unionfs or ecryptfs).  Implementing unioning inside the VFS eliminates
+the need for duplicate copies of VFS data structures, unnecessary
+indirection, and code duplication, but requires very maintainable, low
+overhead code.  Union mounts require no change to file systems serving
+as the read-only layer, and requires some minor support from file
+systems serving as the read-write layer.  File systems that want to be
+the writable layer must implement the new ->whiteout() and
+->fallthru() inode operations, which create special dummy directory
+entries.
+
+The union mounts code must accomplish the following major tasks:
+
+1) Pass lookups through to the lower level file system.
+2) Copy files and directories up to the topmost layer when written.
+3) Create whiteouts and fallthrus as necessary.
+
+VFS objects and union mounts
+----------------------------
+
+First, some VFS basics:
+
+The VFS allows multiple mounts of the same file system.  For example,
+/dev/sda can be mounted at /usr and also at /mnt.  The same file
+system can be mounted read-only at one point and read-write at
+another.  Each of these mounts has its own vfsmount data structure in
+the kernel.  However, each underlying file system has exactly one
+in-kernel superblock structure no matter how many times it is mounted.
+All the separate vfsmounts for the same file system reference the same
+superblock data structure.
+
+Directory entries are cached by the VFS in dentry structures.  The VFS
+keeps one dentry structure for each file or directory in a file
+system, no matter how many times it is mounted.  Each dentry
+represents only one element of a path name.  When the VFS looks up a
+pathname (e.g., "/sbin/init"), the result is a combination of vfsmount
+and dentry.  This <mnt,dentry> pair is usually stored in a kernel
+structure named "path", which is simply two pointers, one to the
+vfsmount and one to the dentry.  A "struct path" is this structure; a
+pathname is a string like "/etc/fstab".
+
+In union mounts, a file system can only be the topmost layer for one
+union mount.  A file system can be part of multiple union mounts if it
+is a read-only layer.  So dentries in the read-only layers can be part
+of multiple unions, while a dentry in the read-write layer can only be
+part of one unin.
+
+union_dir structure
+---------------------
+
+The first job of union mounts is to map directories from the topmost
+layer to directories with the same pathname in the lower layer.  That
+is, given the <mnt,dentry> pair for a directory pathname in the
+topmost layer, we need to find all the <mnt,dentry> pairs for the
+directory with the same pathname in the lower layer.  We do this with
+the union_dir structure, which is an array containing struct paths
+(mnt, dentry pointer pairs) for each directory unioned with the
+topmost union.  The array is pointed to from the new d_union_stack
+member of struct dentry.
+
+/*
+ * The union_stack structure.  It is an array of struct paths of
+ * directories below the topmost directory in a unioned directory, The
+ * topmost dentry has a pointer to this structure.  The topmost dentry
+ * can only be part of one union, so we can reference it from the
+ * dentry, but lower dentries can be part of multiple union stacks.
+ *
+ * The number of dirs actually allocated is kept in the superblock,
+ * s_union_count.
+ */
+struct union_stack {
+	struct path u_dirs[0];
+};
+
+This structure is flexible enough to support an arbitrary number of
+layers of unioned file systems.  Since there can be more than two
+layers, this section will talk about mapping "upper" directories to
+"lower" directories, instead of "topmost" directories to "bottom"
+directories.
+
+Traversing the union stack
+--------------------------
+
+The set of union_dir structures referring to a particular pathname are
+called collectively the union stack for that directory.  To traverse
+the union stack, iterate through the number of layers in the union
+(stored in sb->s_union_count) with union_find_dir().  Example: freeing
+the union stack:
+
+void d_free_unions(struct dentry *topmost)
+{
+	struct path *path;
+	unsigned int i, layers = topmost->d_sb->s_union_count;
+
+	if (!IS_DIR_UNIONED(topmost))
+		return;
+
+	for (i = 0; i < layers; i++) {
+		path = union_find_dir(topmost, i);
+		if (path->mnt)
+			path_put(path);
+	}
+	kfree(topmost->d_union_stack);
+	topmost->d_union_stack = NULL;
+}
+
+Code paths
+----------
+
+Union mounts modify the following key code paths in the VFS:
+
+- mount()/umount()
+- Pathname lookup
+- Any path that modifies an existing file
+
+Mount
+-----
+
+Union mounts are created in two steps:
+
+1. Mount the read-only layer file systems read-only in the usual
+manner, all on the same mountpoint.  Submounts are permitted as long
+as they are also read-only and not shared (part of a mount propagation
+group).
+
+2. Mount the top layer with the "-o union" option at the same
+mountpoint.  All read-only file systems mounted at this mountpoint
+will be included in the union mount.
+
+The bottom layers must be read-only and the top layer must be
+read-write and support whiteouts and fallthrus.  A file system that
+supports whiteouts and fallthrus indicates this by setting the
+MS_WHITEOUT and MS_FALLTHRU flags in the superblock.  Currently, the
+top layer is forced to "noatime" to avoid a copyup on every access of
+a file.  Supporting atime with the current infrastructure would
+require a copyup on every open().  The "relatime" option would be
+equally efficient if the atime is the same or more recent than the
+mtime/ctime for every object on the read-only file system, and if the
+24-hour timeout on relatime was disabled.  However, this is probably
+not worthwhile for the majority of union mount use cases.
+
+File systems can only be union mounted at their root directories, for
+simplicity and performance.
+
+pivot_root() to a union mounted file system is supported.  The
+recommended way to get to a union mounted root file system is to boot
+with the read-only mount as the root file system, construct the union
+mount on an entirely new mount, and pivot_root() to the new union
+mount root.  Attempting to union mount the root file system later in
+boot will result in covering other file systems, e.g., /proc, which
+isn't permitted in the current code and is a bad idea anyway.
+
+Hard read-only file systems
+---------------------------
+
+Union mounts require the lower layer of the file system to be
+read-only.  However, in Linux, any individual file system may be
+mounted at multiple places in the namespace, and a file system can be
+changed from read-only to read-write while still mounted.  Thus, simply
+checking that the bottom layer is read-only at the time the writable
+overlay is mounted over it is pointless, since at any time the bottom
+layer may become read-write.
+
+We have to guarantee that a file system will be read-only for as long
+as it is the bottom layer of a union mount.  To do this, we track the
+number of hard read-only users of a file system in its VFS superblock
+structure.  When we union mount a writable overlay over a file system,
+we increment its read-only user count.  The file system can only be
+mounted read-write if its read-only users count is zero.
+
+Todo:
+
+- Support hard read-only NFS mounts.  See discussion here:
+
+  http://markmail.org/message/3mkgnvo4pswxd7lp
+
+Pathname lookup
+---------------
+
+Pathname lookup in a unioned directory traverses down the union stack
+for the parent directory, looking up each pathname element in each
+layer of the file system (according to the rules of whiteouts,
+fallthrus, and opaque flags).  At mount time, the union stack for the
+root directory of the file system is created, and the union stack
+creation for every other unioned directory in the file system is
+boot-strapped using the already-existing union stack of the
+directory's parent.  In order to simplify the code greatly, every
+visible directory on the lower file system is required to have a
+matching directory on the upper file system.  If this matching directory
+does not already exist, it is created during pathname lookup.
+Therefore, each unioned directory is the child of another unioned
+directory (or is the root directory of the file system).
+
+The actual union lookup function is called in the following code
+paths:
+
+do_lookup()->do_union_lookup()->lookup_union()->__lookup_union()
+lookup_hash()->lookup_union()->__lookup_union()
+
+__lookup_union() is where the rules of whiteouts, fallthrus, and
+opaque flags are actually implemented.  __lookup_union() returns
+either the first visible dentry, or a negative dentry from the topmost
+file system if no matching dentry exists.  If it finds a directory, it
+looks up any potential matching lower layer directories.  If it finds
+a lower layer directory, it first creates the topmost dir if necessary
+via union_create_topmost_dir(), and then calls union_add_dir() to
+append the lower directory to the end of the union stack.
+
+Note that not all directories in a union mount are unioned, only those
+with matching directories on the lower layer.  The macro
+IS_DIR_UNIONED() is a cheap, constant time way to check if a directory
+is unioned, while IS_MNT_UNION() checks if the entire mount is unioned
+(and therefore whether the directory in question is potentially
+unioned).
+
+Currently, lookup of a negative dentry or a directory with no matching
+directories below it requires a lookup in every directory in the union
+stack every time it is looked up.  We could avoid subsequent lookups
+by adding the equivalent of a negative dcache entry.
+
+File copyup
+-----------
+
+Any system call that alters the data or metadata of a file on the
+bottom layer, or creates or changes a hard link to it will trigger a
+copyup of the target file from the lower layer to the topmost layer
+
+ - open(O_WRITE | O_RDWR | O_APPEND)
+ - truncate()/open(O_TRUNC)
+ - link()
+ - rename()
+ - chmod()
+ - chown()/lchown()
+ - utimes()
+ - setxattr()/lsetxattr()
+
+Copyup of a file due to open(O_WRITE) has already occurred when:
+
+ - write()
+ - ftruncate()
+ - writable mmap()
+
+The following system calls will fail on an fd opened O_RDONLY:
+
+ - fchmod()
+ - fchown()
+ - fsetxattr()
+ - futimensat()
+
+Contrary to common sense, the above system calls are defined to
+succeed on O_RDONLY fds.  The idea seems to be that the
+O_RDONLY/O_RDWR/O_WRITE flags only apply to the actual file data, not
+to any form of metadata (times, owner, mode, or even extended
+attributes).  Applications making these system calls on O_RDONLY fds
+are correct according to the standard and work on non-union mounts.
+They will need to be rewritten (O_RDONLY -> O_RDWR) to work on union
+mounts.  We suspect this usage is uncommon.
+
+This deviation from standard is due to technical limitations of the
+union mount implementation.  Specifically, we would need to replace an
+open file descriptor from the lower layer with an open file descriptor
+for a file with matching pathname and contents on the upper layer,
+which is difficult to do.  We avoid this in other system calls by
+doing the copyup before the file is opened.  Unionfs doesn't encounter
+this problem because it creates a dummy file struct which redirects or
+fans out operations to the struct files for the underlying file
+systems.
+
+From an application's point of view, the result of an in-kernel file
+copyup is the logical equivalent of another application updating the
+file via the rename() pattern: creat() a new file, copy the data over,
+make changes the copy, and rename() over the old version.  Any
+existing open file descriptors for that file (including those in the
+same application) refer to a now invisible object that used to have
+the same pathname.  Only opens that occur after the copyup will see
+updates to the file.
+
+Permission checks
+-----------------
+
+We want to be sure we have the correct permissions to actually succeed
+in a system call before copying a file up to avoid unnecessary IO.  At
+present, the permission check for a single system call may be spread
+out over many hundreds of lines of code (e.g., open()).  In order to
+check permissions, we occasionally need to determine if there is a
+writable overlay on top of this inode.  This requires a full path, but
+often we only have the inode at this point.  In particular,
+inode_permission() returns EROFS if the inode is on a read-only file
+system, which is the wrong answer if there is a writable overlay
+mounted on top of it.
+
+The current solution is to split out the file-system-wide permission
+checks from the per-inode permission checks.  inode_permission()
+becomes:
+
+sb_permission()
+__inode_permission()
+
+inode_permission() calls sb_permission() and __inode_permission() on
+the same path.  We create path_permission() which calls
+sb_permission() on the parent directory from the top layer, and
+__inode_permission() on the target on the lower layer.  This gets us
+the correct write permissions consdering that the file will be copied
+up.
+
+Todo:
+
+  - Currently, we don't deal with differing directory permissions at
+    different levels of the stack.  This is a bug.
+
+Impact on non-union kernels and mounts
+--------------------------------------
+
+Union-related data structures, extra fields, and function calls are
+#ifdef'd out at the function/macro level with CONFIG_UNION_MOUNT in
+nearly all cases (see fs/union.h).  When CONFIG_UNION_MOUNT is
+enabled, struct dentry has one more pointer, reducing the size of
+dentry names stored in the dentry itself by 4 to 8 bytes.
+
+Todo:
+
+ - Do performance tests
+
+Locking strategy
+================
+
+The current union mount locking strategy is based on the following
+rules:
+
+* The lower layer file system is always read-only
+* The topmost file system is always read-write
+  => A file system can never a topmost and lower layer at the same time
+
+Additionally, the topmost layer may only be mounted exactly once.
+Don't think of the topmost layer as a separate independent file
+system; when it is part of a union mount, it is only a file system in
+conjunction with the read-only bottom layer.  The read-only bottom
+layer is an independent file system in and of itself and can be
+mounted elsewhere, including as the bottom layer for another union
+mount.
+
+Thus, we may define a stable locking order in terms of top layer and
+bottom layer locks, since a top layer is never a bottom layer and a
+bottom layer is never a top layer.  Another simplifying assumption is
+that all directories in a pathname exist on the top layer, as they are
+created step-by-step during lookup.  This prevents us from ever having
+to walk backwards up the path creating directory entries, which can
+get complicated.  By implication, parent directories paths during any
+operation (rename(), unlink(),etc.) are from the top layer.  Dentries
+for directories from the bottom layer are only ever seen or used by
+the lookup code.
+
+The two major problems we avoid with the above rules are:
+
+Lock ordering: Imagine two union stacks with the same two file
+systems: A mounted over B, and B mounted over A.  Sometimes locks on
+objects in both A and B will have to be held simultanously.  What
+order should they be acquired in?  Simply acquiring them from top to
+bottom will create a lock-ordering problem - one thread acquires lock
+on object from A and then tries for a lock on object from B, while
+another thread grabs the lock on object from B and then waits for the
+lock on object from A.  Some other lock ordering must be defined.
+
+Movement/change/disappearance of objects on multiple layers: A variety
+of nasty corner cases arise when more than one layer is changing at
+the same time.  Changes in the directory topology and their effect on
+inheritance are of special concern.  Al Viro's canonical email on the
+subject:
+
+http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0839.html
+
+We don't try to solve any of these cases, just avoid them in the first
+place.
+
+Todo: Prevent top layer from being mounted more than once.
+
+Cross-layer interactions
+------------------------
+
+The VFS code simultaneously holds references to and/or modifies
+objects from both the top and bottom layers in the following cases:
+
+Path lookup:
+
+Grabs i_mutex on bottom layer while holding i_mutex on top layer
+directory inode.
+
+File copyup:
+
+Holds i_mutex on the parent directory from the top layer while copying
+up file from lower layer.
+
+link():
+
+File copyup of target while holding i_mutex on parent directory on top
+layer.  Followed by a normal link() operation.
+
+rename():
+
+Holds s_vfs_rename_mutex on the top layer, i_mutex of the source's
+parent dir (top layer), and i_mutex of the target's parent dir (also
+top layer) while looking up and copying the bottom layer target and
+also creating the whiteout.
+
+Notes on rename():
+
+First, renaming of directories returns EXDEV.  It's not at all
+reasonable to recursively copy directory trees and userspace has to
+handle this case anyway.  An exception is rename() of directories that
+exist only on the topmost layer; this succeeds.
+
+Rename involves three steps on a union mount: (1) copyup of the file
+from the bottom layer, (2) rename of the new top-layer copy to the
+target in the usual manner, (3) creation of a whiteout covering the
+source of the rename.
+
+Directory copyup:
+
+Directory entries are copied up on the first readdir().  We hold the
+top layer directory i_mutex throughout and sequentially acquire and
+drop the i_mutex for each lower layer directory.
+
+VFS-fs interface
+================
+
+Read-only layer: No support necessary other than enforcement of really
+really read-only semantics (done by VFS for local file systems).
+
+Writable layer: Must implement two new inode operations:
+
+int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
+int (*fallthru) (struct inode *, struct dentry *);
+
+And set the MS_WHITEOUT and MS_FALLTHRU flags to indicate support of
+these operations.
+
+Todo:
+
+- Implement whiteouts and fallthrus in ext3
+- Implement whiteouts and fallthrus in btrfs
+
+Supported file systems
+----------------------
+
+Any file system can be a read-only layer.  File systems must
+explicitly support whiteouts and fallthrus in order to be a read-write
+layer.  This patch set implements whiteouts for ext2, tmpfs, and
+jffs2.  We have tested ext2, tmpfs, and iso9660 as the read-only
+layer.
+
+Todo:
+ - Test corner cases of case-insensitive/oversensitive file systems
+
+NFS interaction
+===============
+
+NFS is currently not supported as either type of layer.  NFS as
+read-only layer requires support from the server to honor the
+read-only guarantee needed for the bottom layer.  To do this, the
+server needs to revoke access to clients requesting read-only file
+systems if the exported file system is remounted read-write or
+unmounted (during which arbitrary changes can occur).  Some recent
+discussion:
+
+http://markmail.org/message/3mkgnvo4pswxd7lp
+
+NFS as the read-write layer would require implementation of the
+->whiteout() and ->fallthru() methods.  DT_WHT directory entries are
+theoretically already supported.
+
+Also, technically the requirement for a readdir() cookie that is
+stable across reboots comes only from file systems exported via NFSv2:
+
+http://oss.oracle.com/pipermail/btrfs-devel/2008-January/000463.html
+
+Todo:
+
+- Guarantee really really read-only on NFS exports
+- Implement whiteout()/fallthru() for NFS
+
+Userland support
+================
+
+The mount command must support the "-o union" mount option and pass
+the corresponding MS_UNION flag to the kerel.  A util-linux git
+tree with union mount support is here:
+
+git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git
+
+File system utilities must support whiteouts and fallthrus.  An
+e2fsprogs git tree with union mount support is here:
+
+git://git.kernel.org/pub/scm/fs/ext2/val/e2fsprogs.git
+
+Currently, whiteout directory entries are not returned to userland.
+While the directory type for whiteouts, DT_WHT, has been defined for
+many years, very little userland code handles them.  Userland will
+never see fallthru directory entries.
+
+Known non-POSIX behaviors
+-------------------------
+
+- Any writing system call (unlink()/chmod()/etc.) can return ENOSPC or EIO
+
+  Most programs are not tested and don't work well under conditions of
+  ENOSPC.  The solution is to add more disk space.
+
+- Link count may be wrong for files on bottom layer with > 1 link count
+
+  A file may have more than one hard link to it.  When a file with
+  multiple hard links is copied up, any other hard links pointing to
+  the same inode will remain unchanged.  If the file is looked up via
+  one of the hard links on the read-only layer, it will have the
+  original link count (which is off by one at this point).  An
+  example:
+
+  /bin/link1 -> inode 100
+  /etc/link2 -> inode 100
+
+  inode 100 will have link count 2.
+
+  # echo "blah" > /bin/link1
+
+  Now /bin/link1 will be copied up to the topmost layer.  But
+  /etc/link2 will still point to the original inode 100, and its link
+  count will still be 2.
+
+- Link count on directories will be wrong before readdir() (fixable)
+- File copyup is the logical equivalent of an update via copy +
+  rename().  Any existing open file descriptors will continue to refer
+  to the read-only copy on the bottom layer and will not see any
+  changes that occur after the copy-up.
+- rename() of directory may fail with EXDEV
+- fchmod()/fchown()/futimensat()/fsetattr() fail on O_RDONLY fds
+
+Status
+======
+
+The current union mounts implementation is feature-complete on local
+file systems and passes an extensive union mounts test suite,
+available in the union mounts Usermode Linux-based development kit:
+
+http://valerieaurora.org/union/union_mount_devkit.tar.gz
+
+The whiteout code has had some non-trivial level of review and
+testing, but much of the code has had no external review or testing
+outside the authors' machines.
+
+The latest version is available at:
+
+git://git.kernel.org/pub/scm/linux/kernel/git/val/linux-2.6.git
+
+Check the union mounts web page for the name of the latest branch:
+
+http://valerieaurora.org/union/
+
+Todo:
+
+- Run more tests (e.g., XFS test suite)
+- Get review from VFS maintainers
+
+Non-features
+------------
+
+Features we do not currently plan to support in union mounts:
+
+Online upgrade: E.g., installing software on a file system NFS
+exported to clients while the clients are still up and running.
+Allowing the read-only bottom layer of a union mount to change
+invalidates our locking strategy.
+
+Recursive copying of directories: E.g., implementing rename() across
+layers for directories.  Doing an in-kernel copy of a single file is
+bad enough.  Recursively copying a directory is a big no-no.
+
+Read-only top layer: The readdir() strategy fundamentally requires the
+ability to create persistent directory entries on the top layer file
+system (which may be tmpfs).  However, you can union two read-only
+file systems by union mounting a third file system (such as tmpfs)
+over the two read-onlly file systems.  Numerous alternatives to this
+readdir() strategy (including in-kernel or in-application caching)
+exist and are compatible with union mounts with its writing-readdir()
+implementation disabled.  Creating a readdir() cookie that is stable
+across multiple readdir()s requires one of:
+
+- Write to stable storage (e.g., fallthru dentries)
+- Non-evictable kernel memory cache (doesn't handle NFS server reboot)
+- Per-application caching by glibc readdir()
+
+Often these features are supported by other unioning file systems or
+by other versions of union mounts.
+
+Contributing to union mounts
+============================
+
+The union mounts web page is here:
+
+http://valerieaurora.org/union/
+
+It links to:
+
+ - All git repositories
+ - Documentation
+ - An entire self-contained UML-based dev kit with README, etc.
+
+The best mailing list for discussing union mounts is:
+
+linux-fsdevel@vger.kernel.org
+
+http://vger.kernel.org/vger-lists.html#linux-fsdevel
+
+Thank you for reading!
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 24/74] union-mount: Introduce MNT_UNION and MS_UNION flags
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (22 preceding siblings ...)
  2011-03-23  1:58 ` [PATCH 23/74] union-mount: Union mounts documentation Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 25/74] union-mount: Add CONFIG_UNION_MOUNT option Valerie Aurora
                   ` (22 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Jan Blunck, Valerie Aurora, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

Add per mountpoint flag for Union Mount support. You need additional patches
to util-linux for that to work - see:

git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c        |    5 ++++-
 include/linux/fs.h    |    1 +
 include/linux/mount.h |    1 +
 3 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0f028e0..054eb7d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -862,6 +862,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 		{ MNT_NOATIME, ",noatime" },
 		{ MNT_NODIRATIME, ",nodiratime" },
 		{ MNT_RELATIME, ",relatime" },
+		{ MNT_UNION, ",union" },
 		{ 0, NULL }
 	};
 	const struct proc_fs_info *fs_infop;
@@ -2095,10 +2096,12 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 		mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
 	if (flags & MS_RDONLY)
 		mnt_flags |= MNT_READONLY;
+	if (flags & MS_UNION)
+		mnt_flags |= MNT_UNION;
 
 	flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
 		   MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
-		   MS_STRICTATIME);
+		   MS_STRICTATIME | MS_UNION);
 
 	if (flags & MS_REMOUNT)
 		retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 469e0ea..dad9903 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -191,6 +191,7 @@ struct inodes_stat_t {
 #define MS_REMOUNT	32	/* Alter flags of a mounted FS */
 #define MS_MANDLOCK	64	/* Allow mandatory locks on an FS */
 #define MS_DIRSYNC	128	/* Directory modifications are synchronous */
+#define MS_UNION	256	/* Merge namespace with FS mounted below */
 #define MS_NOATIME	1024	/* Do not update access times. */
 #define MS_NODIRATIME	2048	/* Do not update directory access times */
 #define MS_BIND		4096
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 44aa119..1c69bee 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -46,6 +46,7 @@ struct mnt_namespace;
 
 #define MNT_INTERNAL	0x4000
 #define MNT_HARD_READONLY	0x8000	/* has a hard read-only ref on the sb */
+#define MNT_UNION	0x10000		/* top layer of a union mount */
 
 struct vfsmount {
 	struct list_head mnt_hash;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 25/74] union-mount: Add CONFIG_UNION_MOUNT option
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (23 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 24/74] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 26/74] union-mount: Create union_stack structure Valerie Aurora
                   ` (21 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Add CONFIG_UNION_MOUNT option.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/Kconfig |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 3d18530..0e4a3a6 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -59,6 +59,19 @@ source "fs/notify/Kconfig"
 
 source "fs/quota/Kconfig"
 
+config UNION_MOUNT
+       bool "Union mounts (writable overlays) (EXPERIMENTAL)"
+       depends on EXPERIMENTAL
+       help
+         Union mounts allow you to mount a transparent writable
+	 layer over a read-only file system, for example, an ext3
+	 partition on a hard drive over a CD-ROM root file system
+	 image.
+
+	 See <file:Documentation/filesystems/union-mounts.txt> for details.
+
+	 If unsure, say N.
+
 source "fs/autofs/Kconfig"
 source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 26/74] union-mount: Create union_stack structure
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (24 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 25/74] union-mount: Add CONFIG_UNION_MOUNT option Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 27/74] union-mount: Add two superblock fields for union mounts Valerie Aurora
                   ` (20 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

struct union_stack records the stack of directories unioned at this
directory.  A union_stack is an array of struct paths, dynamically
allocated when the dentry for the topmost directory is created.  The
topmost dentry contains a pointer to the union_stack.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/dcache.c            |    3 ++
 fs/union.h             |   54 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |   22 +++++++++++++++++-
 3 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ff3f949..566acf7 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -961,6 +961,9 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 	INIT_LIST_HEAD(&dentry->d_lru);
 	INIT_LIST_HEAD(&dentry->d_subdirs);
 	INIT_LIST_HEAD(&dentry->d_alias);
+#ifdef CONFIG_UNION_MOUNT
+	dentry->d_union_stack = NULL;
+#endif
 
 	if (parent) {
 		dentry->d_parent = dget(parent);
diff --git a/fs/union.h b/fs/union.h
new file mode 100644
index 0000000..38b26fd
--- /dev/null
+++ b/fs/union.h
@@ -0,0 +1,54 @@
+ /*
+ * VFS-based union mounts for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007-2009 Novell Inc.
+ * Copyright (C) 2009-2010 Red Hat, Inc.
+ *
+ *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *              Valerie Aurora <vaurora@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+#ifndef __LINUX_UNION_H
+#define __LINUX_UNION_H
+#ifdef __KERNEL__
+
+#ifdef CONFIG_UNION_MOUNT
+
+/*
+ * WARNING! Confusing terminology alert.
+ *
+ * Note that the directions "up" and "down" in union mounts are the
+ * opposite of "up" and "down" in normal VFS operation terminology.
+ * "up" in the rest of the VFS means "towards the root of the mount
+ * tree."  If you mount B on top of A, following B "up" will get you
+ * A.  In union mounts, "up" means "towards the most recently mounted
+ * layer of the union stack."  If you union mount B on top of A,
+ * following A "up" will get you to B.  Another way to put it is that
+ * "up" in the VFS means going from this mount towards the direction
+ * of its mnt->mnt_parent pointer, but "up" in union mounts means
+ * going in the opposite direction (until you run out of union
+ * layers).
+ */
+
+/*
+ * The union_stack structure.  It is an array of struct paths of
+ * directories below the topmost directory in a unioned directory, The
+ * topmost dentry has a pointer to this structure.  The topmost dentry
+ * can only be part of one union, so we can reference it from the
+ * dentry, but lower dentries can be part of multiple union stacks.
+ *
+ * The number of dirs actually allocated is kept in the superblock,
+ * s_union_count.
+ */
+struct union_stack {
+	struct path u_dirs[0];
+};
+
+#endif	/* CONFIG_UNION_MOUNT */
+#endif	/* __KERNEL__ */
+#endif	/* __LINUX_UNION_H */
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 19ddb8c..ddf3d2d 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -79,12 +79,28 @@ full_name_hash(const unsigned char *name, unsigned int len)
  * Try to keep struct dentry aligned on 64 byte cachelines (this will
  * give reasonable cacheline footprint with larger lines without the
  * large memory footprint increase).
+ *
+ * XXX DNAME_INLINE_LEN_MIN is kind of pitiful on 64bit + union
+ * mounts.  May be worth tuning up, but either we go to 256 bytes and
+ * a wasteful 88 bytes of d_iname, or we lose 64-byte aligment.
  */
 #ifdef CONFIG_64BIT
+
+#ifdef CONFIG_UNION_MOUNT
+#define DNAME_INLINE_LEN_MIN 24 /* 192 bytes */
+#else
 #define DNAME_INLINE_LEN_MIN 32 /* 192 bytes */
+#endif /* CONFIG_UNION_MOUNT */
+
+#else
+
+#ifdef CONFIG_UNION_MOUNT
+#define DNAME_INLINE_LEN_MIN 36 /* 128 bytes */
 #else
 #define DNAME_INLINE_LEN_MIN 40 /* 128 bytes */
-#endif
+#endif /* CONFIG_UNION_MOUNT */
+
+#endif /* CONFIG_64BIT */
 
 struct dentry {
 	atomic_t d_count;
@@ -100,7 +116,9 @@ struct dentry {
 	struct hlist_node d_hash;	/* lookup hash list */
 	struct dentry *d_parent;	/* parent directory */
 	struct qstr d_name;
-
+#ifdef CONFIG_UNION_MOUNT
+	struct union_stack *d_union_stack;	/* dirs in union stack */
+#endif
 	struct list_head d_lru;		/* LRU list */
 	/*
 	 * d_child and d_rcu can share memory
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 27/74] union-mount: Add two superblock fields for union mounts
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (25 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 26/74] union-mount: Create union_stack structure Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 28/74] union-mount: Add union_alloc() Valerie Aurora
                   ` (19 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Add two fields to struct super_block to support union mounts.
s_union_lower_mnts is a pointer to a cloned vfsmount tree of all the
lower (read-only) mounts unioned with the topmost (read-write)
vfsmount.  These mounts may have submounts which will also be unioned;
hence we copy the entire vfsmount tree, not just the root vfsmounts.
s_union_count is the number of lower mounts unioned at the root of the
file system.  This count is the maximum number of directories that
will ever be unioned with a single directory.  We use it to allocate a
union stack of the correct size for each directory.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 include/linux/fs.h |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index dad9903..258f99b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1396,6 +1396,18 @@ struct super_block {
 	 * Decremented by free_vfsmnt() if MNT_HARD_READONLY is set.
 	 */
 	int s_hard_readonly_users;
+
+	/*
+	 * Root of the private cloned vfsmount tree of the read-only
+	 * mounts in this union (set in topmost vfsmount only)
+	 */
+	struct vfsmount *s_union_lower_mnts;
+
+	/*
+	 * Number of layers in this union, not counting the topmost or
+	 * submounts.
+	 */
+	unsigned int s_union_count;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 28/74] union-mount: Add union_alloc()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (26 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 27/74] union-mount: Add two superblock fields for union mounts Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 29/74] union-mount: Add union_find_dir() Valerie Aurora
                   ` (18 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

union_alloc() allocates a union stack with enough entries for the
maximum possible number of directories that might be unioned at this
point.

The union_stack may be larger than strictly necessary if this
directory does not exist on all layers, but allocating exactly the
right number would require keeping the number of layers in the
union_stack structure.  We optimize for the case of unioning two file
systems and keep the count of layers in the superblock.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/Makefile |    1 +
 fs/union.c  |   42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/fs/Makefile b/fs/Makefile
index e6ec1d3..936acf0 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_NFS_COMMON)	+= nfs_common/
 obj-$(CONFIG_GENERIC_ACL)	+= generic_acl.o
 
 obj-y				+= quota/
+obj-$(CONFIG_UNION_MOUNT)	+= union.o
 
 obj-$(CONFIG_PROC_FS)		+= proc/
 obj-y				+= partitions/
diff --git a/fs/union.c b/fs/union.c
new file mode 100644
index 0000000..52a5c28
--- /dev/null
+++ b/fs/union.c
@@ -0,0 +1,42 @@
+ /*
+ * VFS-based union mounts for Linux
+ *
+ * Copyright (C) 2004-2007 IBM Corporation, IBM Deutschland Entwicklung GmbH.
+ * Copyright (C) 2007-2009 Novell Inc.
+ * Copyright (C) 2009-2010 Red Hat, Inc.
+ *
+ *   Author(s): Jan Blunck (j.blunck@tu-harburg.de)
+ *              Valerie Aurora <vaurora@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/fs_struct.h>
+#include <linux/slab.h>
+
+#include "union.h"
+
+/**
+ * union_alloc - allocate a union stack
+ *
+ * @path: path of topmost directory
+ *
+ * Allocate a union_stack large enough to contain the maximum number
+ * of layers in this union mount.
+ */
+
+static struct union_stack *union_alloc(struct path *topmost)
+{
+	unsigned int layers = topmost->dentry->d_sb->s_union_count;
+	BUG_ON(!S_ISDIR(topmost->dentry->d_inode->i_mode));
+
+	return kzalloc(sizeof(struct path) * layers, GFP_KERNEL);
+}
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 29/74] union-mount: Add union_find_dir()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (27 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 28/74] union-mount: Add union_alloc() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 30/74] union-mount: Create d_free_unions() Valerie Aurora
                   ` (17 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

union_find_dir() returns the path of the directory at the specified
layer in a unioned directory.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/union.h |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/union.h b/fs/union.h
index 38b26fd..e242451 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -49,6 +49,16 @@ struct union_stack {
 	struct path u_dirs[0];
 };
 
+static inline struct path *union_find_dir(struct dentry *dentry,
+					  unsigned int layer) {
+	BUG_ON(layer >= dentry->d_sb->s_union_count);
+	return &(dentry->d_union_stack->u_dirs[layer]);
+}
+
+#else /* CONFIG_UNION_MOUNT */
+
+#define union_find_dir(x, y)		({ BUG(); (NULL); })
+
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
 #endif	/* __LINUX_UNION_H */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 30/74] union-mount: Create d_free_unions()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (28 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 29/74] union-mount: Add union_find_dir() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 31/74] union-mount: Free union stack on removal of topmost dentry from dcache Valerie Aurora
                   ` (16 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

d_free_unions() frees the union stack associated with a directory.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/union.c |   25 +++++++++++++++++++++++++
 fs/union.h |    7 +++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/fs/union.c b/fs/union.c
index 52a5c28..a191bef 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -21,6 +21,7 @@
 #include <linux/mount.h>
 #include <linux/fs_struct.h>
 #include <linux/slab.h>
+#include <linux/namei.h>
 
 #include "union.h"
 
@@ -40,3 +41,27 @@ static struct union_stack *union_alloc(struct path *topmost)
 
 	return kzalloc(sizeof(struct path) * layers, GFP_KERNEL);
 }
+
+/**
+ * d_free_unions - free all unions for this dentry
+ *
+ * @dentry - topmost dentry in the union stack to remove
+ *
+ * This must be called when freeing a dentry.
+ */
+void d_free_unions(struct dentry *topmost)
+{
+	struct path *path;
+	unsigned int i, layers = topmost->d_sb->s_union_count;
+
+	if (!IS_DIR_UNIONED(topmost))
+		return;
+
+	for (i = 0; i < layers; i++) {
+		path = union_find_dir(topmost, i);
+		if (path->mnt)
+			path_put(path);
+	}
+	kfree(topmost->d_union_stack);
+	topmost->d_union_stack = NULL;
+}
diff --git a/fs/union.h b/fs/union.h
index e242451..353f78d 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -49,6 +49,10 @@ struct union_stack {
 	struct path u_dirs[0];
 };
 
+#define IS_DIR_UNIONED(dentry)	((dentry)->d_union_stack)
+
+extern void d_free_unions(struct dentry *);
+
 static inline struct path *union_find_dir(struct dentry *dentry,
 					  unsigned int layer) {
 	BUG_ON(layer >= dentry->d_sb->s_union_count);
@@ -57,6 +61,9 @@ static inline struct path *union_find_dir(struct dentry *dentry,
 
 #else /* CONFIG_UNION_MOUNT */
 
+#define IS_DIR_UNIONED(x)		(0)
+
+#define d_free_unions(x)		do { } while (0)
 #define union_find_dir(x, y)		({ BUG(); (NULL); })
 
 #endif	/* CONFIG_UNION_MOUNT */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 31/74] union-mount: Free union stack on removal of topmost dentry from dcache
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (29 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 30/74] union-mount: Create d_free_unions() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 32/74] union-mount: Create union_add_dir() Valerie Aurora
                   ` (15 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Jan Blunck, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

If a dentry is removed from dentry cache because its usage count drops
to zero, its union stack is freed too.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/dcache.c    |   11 +++++++++++
 fs/namespace.c |    2 ++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 566acf7..6b11519 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -34,6 +34,7 @@
 #include <linux/fs_struct.h>
 #include <linux/hardirq.h>
 #include "internal.h"
+#include "union.h"
 
 int sysctl_vfs_cache_pressure __read_mostly = 100;
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
@@ -175,6 +176,7 @@ static struct dentry *d_kill(struct dentry *dentry)
 	dentry_stat.nr_dentry--;	/* For d_free, below */
 	/*drops the locks, at that point nobody can reach this dentry */
 	dentry_iput(dentry);
+	d_free_unions(dentry);
 	if (IS_ROOT(dentry))
 		parent = NULL;
 	else
@@ -697,6 +699,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 					iput(inode);
 			}
 
+			d_free_unions(dentry);
 			d_free(dentry);
 
 			/* finished when we fall off the top of the tree,
@@ -1547,6 +1550,7 @@ void d_delete(struct dentry * dentry)
 	if (atomic_read(&dentry->d_count) == 1) {
 		dentry->d_flags &= ~DCACHE_CANT_MOUNT;
 		dentry_iput(dentry);
+		d_free_unions(dentry);
 		fsnotify_nameremove(dentry, isdir);
 		return;
 	}
@@ -1557,6 +1561,13 @@ void d_delete(struct dentry * dentry)
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
+	/*
+	 * Remove any associated unions.  While someone still has this
+	 * directory open (ref count > 0), we could not have deleted
+	 * it unless it was empty, and therefore has no references to
+	 * directories below it.  So we don't need the unions.
+	 */
+	d_free_unions(dentry);
 	fsnotify_nameremove(dentry, isdir);
 }
 EXPORT_SYMBOL(d_delete);
diff --git a/fs/namespace.c b/fs/namespace.c
index 054eb7d..5de05e4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -36,6 +36,7 @@
 #include <asm/unistd.h>
 #include "pnode.h"
 #include "internal.h"
+#include "union.h"
 
 #define HASH_SHIFT ilog2(PAGE_SIZE / sizeof(struct list_head))
 #define HASH_SIZE (1UL << HASH_SHIFT)
@@ -1108,6 +1109,7 @@ void umount_tree(struct vfsmount *mnt, int propagate, struct list_head *kill)
 		propagate_umount(kill);
 
 	list_for_each_entry(p, kill, mnt_hash) {
+		d_free_unions(p->mnt_root);
 		list_del_init(&p->mnt_expire);
 		list_del_init(&p->mnt_list);
 		__touch_mnt_namespace(p->mnt_ns);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 32/74] union-mount: Create union_add_dir()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (30 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 31/74] union-mount: Free union stack on removal of topmost dentry from dcache Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 33/74] union-mount: Add union_create_topmost_dir() Valerie Aurora
                   ` (14 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

union_add_dir() fills out the union stack for the topmost dentry with
the path of the directory in this layer of the union.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/union.c |   28 ++++++++++++++++++++++++++++
 fs/union.h |    2 ++
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/fs/union.c b/fs/union.c
index a191bef..45552f8 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -65,3 +65,31 @@ void d_free_unions(struct dentry *topmost)
 	kfree(topmost->d_union_stack);
 	topmost->d_union_stack = NULL;
 }
+
+/**
+ * union_add_dir - Add another layer to a unioned directory
+ *
+ * @topmost - topmost directory
+ * @lower - directory in the current layer
+ * @layer - index of layer to add this at
+ *
+ * @layer counts starting at 0 for the dir below the topmost dir.
+ * Must take a reference to @lower (call path_get()) before calling
+ * this function.
+ */
+
+int union_add_dir(struct path *topmost, struct path *lower,
+		  unsigned int layer)
+{
+	struct path *path;
+	struct dentry *dentry = topmost->dentry;
+	BUG_ON(layer >= dentry->d_sb->s_union_count);
+
+	if (!dentry->d_union_stack)
+		dentry->d_union_stack = union_alloc(topmost);
+	if (!dentry->d_union_stack)
+		return -ENOMEM;
+	path = union_find_dir(dentry, layer);
+	*path = *lower;
+	return 0;
+}
diff --git a/fs/union.h b/fs/union.h
index 353f78d..bd03d67 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -52,6 +52,7 @@ struct union_stack {
 #define IS_DIR_UNIONED(dentry)	((dentry)->d_union_stack)
 
 extern void d_free_unions(struct dentry *);
+extern int union_add_dir(struct path *, struct path *, unsigned int);
 
 static inline struct path *union_find_dir(struct dentry *dentry,
 					  unsigned int layer) {
@@ -64,6 +65,7 @@ static inline struct path *union_find_dir(struct dentry *dentry,
 #define IS_DIR_UNIONED(x)		(0)
 
 #define d_free_unions(x)		do { } while (0)
+#define union_add_dir(x, y, z)		({ BUG(); (0); })
 #define union_find_dir(x, y)		({ BUG(); (NULL); })
 
 #endif	/* CONFIG_UNION_MOUNT */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 33/74] union-mount: Add union_create_topmost_dir()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (31 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 32/74] union-mount: Create union_add_dir() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 34/74] union-mount: Create IS_MNT_UNION() Valerie Aurora
                   ` (13 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Union mounts design requires that the topmost directory exist for
every single directory at the time lookup completes.  This is so that
we don't have to double back and create a whole path's worth of
directories whenever we copy up a file in a directory for the first
time.  This greatly simplifies locking and error handling.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/union.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/union.h |    3 +++
 2 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/fs/union.c b/fs/union.c
index 45552f8..446116f 100644
--- a/fs/union.c
+++ b/fs/union.c
@@ -93,3 +93,56 @@ int union_add_dir(struct path *topmost, struct path *lower,
 	*path = *lower;
 	return 0;
 }
+
+/**
+ * union_create_topmost_dir - Create a matching dir in the topmost file system
+ *
+ * @parent - parent of target on topmost layer
+ * @name - name of target
+ * @topmost - path of target on topmost layer
+ * @lower - path of source on lower layer
+ *
+ * As we lookup each directory on the lower layer of a union, we
+ * create a matching directory on the topmost layer if it does not
+ * already exist.
+ *
+ * We don't use vfs_mkdir() for a few reasons: don't want to do the
+ * security check, don't want to make the dir opaque, don't need to
+ * sanitize the mode.
+ *
+ * XXX - owner is wrong, set credentials properly
+ * XXX - rmdir() directory on failure of xattr copyup
+ * XXX - not atomic w/ respect to crash
+ */
+
+int union_create_topmost_dir(struct path *parent, struct qstr *name,
+			     struct path *topmost, struct path *lower)
+{
+	struct inode *dir = parent->dentry->d_inode;
+	int mode = lower->dentry->d_inode->i_mode;
+	int error;
+
+	BUG_ON(topmost->dentry->d_inode);
+
+	/* XXX - Do we even need to check this? */
+	if (!dir->i_op->mkdir)
+		return -EPERM;
+
+	error = mnt_want_write(parent->mnt);
+	if (error)
+		return error;
+
+	error = dir->i_op->mkdir(dir, topmost->dentry, mode);
+	if (error)
+		goto out;
+
+	error = union_copyup_xattr(lower->dentry, topmost->dentry);
+	if (error)
+		dput(topmost->dentry);
+
+	fsnotify_mkdir(dir, topmost->dentry);
+out:
+	mnt_drop_write(parent->mnt);
+
+	return error;
+}
diff --git a/fs/union.h b/fs/union.h
index bd03d67..1692803 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -53,6 +53,8 @@ struct union_stack {
 
 extern void d_free_unions(struct dentry *);
 extern int union_add_dir(struct path *, struct path *, unsigned int);
+extern int union_create_topmost_dir(struct path *, struct qstr *, struct path *,
+				    struct path *);
 
 static inline struct path *union_find_dir(struct dentry *dentry,
 					  unsigned int layer) {
@@ -67,6 +69,7 @@ static inline struct path *union_find_dir(struct dentry *dentry,
 #define d_free_unions(x)		do { } while (0)
 #define union_add_dir(x, y, z)		({ BUG(); (0); })
 #define union_find_dir(x, y)		({ BUG(); (NULL); })
+#define union_create_topmost_dir(w, x, y, z)	({ BUG(); (0); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 34/74] union-mount: Create IS_MNT_UNION()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (32 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 33/74] union-mount: Add union_create_topmost_dir() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 35/74] union-mount: Create needs_lookup_union() Valerie Aurora
                   ` (12 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Jan Blunck, Valerie Aurora

From: Jan Blunck <jblunck@suse.de>

IS_MNT_UNION() tests whether a vfsmount is a union.  Note that a
directory in a union mounted file system is not necessarily unioned.
Use IS_DIR_UNIONED() to test that.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/union.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/union.h b/fs/union.h
index 1692803..c496823 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -49,6 +49,7 @@ struct union_stack {
 	struct path u_dirs[0];
 };
 
+#define IS_MNT_UNION(mnt)	((mnt)->mnt_flags & MNT_UNION)
 #define IS_DIR_UNIONED(dentry)	((dentry)->d_union_stack)
 
 extern void d_free_unions(struct dentry *);
@@ -64,6 +65,7 @@ static inline struct path *union_find_dir(struct dentry *dentry,
 
 #else /* CONFIG_UNION_MOUNT */
 
+#define IS_MNT_UNION(x)			(0)
 #define IS_DIR_UNIONED(x)		(0)
 
 #define d_free_unions(x)		do { } while (0)
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 35/74] union-mount: Create needs_lookup_union()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (33 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 34/74] union-mount: Create IS_MNT_UNION() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 36/74] union-mount: Create check_topmost_union_mnt() Valerie Aurora
                   ` (11 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

needs_lookup_union() tests if a path could possibly require a union
lookup.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/union.h             |   16 ++++++++++++++++
 include/linux/dcache.h |    1 +
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/fs/union.h b/fs/union.h
index c496823..9efb177 100644
--- a/fs/union.h
+++ b/fs/union.h
@@ -57,6 +57,21 @@ extern int union_add_dir(struct path *, struct path *, unsigned int);
 extern int union_create_topmost_dir(struct path *, struct qstr *, struct path *,
 				    struct path *);
 
+static inline int needs_lookup_union(struct path *parent_path, struct path *path)
+{
+	if (!IS_DIR_UNIONED(parent_path->dentry))
+		return 0;
+
+	/* Either already built or crossed a mountpoint to not-unioned mnt */
+	/* XXX are bind mounts root? think not */
+	if (IS_ROOT(path->dentry))
+		return 0;
+
+	/* It's okay not to have the lock; will recheck in lookup_union() */
+	/* XXX set for root dentry at mount? */
+	return !(path->dentry->d_flags & DCACHE_UNION_LOOKUP_DONE);
+}
+
 static inline struct path *union_find_dir(struct dentry *dentry,
 					  unsigned int layer) {
 	BUG_ON(layer >= dentry->d_sb->s_union_count);
@@ -72,6 +87,7 @@ static inline struct path *union_find_dir(struct dentry *dentry,
 #define union_add_dir(x, y, z)		({ BUG(); (0); })
 #define union_find_dir(x, y)		({ BUG(); (NULL); })
 #define union_create_topmost_dir(w, x, y, z)	({ BUG(); (0); })
+#define needs_lookup_union(x, y)	({ (0); })
 
 #endif	/* CONFIG_UNION_MOUNT */
 #endif	/* __KERNEL__ */
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index ddf3d2d..c37b621 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -209,6 +209,7 @@ d_iput:		no		no		no       yes
 #define DCACHE_WHITEOUT		0x0200	/* Stop lookup in a unioned file system */
 
 #define DCACHE_FALLTHRU		0x0400	/* Continue lookup below an opaque dir */
+#define DCACHE_UNION_LOOKUP_DONE	0x0800	/* Union lookup was called on this dentry */
 
 extern spinlock_t dcache_lock;
 extern seqlock_t rename_lock;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 36/74] union-mount: Create check_topmost_union_mnt()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (34 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 35/74] union-mount: Create needs_lookup_union() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 37/74] union-mount: Add clone_union_tree() and put_union_sb() Valerie Aurora
                   ` (10 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

check_topmost_union_mnt() checks that the topmost layer of a proposed
union mount is read-write, supports fallthrus and whiteouts, and isn't
mounted elsewhere.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c |   40 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5de05e4..1027e8c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1396,6 +1396,46 @@ static int invent_group_ids(struct vfsmount *mnt, bool recurse)
 	return 0;
 }
 
+/**
+ * check_topmost_union_mnt - mount-time checks for union mount
+ *
+ * @topmost_mnt: vfsmount of the topmost union filed system
+ * @mnt_flags: mount flags for the topmost mount
+ *
+ * Our readdir() solution of copying up directory entries requires
+ * that the topmost layer be writeable and support whiteouts and
+ * fallthrus.  The topmost file system can't be mounted elsewhere
+ * because it's Too Hard(tm).
+ */
+
+static int check_topmost_union_mnt(struct vfsmount *topmost_mnt, int mnt_flags)
+{
+	struct super_block *sb = topmost_mnt->mnt_sb;
+#ifndef CONFIG_UNION_MOUNT
+	printk(KERN_INFO "union mount: not supported by the kernel\n");
+	return -EINVAL;
+#endif
+	if (mnt_flags & MNT_READONLY)
+		return -EROFS;
+
+	if (atomic_read(&sb->s_active) != 1) {
+		printk(KERN_INFO "union mount: topmost fs mounted elsewhere\n");
+		return -EBUSY;
+	}
+
+	if (!(sb->s_flags & MS_WHITEOUT)) {
+		printk(KERN_INFO "union mount: whiteouts not supported by fs\n");
+		return -EINVAL;
+	}
+
+	if (!(sb->s_flags & MS_FALLTHRU)) {
+		printk(KERN_INFO "union mount: fallthrus not supported by fs\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /*
  *  @source_mnt : mount tree to be attached
  *  @nd         : place the mount tree @source_mnt is attached
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 37/74] union-mount: Add clone_union_tree() and put_union_sb()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (35 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 36/74] union-mount: Create check_topmost_union_mnt() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 38/74] union-mount: Create build_root_union() Valerie Aurora
                   ` (9 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

A union mount clones the vfsmount tree of all of the read-only layers
of the union and keeps a reference to it in the vfsmount of the
topmost layer of the union.

clone_union_tree() takes the path of the proposed union mountpoint and
attempts to clones every vfsmount mounted at that same pathname, as
well as their submounts.  All these mounts must be read-only, not
slave, and not shared.

put_union_sb() unwinds everything clone_union_tree() does.  It is
called when the superblock is deactivated.  Thus, you can lazy unmount
a union mount and when the last reference goes away, the union will be
torn down.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c        |   71 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mount.h |    2 +
 2 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 1027e8c..3da6848 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1436,6 +1436,77 @@ static int check_topmost_union_mnt(struct vfsmount *topmost_mnt, int mnt_flags)
 	return 0;
 }
 
+void put_union_sb(struct super_block *sb)
+{
+       struct vfsmount *mnt = sb->s_union_lower_mnts;
+       LIST_HEAD(umount_list);
+
+       if (!mnt)
+               return;
+       br_write_lock(vfsmount_lock);
+       umount_tree(mnt, 0, &umount_list);
+       br_write_unlock(vfsmount_lock);
+       release_mounts(&umount_list);
+       sb->s_union_lower_mnts = 0;
+       sb->s_union_count = 0;
+}
+
+/**
+ * clone_union_tree - Clone all union-able mounts at this mountpoint
+ *
+ * @topmost - vfsmount of topmost layer
+ * @mntpnt - target of union mount
+ *
+ * Given the target mountpoint of a union mount, clone all the mounts
+ * at that mountpoint (well, pathname) that qualify as a union lower
+ * layer.  Increment the hard readonly count of the lower layer
+ * superblocks.
+ *
+ * Returns error if any of the mounts or submounts mounted on or below
+ * this pathname are unsuitable for union mounting.  This means you
+ * can't construct a union mount at the root of an existing mount
+ * without unioning it.
+ *
+ * XXX - Maybe should take # of layers to go down as an argument. But
+ * how to pass this in through mount options?  All solutions look
+ * ugly.  Currently you express your intention through mounting file
+ * systems on the same mountpoint, which is pretty elegant.
+ */
+
+static int clone_union_tree(struct vfsmount *topmost, struct path *mntpnt)
+{
+	struct vfsmount *mnt, *cloned_tree;
+
+	if (!IS_ROOT(mntpnt->dentry)) {
+		printk(KERN_INFO "union mount: mount point must be a root dir\n");
+		return -EINVAL;
+	}
+
+	/* Look for the "lowest" layer to union. */
+	mnt = mntpnt->mnt;
+	while (mnt->mnt_parent->mnt_root == mnt->mnt_mountpoint) {
+		/* Got root (mnt)? */
+		if (mnt->mnt_parent == mnt)
+			break;
+		mnt = mnt->mnt_parent;
+	}
+	/*
+	 * Clone all the read-only mounts and submounts, only if they
+	 * are not shared or slave, and increment the hard read-only
+	 * users count on each one.  If this can't be done for every
+	 * mount and submount below this one, fail.
+	 */
+	cloned_tree = copy_tree(mnt, mnt->mnt_root,
+				CL_COPY_ALL | CL_PRIVATE |
+				CL_NO_SHARED | CL_NO_SLAVE |
+				CL_MAKE_HARD_READONLY);
+	if (IS_ERR(cloned_tree))
+		return PTR_ERR(cloned_tree);
+
+	topmost->mnt_sb->s_union_lower_mnts = cloned_tree;
+	return 0;
+}
+
 /*
  *  @source_mnt : mount tree to be attached
  *  @nd         : place the mount tree @source_mnt is attached
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1c69bee..2511848 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -141,4 +141,6 @@ extern void mark_mounts_for_expiry(struct list_head *mounts);
 
 extern dev_t name_to_dev_t(char *name);
 
+extern void put_union_sb(struct super_block *sb);
+
 #endif /* _LINUX_MOUNT_H */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 38/74] union-mount: Create build_root_union()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (36 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 37/74] union-mount: Add clone_union_tree() and put_union_sb() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 39/74] union-mount: Create prepare_mnt_union() and cleanup_mnt_union() Valerie Aurora
                   ` (8 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

During mount(), build_root_union() creates the union stack for the
root directory.  All other directory union stacks are bootstrapped
from their parents' union stacks during path lookup.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 3da6848..12563ea 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1507,6 +1507,54 @@ static int clone_union_tree(struct vfsmount *topmost, struct path *mntpnt)
 	return 0;
 }
 
+/**
+ * build_root_union - Create the union stack for the root dir
+ *
+ * @topmost_mnt - vfsmount of topmost mount
+ *
+ * Build the union stack for the root dir.  Annoyingly, we have to
+ * traverse union "up" from the root of the cloned tree to find the
+ * topmost read-only mount, and then traverse back "down" to build the
+ * stack.
+ */
+
+static int build_root_union(struct vfsmount *topmost_mnt)
+{
+	struct path lower, topmost_path;
+	struct vfsmount *mnt, *topmost_ro_mnt;
+	unsigned int i, layers = 1;
+	int err = 0;
+
+	/* Find the topmost read-only mount */
+	topmost_ro_mnt = topmost_mnt->mnt_sb->s_union_lower_mnts;
+	for (mnt = topmost_ro_mnt; mnt; mnt = next_mnt(mnt, topmost_ro_mnt)) {
+		if ((mnt->mnt_parent == topmost_ro_mnt) &&
+		    (mnt->mnt_mountpoint == topmost_ro_mnt->mnt_root)) {
+			topmost_ro_mnt = mnt;
+			layers++;
+		}
+	}
+	topmost_mnt->mnt_sb->s_union_count = layers;
+
+	/* Build the root dir's union stack from the top down */
+	topmost_path.mnt = topmost_mnt;
+	topmost_path.dentry = topmost_mnt->mnt_root;
+	mnt = topmost_ro_mnt;
+	for (i = 0; i < layers; i++) {
+		lower.mnt = mntget(mnt);
+		lower.dentry = dget(mnt->mnt_root);
+		err = union_add_dir(&topmost_path, &lower, i);
+		if (err)
+			goto out;
+		mnt = mnt->mnt_parent;
+	}
+	return 0;
+out:
+	d_free_unions(topmost_path.dentry);
+	topmost_mnt->mnt_sb->s_union_count = 0;
+	return err;
+}
+
 /*
  *  @source_mnt : mount tree to be attached
  *  @nd         : place the mount tree @source_mnt is attached
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 39/74] union-mount: Create prepare_mnt_union() and cleanup_mnt_union()
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (37 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 38/74] union-mount: Create build_root_union() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 40/74] union-mount: Prevent improper union-related remounts Valerie Aurora
                   ` (7 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

prepare_mnt_union() ties together all the mount-time checks and setup
for union mounts.  It tests the layers for suitability and builds the
root union stack.

cleanup_mnt_union() unwinds everything prepare_mnt_union() does.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 12563ea..6486386 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1555,6 +1555,49 @@ out:
 	return err;
 }
 
+/**
+ * prepare_mnt_union - do setup necessary for a union mount
+ *
+ * @topmost_mnt: vfsmount of topmost layer
+ * @mntpnt: path of requested mountpoint
+ *
+ * We union every underlying file system that is mounted on the same
+ * mountpoint (well, pathname), read-only, and not shared.  If we get
+ * at least one layer, we don't return an error, although we will
+ * complain in the kernel log if we hit a mount that can't be
+ * unioned.
+ *
+ * Caller needs namespace_sem, but can't have vfsmount_lock.
+ */
+
+static int prepare_mnt_union(struct vfsmount *topmost_mnt, struct path *mntpnt)
+{
+	int err;
+
+	err = check_topmost_union_mnt(topmost_mnt, topmost_mnt->mnt_flags);
+	if (err)
+		return err;
+
+	err = clone_union_tree(topmost_mnt, mntpnt);
+	if (err)
+		return err;
+
+	err = build_root_union(topmost_mnt);
+	if (err)
+		goto out;
+
+	return 0;
+out:
+	put_union_sb(topmost_mnt->mnt_sb);
+	return err;
+}
+
+static void cleanup_mnt_union(struct vfsmount *topmost_mnt)
+{
+	d_free_unions(topmost_mnt->mnt_root);
+	put_union_sb(topmost_mnt->mnt_sb);
+}
+
 /*
  *  @source_mnt : mount tree to be attached
  *  @nd         : place the mount tree @source_mnt is attached
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 40/74] union-mount: Prevent improper union-related remounts
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (38 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 39/74] union-mount: Create prepare_mnt_union() and cleanup_mnt_union() Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 41/74] union-mount: Prevent topmost file system from being mounted elsewhere Valerie Aurora
                   ` (6 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

A remount request must (a) not convert a union to a non-union (or vice
versa), or (b) make the topmost layer of a union read-only.

Note that we only have to worry about attempts to remount the vfsmount
of the topmost read-write of the union (the one with MNT_UNION set).
The vfsmounts of the read-only layers are hidden in a cloned tree
hanging of the superblock of the topmost layer and aren't visible to
userspace.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6486386..2b8f329 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1870,6 +1870,18 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	if (!check_mnt(path->mnt))
 		return -EINVAL;
 
+	if ((path->mnt->mnt_flags & MNT_UNION) &&
+	    !(mnt_flags & MNT_UNION))
+		return -EINVAL;
+
+	if ((mnt_flags & MNT_UNION) &&
+	    !(path->mnt->mnt_flags & MNT_UNION))
+		return -EINVAL;
+
+	if ((path->mnt->mnt_flags & MNT_UNION) &&
+	    (mnt_flags & MNT_READONLY))
+		return -EINVAL;
+
 	if (path->dentry != path->mnt->mnt_root)
 		return -EINVAL;
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 41/74] union-mount: Prevent topmost file system from being mounted elsewhere
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (39 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 40/74] union-mount: Prevent improper union-related remounts Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 42/74] union-mount: Prevent bind mounts of union mounts Valerie Aurora
                   ` (5 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

The device underlying the topmost read-write layer of a file system
cannot be mounted anywhere else on the system.  We keep a pointer to
the union stack in the dentry of the topmost directory, so that dentry
can't be part of a different mount, since dentries are shared between
different mounts of the same device.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2b8f329..3ac8198 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2044,6 +2044,11 @@ int do_add_mount(struct vfsmount *newmnt, struct path *path,
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
+	/* Top layers of union mounts can't be mounted elsewhere */
+	err = -EBUSY;
+	if (newmnt->mnt_sb->s_union_lower_mnts)
+		goto unlock;
+
 	newmnt->mnt_flags = mnt_flags;
 	if ((err = graft_tree(newmnt, path)))
 		goto unlock;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 42/74] union-mount: Prevent bind mounts of union mounts
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (40 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 41/74] union-mount: Prevent topmost file system from being mounted elsewhere Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 43/74] union-mount: Implement union mount Valerie Aurora
                   ` (4 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Prevent bind mounts of parts of union mounts.

XXX - Bind mounting parts of union mounts is probably easy to
implement, but requires some careful thought about corner cases,
extensive testing, and some refactoring of the code.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 3ac8198..1581411 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1806,6 +1806,12 @@ static int do_loopback(struct path *path, char *old_name,
 	err = -EINVAL;
 	if (IS_MNT_UNBINDABLE(old_path.mnt))
 		goto out;
+	/*
+	 * XXX - Mounting a subtree of a union mount elsewhere
+	 * requires careful thought and some refactoring.
+	 */
+	if (IS_MNT_UNION(old_path.mnt))
+		goto out;
 
 	if (!check_mnt(path->mnt) || !check_mnt(old_path.mnt))
 		goto out;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 43/74] union-mount: Implement union mount
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (41 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 42/74] union-mount: Prevent bind mounts of union mounts Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  1:59 ` [PATCH 44/74] union-mount: Temporarily disable some syscalls Valerie Aurora
                   ` (3 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

Up till this commit, mount with MS_UNION flag succeeded but didn't
actually union the file systems.  Now call the functions to check
the source mounts and create/destroy the per-vfsmount union structures.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namespace.c |   13 ++++++++++++-
 fs/super.c     |    1 +
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 1581411..11677d4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1675,9 +1675,17 @@ static int attach_recursive_mnt(struct vfsmount *source_mnt,
 		if (err)
 			goto out;
 	}
+
+	/* parent_path means we are moving an existing unioned mount */
+	if (!parent_path && IS_MNT_UNION(source_mnt)) {
+		err = prepare_mnt_union(source_mnt, path);
+		if (err)
+			goto out_cleanup_ids;
+	}
+
 	err = propagate_mnt(dest_mnt, dest_dentry, source_mnt, &tree_list);
 	if (err)
-		goto out_cleanup_ids;
+		goto out_cleanup_union;
 
 	br_write_lock(vfsmount_lock);
 
@@ -1702,6 +1710,9 @@ static int attach_recursive_mnt(struct vfsmount *source_mnt,
 
 	return 0;
 
+ out_cleanup_union:
+	if (!parent_path && IS_MNT_UNION(source_mnt))
+		cleanup_mnt_union(source_mnt);
  out_cleanup_ids:
 	if (IS_MNT_SHARED(dest_mnt))
 		cleanup_group_ids(source_mnt, NULL);
diff --git a/fs/super.c b/fs/super.c
index d02a4d6..78232d2 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -178,6 +178,7 @@ void deactivate_locked_super(struct super_block *s)
 	if (atomic_dec_and_test(&s->s_active)) {
 		fs->kill_sb(s);
 		put_filesystem(fs);
+		put_union_sb(s);
 		put_super(s);
 	} else {
 		up_write(&s->s_umount);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 44/74] union-mount: Temporarily disable some syscalls
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (42 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 43/74] union-mount: Implement union mount Valerie Aurora
@ 2011-03-23  1:59 ` Valerie Aurora
  2011-03-23  2:12 ` [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (2 subsequent siblings)
  46 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  1:59 UTC (permalink / raw)
  To: linux-fsdevel, linux; +Cc: viro, Valerie Aurora, Valerie Aurora

From: Valerie Aurora <vaurora@redhat.com>

After some of the following patches in this series, a few system calls
will crash the kernel if called on union-mounted file systems.
Temporarily disable rename(), unlink(), and rmdir() on unioned file
systems until they are correctly implemented by later patches.

Signed-off-by: Valerie Aurora <valerie.aurora@gmail.com>
---
 fs/namei.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ce54ed4..3c00ce6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -35,6 +35,7 @@
 #include <asm/uaccess.h>
 
 #include "internal.h"
+#include "union.h"
 
 /* [Feb-1997 T. Schoebel-Theuer]
  * Fundamental changes in the pathname lookup mechanisms (namei)
@@ -2375,6 +2376,11 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	if (error)
 		return error;
 
+	/* rmdir() on union mounts not implemented yet */
+	error = -EINVAL;
+	if (IS_DIR_UNIONED(nd.path.dentry))
+		goto exit1;
+
 	switch(nd.last_type) {
 	case LAST_DOTDOT:
 		error = -ENOTEMPTY;
@@ -2471,6 +2477,11 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 	if (nd.last_type != LAST_NORM)
 		goto exit1;
 
+	/* unlink() on union mounts not implemented yet */
+	error = -EINVAL;
+	if (IS_DIR_UNIONED(nd.path.dentry))
+		goto exit1;
+
 	nd.flags &= ~LOOKUP_PARENT;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
@@ -2861,6 +2872,12 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	if (oldnd.path.mnt != newnd.path.mnt)
 		goto exit2;
 
+	/* rename() on union mounts not implemented yet */
+	error = -EXDEV;
+	if (IS_DIR_UNIONED(oldnd.path.dentry) ||
+	    IS_DIR_UNIONED(newnd.path.dentry))
+		goto exit2;
+
 	old_dir = oldnd.path.dentry;
 	error = -EBUSY;
 	if (oldnd.last_type != LAST_NORM)
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/74] Union mounts version something or other
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (43 preceding siblings ...)
  2011-03-23  1:59 ` [PATCH 44/74] union-mount: Temporarily disable some syscalls Valerie Aurora
@ 2011-03-23  2:12 ` Valerie Aurora
  2011-03-24 13:43   ` Union mounts comparison with overlay file system prototype? Ric Wheeler
  2011-03-23  8:38 ` [PATCH 00/74] Union mounts version something or other Sedat Dilek
  2011-03-30 14:30 ` David Howells
  46 siblings, 1 reply; 56+ messages in thread
From: Valerie Aurora @ 2011-03-23  2:12 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: viro

And I cc'd linux@vger.kernel.org on all the patches instead of
linux-kernel@vger.kernel.org - guess I'm really out of the kernel
business now!  Anyway, if you want your replies to go to lkml, you'll
have to hand edit the cc list.

-VAL

On Tue, Mar 22, 2011 at 6:58 PM, Valerie Aurora
<valerie.aurora@gmail.com> wrote:
> Hi union mounts fans(?),
>
> Here's my current union mounts patch set, against 2.6.36-rc5.  I'm
> busy with other things[1] and unlikely to put in significant work on
> union mounts in the next year.  I'm happy to answer questions from
> anyone else working on them.
>
> As always, git trees for the kernel, util-linux, and e2fsprogs, lots
> of documentation, and LWN articles describing the various problems
> unioning file systems will encounter are here:
>
> http://valerieaurora.org/union/
>
> The devkit linked to from that page includes my Usermode Linux testing
> environment, including root file system image.  The README tells you
> how to run the test suite automatically (yes, an automated test suite
> - with Makefile and version control and comments and stuff!).
>
> I took a quick look at the current overlayfs patch set, and it's
> small, clean, and easy to understand.  If it does what people need, I
> say ship it.
>
> Thanks to everyone who reviewed and submitted patches for union mounts!
>
> -VAL
>
> [1] http://adainitiative.org
>
> ---
>
> Felix Fietkau (2):
>  whiteout: jffs2 whiteout support
>  fallthru: jffs2 fallthru support
>
> Jan Blunck (9):
>  VFS: Make lookup_hash() return a struct path
>  autofs4: Save autofs trigger's vfsmount in super block info
>  whiteout/NFSD: Don't return information about whiteouts to userspace
>  whiteout: Add vfs_whiteout() and whiteout inode operation
>  whiteout: Allow removal of a directory with whiteouts
>  whiteout: tmpfs whiteout support
>  union-mount: Introduce MNT_UNION and MS_UNION flags
>  union-mount: Free union stack on removal of topmost dentry from
>    dcache
>  union-mount: Create IS_MNT_UNION()
>
> Valerie Aurora (63):
>  VFS: Comment follow_mount() and friends
>  Documentation: Fix trivial typo in filesystems/sharedsubtree.txt
>  whiteout: Define opaque inode flags and operations
>  ext2: Add ext2_dirent_in_use()
>  ext2: Split ext2_add_entry() from ext2_add_link()
>  whiteout: ext2 whiteout support
>  fallthru: Basic fallthru definitions
>  fallthru: ext2 fallthru support
>  fallthru: tmpfs fallthru support
>  VFS: Add hard read-only users count to superblock
>  VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors
>  VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree()
>  VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
>  VFS: Add CL_MAKE_HARD_READONLY flag to clone_mnt()/copy_tree()
>  union-mount: Union mounts documentation
>  union-mount: Add CONFIG_UNION_MOUNT option
>  union-mount: Create union_stack structure
>  union-mount: Add two superblock fields for union mounts
>  union-mount: Add union_alloc()
>  union-mount: Add union_find_dir()
>  union-mount: Create d_free_unions()
>  union-mount: Create union_add_dir()
>  union-mount: Add union_create_topmost_dir()
>  union-mount: Create needs_lookup_union()
>  union-mount: Create check_topmost_union_mnt()
>  union-mount: Add clone_union_tree() and put_union_sb()
>  union-mount: Create build_root_union()
>  union-mount: Create prepare_mnt_union() and cleanup_mnt_union()
>  union-mount: Prevent improper union-related remounts
>  union-mount: Prevent topmost file system from being mounted elsewhere
>  union-mount: Prevent bind mounts of union mounts
>  union-mount: Implement union mount
>  union-mount: Temporarily disable some syscalls
>  union-mount: Basic infrastructure of __lookup_union()
>  union-mount: Process negative dentries in __lookup_union()
>  union-mount: Return files found in lower layers in __lookup_union()
>  union-mount: Build union stack in __lookup_union()
>  union-mount: Follow mount in __lookup_union()
>  union-mount: Add lookup_union()
>  union-mount: Add do_lookup_union() wrapper for __lookup_union()
>  union-mount: Call union lookup functions in lookup path
>  union-mount: Create whiteout on unlink()
>  union-mount: Create whiteout on rmdir()
>  union-mount: Set opaque flag on new directories in unioned file
>    systems
>  union-mount: Copy up directory entries on first readdir()
>  union-mount: Add generic_readdir_fallthru() helper
>  fallthru: ext2 support for lookup of d_type/d_ino in fallthrus
>  fallthru: tmpfs support for lookup of d_type/d_ino in fallthrus
>  fallthru: jffs2 support for lookup of d_type/d_ino in fallthrus
>  VFS: Split inode_permission() and create path_permission()
>  VFS: Create user_path_nd() to lookup both parent and target
>  union-mount: In-kernel file copyup routines
>  union-mount: Implement union-aware access()/faccessat()
>  union-mount: Implement union-aware link()
>  union-mount: Implement union-aware rename()
>  union-mount: Implement union-aware writable open()
>  union-mount: Implement union-aware chown()
>  union-mount: Implement union-aware truncate()
>  union-mount: Implement union-aware chmod()/fchmodat()
>  union-mount: Implement union-aware lchown()
>  union-mount: Implement union-aware utimensat()
>  union-mount: Implement union-aware setxattr()
>  union-mount: Implement union-aware lsetxattr()
>
>  Documentation/filesystems/sharedsubtree.txt |    4 +-
>  Documentation/filesystems/union-mounts.txt  |  751 +++++++++++++++++++++++++
>  Documentation/filesystems/vfs.txt           |   16 +-
>  fs/Kconfig                                  |   13 +
>  fs/Makefile                                 |    1 +
>  fs/autofs4/autofs_i.h                       |    1 +
>  fs/autofs4/init.c                           |   11 +-
>  fs/autofs4/root.c                           |    6 +
>  fs/compat.c                                 |    9 +
>  fs/dcache.c                                 |   32 +-
>  fs/ext2/dir.c                               |  116 ++++-
>  fs/ext2/ext2.h                              |    3 +
>  fs/ext2/inode.c                             |   11 +-
>  fs/ext2/namei.c                             |   85 +++-
>  fs/ext2/super.c                             |    6 +
>  fs/jffs2/dir.c                              |  117 ++++-
>  fs/jffs2/fs.c                               |    4 +
>  fs/jffs2/super.c                            |    2 +-
>  fs/libfs.c                                  |   20 +-
>  fs/namei.c                                  |  807 ++++++++++++++++++++++++---
>  fs/namespace.c                              |  394 +++++++++++--
>  fs/nfsd/nfs3xdr.c                           |    5 +
>  fs/nfsd/nfs4xdr.c                           |    5 +
>  fs/nfsd/nfsxdr.c                            |    4 +
>  fs/open.c                                   |  116 ++++-
>  fs/pnode.c                                  |    5 +-
>  fs/pnode.h                                  |    3 +
>  fs/readdir.c                                |   18 +
>  fs/super.c                                  |    9 +
>  fs/union.c                                  |  714 ++++++++++++++++++++++++
>  fs/union.h                                  |  105 ++++
>  fs/utimes.c                                 |   14 +-
>  fs/xattr.c                                  |   65 ++-
>  include/linux/dcache.h                      |   37 ++-
>  include/linux/ext2_fs.h                     |    8 +
>  include/linux/fs.h                          |   45 ++
>  include/linux/jffs2.h                       |    8 +
>  include/linux/mount.h                       |    4 +
>  include/linux/namei.h                       |    2 +
>  kernel/audit_tree.c                         |   10 +-
>  mm/shmem.c                                  |  193 ++++++-
>  41 files changed, 3551 insertions(+), 228 deletions(-)
>  create mode 100644 Documentation/filesystems/union-mounts.txt
>  create mode 100644 fs/union.c
>  create mode 100644 fs/union.h
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/74] Union mounts version something or other
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (44 preceding siblings ...)
  2011-03-23  2:12 ` [PATCH 00/74] Union mounts version something or other Valerie Aurora
@ 2011-03-23  8:38 ` Sedat Dilek
  2011-03-24 22:40   ` Ben Hutchings
  2011-03-30 14:30 ` David Howells
  46 siblings, 1 reply; 56+ messages in thread
From: Sedat Dilek @ 2011-03-23  8:38 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: linux-fsdevel, viro, LKML, Felix Fietkau, hch, Miklos Szeredi,
	J. R. Okajima

On Wed, Mar 23, 2011 at 2:58 AM, Valerie Aurora
<valerie.aurora@gmail.com> wrote:
> Hi union mounts fans(?),
>
> Here's my current union mounts patch set, against 2.6.36-rc5.  I'm
> busy with other things[1] and unlikely to put in significant work on
> union mounts in the next year.  I'm happy to answer questions from
> anyone else working on them.
>
> As always, git trees for the kernel, util-linux, and e2fsprogs, lots
> of documentation, and LWN articles describing the various problems
> unioning file systems will encounter are here:
>
> http://valerieaurora.org/union/
>
> The devkit linked to from that page includes my Usermode Linux testing
> environment, including root file system image.  The README tells you
> how to run the test suite automatically (yes, an automated test suite
> - with Makefile and version control and comments and stuff!).
>
> I took a quick look at the current overlayfs patch set, and it's
> small, clean, and easy to understand.  If it does what people need, I
> say ship it.
>
> Thanks to everyone who reviewed and submitted patches for union mounts!
>
> -VAL
>
> [1] http://adainitiative.org
>
> ---
>
> Felix Fietkau (2):
>  whiteout: jffs2 whiteout support
>  fallthru: jffs2 fallthru support
>
> Jan Blunck (9):
>  VFS: Make lookup_hash() return a struct path
>  autofs4: Save autofs trigger's vfsmount in super block info
>  whiteout/NFSD: Don't return information about whiteouts to userspace
>  whiteout: Add vfs_whiteout() and whiteout inode operation
>  whiteout: Allow removal of a directory with whiteouts
>  whiteout: tmpfs whiteout support
>  union-mount: Introduce MNT_UNION and MS_UNION flags
>  union-mount: Free union stack on removal of topmost dentry from
>    dcache
>  union-mount: Create IS_MNT_UNION()
>
> Valerie Aurora (63):
>  VFS: Comment follow_mount() and friends
>  Documentation: Fix trivial typo in filesystems/sharedsubtree.txt
>  whiteout: Define opaque inode flags and operations
>  ext2: Add ext2_dirent_in_use()
>  ext2: Split ext2_add_entry() from ext2_add_link()
>  whiteout: ext2 whiteout support
>  fallthru: Basic fallthru definitions
>  fallthru: ext2 fallthru support
>  fallthru: tmpfs fallthru support
>  VFS: Add hard read-only users count to superblock
>  VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors
>  VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree()
>  VFS: Add CL_NO_SLAVE flag to clone_mnt()/copy_tree()
>  VFS: Add CL_MAKE_HARD_READONLY flag to clone_mnt()/copy_tree()
>  union-mount: Union mounts documentation
>  union-mount: Add CONFIG_UNION_MOUNT option
>  union-mount: Create union_stack structure
>  union-mount: Add two superblock fields for union mounts
>  union-mount: Add union_alloc()
>  union-mount: Add union_find_dir()
>  union-mount: Create d_free_unions()
>  union-mount: Create union_add_dir()
>  union-mount: Add union_create_topmost_dir()
>  union-mount: Create needs_lookup_union()
>  union-mount: Create check_topmost_union_mnt()
>  union-mount: Add clone_union_tree() and put_union_sb()
>  union-mount: Create build_root_union()
>  union-mount: Create prepare_mnt_union() and cleanup_mnt_union()
>  union-mount: Prevent improper union-related remounts
>  union-mount: Prevent topmost file system from being mounted elsewhere
>  union-mount: Prevent bind mounts of union mounts
>  union-mount: Implement union mount
>  union-mount: Temporarily disable some syscalls
>  union-mount: Basic infrastructure of __lookup_union()
>  union-mount: Process negative dentries in __lookup_union()
>  union-mount: Return files found in lower layers in __lookup_union()
>  union-mount: Build union stack in __lookup_union()
>  union-mount: Follow mount in __lookup_union()
>  union-mount: Add lookup_union()
>  union-mount: Add do_lookup_union() wrapper for __lookup_union()
>  union-mount: Call union lookup functions in lookup path
>  union-mount: Create whiteout on unlink()
>  union-mount: Create whiteout on rmdir()
>  union-mount: Set opaque flag on new directories in unioned file
>    systems
>  union-mount: Copy up directory entries on first readdir()
>  union-mount: Add generic_readdir_fallthru() helper
>  fallthru: ext2 support for lookup of d_type/d_ino in fallthrus
>  fallthru: tmpfs support for lookup of d_type/d_ino in fallthrus
>  fallthru: jffs2 support for lookup of d_type/d_ino in fallthrus
>  VFS: Split inode_permission() and create path_permission()
>  VFS: Create user_path_nd() to lookup both parent and target
>  union-mount: In-kernel file copyup routines
>  union-mount: Implement union-aware access()/faccessat()
>  union-mount: Implement union-aware link()
>  union-mount: Implement union-aware rename()
>  union-mount: Implement union-aware writable open()
>  union-mount: Implement union-aware chown()
>  union-mount: Implement union-aware truncate()
>  union-mount: Implement union-aware chmod()/fchmodat()
>  union-mount: Implement union-aware lchown()
>  union-mount: Implement union-aware utimensat()
>  union-mount: Implement union-aware setxattr()
>  union-mount: Implement union-aware lsetxattr()
>

Shall I cry or laugh? I really don't know...

With your email I remembered my first steps with Linux Live-CD technologies.
As a home-project I created an own sidux live-cd and enlightenment as
window-manager to get a bit familiar with the technolgies/tools behind
it.
Still the mostly used and most effective "technology" in this area is
AUFS in combination with SquashFS compression (see for example Debian
/ GRML / ex-sidux Live-CD frameworks).

But I also remember these "famous words" [1]:

"Note: it becomes clear that "Aufs was rejected. Let's give it up."
According to Christoph Hellwig, linux rejects all union-type filesystems
but UnionMount."

Please hold the line... Please hold the line... Please hold the line???

Whuzzz up with AUFS?
Even there where massive changes to VFS and FS in 2.6.38+, there is an
adapted kernel patch around (for example see Debian's 2.6.38 linux-2.6
packages in experimental branch).

From my POV OverlayFS is the new star at the skyline and should be
promoted as 1st choice, now.
I am definitely PRO for including it in 2.6.39!
And when looking to the code-size, OverlayFS is small while you send
70+ single patches.

Union-mounts never got out of "technology preview" status and was
stepmotherly promoted in the past.
You are leaving an outdated (unfinished?) code (IIRC I read 2.6.36-rc5
as code-base) and furthermore u-m needs hacked user-space, too.
So, u-m is for me - seen from today and having OverlayFS as an
alternative - a dead horse.
But, BKL-removal showed us... a once started job can be finished :-).

( Sorry, for the rough words. )

I do not want to end this email with some lights at the end of the
tunnel and want to quote Felix [2]:

[...]
> But I'd want Al's ack on the series. And also hear who uses it and how
> it's been tested?
We're using it in OpenWrt (an Embedded Linux distribution) for devices
with tiny amounts of flash for the entire system (e.g. 4 MB).
We're using it to provide a writable on-flash root filesystem with
squashfs for the read-only part and jffs2 for the writable overlay. This
saves some precious flash space compared to using only jffs2, and it
makes it easy for users to reset their device to defaults without having
to reflash.
With a backport of v6 of this series + my fixes that went into v7 this
is working quite well on 2.6.37 and 2.6.38 - I'm using it on a few
wireless access points at home.
[...]

OverlayFS GO GO GO!

- Sedat -

[1] http://aufs.sourceforge.net/
[2] http://lkml.org/lkml/2011/3/22/294

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Union mounts comparison with overlay file system prototype?
  2011-03-23  2:12 ` [PATCH 00/74] Union mounts version something or other Valerie Aurora
@ 2011-03-24 13:43   ` Ric Wheeler
  2011-03-25 11:38     ` Szeredi Miklos
  0 siblings, 1 reply; 56+ messages in thread
From: Ric Wheeler @ 2011-03-24 13:43 UTC (permalink / raw)
  To: Valerie Aurora, miklos
  Cc: linux-fsdevel, linux-kernel, viro, Christoph Hellwig

On 03/22/2011 10:12 PM, Valerie Aurora wrote:
> And I cc'd linux@vger.kernel.org on all the patches instead of
> linux-kernel@vger.kernel.org - guess I'm really out of the kernel
> business now!  Anyway, if you want your replies to go to lkml, you'll
> have to hand edit the cc list.
>
> -VAL
>
  Val, Miklos,

Can one or both of you summarize what we union mounts and overlay do better or 
worse? Do we need both or just one?

Thanks!

Ric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/74] Union mounts version something or other
  2011-03-23  8:38 ` [PATCH 00/74] Union mounts version something or other Sedat Dilek
@ 2011-03-24 22:40   ` Ben Hutchings
  2011-03-25  2:32     ` Sedat Dilek
  0 siblings, 1 reply; 56+ messages in thread
From: Ben Hutchings @ 2011-03-24 22:40 UTC (permalink / raw)
  To: sedat.dilek
  Cc: Valerie Aurora, linux-fsdevel, viro, LKML, Felix Fietkau, hch,
	Miklos Szeredi, J. R. Okajima

On Wed, Mar 23, 2011 at 09:38:39AM +0100, Sedat Dilek wrote:
[...]
> Whuzzz up with AUFS?
> Even there where massive changes to VFS and FS in 2.6.38+, there is an
> adapted kernel patch around (for example see Debian's 2.6.38 linux-2.6
> packages in experimental branch).

Do not cite the Debian kernel team as supporting aufs.  It is included
only because Debian Live needs some kind of union filesystem, and only
until that appears in-tree.

> From my POV OverlayFS is the new star at the skyline and should be
> promoted as 1st choice, now.
[...]

This thread is for technical review, not marketing.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/74] Union mounts version something or other
  2011-03-24 22:40   ` Ben Hutchings
@ 2011-03-25  2:32     ` Sedat Dilek
  0 siblings, 0 replies; 56+ messages in thread
From: Sedat Dilek @ 2011-03-25  2:32 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Valerie Aurora, linux-fsdevel, viro, LKML, Felix Fietkau, hch,
	Miklos Szeredi, J. R. Okajima

On Thu, Mar 24, 2011 at 11:40 PM, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Wed, Mar 23, 2011 at 09:38:39AM +0100, Sedat Dilek wrote:
> [...]
>> Whuzzz up with AUFS?
>> Even there where massive changes to VFS and FS in 2.6.38+, there is an
>> adapted kernel patch around (for example see Debian's 2.6.38 linux-2.6
>> packages in experimental branch).
>
> Do not cite the Debian kernel team as supporting aufs.  It is included
> only because Debian Live needs some kind of union filesystem, and only
> until that appears in-tree.
>

Debian is using AUFS - that is a fact!
That is exactly what I mean... as there is no *official* (and working!
and really used) union-filesystem from kernel-side.
Diverse distributions (and especially the embedded area) use AUFS (as
overlay) and SquashFS (for compression) as an *unofficial* working
solution for years. including Debian.
SquashsFS and hopefully SquashFS-XZ is in the kernel, but not AUFS.
So, I am interested in (a new discussion and) re-thinking what is the
number #1 choice in that area.
BTW, in the meantime Ric Wheeler asked for a comparison between
Union-mounts and OverlayFS  [1].
Let's see and read.

>> From my POV OverlayFS is the new star at the skyline and should be
>> promoted as 1st choice, now.
> [...]
>
> This thread is for technical review, not marketing.
>

My POV is clear - I already gave some technical arguments contra union-mounts.
An official and working(!) solution is needed and "promoted" from the
big five in Linux kernel (filesystem) development.
A decision what is the (next and new preferred?) standard
union-filesystem in the kernel-world.
Now, there is a new problem for union-mounts as one of its main
maintainer stopped working on it.
Even OverlayFS is young, it is already used as a working(!) solution in OpenWRT.
People do not need and want a never-ending "technical preview" for
years, they need and want a working(!) solution, that is/was mostly
AUFS chosen *unofficially* (and remember rejected into mainline).
Personally, I did not see/read union-mounts used by any distro or in
the embedded world.

- Sedat -

[1] http://www.spinics.net/lists/linux-fsdevel/msg43345.html
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Union mounts comparison with overlay file system prototype?
  2011-03-24 13:43   ` Union mounts comparison with overlay file system prototype? Ric Wheeler
@ 2011-03-25 11:38     ` Szeredi Miklos
  2011-03-25 12:12       ` Ric Wheeler
  0 siblings, 1 reply; 56+ messages in thread
From: Szeredi Miklos @ 2011-03-25 11:38 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Valerie Aurora, linux-fsdevel, linux-kernel, viro,
	Christoph Hellwig

On Thu, Mar 24, 2011 at 2:43 PM, Ric Wheeler <rwheeler@redhat.com> wrote:
> Can one or both of you summarize what we union mounts and overlay do better
> or worse? Do we need both or just one?

The semantics are very similar, the differences are in the implementation.

Union mounts:

 - whiteout/opaque/fallthrough support in filesystems
 - whiteout operation is atomic
 - no dentry and inode duplication
 - copy up on lookup and readdir
 - does not support union of two read-only trees
 - merged directory stored in upper tree

Overlayfs

- whiteout/opaque as xattrs
- whiteout operation is not atomic
- dentry and inode duplication(*)
- only copy up on modification
- supports union of two read-only trees
- merged directory not cached(**)

(*) it's possible to eliminate inode duplication of non-directories
with some VFS modifications
(**) caching should be possible to do

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Union mounts comparison with overlay file system prototype?
  2011-03-25 11:38     ` Szeredi Miklos
@ 2011-03-25 12:12       ` Ric Wheeler
  0 siblings, 0 replies; 56+ messages in thread
From: Ric Wheeler @ 2011-03-25 12:12 UTC (permalink / raw)
  To: Szeredi Miklos
  Cc: Valerie Aurora, linux-fsdevel, linux-kernel, viro,
	Christoph Hellwig

On 03/25/2011 07:38 AM, Szeredi Miklos wrote:
> On Thu, Mar 24, 2011 at 2:43 PM, Ric Wheeler<rwheeler@redhat.com>  wrote:
>> Can one or both of you summarize what we union mounts and overlay do better
>> or worse? Do we need both or just one?
> The semantics are very similar, the differences are in the implementation.
>
> Union mounts:
>
>   - whiteout/opaque/fallthrough support in filesystems
>   - whiteout operation is atomic
>   - no dentry and inode duplication
>   - copy up on lookup and readdir
>   - does not support union of two read-only trees
>   - merged directory stored in upper tree
>
> Overlayfs
>
> - whiteout/opaque as xattrs
> - whiteout operation is not atomic
> - dentry and inode duplication(*)
> - only copy up on modification
> - supports union of two read-only trees
> - merged directory not cached(**)
>
> (*) it's possible to eliminate inode duplication of non-directories
> with some VFS modifications
> (**) caching should be possible to do

Thanks for the high level overview!

Ric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/74] Union mounts version something or other
  2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
                   ` (45 preceding siblings ...)
  2011-03-23  8:38 ` [PATCH 00/74] Union mounts version something or other Sedat Dilek
@ 2011-03-30 14:30 ` David Howells
  2011-04-01 16:48   ` Valerie Aurora
  2011-04-21 13:09   ` David Howells
  46 siblings, 2 replies; 56+ messages in thread
From: David Howells @ 2011-03-30 14:30 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: dhowells, linux-fsdevel, linux, viro

Valerie Aurora <valerie.aurora@gmail.com> wrote:

> As always, git trees for the kernel, util-linux, and e2fsprogs, lots
> of documentation, and LWN articles describing the various problems
> unioning file systems will encounter are here:
> 
> http://valerieaurora.org/union/

The webpage there says "linked_list" is the most recent branch, but that
doesn't match with your patches posted here.  Is "+ext2_cleanup" the matching
branch?

Thanks,
David

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/74] Union mounts version something or other
  2011-03-30 14:30 ` David Howells
@ 2011-04-01 16:48   ` Valerie Aurora
  2011-04-21 13:09   ` David Howells
  1 sibling, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-04-01 16:48 UTC (permalink / raw)
  To: David Howells; +Cc: linux-fsdevel, linux-kernel, viro

On Wed, Mar 30, 2011 at 7:30 AM, David Howells <dhowells@redhat.com> wrote:
> Valerie Aurora <valerie.aurora@gmail.com> wrote:
>
>> As always, git trees for the kernel, util-linux, and e2fsprogs, lots
>> of documentation, and LWN articles describing the various problems
>> unioning file systems will encounter are here:
>>
>> http://valerieaurora.org/union/
>
> The webpage there says "linked_list" is the most recent branch, but that
> doesn't match with your patches posted here.  Is "+ext2_cleanup" the matching
> branch?

That's never up to date, but I just fixed it.  The branch you want is
"ext2_cleanup".  Any branch with "+" at the front is a mistake from
trying to do a git push of a rebased branch.  I deleted it.

-VAL

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/74] Union mounts version something or other
  2011-03-30 14:30 ` David Howells
  2011-04-01 16:48   ` Valerie Aurora
@ 2011-04-21 13:09   ` David Howells
  2011-04-24 21:48     ` Valerie Aurora
  1 sibling, 1 reply; 56+ messages in thread
From: David Howells @ 2011-04-21 13:09 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: dhowells, linux-fsdevel, linux-kernel, viro

Valerie Aurora <valerie.aurora@gmail.com> wrote:

> That's never up to date, but I just fixed it.  The branch you want is
> "ext2_cleanup".  Any branch with "+" at the front is a mistake from
> trying to do a git push of a rebased branch.  I deleted it.

Okay, I've got that pulled up to Linus's head branch.  I've mostly got the RCU
pathwalk and managed dentry stuff correctly entangled.

However, I have a few questions:

 (1) Is it meant to be possible to unionmount over a mount tree rather than
     just a single mount?  I ask because do_lookup_union() calls
     __follow_mount().

 (2) When you open a file that exists in the lower layer but not the top
     layer, am I right in thinking that f_path points to the lower layer file?

 (3) If I'm correct in (2), I presume something must intercept fchown() and
     suchlike?

 (4) I presume IS_DIR_UNIONED() only gives true on the upper layer (the one
     that was mounted -o union)?

David

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 00/74] Union mounts version something or other
  2011-04-21 13:09   ` David Howells
@ 2011-04-24 21:48     ` Valerie Aurora
  0 siblings, 0 replies; 56+ messages in thread
From: Valerie Aurora @ 2011-04-24 21:48 UTC (permalink / raw)
  To: David Howells; +Cc: linux-fsdevel, linux-kernel, viro

On Thu, Apr 21, 2011 at 6:09 AM, David Howells <dhowells@redhat.com> wrote:
> Valerie Aurora <valerie.aurora@gmail.com> wrote:
>
>> That's never up to date, but I just fixed it.  The branch you want is
>> "ext2_cleanup".  Any branch with "+" at the front is a mistake from
>> trying to do a git push of a rebased branch.  I deleted it.
>
> Okay, I've got that pulled up to Linus's head branch.  I've mostly got the RCU
> pathwalk and managed dentry stuff correctly entangled.

Awesome, thanks!

> However, I have a few questions:
>
>  (1) Is it meant to be possible to unionmount over a mount tree rather than
>     just a single mount?  I ask because do_lookup_union() calls
>     __follow_mount().

It's meant to allow mounting over a mount tree, as long as all the
mounts are read-only.   I don't recall the difference between
__follow_mount() and follow_mount() at this moment, but that code may
be left over from the time that we could only union mount a single
mount.

Think about it carefully though, and check my comments, and run the
union mounts test suite - I got this wrong a few times and added a
number of tests to make sure the mount point case is right.

>  (2) When you open a file that exists in the lower layer but not the top
>     layer, am I right in thinking that f_path points to the lower layer file?

If the file is opened read-write, then it is copied up and the f_path
points to the upper layer file.  If the file is opened read-only, then
it is not copied up and the f_path points to the lower layer file.
So, yes, f_path points to the lower layer file.

>  (3) If I'm correct in (2), I presume something must intercept fchown() and
>     suchlike?

There's a thread somewhere on this, hang on...

http://kerneltrap.org/mailarchive/linux-fsdevel/2010/3/29/6897953/thread

Basically, if you open the file read-write and do an fchown() on it,
it works fine because the file is copied up on open.  If you open the
file read-only and fchown() it (yes, that's permitted) then in union
mounts you will get EPERM or EBADF (don't recall which).  Actually
implementing this requires copy-up after open, which requires atomic
update of the struct file pointer, which is ugly and painful and what
we were trying to avoid in the first place.

I discussed this with Al and Christoph and the consensus was along the
lines of, "What?  You can do that?  POSIX is so stupid.  Yes, we don't
care if union mounts returns EPERM in this case that no one thinks
should work anyway."

If you can find a way to do this cleanly, hurray.  Otherwise the
current code works for fchown/fchmod/utimensat already in the open
read-write case.

>  (4) I presume IS_DIR_UNIONED() only gives true on the upper layer (the one
>     that was mounted -o union)?

Yes.  It maybe should be renamed...

Thanks,

-VAL

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2011-04-24 21:48 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-23  1:58 [PATCH 00/74] Union mounts version something or other Valerie Aurora
2011-03-23  1:58 ` [PATCH 01/74] VFS: Comment follow_mount() and friends Valerie Aurora
2011-03-23  1:58 ` [PATCH 02/74] VFS: Make lookup_hash() return a struct path Valerie Aurora
2011-03-23  1:58 ` [PATCH 03/74] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
2011-03-23  1:58 ` [PATCH 04/74] Documentation: Fix trivial typo in filesystems/sharedsubtree.txt Valerie Aurora
2011-03-23  1:58 ` [PATCH 05/74] whiteout/NFSD: Don't return information about whiteouts to userspace Valerie Aurora
2011-03-23  1:58 ` [PATCH 06/74] whiteout: Define opaque inode flags and operations Valerie Aurora
2011-03-23  1:58 ` [PATCH 07/74] whiteout: Add vfs_whiteout() and whiteout inode operation Valerie Aurora
2011-03-23  1:58 ` [PATCH 08/74] whiteout: Allow removal of a directory with whiteouts Valerie Aurora
2011-03-23  1:58 ` [PATCH 09/74] whiteout: tmpfs whiteout support Valerie Aurora
2011-03-23  1:58 ` [PATCH 10/74] ext2: Add ext2_dirent_in_use() Valerie Aurora
2011-03-23  1:58 ` [PATCH 11/74] ext2: Split ext2_add_entry() from ext2_add_link() Valerie Aurora
2011-03-23  1:58 ` [PATCH 12/74] whiteout: ext2 whiteout support Valerie Aurora
2011-03-23  1:58 ` [PATCH 13/74] whiteout: jffs2 " Valerie Aurora
2011-03-23  1:58 ` [PATCH 14/74] fallthru: Basic fallthru definitions Valerie Aurora
2011-03-23  1:58 ` [PATCH 15/74] fallthru: ext2 fallthru support Valerie Aurora
2011-03-23  1:58 ` [PATCH 16/74] fallthru: tmpfs " Valerie Aurora
2011-03-23  1:58 ` [PATCH 17/74] fallthru: jffs2 " Valerie Aurora
2011-03-23  1:58 ` [PATCH 18/74] VFS: Add hard read-only users count to superblock Valerie Aurora
2011-03-23  1:58 ` [PATCH 19/74] VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors Valerie Aurora
2011-03-23  1:58 ` [PATCH 20/74] VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree() Valerie Aurora
2011-03-23  1:58 ` [PATCH 21/74] VFS: Add CL_NO_SLAVE " Valerie Aurora
2011-03-23  1:58 ` [PATCH 22/74] VFS: Add CL_MAKE_HARD_READONLY " Valerie Aurora
2011-03-23  1:58 ` [PATCH 23/74] union-mount: Union mounts documentation Valerie Aurora
2011-03-23  1:59 ` [PATCH 24/74] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
2011-03-23  1:59 ` [PATCH 25/74] union-mount: Add CONFIG_UNION_MOUNT option Valerie Aurora
2011-03-23  1:59 ` [PATCH 26/74] union-mount: Create union_stack structure Valerie Aurora
2011-03-23  1:59 ` [PATCH 27/74] union-mount: Add two superblock fields for union mounts Valerie Aurora
2011-03-23  1:59 ` [PATCH 28/74] union-mount: Add union_alloc() Valerie Aurora
2011-03-23  1:59 ` [PATCH 29/74] union-mount: Add union_find_dir() Valerie Aurora
2011-03-23  1:59 ` [PATCH 30/74] union-mount: Create d_free_unions() Valerie Aurora
2011-03-23  1:59 ` [PATCH 31/74] union-mount: Free union stack on removal of topmost dentry from dcache Valerie Aurora
2011-03-23  1:59 ` [PATCH 32/74] union-mount: Create union_add_dir() Valerie Aurora
2011-03-23  1:59 ` [PATCH 33/74] union-mount: Add union_create_topmost_dir() Valerie Aurora
2011-03-23  1:59 ` [PATCH 34/74] union-mount: Create IS_MNT_UNION() Valerie Aurora
2011-03-23  1:59 ` [PATCH 35/74] union-mount: Create needs_lookup_union() Valerie Aurora
2011-03-23  1:59 ` [PATCH 36/74] union-mount: Create check_topmost_union_mnt() Valerie Aurora
2011-03-23  1:59 ` [PATCH 37/74] union-mount: Add clone_union_tree() and put_union_sb() Valerie Aurora
2011-03-23  1:59 ` [PATCH 38/74] union-mount: Create build_root_union() Valerie Aurora
2011-03-23  1:59 ` [PATCH 39/74] union-mount: Create prepare_mnt_union() and cleanup_mnt_union() Valerie Aurora
2011-03-23  1:59 ` [PATCH 40/74] union-mount: Prevent improper union-related remounts Valerie Aurora
2011-03-23  1:59 ` [PATCH 41/74] union-mount: Prevent topmost file system from being mounted elsewhere Valerie Aurora
2011-03-23  1:59 ` [PATCH 42/74] union-mount: Prevent bind mounts of union mounts Valerie Aurora
2011-03-23  1:59 ` [PATCH 43/74] union-mount: Implement union mount Valerie Aurora
2011-03-23  1:59 ` [PATCH 44/74] union-mount: Temporarily disable some syscalls Valerie Aurora
2011-03-23  2:12 ` [PATCH 00/74] Union mounts version something or other Valerie Aurora
2011-03-24 13:43   ` Union mounts comparison with overlay file system prototype? Ric Wheeler
2011-03-25 11:38     ` Szeredi Miklos
2011-03-25 12:12       ` Ric Wheeler
2011-03-23  8:38 ` [PATCH 00/74] Union mounts version something or other Sedat Dilek
2011-03-24 22:40   ` Ben Hutchings
2011-03-25  2:32     ` Sedat Dilek
2011-03-30 14:30 ` David Howells
2011-04-01 16:48   ` Valerie Aurora
2011-04-21 13:09   ` David Howells
2011-04-24 21:48     ` Valerie Aurora

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).