[PATCH 0/7] overlay filesystem: request for inclusion

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7] overlay filesystem: request for inclusion
@ 2011-06-01 12:46 Miklos Szeredi
  2011-06-01 12:46 ` [PATCH 1/7] vfs: add i_op->open() Miklos Szeredi
                   ` (8 more replies)
  0 siblings, 9 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-01 12:46 UTC (permalink / raw)
  To: viro, torvalds
  Cc: linux-fsdevel, linux-kernel, akpm, apw, nbd, neilb, hramrach,
	jordipujolp, ezk, mszeredi

Linus, Al,

I'd like to ask for overlayfs to be merged into 3.1.

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v10

AFAICS the only barrier now is the upcoming changes to the VFS/open
interface.  Is there anything I can help to get those changes done?

Thanks,
Miklos

---
Andy Whitcroft (1):
      overlayfs: add statfs support

Erez Zadok (1):
      overlayfs: implement show_options

Miklos Szeredi (4):
      vfs: add i_op->open()
      vfs: export do_splice_direct() to modules
      vfs: introduce clone_private_mount()
      overlay filesystem

Neil Brown (1):
      overlay: overlay filesystem documentation

---
 Documentation/filesystems/overlayfs.txt |  167 ++++++++
 MAINTAINERS                             |    7 +
 fs/Kconfig                              |    1 +
 fs/Makefile                             |    1 +
 fs/namespace.c                          |   17 +
 fs/open.c                               |   76 +++--
 fs/overlayfs/Kconfig                    |    4 +
 fs/overlayfs/Makefile                   |    7 +
 fs/overlayfs/copy_up.c                  |  383 +++++++++++++++++++
 fs/overlayfs/dir.c                      |  607 ++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c                    |  375 ++++++++++++++++++
 fs/overlayfs/overlayfs.h                |   62 +++
 fs/overlayfs/readdir.c                  |  558 +++++++++++++++++++++++++++
 fs/overlayfs/super.c                    |  625 +++++++++++++++++++++++++++++++
 fs/splice.c                             |    1 +
 include/linux/fs.h                      |    2 +
 include/linux/mount.h                   |    3 +
 17 files changed, 2870 insertions(+), 26 deletions(-)
 create mode 100644 Documentation/filesystems/overlayfs.txt
 create mode 100644 fs/overlayfs/Kconfig
 create mode 100644 fs/overlayfs/Makefile
 create mode 100644 fs/overlayfs/copy_up.c
 create mode 100644 fs/overlayfs/dir.c
 create mode 100644 fs/overlayfs/inode.c
 create mode 100644 fs/overlayfs/overlayfs.h
 create mode 100644 fs/overlayfs/readdir.c
 create mode 100644 fs/overlayfs/super.c
------------------------------------------------------------------------------
Changes from v9 to v10

- prevent d_delete() from turning upperdentry negative (reported by
  Erez Zadok)

- show mount options in /proc/mounts and friends (patch by Erez Zadok)

- fix off-by-one error in readdir (reported by Jordi Pujol)

------------------------------------------------------------------------------
Changes from v8 to v9

- support xattr on tmpfs

- fix build after split-up

- fix remove after rename (reported by Jordi Pujol)

- fix rename failure case

------------------------------------------------------------------------------
Changes from v7 to v8:

- split overlayfs.c into smaller files

- fix locking for copy up (reported by Al Viro)

- locking analysis of copy up vs. directory rename added as a comment

- tested with lockdep, fixed one lock annotation

- other bug fixes

------------------------------------------------------------------------------
Changes from v6 to v7

- added patches from Felix Fietkau to fix deadlocks on jffs2

- optimized directory removal

- properly clean up after copy-up and other failures

------------------------------------------------------------------------------
Changes from v5 to v6

- optimize directory merging

  o use rbtree for weeding out duplicates

  o use a cursor for current position within the stream

- instead of f_op->open_other(), implement i_op->open()

- don't share inodes for non-directory dentries - for now.  I hope
  this can come back once RCU lookup code has settled.

- misc bug fixes

------------------------------------------------------------------------------
Changes from v4 to v5

- fix copying up if fs doesn't support xattrs (Andy Whitcroft)

- clone mounts to be used internally to access the underlying
  filesystems

------------------------------------------------------------------------------
Changes from v3 to v4

- export security_inode_permission to allow overlayfs to be modular
  (Andy Whitcroft)

- add statfs support (Andy Whitcroft)

- change BUG_ON to WARN_ON

- Revert "vfs: add flag to allow rename to same inode", instead
  introduce s_op->is_same_inode()

- overlayfs: fix rename to self

- fix whiteout after rename

------------------------------------------------------------------------------
Changes from v2 to v3

 - Minimal remount support.  As overlayfs reflects the 'readonly'
   mount status in write-access to the upper filesystem, we must
   handle remount and either drop or take write access when the ro
   status changes. (NeilBrown)

 - Use correct seek function for directories.  It is incorrect to call
   generic_llseek_file on a file from a different filesystem.  For
   that we must use the seek function that the filesystem defines,
   which is called by vfs_llseek.  Also, we only want to seek the
   realfile when is_real is true.  Otherwise we just want to update
   our own f_pos pointer, so use generic_llseek_file for
   that. (NeilBrown)

 - Initialise is_real before use.  The previous patch can use
   od->is_real before it is properly initialised is llseek is called
   before readdir.  So factor out the initialisation of is_real and
   call it from both readdir and llseek when f_pos is 0. (NeilBrown)

 - Rename ovl_fill_cache to ovl_dir_read (NeilBrown)

 - Tiny optimisation in open_other handling (NeilBrown)

 - Assorted updates to Documentation/filesystems/overlayfs.txt (NeilBrown)

 - Make copy-up work for >=4G files, make it killable during copy-up.
   Need to fix recovery after a failed/interrupted copy-up.

 - Store and reference upper/lower dentries in overlay dentries.
   Store and reference upper/lower vfsmounts in overlay superblock.

 - Add necessary barriers for setting upper dentry in copyup and for
   retrieving upper dentry locklessly.

 - Make sure the right file is used for directory fsync() after
   copy-up.

 - Add locking to ovl_dir_llseek() to prevent concurrent call of
   ovl_dir_reset() with ovl_dir_read().

 - Get rid of ovl_dentry_iput().  The VFS doesn't provide enough
   locking for this function that the contents of ->d_fsdata could be
   safely updated.

 - After copying up a non-directory unhash the dentry.  This way the
   lower dentry ref, which is no longer necessary, can go away.  This
   revealed a use-after-free bug in truncate handling in
   fs/namei.c:finish_open().

 - Fix if a copy-up happens between the follow_linka the put_link
   calls.

 - Replace some WARN_ONs with BUG_ON.  Some things just _really_
   shouldn't happen.

 - Extract common code from ovl_unlink and ovl_rmdir to a helper
   function.

 - After unlink and rmdir unhash the dentry.  This will get rid of the
   lower and upper dentry references after there are no more users of
   the deleted dentry.  This is a safe replacement for the removed
   ->d_iput() functionality.

 - Added checks to unlink, rmdir and rename to verify that the
   parent-child relationship in the upper filesystem matches that of
   the overlay.  This is necessary to prevent crash and/or corruption
   if the upper filesystem topology is being modified while part of
   the overlay.

 - Optimize checking whiteout and opaque attributes.

 - Optimize copy-up on truncate: don't copy up whole file before
   truncating

 - Misc bug fixes

------------------------------------------------------------------------------
Changes from v1 to v2

 - rename "hybrid union filesystem" to "overlay filesystem" or overlayfs

 - added documentation written by Neil

 - correct st_dev for directories (reported by Neil)

 - use getattr() to get attributes from the underlying filesystems,
   this means that now an overlay filesystem itself can be the lower,
   read-only layer of another overlay

 - listxattr filters out private extended attributes

 - get write ref on the upper layer on mount unless the overlay
   itself is mounted read-only

 - raise capabilities for copy up, dealing with whiteouts and opaque
   directories.  Now the overlay works for non-root users as well

 - "rm -rf" didn't work correctly in all cases if the directory was
   copied up between opendir and the first readdir, this is now fixed
   (and the directory operations consolidated)

 - simplified copy up, this broke optimization for truncate and
   open(O_TRUNC) (now file is copied up to be immediately truncated,
   will fix)

 - st_nlink for merged directories set to 1, this is an "illegal"
   value that normal filesystems never have but some use it to
   indicate that the number of subdirectories is unknown.  Utilities
   (find, ...) seem to tolerate this well.

 - misc fixes I forgot about

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 1/7] vfs: add i_op->open()
  2011-06-01 12:46 [PATCH 0/7] overlay filesystem: request for inclusion Miklos Szeredi
@ 2011-06-01 12:46 ` Miklos Szeredi
  2011-06-01 12:46 ` [PATCH 2/7] vfs: export do_splice_direct() to modules Miklos Szeredi
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-01 12:46 UTC (permalink / raw)
  To: viro, torvalds
  Cc: linux-fsdevel, linux-kernel, akpm, apw, nbd, neilb, hramrach,
	jordipujolp, ezk, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

Add a new inode operation i_op->open().  This is for stacked
filesystems that want to return a struct file from a different
filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/open.c          |   76 ++++++++++++++++++++++++++++++++++------------------
 include/linux/fs.h |    2 +
 2 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index b52cf01..84fa16a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -666,8 +666,7 @@ static inline int __get_file_write_access(struct inode *inode,
 	return error;
 }
 
-static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
-					struct file *f,
+static struct file *__dentry_open(struct path *path, struct file *f,
 					int (*open)(struct inode *, struct file *),
 					const struct cred *cred)
 {
@@ -675,15 +674,16 @@ static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
 	struct inode *inode;
 	int error;
 
+	path_get(path);
 	f->f_mode = OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
 				FMODE_PREAD | FMODE_PWRITE;
 
 	if (unlikely(f->f_flags & O_PATH))
 		f->f_mode = FMODE_PATH;
 
-	inode = dentry->d_inode;
+	inode = path->dentry->d_inode;
 	if (f->f_mode & FMODE_WRITE) {
-		error = __get_file_write_access(inode, mnt);
+		error = __get_file_write_access(inode, path->mnt);
 		if (error)
 			goto cleanup_file;
 		if (!special_file(inode->i_mode))
@@ -691,8 +691,7 @@ static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
 	}
 
 	f->f_mapping = inode->i_mapping;
-	f->f_path.dentry = dentry;
-	f->f_path.mnt = mnt;
+	f->f_path = *path;
 	f->f_pos = 0;
 	file_sb_list_add(f, inode->i_sb);
 
@@ -745,7 +744,7 @@ cleanup_all:
 			 * here, so just reset the state.
 			 */
 			file_reset_write(f);
-			mnt_drop_write(mnt);
+			mnt_drop_write(path->mnt);
 		}
 	}
 	file_sb_list_del(f);
@@ -753,8 +752,7 @@ cleanup_all:
 	f->f_path.mnt = NULL;
 cleanup_file:
 	put_filp(f);
-	dput(dentry);
-	mntput(mnt);
+	path_put(path);
 	return ERR_PTR(error);
 }
 
@@ -780,14 +778,14 @@ cleanup_file:
 struct file *lookup_instantiate_filp(struct nameidata *nd, struct dentry *dentry,
 		int (*open)(struct inode *, struct file *))
 {
+	struct path path = { .dentry = dentry, .mnt = nd->path.mnt };
 	const struct cred *cred = current_cred();
 
 	if (IS_ERR(nd->intent.open.file))
 		goto out;
 	if (IS_ERR(dentry))
 		goto out_err;
-	nd->intent.open.file = __dentry_open(dget(dentry), mntget(nd->path.mnt),
-					     nd->intent.open.file,
+	nd->intent.open.file = __dentry_open(&path, nd->intent.open.file,
 					     open, cred);
 out:
 	return nd->intent.open.file;
@@ -816,10 +814,17 @@ struct file *nameidata_to_filp(struct nameidata *nd)
 
 	/* Has the filesystem initialised the file for us? */
 	if (filp->f_path.dentry == NULL) {
-		path_get(&nd->path);
-		filp = __dentry_open(nd->path.dentry, nd->path.mnt, filp,
-				     NULL, cred);
+		struct inode *inode = nd->path.dentry->d_inode;
+
+		if (inode->i_op->open) {
+			int flags = filp->f_flags;
+			put_filp(filp);
+			filp = inode->i_op->open(nd->path.dentry, flags, cred);
+		} else {
+			filp = __dentry_open(&nd->path, filp, NULL, cred);
+		}
 	}
+
 	return filp;
 }
 
@@ -830,26 +835,45 @@ struct file *nameidata_to_filp(struct nameidata *nd)
 struct file *dentry_open(struct dentry *dentry, struct vfsmount *mnt, int flags,
 			 const struct cred *cred)
 {
-	int error;
-	struct file *f;
-
-	validate_creds(cred);
+	struct path path = { .dentry = dentry, .mnt = mnt };
+	struct file *ret;
 
 	/* We must always pass in a valid mount pointer. */
 	BUG_ON(!mnt);
 
-	error = -ENFILE;
+	ret = vfs_open(&path, flags, cred);
+	path_put(&path);
+
+	return ret;
+}
+EXPORT_SYMBOL(dentry_open);
+
+/**
+ * vfs_open - open the file at the given path
+ * @path: path to open
+ * @flags: open flags
+ * @cred: credentials to use
+ *
+ * Open the file.  If successful, the returned file will have acquired
+ * an additional reference for path.
+ */
+struct file *vfs_open(struct path *path, int flags, const struct cred *cred)
+{
+	struct file *f;
+	struct inode *inode = path->dentry->d_inode;
+
+	validate_creds(cred);
+
+	if (inode->i_op->open)
+		return inode->i_op->open(path->dentry, flags, cred);
 	f = get_empty_filp();
-	if (f == NULL) {
-		dput(dentry);
-		mntput(mnt);
-		return ERR_PTR(error);
-	}
+	if (f == NULL)
+		return ERR_PTR(-ENFILE);
 
 	f->f_flags = flags;
-	return __dentry_open(dentry, mnt, f, NULL, cred);
+	return __dentry_open(path, f, NULL, cred);
 }
-EXPORT_SYMBOL(dentry_open);
+EXPORT_SYMBOL(vfs_open);
 
 static void __put_unused_fd(struct files_struct *files, unsigned int fd)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c55d6b7..04eef4e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1600,6 +1600,7 @@ struct inode_operations {
 	void (*truncate_range)(struct inode *, loff_t, loff_t);
 	int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
 		      u64 len);
+	struct file *(*open)(struct dentry *, int flags, const struct cred *);
 } ____cacheline_aligned;
 
 struct seq_file;
@@ -1994,6 +1995,7 @@ extern long do_sys_open(int dfd, const char __user *filename, int flags,
 extern struct file *filp_open(const char *, int, int);
 extern struct file *file_open_root(struct dentry *, struct vfsmount *,
 				   const char *, int);
+extern struct file *vfs_open(struct path *, int flags, const struct cred *);
 extern struct file * dentry_open(struct dentry *, struct vfsmount *, int,
 				 const struct cred *);
 extern int filp_close(struct file *, fl_owner_t id);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 2/7] vfs: export do_splice_direct() to modules
  2011-06-01 12:46 [PATCH 0/7] overlay filesystem: request for inclusion Miklos Szeredi
  2011-06-01 12:46 ` [PATCH 1/7] vfs: add i_op->open() Miklos Szeredi
@ 2011-06-01 12:46 ` Miklos Szeredi
  2011-06-01 12:46 ` [PATCH 3/7] vfs: introduce clone_private_mount() Miklos Szeredi
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-01 12:46 UTC (permalink / raw)
  To: viro, torvalds
  Cc: linux-fsdevel, linux-kernel, akpm, apw, nbd, neilb, hramrach,
	jordipujolp, ezk, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

Export do_splice_direct() to modules.  Needed by overlay filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/splice.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index aa866d3..bd730eb 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1300,6 +1300,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
 
 	return ret;
 }
+EXPORT_SYMBOL(do_splice_direct);
 
 static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 			       struct pipe_inode_info *opipe,
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 3/7] vfs: introduce clone_private_mount()
  2011-06-01 12:46 [PATCH 0/7] overlay filesystem: request for inclusion Miklos Szeredi
  2011-06-01 12:46 ` [PATCH 1/7] vfs: add i_op->open() Miklos Szeredi
  2011-06-01 12:46 ` [PATCH 2/7] vfs: export do_splice_direct() to modules Miklos Szeredi
@ 2011-06-01 12:46 ` Miklos Szeredi
  2011-06-01 12:46 ` [PATCH 4/7] overlay filesystem Miklos Szeredi
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-01 12:46 UTC (permalink / raw)
  To: viro, torvalds
  Cc: linux-fsdevel, linux-kernel, akpm, apw, nbd, neilb, hramrach,
	jordipujolp, ezk, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

Overlayfs needs a private clone of the mount, so create a function for
this and export to modules.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namespace.c        |   17 +++++++++++++++++
 include/linux/mount.h |    3 +++
 2 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index fe59bd1..79bc9a7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1494,6 +1494,23 @@ void drop_collected_mounts(struct vfsmount *mnt)
 	release_mounts(&umount_list);
 }
 
+struct vfsmount *clone_private_mount(struct path *path)
+{
+	struct vfsmount *mnt;
+
+	if (IS_MNT_UNBINDABLE(path->mnt))
+		return ERR_PTR(-EINVAL);
+
+	down_read(&namespace_sem);
+	mnt = clone_mnt(path->mnt, path->dentry, CL_PRIVATE);
+	up_read(&namespace_sem);
+	if (!mnt)
+		return ERR_PTR(-ENOMEM);
+
+	return mnt;
+}
+EXPORT_SYMBOL_GPL(clone_private_mount);
+
 int iterate_mounts(int (*f)(struct vfsmount *, void *), void *arg,
 		   struct vfsmount *root)
 {
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 604f122..44e9bf4 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -100,6 +100,9 @@ extern void mnt_pin(struct vfsmount *mnt);
 extern void mnt_unpin(struct vfsmount *mnt);
 extern int __mnt_is_readonly(struct vfsmount *mnt);
 
+struct path;
+extern struct vfsmount *clone_private_mount(struct path *path);
+
 extern struct vfsmount *do_kern_mount(const char *fstype, int flags,
 				      const char *name, void *data);
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 4/7] overlay filesystem
  2011-06-01 12:46 [PATCH 0/7] overlay filesystem: request for inclusion Miklos Szeredi
                   ` (2 preceding siblings ...)
  2011-06-01 12:46 ` [PATCH 3/7] vfs: introduce clone_private_mount() Miklos Szeredi
@ 2011-06-01 12:46 ` Miklos Szeredi
  2011-06-01 12:46 ` [PATCH 5/7] overlayfs: add statfs support Miklos Szeredi
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-01 12:46 UTC (permalink / raw)
  To: viro, torvalds
  Cc: linux-fsdevel, linux-kernel, akpm, apw, nbd, neilb, hramrach,
	jordipujolp, ezk, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

Overlayfs allows one, usually read-write, directory tree to be
overlaid onto another, read-only directory tree.  All modifications
go to the upper, writable layer.

This type of mechanism is most often used for live CDs but there's a
wide variety of other uses.

The implementation differs from other "union filesystem"
implementations in that after a file is opened all operations go
directly to the underlying, lower or upper, filesystems.  This
simplifies the implementation and allows native performance in these
cases.

The dentry tree is duplicated from the underlying filesystems, this
enables fast cached lookups without adding special support into the
VFS.  This uses slightly more memory than union mounts, but dentries
are relatively small.

Currently inodes are duplicated as well, but it is a possible
optimization to share inodes for non-directories.

Opening non directories results in the open forwarded to the
underlying filesystem.  This makes the behavior very similar to union
mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
descriptors).

Usage:

  mount -t overlay -olowerdir=/lower,upperdir=/upper overlay /mnt

Supported:

 - all operations

Missing:

 - ensure that filesystems part of the overlay are not modified outside
   the overlay

The following cotributions have been folded into this patch:

Neil Brown <neilb@suse.de>:
 - minimal remount support
 - use correct seek function for directories
 - initialise is_real before use
 - rename ovl_fill_cache to ovl_dir_read

Felix Fietkau <nbd@openwrt.org>:
 - fix a deadlock in ovl_dir_read_merged
 - fix a deadlock in ovl_remove_whiteouts

Erez Zadok <ezk@fsl.cs.sunysb.edu>
 - fix cleanup after WARN_ON

Also thanks to the following people for testing and reporting bugs:

  Jordi Pujol <jordipujolp@gmail.com>
  Andy Whitcroft <apw@canonical.com>
  Michal Suchanek <hramrach@centrum.cz>
  Felix Fietkau <nbd@openwrt.org>
  Erez Zadok <ezk@fsl.cs.sunysb.edu>
  Randy Dunlap <rdunlap@xenotime.net>

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/Kconfig               |    1 +
 fs/Makefile              |    1 +
 fs/overlayfs/Kconfig     |    4 +
 fs/overlayfs/Makefile    |    7 +
 fs/overlayfs/copy_up.c   |  383 +++++++++++++++++++++++++++++
 fs/overlayfs/dir.c       |  607 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c     |  375 ++++++++++++++++++++++++++++
 fs/overlayfs/overlayfs.h |   62 +++++
 fs/overlayfs/readdir.c   |  558 ++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/super.c     |  582 ++++++++++++++++++++++++++++++++++++++++++++
 10 files changed, 2580 insertions(+), 0 deletions(-)
 create mode 100644 fs/overlayfs/Kconfig
 create mode 100644 fs/overlayfs/Makefile
 create mode 100644 fs/overlayfs/copy_up.c
 create mode 100644 fs/overlayfs/dir.c
 create mode 100644 fs/overlayfs/inode.c
 create mode 100644 fs/overlayfs/overlayfs.h
 create mode 100644 fs/overlayfs/readdir.c
 create mode 100644 fs/overlayfs/super.c

diff --git a/fs/Kconfig b/fs/Kconfig
index 19891aa..3badf8f 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -63,6 +63,7 @@ source "fs/quota/Kconfig"
 
 source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
+source "fs/overlayfs/Kconfig"
 
 config CUSE
 	tristate "Character device in Userspace support"
diff --git a/fs/Makefile b/fs/Makefile
index fb68c2b..ac7402a 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -105,6 +105,7 @@ obj-$(CONFIG_QNX4FS_FS)		+= qnx4/
 obj-$(CONFIG_AUTOFS4_FS)	+= autofs4/
 obj-$(CONFIG_ADFS_FS)		+= adfs/
 obj-$(CONFIG_FUSE_FS)		+= fuse/
+obj-$(CONFIG_OVERLAYFS_FS)	+= overlayfs/
 obj-$(CONFIG_UDF_FS)		+= udf/
 obj-$(CONFIG_SUN_OPENPROMFS)	+= openpromfs/
 obj-$(CONFIG_OMFS_FS)		+= omfs/
diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
new file mode 100644
index 0000000..c4517da
--- /dev/null
+++ b/fs/overlayfs/Kconfig
@@ -0,0 +1,4 @@
+config OVERLAYFS_FS
+	tristate "Overlay filesystem support"
+	help
+	  Add support for overlay filesystem.
diff --git a/fs/overlayfs/Makefile b/fs/overlayfs/Makefile
new file mode 100644
index 0000000..8f91889
--- /dev/null
+++ b/fs/overlayfs/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the overlay filesystem.
+#
+
+obj-$(CONFIG_OVERLAYFS_FS) += overlayfs.o
+
+overlayfs-objs := super.o inode.o dir.o readdir.o copy_up.o
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
new file mode 100644
index 0000000..308a80a
--- /dev/null
+++ b/fs/overlayfs/copy_up.c
@@ -0,0 +1,383 @@
+/*
+ *
+ * Copyright (C) 2011 Novell Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/splice.h>
+#include <linux/xattr.h>
+#include <linux/security.h>
+#include <linux/uaccess.h>
+#include "overlayfs.h"
+
+#define OVL_COPY_UP_CHUNK_SIZE (1 << 20)
+
+static int ovl_copy_up_xattr(struct dentry *old, struct dentry *new)
+{
+	ssize_t list_size, size;
+	char *buf, *name, *value;
+	int error;
+
+	if (!old->d_inode->i_op->getxattr ||
+	    !new->d_inode->i_op->getxattr)
+		return 0;
+
+	list_size = vfs_listxattr(old, NULL, 0);
+	if (list_size <= 0) {
+		if (list_size == -EOPNOTSUPP)
+			return 0;
+		return list_size;
+	}
+
+	buf = kzalloc(list_size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	error = -ENOMEM;
+	value = kmalloc(XATTR_SIZE_MAX, GFP_KERNEL);
+	if (!value)
+		goto out;
+
+	list_size = vfs_listxattr(old, buf, list_size);
+	if (list_size <= 0) {
+		error = list_size;
+		goto out_free_value;
+	}
+
+	for (name = buf; name < (buf + list_size); name += strlen(name) + 1) {
+		size = vfs_getxattr(old, name, value, XATTR_SIZE_MAX);
+		if (size <= 0) {
+			error = size;
+			goto out_free_value;
+		}
+		error = vfs_setxattr(new, name, value, size, 0);
+		if (error)
+			goto out_free_value;
+	}
+
+out_free_value:
+	kfree(value);
+out:
+	kfree(buf);
+	return error;
+}
+
+static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
+{
+	struct file *old_file;
+	struct file *new_file;
+	int error = 0;
+
+	if (len == 0)
+		return 0;
+
+	old_file = vfs_open(old, O_RDONLY, current_cred());
+	if (IS_ERR(old_file))
+		return PTR_ERR(old_file);
+
+	new_file = vfs_open(new, O_WRONLY, current_cred());
+	if (IS_ERR(new_file)) {
+		error = PTR_ERR(new_file);
+		goto out_fput;
+	}
+
+	/* FIXME: copy up sparse files efficiently */
+	while (len) {
+		loff_t offset = new_file->f_pos;
+		size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
+		long bytes;
+
+		if (len < this_len)
+			this_len = len;
+
+		if (signal_pending_state(TASK_KILLABLE, current)) {
+			error = -EINTR;
+			break;
+		}
+
+		bytes = do_splice_direct(old_file, &offset, new_file, this_len,
+				 SPLICE_F_MOVE);
+		if (bytes <= 0) {
+			error = bytes;
+			break;
+		}
+
+		len -= bytes;
+	}
+
+	fput(new_file);
+out_fput:
+	fput(old_file);
+	return error;
+}
+
+static char *ovl_read_symlink(struct dentry *realdentry)
+{
+	int res;
+	char *buf;
+	struct inode *inode = realdentry->d_inode;
+	mm_segment_t old_fs;
+
+	res = -EINVAL;
+	if (!inode->i_op->readlink)
+		goto err;
+
+	res = -ENOMEM;
+	buf = (char *) __get_free_page(GFP_KERNEL);
+	if (!buf)
+		goto err;
+
+	old_fs = get_fs();
+	set_fs(get_ds());
+	/* The cast to a user pointer is valid due to the set_fs() */
+	res = inode->i_op->readlink(realdentry,
+				    (char __user *)buf, PAGE_SIZE - 1);
+	set_fs(old_fs);
+	if (res < 0) {
+		free_page((unsigned long) buf);
+		goto err;
+	}
+	buf[res] = '\0';
+
+	return buf;
+
+err:
+	return ERR_PTR(res);
+}
+
+static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
+{
+	struct iattr attr = {
+		.ia_valid = ATTR_ATIME | ATTR_MTIME | ATTR_ATIME_SET | ATTR_MTIME_SET,
+		.ia_atime = stat->atime,
+		.ia_mtime = stat->mtime,
+	};
+
+	return notify_change(upperdentry, &attr);
+}
+
+static int ovl_set_mode(struct dentry *upperdentry, umode_t mode)
+{
+	struct iattr attr = {
+		.ia_valid = ATTR_MODE,
+		.ia_mode = mode,
+	};
+
+	return notify_change(upperdentry, &attr);
+}
+
+static int ovl_copy_up_locked(struct dentry *upperdir, struct dentry *dentry,
+			      struct path *lowerpath, struct kstat *stat,
+			      const char *link)
+{
+	int err;
+	struct path newpath;
+	umode_t mode = stat->mode;
+
+	/* Can't properly set mode on creation because of the umask */
+	stat->mode &= S_IFMT;
+
+	ovl_path_upper(dentry, &newpath);
+	WARN_ON(newpath.dentry);
+	newpath.dentry = ovl_upper_create(upperdir, dentry, stat, link);
+	if (IS_ERR(newpath.dentry))
+		return PTR_ERR(newpath.dentry);
+
+	if (S_ISREG(stat->mode)) {
+		err = ovl_copy_up_data(lowerpath, &newpath, stat->size);
+		if (err)
+			goto err_remove;
+	}
+
+	err = ovl_copy_up_xattr(lowerpath->dentry, newpath.dentry);
+	if (err)
+		goto err_remove;
+
+	mutex_lock(&newpath.dentry->d_inode->i_mutex);
+	if (!S_ISLNK(stat->mode))
+		err = ovl_set_mode(newpath.dentry, mode);
+	if (!err)
+		err = ovl_set_timestamps(newpath.dentry, stat);
+	mutex_unlock(&newpath.dentry->d_inode->i_mutex);
+	if (err)
+		goto err_remove;
+
+	ovl_dentry_update(dentry, newpath.dentry);
+
+	/*
+	 * Easiest way to get rid of the lower dentry reference is to
+	 * drop this dentry.  This is neither needed nor possible for
+	 * directories.
+	 */
+	if (!S_ISDIR(stat->mode))
+		d_drop(dentry);
+
+	return 0;
+
+err_remove:
+	if (S_ISDIR(stat->mode))
+		vfs_rmdir(upperdir->d_inode, newpath.dentry);
+	else
+		vfs_unlink(upperdir->d_inode, newpath.dentry);
+
+	dput(newpath.dentry);
+
+	return err;
+}
+
+/*
+ * Copy up a single dentry
+ *
+ * Directory renames only allowed on "pure upper" (already created on
+ * upper filesystem, never copied up).  Directories which are on lower or
+ * are merged may not be renamed.  For these -EXDEV is returned and
+ * userspace has to deal with it.  This means, when copying up a
+ * directory we can rely on it and ancestors being stable.
+ *
+ * Non-directory renames start with copy up of source if necessary.  The
+ * actual rename will only proceed once the copy up was successful.  Copy
+ * up uses upper parent i_mutex for exclusion.  Since rename can change
+ * d_parent it is possible that the copy up will lock the old parent.  At
+ * that point the file will have already been copied up anyway.
+ */
+static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
+			   struct path *lowerpath, struct kstat *stat)
+{
+	int err;
+	struct kstat pstat;
+	struct path parentpath;
+	struct dentry *upperdir;
+	const struct cred *old_cred;
+	struct cred *override_cred;
+	char *link = NULL;
+
+	ovl_path_upper(parent, &parentpath);
+	upperdir = parentpath.dentry;
+
+	err = vfs_getattr(parentpath.mnt, parentpath.dentry, &pstat);
+	if (err)
+		return err;
+
+	if (S_ISLNK(stat->mode)) {
+		link = ovl_read_symlink(lowerpath->dentry);
+		if (IS_ERR(link))
+			return PTR_ERR(link);
+	}
+
+	err = -ENOMEM;
+	override_cred = prepare_creds();
+	if (!override_cred)
+		goto out_free_link;
+
+	override_cred->fsuid = stat->uid;
+	override_cred->fsgid = stat->gid;
+	/*
+	 * CAP_SYS_ADMIN for copying up extended attributes
+	 * CAP_DAC_OVERRIDE for create
+	 * CAP_FOWNER for chmod, timestamp update
+	 * CAP_FSETID for chmod
+	 * CAP_MKNOD for mknod
+	 */
+	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+	cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
+	cap_raise(override_cred->cap_effective, CAP_FOWNER);
+	cap_raise(override_cred->cap_effective, CAP_FSETID);
+	cap_raise(override_cred->cap_effective, CAP_MKNOD);
+	old_cred = override_creds(override_cred);
+
+	mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
+	if (ovl_path_type(dentry) != OVL_PATH_LOWER) {
+		err = 0;
+	} else {
+		err = ovl_copy_up_locked(upperdir, dentry, lowerpath,
+					 stat, link);
+		if (!err) {
+			/* Restore timestamps on parent (best effort) */
+			ovl_set_timestamps(upperdir, &pstat);
+		}
+	}
+
+	mutex_unlock(&upperdir->d_inode->i_mutex);
+
+	revert_creds(old_cred);
+	put_cred(override_cred);
+
+out_free_link:
+	if (link)
+		free_page((unsigned long) link);
+
+	return err;
+}
+
+int ovl_copy_up(struct dentry *dentry)
+{
+	int err;
+
+	err = 0;
+	while (!err) {
+		struct dentry *next;
+		struct dentry *parent;
+		struct path lowerpath;
+		struct kstat stat;
+		enum ovl_path_type type = ovl_path_type(dentry);
+
+		if (type != OVL_PATH_LOWER)
+			break;
+
+		next = dget(dentry);
+		/* find the topmost dentry not yet copied up */
+		for (;;) {
+			parent = dget_parent(next);
+
+			type = ovl_path_type(parent);
+			if (type != OVL_PATH_LOWER)
+				break;
+
+			dput(next);
+			next = parent;
+		}
+
+		ovl_path_lower(next, &lowerpath);
+		err = vfs_getattr(lowerpath.mnt, lowerpath.dentry, &stat);
+		if (!err)
+			err = ovl_copy_up_one(parent, next, &lowerpath, &stat);
+
+		dput(parent);
+		dput(next);
+	}
+
+	return err;
+}
+
+/* Optimize by not copying up the file first and truncating later */
+int ovl_copy_up_truncate(struct dentry *dentry, loff_t size)
+{
+	int err;
+	struct kstat stat;
+	struct path lowerpath;
+	struct dentry *parent = dget_parent(dentry);
+
+	err = ovl_copy_up(parent);
+	if (err)
+		goto out_dput_parent;
+
+	ovl_path_lower(dentry, &lowerpath);
+	err = vfs_getattr(lowerpath.mnt, lowerpath.dentry, &stat);
+	if (err)
+		goto out_dput_parent;
+
+	if (size < stat.size)
+		stat.size = size;
+
+	err = ovl_copy_up_one(parent, dentry, &lowerpath, &stat);
+
+out_dput_parent:
+	dput(parent);
+	return err;
+}
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
new file mode 100644
index 0000000..966db6b
--- /dev/null
+++ b/fs/overlayfs/dir.c
@@ -0,0 +1,607 @@
+/*
+ *
+ * Copyright (C) 2011 Novell Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/xattr.h>
+#include <linux/security.h>
+#include "overlayfs.h"
+
+static const char *ovl_whiteout_symlink = "(overlay-whiteout)";
+
+static struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
+				 struct nameidata *nd)
+{
+	int err = ovl_do_lookup(dentry);
+
+	if (err)
+		return ERR_PTR(err);
+
+	return NULL;
+}
+
+static int ovl_whiteout(struct dentry *upperdir, struct dentry *dentry)
+{
+	int err;
+	struct dentry *newdentry;
+	const struct cred *old_cred;
+	struct cred *override_cred;
+
+	/* FIXME: recheck lower dentry to see if whiteout is really needed */
+
+	err = -ENOMEM;
+	override_cred = prepare_creds();
+	if (!override_cred)
+		goto out;
+
+	/*
+	 * CAP_SYS_ADMIN for setxattr
+	 * CAP_DAC_OVERRIDE for symlink creation
+	 * CAP_FOWNER for unlink in sticky directory
+	 */
+	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+	cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
+	cap_raise(override_cred->cap_effective, CAP_FOWNER);
+	override_cred->fsuid = 0;
+	override_cred->fsgid = 0;
+	old_cred = override_creds(override_cred);
+
+	newdentry = lookup_one_len(dentry->d_name.name, upperdir,
+				   dentry->d_name.len);
+	err = PTR_ERR(newdentry);
+	if (IS_ERR(newdentry))
+		goto out_put_cred;
+
+	/* Just been removed within the same locked region */
+	WARN_ON(newdentry->d_inode);
+
+	err = vfs_symlink(upperdir->d_inode, newdentry, ovl_whiteout_symlink);
+	if (err)
+		goto out_dput;
+
+	ovl_dentry_version_inc(dentry->d_parent);
+
+	err = vfs_setxattr(newdentry, ovl_whiteout_xattr, "y", 1, 0);
+	if (err)
+		vfs_unlink(upperdir->d_inode, newdentry);
+
+out_dput:
+	dput(newdentry);
+out_put_cred:
+	revert_creds(old_cred);
+	put_cred(override_cred);
+out:
+	if (err) {
+		/*
+		 * There's no way to recover from failure to whiteout.
+		 * What should we do?  Log a big fat error and... ?
+		 */
+		printk(KERN_ERR "overlayfs: ERROR - failed to whiteout '%s'\n",
+		       dentry->d_name.name);
+	}
+
+	return err;
+}
+
+static struct dentry *ovl_lookup_create(struct dentry *upperdir,
+					struct dentry *template)
+{
+	int err;
+	struct dentry *newdentry;
+	struct qstr *name = &template->d_name;
+
+	newdentry = lookup_one_len(name->name, upperdir, name->len);
+	if (IS_ERR(newdentry))
+		return newdentry;
+
+	if (newdentry->d_inode) {
+		const struct cred *old_cred;
+		struct cred *override_cred;
+
+		/* No need to check whiteout if lower parent is non-existent */
+		err = -EEXIST;
+		if (!ovl_dentry_lower(template->d_parent))
+			goto out_dput;
+
+		if (!S_ISLNK(newdentry->d_inode->i_mode))
+			goto out_dput;
+
+		err = -ENOMEM;
+		override_cred = prepare_creds();
+		if (!override_cred)
+			goto out_dput;
+
+		/*
+		 * CAP_SYS_ADMIN for getxattr
+		 * CAP_FOWNER for unlink in sticky directory
+		 */
+		cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+		cap_raise(override_cred->cap_effective, CAP_FOWNER);
+		old_cred = override_creds(override_cred);
+
+		err = -EEXIST;
+		if (ovl_is_whiteout(newdentry))
+			err = vfs_unlink(upperdir->d_inode, newdentry);
+
+		revert_creds(old_cred);
+		put_cred(override_cred);
+		if (err)
+			goto out_dput;
+
+		dput(newdentry);
+		newdentry = lookup_one_len(name->name, upperdir, name->len);
+		if (IS_ERR(newdentry)) {
+			ovl_whiteout(upperdir, template);
+			return newdentry;
+		}
+
+		/*
+		 * Whiteout just been successfully removed, parent
+		 * i_mutex is still held, there's no way the lookup
+		 * could return positive.
+		 */
+		WARN_ON(newdentry->d_inode);
+	}
+
+	return newdentry;
+
+out_dput:
+	dput(newdentry);
+	return ERR_PTR(err);
+}
+
+struct dentry *ovl_upper_create(struct dentry *upperdir, struct dentry *dentry,
+				struct kstat *stat, const char *link)
+{
+	int err;
+	struct dentry *newdentry;
+	struct inode *dir = upperdir->d_inode;
+
+	newdentry = ovl_lookup_create(upperdir, dentry);
+	if (IS_ERR(newdentry))
+		goto out;
+
+	switch (stat->mode & S_IFMT) {
+	case S_IFREG:
+		err = vfs_create(dir, newdentry, stat->mode, NULL);
+		break;
+
+	case S_IFDIR:
+		err = vfs_mkdir(dir, newdentry, stat->mode);
+		break;
+
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFIFO:
+	case S_IFSOCK:
+		err = vfs_mknod(dir, newdentry, stat->mode, stat->rdev);
+		break;
+
+	case S_IFLNK:
+		err = vfs_symlink(dir, newdentry, link);
+		break;
+
+	default:
+		err = -EPERM;
+	}
+	if (err) {
+		if (ovl_dentry_is_opaque(dentry))
+			ovl_whiteout(upperdir, dentry);
+		dput(newdentry);
+		newdentry = ERR_PTR(err);
+	} else if (WARN_ON(!newdentry->d_inode)) {
+		/*
+		 * Not quite sure if non-instantiated dentry is legal or not.
+		 * VFS doesn't seem to care so check and warn here.
+		 */
+		dput(newdentry);
+		newdentry = ERR_PTR(-ENOENT);
+	}
+
+out:
+	return newdentry;
+
+}
+
+static int ovl_set_opaque(struct dentry *upperdentry)
+{
+	int err;
+	const struct cred *old_cred;
+	struct cred *override_cred;
+
+	override_cred = prepare_creds();
+	if (!override_cred)
+		return -ENOMEM;
+
+	/* CAP_SYS_ADMIN for setxattr of "trusted" namespace */
+	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+	old_cred = override_creds(override_cred);
+	err = vfs_setxattr(upperdentry, ovl_opaque_xattr, "y", 1, 0);
+	revert_creds(old_cred);
+	put_cred(override_cred);
+
+	return err;
+}
+
+static int ovl_remove_opaque(struct dentry *upperdentry)
+{
+	int err;
+	const struct cred *old_cred;
+	struct cred *override_cred;
+
+	override_cred = prepare_creds();
+	if (!override_cred)
+		return -ENOMEM;
+
+	/* CAP_SYS_ADMIN for removexattr of "trusted" namespace */
+	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+	old_cred = override_creds(override_cred);
+	err = vfs_removexattr(upperdentry, ovl_opaque_xattr);
+	revert_creds(old_cred);
+	put_cred(override_cred);
+
+	return err;
+}
+
+static int ovl_dir_getattr(struct vfsmount *mnt, struct dentry *dentry,
+			 struct kstat *stat)
+{
+	int err;
+	enum ovl_path_type type;
+	struct path realpath;
+
+	type = ovl_path_real(dentry, &realpath);
+	err = vfs_getattr(realpath.mnt, realpath.dentry, stat);
+	if (err)
+		return err;
+
+	stat->dev = dentry->d_sb->s_dev;
+	stat->ino = dentry->d_inode->i_ino;
+
+	/*
+	 * It's probably not worth it to count subdirs to get the
+	 * correct link count.  nlink=1 seems to pacify 'find' and
+	 * other utilities.
+	 */
+	if (type == OVL_PATH_MERGE)
+		stat->nlink = 1;
+
+	return 0;
+}
+
+static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev,
+			     const char *link)
+{
+	int err;
+	struct dentry *newdentry;
+	struct dentry *upperdir;
+	struct inode *inode;
+	struct kstat stat = {
+		.mode = mode,
+		.rdev = rdev,
+	};
+
+	err = -ENOMEM;
+	inode = ovl_new_inode(dentry->d_sb, mode, dentry->d_fsdata);
+	if (!inode)
+		goto out;
+
+	err = ovl_copy_up(dentry->d_parent);
+	if (err)
+		goto out_iput;
+
+	upperdir = ovl_dentry_upper(dentry->d_parent);
+	mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
+
+	newdentry = ovl_upper_create(upperdir, dentry, &stat, link);
+	err = PTR_ERR(newdentry);
+	if (IS_ERR(newdentry))
+		goto out_unlock;
+
+	ovl_dentry_version_inc(dentry->d_parent);
+	if (ovl_dentry_is_opaque(dentry) && S_ISDIR(mode)) {
+		err = ovl_set_opaque(newdentry);
+		if (err) {
+			vfs_rmdir(upperdir->d_inode, newdentry);
+			ovl_whiteout(upperdir, dentry);
+			goto out_dput;
+		}
+	}
+	ovl_dentry_update(dentry, newdentry);
+	d_instantiate(dentry, inode);
+	inode = NULL;
+	newdentry = NULL;
+	err = 0;
+
+out_dput:
+	dput(newdentry);
+out_unlock:
+	mutex_unlock(&upperdir->d_inode->i_mutex);
+out_iput:
+	iput(inode);
+out:
+	return err;
+}
+
+static int ovl_create(struct inode *dir, struct dentry *dentry, int mode,
+			struct nameidata *nd)
+{
+	return ovl_create_object(dentry, (mode & 07777) | S_IFREG, 0, NULL);
+}
+
+static int ovl_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+	return ovl_create_object(dentry, (mode & 07777) | S_IFDIR, 0, NULL);
+}
+
+static int ovl_mknod(struct inode *dir, struct dentry *dentry, int mode,
+		       dev_t rdev)
+{
+	return ovl_create_object(dentry, mode, rdev, NULL);
+}
+
+static int ovl_symlink(struct inode *dir, struct dentry *dentry,
+			 const char *link)
+{
+	return ovl_create_object(dentry, S_IFLNK, 0, link);
+}
+
+static int ovl_do_remove(struct dentry *dentry, bool is_dir)
+{
+	int err;
+	enum ovl_path_type type;
+	struct path realpath;
+	struct dentry *upperdir;
+
+	err = ovl_copy_up(dentry->d_parent);
+	if (err)
+		return err;
+
+	upperdir = ovl_dentry_upper(dentry->d_parent);
+	mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
+	type = ovl_path_real(dentry, &realpath);
+	if (type != OVL_PATH_LOWER) {
+		err = -ESTALE;
+		if (realpath.dentry->d_parent != upperdir)
+			goto out_d_drop;
+
+		/* FIXME: create whiteout up front and rename to target */
+
+		if (is_dir)
+			err = vfs_rmdir(upperdir->d_inode, realpath.dentry);
+		else
+			err = vfs_unlink(upperdir->d_inode, realpath.dentry);
+		if (err)
+			goto out_d_drop;
+
+		ovl_dentry_version_inc(dentry->d_parent);
+	}
+
+	if (type != OVL_PATH_UPPER || ovl_dentry_is_opaque(dentry))
+		err = ovl_whiteout(upperdir, dentry);
+
+	/*
+	 * Keeping this dentry hashed would mean having to release
+	 * upperpath/lowerpath, which could only be done if we are the
+	 * sole user of this dentry.  Too tricky...  Just unhash for
+	 * now.
+	 */
+out_d_drop:
+	d_drop(dentry);
+	mutex_unlock(&upperdir->d_inode->i_mutex);
+
+	return err;
+}
+
+static int ovl_unlink(struct inode *dir, struct dentry *dentry)
+{
+	return ovl_do_remove(dentry, false);
+}
+
+
+static int ovl_rmdir(struct inode *dir, struct dentry *dentry)
+{
+	int err;
+	enum ovl_path_type type;
+
+	type = ovl_path_type(dentry);
+	if (type != OVL_PATH_UPPER) {
+		err = ovl_check_empty_and_clear(dentry, type);
+		if (err)
+			return err;
+	}
+
+	return ovl_do_remove(dentry, true);
+}
+
+static int ovl_link(struct dentry *old, struct inode *newdir,
+		    struct dentry *new)
+{
+	int err;
+	struct dentry *olddentry;
+	struct dentry *newdentry;
+	struct dentry *upperdir;
+
+	err = ovl_copy_up(old);
+	if (err)
+		goto out;
+
+	err = ovl_copy_up(new->d_parent);
+	if (err)
+		goto out;
+
+	upperdir = ovl_dentry_upper(new->d_parent);
+	mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
+	newdentry = ovl_lookup_create(upperdir, new);
+	err = PTR_ERR(newdentry);
+	if (IS_ERR(newdentry))
+		goto out_unlock;
+
+	olddentry = ovl_dentry_upper(old);
+	err = vfs_link(olddentry, upperdir->d_inode, newdentry);
+	if (!err) {
+		if (WARN_ON(!newdentry->d_inode)) {
+			dput(newdentry);
+			err = -ENOENT;
+			goto out_unlock;
+		}
+
+		ovl_dentry_version_inc(new->d_parent);
+		ovl_dentry_update(new, newdentry);
+
+		ihold(old->d_inode);
+		d_instantiate(new, old->d_inode);
+	} else {
+		if (ovl_dentry_is_opaque(new))
+			ovl_whiteout(upperdir, new);
+		dput(newdentry);
+	}
+out_unlock:
+	mutex_unlock(&upperdir->d_inode->i_mutex);
+out:
+	return err;
+
+}
+
+static int ovl_rename(struct inode *olddir, struct dentry *old,
+			struct inode *newdir, struct dentry *new)
+{
+	int err;
+	enum ovl_path_type old_type;
+	enum ovl_path_type new_type;
+	struct dentry *old_upperdir;
+	struct dentry *new_upperdir;
+	struct dentry *olddentry;
+	struct dentry *newdentry;
+	struct dentry *trap;
+	bool old_opaque;
+	bool new_opaque;
+	bool new_create = false;
+	bool is_dir = S_ISDIR(old->d_inode->i_mode);
+
+	/* Don't copy up directory trees */
+	old_type = ovl_path_type(old);
+	if (old_type != OVL_PATH_UPPER && is_dir)
+		return -EXDEV;
+
+	if (new->d_inode) {
+		new_type = ovl_path_type(new);
+
+		if (new_type == OVL_PATH_LOWER && old_type == OVL_PATH_LOWER) {
+			if (ovl_dentry_lower(old)->d_inode ==
+			    ovl_dentry_lower(new)->d_inode)
+				return 0;
+		}
+		if (new_type != OVL_PATH_LOWER && old_type != OVL_PATH_LOWER) {
+			if (ovl_dentry_upper(old)->d_inode ==
+			    ovl_dentry_upper(new)->d_inode)
+				return 0;
+		}
+
+		if (new_type != OVL_PATH_UPPER &&
+		    S_ISDIR(new->d_inode->i_mode)) {
+			err = ovl_check_empty_and_clear(new, new_type);
+			if (err)
+				return err;
+		}
+	} else {
+		new_type = OVL_PATH_UPPER;
+	}
+
+	err = ovl_copy_up(old);
+	if (err)
+		return err;
+
+	err = ovl_copy_up(new->d_parent);
+	if (err)
+		return err;
+
+	old_upperdir = ovl_dentry_upper(old->d_parent);
+	new_upperdir = ovl_dentry_upper(new->d_parent);
+
+	trap = lock_rename(new_upperdir, old_upperdir);
+
+	olddentry = ovl_dentry_upper(old);
+	newdentry = ovl_dentry_upper(new);
+	if (newdentry) {
+		dget(newdentry);
+	} else {
+		new_create = true;
+		newdentry = ovl_lookup_create(new_upperdir, new);
+		err = PTR_ERR(newdentry);
+		if (IS_ERR(newdentry))
+			goto out_unlock;
+	}
+
+	err = -ESTALE;
+	if (olddentry->d_parent != old_upperdir)
+		goto out_dput;
+	if (newdentry->d_parent != new_upperdir)
+		goto out_dput;
+	if (olddentry == trap)
+		goto out_dput;
+	if (newdentry == trap)
+		goto out_dput;
+
+	old_opaque = ovl_dentry_is_opaque(old);
+	new_opaque = ovl_dentry_is_opaque(new) || new_type != OVL_PATH_UPPER;
+
+	if (is_dir && !old_opaque && new_opaque) {
+		err = ovl_set_opaque(olddentry);
+		if (err)
+			goto out_dput;
+	}
+
+	err = vfs_rename(old_upperdir->d_inode, olddentry,
+			 new_upperdir->d_inode, newdentry);
+
+	if (err) {
+		if (new_create && ovl_dentry_is_opaque(new))
+			ovl_whiteout(new_upperdir, new);
+		if (is_dir && !old_opaque && new_opaque)
+			ovl_remove_opaque(olddentry);
+		goto out_dput;
+	}
+
+	if (old_type != OVL_PATH_UPPER || old_opaque)
+		err = ovl_whiteout(old_upperdir, old);
+	if (is_dir && old_opaque && !new_opaque)
+		ovl_remove_opaque(olddentry);
+
+	if (old_opaque != new_opaque)
+		ovl_dentry_set_opaque(old, new_opaque);
+
+	ovl_dentry_version_inc(old->d_parent);
+	ovl_dentry_version_inc(new->d_parent);
+
+out_dput:
+	dput(newdentry);
+out_unlock:
+	unlock_rename(new_upperdir, old_upperdir);
+	return err;
+}
+
+const struct inode_operations ovl_dir_inode_operations = {
+	.lookup		= ovl_lookup,
+	.mkdir		= ovl_mkdir,
+	.symlink	= ovl_symlink,
+	.unlink		= ovl_unlink,
+	.rmdir		= ovl_rmdir,
+	.rename		= ovl_rename,
+	.link		= ovl_link,
+	.setattr	= ovl_setattr,
+	.create		= ovl_create,
+	.mknod		= ovl_mknod,
+	.permission	= ovl_permission,
+	.getattr	= ovl_dir_getattr,
+	.setxattr	= ovl_setxattr,
+	.getxattr	= ovl_getxattr,
+	.listxattr	= ovl_listxattr,
+	.removexattr	= ovl_removexattr,
+};
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
new file mode 100644
index 0000000..289006e
--- /dev/null
+++ b/fs/overlayfs/inode.c
@@ -0,0 +1,375 @@
+/*
+ *
+ * Copyright (C) 2011 Novell Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/xattr.h>
+#include "overlayfs.h"
+
+int ovl_setattr(struct dentry *dentry, struct iattr *attr)
+{
+	struct dentry *upperdentry;
+	int err;
+
+	if ((attr->ia_valid & ATTR_SIZE) && !ovl_dentry_upper(dentry))
+		err = ovl_copy_up_truncate(dentry, attr->ia_size);
+	else
+		err = ovl_copy_up(dentry);
+	if (err)
+		return err;
+
+	upperdentry = ovl_dentry_upper(dentry);
+
+	if (attr->ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID))
+		attr->ia_valid &= ~ATTR_MODE;
+
+	mutex_lock(&upperdentry->d_inode->i_mutex);
+	err = notify_change(upperdentry, attr);
+	mutex_unlock(&upperdentry->d_inode->i_mutex);
+
+	return err;
+}
+
+static int ovl_getattr(struct vfsmount *mnt, struct dentry *dentry,
+			 struct kstat *stat)
+{
+	struct path realpath;
+
+	ovl_path_real(dentry, &realpath);
+	return vfs_getattr(realpath.mnt, realpath.dentry, stat);
+}
+
+int ovl_permission(struct inode *inode, int mask, unsigned int flags)
+{
+	struct ovl_entry *oe;
+	struct dentry *alias = NULL;
+	struct inode *realinode;
+	struct dentry *realdentry;
+	bool is_upper;
+	int err;
+
+	if (S_ISDIR(inode->i_mode)) {
+		oe = inode->i_private;
+	} else if (flags & IPERM_FLAG_RCU) {
+		return -ECHILD;
+	} else {
+		/*
+		 * For non-directories find an alias and get the info
+		 * from there.
+		 */
+		spin_lock(&inode->i_lock);
+		if (WARN_ON(list_empty(&inode->i_dentry))) {
+			spin_unlock(&inode->i_lock);
+			return -ENOENT;
+		}
+		alias = list_entry(inode->i_dentry.next, struct dentry, d_alias);
+		dget(alias);
+		spin_unlock(&inode->i_lock);
+		oe = alias->d_fsdata;
+	}
+
+	realdentry = ovl_entry_real(oe, &is_upper);
+
+	/* Careful in RCU walk mode */
+	realinode = ACCESS_ONCE(realdentry->d_inode);
+	if (!realinode) {
+		WARN_ON(!(flags & IPERM_FLAG_RCU));
+		err = -ENOENT;
+		goto out_dput;
+	}
+
+	if (mask & MAY_WRITE) {
+		umode_t mode = realinode->i_mode;
+
+		/*
+		 * Writes will always be redirected to upper layer, so
+		 * ignore lower layer being read-only.
+		 */
+		err = -EROFS;
+		if (is_upper && IS_RDONLY(realinode) &&
+		    (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
+			goto out_dput;
+
+		/*
+		 * Nobody gets write access to an immutable file.
+		 */
+		err = -EACCES;
+		if (IS_IMMUTABLE(realinode))
+			goto out_dput;
+	}
+
+	if (realinode->i_op->permission)
+		err = realinode->i_op->permission(realinode, mask, flags);
+	else
+		err = generic_permission(realinode, mask, flags,
+					 realinode->i_op->check_acl);
+out_dput:
+	dput(alias);
+	return err;
+}
+
+
+struct ovl_link_data {
+	struct dentry *realdentry;
+	void *cookie;
+};
+
+static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+	void *ret;
+	struct dentry *realdentry;
+	struct inode *realinode;
+
+	realdentry = ovl_dentry_real(dentry);
+	realinode = realdentry->d_inode;
+
+	if (WARN_ON(!realinode->i_op->follow_link))
+		return ERR_PTR(-EPERM);
+
+	ret = realinode->i_op->follow_link(realdentry, nd);
+	if (IS_ERR(ret))
+		return ret;
+
+	if (realinode->i_op->put_link) {
+		struct ovl_link_data *data;
+
+		data = kmalloc(sizeof(struct ovl_link_data), GFP_KERNEL);
+		if (!data) {
+			realinode->i_op->put_link(realdentry, nd, ret);
+			return ERR_PTR(-ENOMEM);
+		}
+		data->realdentry = realdentry;
+		data->cookie = ret;
+
+		return data;
+	} else {
+		return NULL;
+	}
+}
+
+static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+{
+	struct inode *realinode;
+	struct ovl_link_data *data = c;
+
+	if (!data)
+		return;
+
+	realinode = data->realdentry->d_inode;
+	realinode->i_op->put_link(data->realdentry, nd, data->cookie);
+	kfree(data);
+}
+
+static int ovl_readlink(struct dentry *dentry, char __user *buf, int bufsiz)
+{
+	struct path realpath;
+	struct inode *realinode;
+
+	ovl_path_real(dentry, &realpath);
+	realinode = realpath.dentry->d_inode;
+
+	if (!realinode->i_op->readlink)
+		return -EINVAL;
+
+	touch_atime(realpath.mnt, realpath.dentry);
+
+	return realinode->i_op->readlink(realpath.dentry, buf, bufsiz);
+}
+
+
+static bool ovl_is_private_xattr(const char *name)
+{
+	return strncmp(name, "trusted.overlay.", 14) == 0;
+}
+
+int ovl_setxattr(struct dentry *dentry, const char *name,
+		 const void *value, size_t size, int flags)
+{
+	int err;
+	struct dentry *upperdentry;
+
+	if (ovl_is_private_xattr(name))
+		return -EPERM;
+
+	err = ovl_copy_up(dentry);
+	if (err)
+		return err;
+
+	upperdentry = ovl_dentry_upper(dentry);
+	return  vfs_setxattr(upperdentry, name, value, size, flags);
+}
+
+ssize_t ovl_getxattr(struct dentry *dentry, const char *name,
+		     void *value, size_t size)
+{
+	if (ovl_path_type(dentry->d_parent) == OVL_PATH_MERGE &&
+	    ovl_is_private_xattr(name))
+		return -ENODATA;
+
+	return vfs_getxattr(ovl_dentry_real(dentry), name, value, size);
+}
+
+ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size)
+{
+	ssize_t res;
+	int off;
+
+	res = vfs_listxattr(ovl_dentry_real(dentry), list, size);
+	if (res <= 0 || size == 0)
+		return res;
+
+	if (ovl_path_type(dentry->d_parent) != OVL_PATH_MERGE)
+		return res;
+
+	/* filter out private xattrs */
+	for (off = 0; off < res;) {
+		char *s = list + off;
+		size_t slen = strlen(s) + 1;
+
+		BUG_ON(off + slen > res);
+
+		if (ovl_is_private_xattr(s)) {
+			res -= slen;
+			memmove(s, s + slen, res - off);
+		} else {
+			off += slen;
+		}
+	}
+
+	return res;
+}
+
+int ovl_removexattr(struct dentry *dentry, const char *name)
+{
+	int err;
+	struct path realpath;
+	enum ovl_path_type type;
+
+	if (ovl_path_type(dentry->d_parent) == OVL_PATH_MERGE &&
+	    ovl_is_private_xattr(name))
+		return -ENODATA;
+
+	type = ovl_path_real(dentry, &realpath);
+	if (type == OVL_PATH_LOWER) {
+		err = vfs_getxattr(realpath.dentry, name, NULL, 0);
+		if (err < 0)
+			return err;
+
+		err = ovl_copy_up(dentry);
+		if (err)
+			return err;
+
+		ovl_path_upper(dentry, &realpath);
+	}
+
+	return vfs_removexattr(realpath.dentry, name);
+}
+
+static bool ovl_open_need_copy_up(int flags, enum ovl_path_type type,
+				  struct dentry *realdentry)
+{
+	if (type != OVL_PATH_LOWER)
+		return false;
+
+	if (special_file(realdentry->d_inode->i_mode))
+		return false;
+
+	if (!(OPEN_FMODE(flags) & FMODE_WRITE) && !(flags & O_TRUNC))
+		return false;
+
+	return true;
+}
+
+static struct file *ovl_open(struct dentry *dentry, int flags,
+			     const struct cred *cred)
+{
+	int err;
+	struct path realpath;
+	enum ovl_path_type type;
+
+	type = ovl_path_real(dentry, &realpath);
+	if (ovl_open_need_copy_up(flags, type, realpath.dentry)) {
+		if (flags & O_TRUNC)
+			err = ovl_copy_up_truncate(dentry, 0);
+		else
+			err = ovl_copy_up(dentry);
+		if (err)
+			return ERR_PTR(err);
+
+		ovl_path_upper(dentry, &realpath);
+	}
+
+	return vfs_open(&realpath, flags, cred);
+}
+
+static const struct inode_operations ovl_file_inode_operations = {
+	.setattr	= ovl_setattr,
+	.permission	= ovl_permission,
+	.getattr	= ovl_getattr,
+	.setxattr	= ovl_setxattr,
+	.getxattr	= ovl_getxattr,
+	.listxattr	= ovl_listxattr,
+	.removexattr	= ovl_removexattr,
+	.open		= ovl_open,
+};
+
+static const struct inode_operations ovl_symlink_inode_operations = {
+	.setattr	= ovl_setattr,
+	.follow_link	= ovl_follow_link,
+	.put_link	= ovl_put_link,
+	.readlink	= ovl_readlink,
+	.getattr	= ovl_getattr,
+	.setxattr	= ovl_setxattr,
+	.getxattr	= ovl_getxattr,
+	.listxattr	= ovl_listxattr,
+	.removexattr	= ovl_removexattr,
+};
+
+struct inode *ovl_new_inode(struct super_block *sb, umode_t mode,
+			    struct ovl_entry *oe)
+{
+	struct inode *inode;
+
+	inode = new_inode(sb);
+	if (!inode)
+		return NULL;
+
+	mode &= S_IFMT;
+
+	inode->i_ino = get_next_ino();
+	inode->i_mode = mode;
+	inode->i_flags |= S_NOATIME | S_NOCMTIME;
+
+	switch (mode) {
+	case S_IFDIR:
+		inode->i_private = oe;
+		inode->i_op = &ovl_dir_inode_operations;
+		inode->i_fop = &ovl_dir_operations;
+		break;
+
+	case S_IFLNK:
+		inode->i_op = &ovl_symlink_inode_operations;
+		break;
+
+	case S_IFREG:
+	case S_IFSOCK:
+	case S_IFBLK:
+	case S_IFCHR:
+	case S_IFIFO:
+		inode->i_op = &ovl_file_inode_operations;
+		break;
+
+	default:
+		WARN(1, "illegal file type: %i\n", mode);
+		inode = NULL;
+	}
+
+	return inode;
+
+}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
new file mode 100644
index 0000000..10c4c36
--- /dev/null
+++ b/fs/overlayfs/overlayfs.h
@@ -0,0 +1,62 @@
+/*
+ *
+ * Copyright (C) 2011 Novell Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+struct ovl_entry;
+
+enum ovl_path_type {
+	OVL_PATH_UPPER,
+	OVL_PATH_MERGE,
+	OVL_PATH_LOWER,
+};
+
+extern const char *ovl_opaque_xattr;
+extern const char *ovl_whiteout_xattr;
+extern const struct dentry_operations ovl_dentry_operations;
+
+enum ovl_path_type ovl_path_type(struct dentry *dentry);
+u64 ovl_dentry_version_get(struct dentry *dentry);
+void ovl_dentry_version_inc(struct dentry *dentry);
+void ovl_path_upper(struct dentry *dentry, struct path *path);
+void ovl_path_lower(struct dentry *dentry, struct path *path);
+enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
+struct dentry *ovl_dentry_upper(struct dentry *dentry);
+struct dentry *ovl_dentry_lower(struct dentry *dentry);
+struct dentry *ovl_dentry_real(struct dentry *dentry);
+struct dentry *ovl_entry_real(struct ovl_entry *oe, bool *is_upper);
+bool ovl_dentry_is_opaque(struct dentry *dentry);
+void ovl_dentry_set_opaque(struct dentry *dentry, bool opaque);
+bool ovl_is_whiteout(struct dentry *dentry);
+void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
+int ovl_do_lookup(struct dentry *dentry);
+
+struct dentry *ovl_upper_create(struct dentry *upperdir, struct dentry *dentry,
+				struct kstat *stat, const char *link);
+
+/* readdir.c */
+extern const struct file_operations ovl_dir_operations;
+int ovl_check_empty_and_clear(struct dentry *dentry, enum ovl_path_type type);
+
+/* inode.c */
+int ovl_setattr(struct dentry *dentry, struct iattr *attr);
+int ovl_permission(struct inode *inode, int mask, unsigned int flags);
+int ovl_setxattr(struct dentry *dentry, const char *name,
+		 const void *value, size_t size, int flags);
+ssize_t ovl_getxattr(struct dentry *dentry, const char *name,
+		     void *value, size_t size);
+ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size);
+int ovl_removexattr(struct dentry *dentry, const char *name);
+
+struct inode *ovl_new_inode(struct super_block *sb, umode_t mode,
+			    struct ovl_entry *oe);
+/* dir.c */
+extern const struct inode_operations ovl_dir_inode_operations;
+
+/* copy_up.c */
+int ovl_copy_up(struct dentry *dentry);
+int ovl_copy_up_truncate(struct dentry *dentry, loff_t size);
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
new file mode 100644
index 0000000..4f0a51c
--- /dev/null
+++ b/fs/overlayfs/readdir.c
@@ -0,0 +1,558 @@
+/*
+ *
+ * Copyright (C) 2011 Novell Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/namei.h>
+#include <linux/file.h>
+#include <linux/xattr.h>
+#include <linux/rbtree.h>
+#include <linux/security.h>
+#include "overlayfs.h"
+
+struct ovl_cache_entry {
+	const char *name;
+	unsigned int len;
+	unsigned int type;
+	u64 ino;
+	bool is_whiteout;
+	struct list_head l_node;
+	struct rb_node node;
+};
+
+struct ovl_readdir_data {
+	struct rb_root *root;
+	struct list_head *list;
+	struct list_head *middle;
+	struct dentry *dir;
+	int count;
+	int err;
+};
+
+struct ovl_dir_file {
+	bool is_real;
+	bool is_cached;
+	struct list_head cursor;
+	u64 cache_version;
+	struct list_head cache;
+	struct file *realfile;
+};
+
+static struct ovl_cache_entry *ovl_cache_entry_from_node(struct rb_node *n)
+{
+	return container_of(n, struct ovl_cache_entry, node);
+}
+
+static struct ovl_cache_entry *ovl_cache_entry_find(struct rb_root *root,
+						    const char *name, int len)
+{
+	struct rb_node *node = root->rb_node;
+	int cmp;
+
+	while (node) {
+		struct ovl_cache_entry *p = ovl_cache_entry_from_node(node);
+
+		cmp = strncmp(name, p->name, len);
+		if (cmp > 0)
+			node = p->node.rb_right;
+		else if (cmp < 0 || len < p->len)
+			node = p->node.rb_left;
+		else
+			return p;
+	}
+
+	return NULL;
+}
+
+static struct ovl_cache_entry *ovl_cache_entry_new(const char *name, int len,
+						   u64 ino, unsigned int d_type)
+{
+	struct ovl_cache_entry *p;
+
+	p = kmalloc(sizeof(*p) + len + 1, GFP_KERNEL);
+	if (p) {
+		char *name_copy = (char *) (p + 1);
+		memcpy(name_copy, name, len);
+		name_copy[len] = '\0';
+		p->name = name_copy;
+		p->len = len;
+		p->type = d_type;
+		p->ino = ino;
+		p->is_whiteout = false;
+	}
+
+	return p;
+}
+
+static int ovl_cache_entry_add_rb(struct ovl_readdir_data *rdd,
+				  const char *name, int len, u64 ino,
+				  unsigned int d_type)
+{
+	struct rb_node **newp = &rdd->root->rb_node;
+	struct rb_node *parent = NULL;
+	struct ovl_cache_entry *p;
+
+	while (*newp) {
+		int cmp;
+		struct ovl_cache_entry *tmp;
+
+		parent = *newp;
+		tmp = ovl_cache_entry_from_node(*newp);
+		cmp = strncmp(name, tmp->name, len);
+		if (cmp > 0)
+			newp = &tmp->node.rb_right;
+		else if (cmp < 0 || len < tmp->len)
+			newp = &tmp->node.rb_left;
+		else
+			return 0;
+	}
+
+	p = ovl_cache_entry_new(name, len, ino, d_type);
+	if (p == NULL)
+		return -ENOMEM;
+
+	list_add_tail(&p->l_node, rdd->list);
+	rb_link_node(&p->node, parent, newp);
+	rb_insert_color(&p->node, rdd->root);
+
+	return 0;
+}
+
+static int ovl_fill_lower(void *buf, const char *name, int namelen,
+			    loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct ovl_readdir_data *rdd = buf;
+	struct ovl_cache_entry *p;
+
+	rdd->count++;
+	p = ovl_cache_entry_find(rdd->root, name, namelen);
+	if (p) {
+		list_move_tail(&p->l_node, rdd->middle);
+	} else {
+		p = ovl_cache_entry_new(name, namelen, ino, d_type);
+		if (p == NULL)
+			rdd->err = -ENOMEM;
+		else
+			list_add_tail(&p->l_node, rdd->middle);
+	}
+
+	return rdd->err;
+}
+
+static void ovl_cache_free(struct list_head *list)
+{
+	struct ovl_cache_entry *p;
+	struct ovl_cache_entry *n;
+
+	list_for_each_entry_safe(p, n, list, l_node)
+		kfree(p);
+
+	INIT_LIST_HEAD(list);
+}
+
+static int ovl_fill_upper(void *buf, const char *name, int namelen,
+			  loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct ovl_readdir_data *rdd = buf;
+
+	rdd->count++;
+	return ovl_cache_entry_add_rb(rdd, name, namelen, ino, d_type);
+}
+
+static int ovl_dir_read(struct path *realpath, struct ovl_readdir_data *rdd,
+			  filldir_t filler)
+{
+	struct file *realfile;
+	int err;
+
+	realfile = vfs_open(realpath, O_RDONLY | O_DIRECTORY, current_cred());
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
+
+	do {
+		rdd->count = 0;
+		rdd->err = 0;
+		err = vfs_readdir(realfile, filler, rdd);
+		if (err >= 0)
+			err = rdd->err;
+	} while (!err && rdd->count);
+	fput(realfile);
+
+	return 0;
+}
+
+static void ovl_dir_reset(struct file *file)
+{
+	struct ovl_dir_file *od = file->private_data;
+	enum ovl_path_type type = ovl_path_type(file->f_path.dentry);
+
+	if (ovl_dentry_version_get(file->f_path.dentry) != od->cache_version) {
+		list_del_init(&od->cursor);
+		ovl_cache_free(&od->cache);
+		od->is_cached = false;
+	}
+	WARN_ON(!od->is_real && type != OVL_PATH_MERGE);
+	if (od->is_real && type == OVL_PATH_MERGE) {
+		fput(od->realfile);
+		od->realfile = NULL;
+		od->is_real = false;
+	}
+}
+
+static int ovl_dir_mark_whiteouts(struct ovl_readdir_data *rdd)
+{
+	struct ovl_cache_entry *p;
+	struct dentry *dentry;
+	const struct cred *old_cred;
+	struct cred *override_cred;
+
+	override_cred = prepare_creds();
+	if (!override_cred) {
+		ovl_cache_free(rdd->list);
+		return -ENOMEM;
+	}
+
+	/*
+	 * CAP_SYS_ADMIN for getxattr
+	 * CAP_DAC_OVERRIDE for lookup
+	 */
+	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+	cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
+	old_cred = override_creds(override_cred);
+
+	mutex_lock(&rdd->dir->d_inode->i_mutex);
+	list_for_each_entry(p, rdd->list, l_node) {
+		if (p->type != DT_LNK)
+			continue;
+
+		dentry = lookup_one_len(p->name, rdd->dir, p->len);
+		if (IS_ERR(dentry))
+			continue;
+
+		p->is_whiteout = ovl_is_whiteout(dentry);
+		dput(dentry);
+	}
+	mutex_unlock(&rdd->dir->d_inode->i_mutex);
+
+	revert_creds(old_cred);
+	put_cred(override_cred);
+
+	return 0;
+}
+
+static int ovl_dir_read_merged(struct path *upperpath, struct path *lowerpath,
+			       struct ovl_readdir_data *rdd)
+{
+	int err;
+	struct rb_root root = RB_ROOT;
+	struct list_head middle;
+
+	rdd->root = &root;
+	if (upperpath->dentry) {
+		rdd->dir = upperpath->dentry;
+		err = ovl_dir_read(upperpath, rdd, ovl_fill_upper);
+		if (err)
+			goto out;
+
+		err = ovl_dir_mark_whiteouts(rdd);
+		if (err)
+			goto out;
+	}
+	/*
+	 * Insert lowerpath entries before upperpath ones, this allows
+	 * offsets to be reasonably constant
+	 */
+	list_add(&middle, rdd->list);
+	rdd->middle = &middle;
+	err = ovl_dir_read(lowerpath, rdd, ovl_fill_lower);
+	list_del(&middle);
+out:
+	rdd->root = NULL;
+
+	return err;
+}
+
+static void ovl_seek_cursor(struct ovl_dir_file *od, loff_t pos)
+{
+	struct list_head *l;
+	loff_t off;
+
+	l = od->cache.next;
+	for (off = 0; off < pos; off++) {
+		if (l == &od->cache)
+			break;
+		l = l->next;
+	}
+	list_move_tail(&od->cursor, l);
+}
+
+static int ovl_readdir(struct file *file, void *buf, filldir_t filler)
+{
+	struct ovl_dir_file *od = file->private_data;
+	int res;
+
+	if (!file->f_pos)
+		ovl_dir_reset(file);
+
+	if (od->is_real) {
+		res = vfs_readdir(od->realfile, filler, buf);
+		file->f_pos = od->realfile->f_pos;
+
+		return res;
+	}
+
+	if (!od->is_cached) {
+		struct path lowerpath;
+		struct path upperpath;
+		struct ovl_readdir_data rdd = { .list = &od->cache };
+
+		ovl_path_lower(file->f_path.dentry, &lowerpath);
+		ovl_path_upper(file->f_path.dentry, &upperpath);
+
+		res = ovl_dir_read_merged(&upperpath, &lowerpath, &rdd);
+		if (res) {
+			ovl_cache_free(rdd.list);
+			return res;
+		}
+
+		od->cache_version = ovl_dentry_version_get(file->f_path.dentry);
+		od->is_cached = true;
+
+		ovl_seek_cursor(od, file->f_pos);
+	}
+
+	while (od->cursor.next != &od->cache) {
+		int over;
+		loff_t off;
+		struct ovl_cache_entry *p;
+
+		p = list_entry(od->cursor.next, struct ovl_cache_entry, l_node);
+		off = file->f_pos;
+		if (!p->is_whiteout) {
+			over = filler(buf, p->name, p->len, off, p->ino, p->type);
+			if (over)
+				break;
+		}
+		file->f_pos++;
+		list_move(&od->cursor, &p->l_node);
+	}
+
+	return 0;
+}
+
+static loff_t ovl_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+	loff_t res;
+	struct ovl_dir_file *od = file->private_data;
+
+	mutex_lock(&file->f_dentry->d_inode->i_mutex);
+	if (!file->f_pos)
+		ovl_dir_reset(file);
+
+	if (od->is_real) {
+		res = vfs_llseek(od->realfile, offset, origin);
+		file->f_pos = od->realfile->f_pos;
+	} else {
+		res = -EINVAL;
+
+		switch (origin) {
+		case SEEK_CUR:
+			offset += file->f_pos;
+			break;
+		case SEEK_SET:
+			break;
+		default:
+			goto out_unlock;
+		}
+		if (offset < 0)
+			goto out_unlock;
+
+		if (offset != file->f_pos) {
+			file->f_pos = offset;
+			if (od->is_cached)
+				ovl_seek_cursor(od, offset);
+		}
+		res = offset;
+	}
+out_unlock:
+	mutex_unlock(&file->f_dentry->d_inode->i_mutex);
+
+	return res;
+}
+
+static int ovl_dir_fsync(struct file *file, int datasync)
+{
+	struct ovl_dir_file *od = file->private_data;
+
+	/* May need to reopen directory if it got copied up */
+	if (!od->realfile) {
+		struct path upperpath;
+
+		ovl_path_upper(file->f_path.dentry, &upperpath);
+		od->realfile = vfs_open(&upperpath, O_RDONLY, current_cred());
+		if (IS_ERR(od->realfile))
+			return PTR_ERR(od->realfile);
+	}
+
+	return vfs_fsync(od->realfile, datasync);
+}
+
+static int ovl_dir_release(struct inode *inode, struct file *file)
+{
+	struct ovl_dir_file *od = file->private_data;
+
+	list_del(&od->cursor);
+	ovl_cache_free(&od->cache);
+	if (od->realfile)
+		fput(od->realfile);
+	kfree(od);
+
+	return 0;
+}
+
+static int ovl_dir_open(struct inode *inode, struct file *file)
+{
+	struct path realpath;
+	struct file *realfile;
+	struct ovl_dir_file *od;
+	enum ovl_path_type type;
+
+	od = kzalloc(sizeof(struct ovl_dir_file), GFP_KERNEL);
+	if (!od)
+		return -ENOMEM;
+
+	type = ovl_path_real(file->f_path.dentry, &realpath);
+	realfile = vfs_open(&realpath, file->f_flags, current_cred());
+	if (IS_ERR(realfile)) {
+		kfree(od);
+		return PTR_ERR(realfile);
+	}
+	INIT_LIST_HEAD(&od->cache);
+	INIT_LIST_HEAD(&od->cursor);
+	od->is_cached = false;
+	od->realfile = realfile;
+	od->is_real = (type != OVL_PATH_MERGE);
+	file->private_data = od;
+
+	return 0;
+}
+
+const struct file_operations ovl_dir_operations = {
+	.read		= generic_read_dir,
+	.open		= ovl_dir_open,
+	.readdir	= ovl_readdir,
+	.llseek		= ovl_dir_llseek,
+	.fsync		= ovl_dir_fsync,
+	.release	= ovl_dir_release,
+};
+
+static int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list)
+{
+	int err;
+	struct path lowerpath;
+	struct path upperpath;
+	struct ovl_cache_entry *p;
+	struct ovl_readdir_data rdd = { .list = list };
+
+	ovl_path_upper(dentry, &upperpath);
+	ovl_path_lower(dentry, &lowerpath);
+
+	err = ovl_dir_read_merged(&upperpath, &lowerpath, &rdd);
+	if (err)
+		return err;
+
+	err = 0;
+
+	list_for_each_entry(p, list, l_node) {
+		if (p->is_whiteout)
+			continue;
+
+		if (p->name[0] == '.') {
+			if (p->len == 1)
+				continue;
+			if (p->len == 2 && p->name[1] == '.')
+				continue;
+		}
+		err = -ENOTEMPTY;
+		break;
+	}
+
+	return err;
+}
+
+static int ovl_remove_whiteouts(struct dentry *dir, struct list_head *list)
+{
+	struct path upperpath;
+	struct dentry *upperdir;
+	struct ovl_cache_entry *p;
+	const struct cred *old_cred;
+	struct cred *override_cred;
+	int err;
+
+	ovl_path_upper(dir, &upperpath);
+	upperdir = upperpath.dentry;
+
+	override_cred = prepare_creds();
+	if (!override_cred)
+		return -ENOMEM;
+
+	/*
+	 * CAP_DAC_OVERRIDE for lookup and unlink
+	 * CAP_SYS_ADMIN for setxattr of "trusted" namespace
+	 * CAP_FOWNER for unlink in sticky directory
+	 */
+	cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
+	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+	cap_raise(override_cred->cap_effective, CAP_FOWNER);
+	old_cred = override_creds(override_cred);
+
+	err = vfs_setxattr(upperdir, ovl_opaque_xattr, "y", 1, 0);
+	if (err)
+		goto out_revert_creds;
+
+	mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
+	list_for_each_entry(p, list, l_node) {
+		struct dentry *dentry;
+		int ret;
+
+		if (!p->is_whiteout)
+			continue;
+
+		dentry = lookup_one_len(p->name, upperdir, p->len);
+		if (IS_ERR(dentry)) {
+			printk(KERN_WARNING "overlayfs: failed to lookup whiteout %.*s: %li\n", p->len, p->name, PTR_ERR(dentry));
+			continue;
+		}
+		ret = vfs_unlink(upperdir->d_inode, dentry);
+		dput(dentry);
+		if (ret)
+			printk(KERN_WARNING "overlayfs: failed to unlink whiteout %.*s: %i\n", p->len, p->name, ret);
+	}
+	mutex_unlock(&upperdir->d_inode->i_mutex);
+
+out_revert_creds:
+	revert_creds(old_cred);
+	put_cred(override_cred);
+
+	return err;
+}
+
+int ovl_check_empty_and_clear(struct dentry *dentry, enum ovl_path_type type)
+{
+	int err;
+	LIST_HEAD(list);
+
+	err = ovl_check_empty_dir(dentry, &list);
+	if (!err && type == OVL_PATH_MERGE)
+		err = ovl_remove_whiteouts(dentry, &list);
+
+	ovl_cache_free(&list);
+
+	return err;
+}
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
new file mode 100644
index 0000000..a344b42
--- /dev/null
+++ b/fs/overlayfs/super.c
@@ -0,0 +1,582 @@
+/*
+ *
+ * Copyright (C) 2011 Novell Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/xattr.h>
+#include <linux/security.h>
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/parser.h>
+#include <linux/module.h>
+#include "overlayfs.h"
+
+MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
+MODULE_DESCRIPTION("Overlay filesystem");
+MODULE_LICENSE("GPL");
+
+struct ovl_fs {
+	struct vfsmount *upper_mnt;
+	struct vfsmount *lower_mnt;
+};
+
+struct ovl_entry {
+	/*
+	 * Keep "double reference" on upper dentries, so that
+	 * d_delete() doesn't think it's OK to reset d_inode to NULL.
+	 */
+	struct dentry *__upperdentry;
+	struct dentry *lowerdentry;
+	union {
+		struct {
+			u64 version;
+			bool opaque;
+		};
+		struct rcu_head rcu;
+	};
+};
+
+const char *ovl_whiteout_xattr = "trusted.overlay.whiteout";
+const char *ovl_opaque_xattr = "trusted.overlay.opaque";
+
+
+enum ovl_path_type ovl_path_type(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	if (oe->__upperdentry) {
+		if (oe->lowerdentry && S_ISDIR(dentry->d_inode->i_mode))
+			return OVL_PATH_MERGE;
+		else
+			return OVL_PATH_UPPER;
+	} else {
+		return OVL_PATH_LOWER;
+	}
+}
+
+static struct dentry *ovl_upperdentry_dereference(struct ovl_entry *oe)
+{
+	struct dentry *upperdentry = ACCESS_ONCE(oe->__upperdentry);
+	smp_read_barrier_depends();
+	return upperdentry;
+}
+
+void ovl_path_upper(struct dentry *dentry, struct path *path)
+{
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	path->mnt = ofs->upper_mnt;
+	path->dentry = ovl_upperdentry_dereference(oe);
+}
+
+void ovl_path_lower(struct dentry *dentry, struct path *path)
+{
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	path->mnt = ofs->lower_mnt;
+	path->dentry = oe->lowerdentry;
+}
+
+enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path)
+{
+
+	enum ovl_path_type type = ovl_path_type(dentry);
+
+	if (type == OVL_PATH_LOWER)
+		ovl_path_lower(dentry, path);
+	else
+		ovl_path_upper(dentry, path);
+
+	return type;
+}
+
+struct dentry *ovl_dentry_upper(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	return ovl_upperdentry_dereference(oe);
+}
+
+struct dentry *ovl_dentry_lower(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	return oe->lowerdentry;
+}
+
+struct dentry *ovl_dentry_real(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+	struct dentry *realdentry;
+
+	realdentry = ovl_upperdentry_dereference(oe);
+	if (!realdentry)
+		realdentry = oe->lowerdentry;
+
+	return realdentry;
+}
+
+struct dentry *ovl_entry_real(struct ovl_entry *oe, bool *is_upper)
+{
+	struct dentry *realdentry;
+
+	realdentry = ovl_upperdentry_dereference(oe);
+	if (realdentry) {
+		*is_upper = true;
+	} else {
+		realdentry = oe->lowerdentry;
+		*is_upper = false;
+	}
+	return realdentry;
+}
+
+bool ovl_dentry_is_opaque(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+	return oe->opaque;
+}
+
+void ovl_dentry_set_opaque(struct dentry *dentry, bool opaque)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+	oe->opaque = opaque;
+}
+
+void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	WARN_ON(!mutex_is_locked(&upperdentry->d_parent->d_inode->i_mutex));
+	WARN_ON(oe->__upperdentry);
+	BUG_ON(!upperdentry->d_inode);
+	smp_wmb();
+	oe->__upperdentry = dget(upperdentry);
+}
+
+void ovl_dentry_version_inc(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	WARN_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
+	oe->version++;
+}
+
+u64 ovl_dentry_version_get(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	WARN_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
+	return oe->version;
+}
+
+bool ovl_is_whiteout(struct dentry *dentry)
+{
+	int res;
+	char val;
+
+	if (!dentry)
+		return false;
+	if (!dentry->d_inode)
+		return false;
+	if (!S_ISLNK(dentry->d_inode->i_mode))
+		return false;
+
+	res = vfs_getxattr(dentry, ovl_whiteout_xattr, &val, 1);
+	if (res == 1 && val == 'y')
+		return true;
+
+	return false;
+}
+
+static bool ovl_is_opaquedir(struct dentry *dentry)
+{
+	int res;
+	char val;
+
+	if (!S_ISDIR(dentry->d_inode->i_mode))
+		return false;
+
+	res = vfs_getxattr(dentry, ovl_opaque_xattr, &val, 1);
+	if (res == 1 && val == 'y')
+		return true;
+
+	return false;
+}
+
+static void ovl_entry_free(struct rcu_head *head)
+{
+	struct ovl_entry *oe = container_of(head, struct ovl_entry, rcu);
+	kfree(oe);
+}
+
+static void ovl_dentry_release(struct dentry *dentry)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+
+	if (oe) {
+		dput(oe->__upperdentry);
+		dput(oe->__upperdentry);
+		dput(oe->lowerdentry);
+		call_rcu(&oe->rcu, ovl_entry_free);
+	}
+}
+
+const struct dentry_operations ovl_dentry_operations = {
+	.d_release = ovl_dentry_release,
+};
+
+static struct ovl_entry *ovl_alloc_entry(void)
+{
+	return kzalloc(sizeof(struct ovl_entry), GFP_KERNEL);
+}
+
+static struct dentry *ovl_lookup_real(struct dentry *dir, struct qstr *name)
+{
+	struct dentry *dentry;
+
+	mutex_lock(&dir->d_inode->i_mutex);
+	dentry = lookup_one_len(name->name, dir, name->len);
+	mutex_unlock(&dir->d_inode->i_mutex);
+
+	if (IS_ERR(dentry)) {
+		if (PTR_ERR(dentry) == -ENOENT)
+			dentry = NULL;
+	} else if (!dentry->d_inode) {
+		dput(dentry);
+		dentry = NULL;
+	}
+	return dentry;
+}
+
+int ovl_do_lookup(struct dentry *dentry)
+{
+	struct ovl_entry *oe;
+	struct dentry *upperdir;
+	struct dentry *lowerdir;
+	struct dentry *upperdentry = NULL;
+	struct dentry *lowerdentry = NULL;
+	struct inode *inode = NULL;
+	int err;
+
+	err = -ENOMEM;
+	oe = ovl_alloc_entry();
+	if (!oe)
+		goto out;
+
+	upperdir = ovl_dentry_upper(dentry->d_parent);
+	lowerdir = ovl_dentry_lower(dentry->d_parent);
+
+	if (upperdir) {
+		upperdentry = ovl_lookup_real(upperdir, &dentry->d_name);
+		err = PTR_ERR(upperdentry);
+		if (IS_ERR(upperdentry))
+			goto out_put_dir;
+
+		if (lowerdir && upperdentry &&
+		    (S_ISLNK(upperdentry->d_inode->i_mode) ||
+		     S_ISDIR(upperdentry->d_inode->i_mode))) {
+			const struct cred *old_cred;
+			struct cred *override_cred;
+
+			err = -ENOMEM;
+			override_cred = prepare_creds();
+			if (!override_cred)
+				goto out_dput_upper;
+
+			/* CAP_SYS_ADMIN needed for getxattr */
+			cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+			old_cred = override_creds(override_cred);
+
+			if (ovl_is_opaquedir(upperdentry)) {
+				oe->opaque = true;
+			} else if (ovl_is_whiteout(upperdentry)) {
+				dput(upperdentry);
+				upperdentry = NULL;
+				oe->opaque = true;
+			}
+			revert_creds(old_cred);
+			put_cred(override_cred);
+		}
+	}
+	if (lowerdir && !oe->opaque) {
+		lowerdentry = ovl_lookup_real(lowerdir, &dentry->d_name);
+		err = PTR_ERR(lowerdentry);
+		if (IS_ERR(lowerdentry))
+			goto out_dput_upper;
+	}
+
+	if (lowerdentry && upperdentry &&
+	    (!S_ISDIR(upperdentry->d_inode->i_mode) ||
+	     !S_ISDIR(lowerdentry->d_inode->i_mode))) {
+		dput(lowerdentry);
+		lowerdentry = NULL;
+		oe->opaque = true;
+	}
+
+	if (lowerdentry || upperdentry) {
+		struct dentry *realdentry;
+
+		realdentry = upperdentry ? upperdentry : lowerdentry;
+		err = -ENOMEM;
+		inode = ovl_new_inode(dentry->d_sb, realdentry->d_inode->i_mode, oe);
+		if (!inode)
+			goto out_dput;
+	}
+
+	if (upperdentry)
+		oe->__upperdentry = dget(upperdentry);
+
+	if (lowerdentry)
+		oe->lowerdentry = lowerdentry;
+
+	dentry->d_fsdata = oe;
+	dentry->d_op = &ovl_dentry_operations;
+	d_add(dentry, inode);
+
+	return 0;
+
+out_dput:
+	dput(lowerdentry);
+out_dput_upper:
+	dput(upperdentry);
+out_put_dir:
+	kfree(oe);
+out:
+	return err;
+}
+
+static void ovl_put_super(struct super_block *sb)
+{
+	struct ovl_fs *ufs = sb->s_fs_info;
+
+	if (!(sb->s_flags & MS_RDONLY))
+		mnt_drop_write(ufs->upper_mnt);
+
+	mntput(ufs->upper_mnt);
+	mntput(ufs->lower_mnt);
+
+	kfree(ufs);
+}
+
+static int ovl_remount_fs(struct super_block *sb, int *flagsp, char *data)
+{
+	int flags = *flagsp;
+	struct ovl_fs *ufs = sb->s_fs_info;
+
+	/* When remounting rw or ro, we need to adjust the write access to the
+	 * upper fs.
+	 */
+	if (((flags ^ sb->s_flags) & MS_RDONLY) == 0)
+		/* No change to readonly status */
+		return 0;
+
+	if (flags & MS_RDONLY) {
+		mnt_drop_write(ufs->upper_mnt);
+		return 0;
+	} else
+		return mnt_want_write(ufs->upper_mnt);
+}
+
+static const struct super_operations ovl_super_operations = {
+	.put_super	= ovl_put_super,
+	.remount_fs	= ovl_remount_fs,
+};
+
+struct ovl_config {
+	char *lowerdir;
+	char *upperdir;
+};
+
+enum {
+	Opt_lowerdir,
+	Opt_upperdir,
+	Opt_err,
+};
+
+static const match_table_t ovl_tokens = {
+	{Opt_lowerdir,			"lowerdir=%s"},
+	{Opt_upperdir,			"upperdir=%s"},
+	{Opt_err,			NULL}
+};
+
+static int ovl_parse_opt(char *opt, struct ovl_config *config)
+{
+	char *p;
+
+	config->upperdir = NULL;
+	config->lowerdir = NULL;
+
+	while ((p = strsep(&opt, ",")) != NULL) {
+		int token;
+		substring_t args[MAX_OPT_ARGS];
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, ovl_tokens, args);
+		switch (token) {
+		case Opt_upperdir:
+			kfree(config->upperdir);
+			config->upperdir = match_strdup(&args[0]);
+			if (!config->upperdir)
+				return -ENOMEM;
+			break;
+
+		case Opt_lowerdir:
+			kfree(config->lowerdir);
+			config->lowerdir = match_strdup(&args[0]);
+			if (!config->lowerdir)
+				return -ENOMEM;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static int ovl_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct path lowerpath;
+	struct path upperpath;
+	struct inode *root_inode;
+	struct dentry *root_dentry;
+	struct ovl_entry *oe;
+	struct ovl_fs *ufs;
+	struct ovl_config config;
+	int err;
+
+	err = ovl_parse_opt((char *) data, &config);
+	if (err)
+		goto out;
+
+	err = -EINVAL;
+	if (!config.upperdir || !config.lowerdir) {
+		printk(KERN_ERR "overlayfs: missing upperdir or lowerdir\n");
+		goto out_free_config;
+	}
+
+	err = -ENOMEM;
+	ufs = kmalloc(sizeof(struct ovl_fs), GFP_KERNEL);
+	if (!ufs)
+		goto out_free_config;
+
+	oe = ovl_alloc_entry();
+	if (oe == NULL)
+		goto out_free_ufs;
+
+	root_inode = ovl_new_inode(sb, S_IFDIR, oe);
+	if (!root_inode)
+		goto out_free_oe;
+
+	err = kern_path(config.upperdir, LOOKUP_FOLLOW, &upperpath);
+	if (err)
+		goto out_put_root;
+
+	err = kern_path(config.lowerdir, LOOKUP_FOLLOW, &lowerpath);
+	if (err)
+		goto out_put_upperpath;
+
+	err = -ENOTDIR;
+	if (!S_ISDIR(upperpath.dentry->d_inode->i_mode) ||
+	    !S_ISDIR(lowerpath.dentry->d_inode->i_mode))
+		goto out_put_lowerpath;
+
+	ufs->upper_mnt = clone_private_mount(&upperpath);
+	err = PTR_ERR(ufs->upper_mnt);
+	if (IS_ERR(ufs->upper_mnt)) {
+		printk(KERN_ERR "overlayfs: failed to clone upperpath\n");
+		goto out_put_lowerpath;
+	}
+
+	ufs->lower_mnt = clone_private_mount(&lowerpath);
+	err = PTR_ERR(ufs->lower_mnt);
+	if (IS_ERR(ufs->lower_mnt)) {
+		printk(KERN_ERR "overlayfs: failed to clone lowerpath\n");
+		goto out_put_upper_mnt;
+	}
+
+	if (!(sb->s_flags & MS_RDONLY)) {
+		err = mnt_want_write(ufs->upper_mnt);
+		if (err)
+			goto out_put_lower_mnt;
+	}
+
+	err = -ENOMEM;
+	root_dentry = d_alloc_root(root_inode);
+	if (!root_dentry)
+		goto out_drop_write;
+
+	mntput(upperpath.mnt);
+	mntput(lowerpath.mnt);
+
+	oe->__upperdentry = dget(upperpath.dentry);
+	oe->lowerdentry = lowerpath.dentry;
+
+	root_dentry->d_fsdata = oe;
+	root_dentry->d_op = &ovl_dentry_operations;
+
+	sb->s_op = &ovl_super_operations;
+	sb->s_root = root_dentry;
+	sb->s_fs_info = ufs;
+
+	return 0;
+
+out_drop_write:
+	if (!(sb->s_flags & MS_RDONLY))
+		mnt_drop_write(ufs->upper_mnt);
+out_put_lower_mnt:
+	mntput(ufs->lower_mnt);
+out_put_upper_mnt:
+	mntput(ufs->upper_mnt);
+out_put_lowerpath:
+	path_put(&lowerpath);
+out_put_upperpath:
+	path_put(&upperpath);
+out_put_root:
+	iput(root_inode);
+out_free_oe:
+	kfree(oe);
+out_free_ufs:
+	kfree(ufs);
+out_free_config:
+	kfree(config.lowerdir);
+	kfree(config.upperdir);
+out:
+	return err;
+}
+
+static struct dentry *ovl_mount(struct file_system_type *fs_type, int flags,
+				const char *dev_name, void *raw_data)
+{
+	return mount_nodev(fs_type, flags, raw_data, ovl_fill_super);
+}
+
+static struct file_system_type ovl_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= "overlayfs",
+	.mount		= ovl_mount,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init ovl_init(void)
+{
+	return register_filesystem(&ovl_fs_type);
+}
+
+static void __exit ovl_exit(void)
+{
+	unregister_filesystem(&ovl_fs_type);
+}
+
+module_init(ovl_init);
+module_exit(ovl_exit);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 5/7] overlayfs: add statfs support
  2011-06-01 12:46 [PATCH 0/7] overlay filesystem: request for inclusion Miklos Szeredi
                   ` (3 preceding siblings ...)
  2011-06-01 12:46 ` [PATCH 4/7] overlay filesystem Miklos Szeredi
@ 2011-06-01 12:46 ` Miklos Szeredi
  2011-06-01 12:46 ` [PATCH 6/7] overlayfs: implement show_options Miklos Szeredi
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-01 12:46 UTC (permalink / raw)
  To: viro, torvalds
  Cc: linux-fsdevel, linux-kernel, akpm, apw, nbd, neilb, hramrach,
	jordipujolp, ezk, mszeredi

From: Andy Whitcroft <apw@canonical.com>

Add support for statfs to the overlayfs filesystem.  As the upper layer
is the target of all write operations assume that the space in that
filesystem is the space in the overlayfs.  There will be some inaccuracy as
overwriting a file will copy it up and consume space we were not expecting,
but it is better than nothing.

Use the upper layer dentry and mount from the overlayfs root inode,
passing the statfs call to that filesystem.

Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/overlayfs/super.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index a344b42..c9db954 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -385,9 +385,29 @@ static int ovl_remount_fs(struct super_block *sb, int *flagsp, char *data)
 		return mnt_want_write(ufs->upper_mnt);
 }
 
+/**
+ * ovl_statfs
+ * @sb: The overlayfs super block
+ * @buf: The struct kstatfs to fill in with stats
+ *
+ * Get the filesystem statistics.  As writes always target the upper layer
+ * filesystem pass the statfs to the same filesystem.
+ */
+static int ovl_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+	struct dentry *root_dentry = dentry->d_sb->s_root;
+	struct path path;
+	ovl_path_upper(root_dentry, &path);
+
+	if (!path.dentry->d_sb->s_op->statfs)
+		return -ENOSYS;
+	return path.dentry->d_sb->s_op->statfs(path.dentry, buf);
+}
+
 static const struct super_operations ovl_super_operations = {
 	.put_super	= ovl_put_super,
 	.remount_fs	= ovl_remount_fs,
+	.statfs		= ovl_statfs,
 };
 
 struct ovl_config {
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 6/7] overlayfs: implement show_options
  2011-06-01 12:46 [PATCH 0/7] overlay filesystem: request for inclusion Miklos Szeredi
                   ` (4 preceding siblings ...)
  2011-06-01 12:46 ` [PATCH 5/7] overlayfs: add statfs support Miklos Szeredi
@ 2011-06-01 12:46 ` Miklos Szeredi
  2011-06-01 12:46 ` [PATCH 7/7] overlay: overlay filesystem documentation Miklos Szeredi
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-01 12:46 UTC (permalink / raw)
  To: viro, torvalds
  Cc: linux-fsdevel, linux-kernel, akpm, apw, nbd, neilb, hramrach,
	jordipujolp, ezk, mszeredi, Erez Zadok

From: Erez Zadok <ezk@fsl.cs.sunysb.edu>

This is useful because of the stacking nature of overlayfs.  Users like to
find out (via /proc/mounts) which lower/upper directory were used at mount
time.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/overlayfs/super.c |   63 ++++++++++++++++++++++++++++++++++----------------
 1 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index c9db954..7109b45 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -15,17 +15,27 @@
 #include <linux/slab.h>
 #include <linux/parser.h>
 #include <linux/module.h>
+#include <linux/seq_file.h>
 #include "overlayfs.h"
 
 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
 MODULE_DESCRIPTION("Overlay filesystem");
 MODULE_LICENSE("GPL");
 
+struct ovl_config {
+	char *lowerdir;
+	char *upperdir;
+};
+
+/* private information held for overlayfs's superblock */
 struct ovl_fs {
 	struct vfsmount *upper_mnt;
 	struct vfsmount *lower_mnt;
+	/* pathnames of lower and upper dirs, for show_options */
+	struct ovl_config config;
 };
 
+/* private information held for every overlayfs dentry */
 struct ovl_entry {
 	/*
 	 * Keep "double reference" on upper dentries, so that
@@ -363,6 +373,8 @@ static void ovl_put_super(struct super_block *sb)
 	mntput(ufs->upper_mnt);
 	mntput(ufs->lower_mnt);
 
+	kfree(ufs->config.lowerdir);
+	kfree(ufs->config.upperdir);
 	kfree(ufs);
 }
 
@@ -404,15 +416,27 @@ static int ovl_statfs(struct dentry *dentry, struct kstatfs *buf)
 	return path.dentry->d_sb->s_op->statfs(path.dentry, buf);
 }
 
+/**
+ * ovl_show_options
+ *
+ * Prints the mount options for a given superblock.
+ * Returns zero; does not fail.
+ */
+static int ovl_show_options(struct seq_file *m, struct vfsmount *mnt)
+{
+	struct super_block *sb = mnt->mnt_sb;
+	struct ovl_fs *ufs = sb->s_fs_info;
+
+	seq_printf(m, ",lowerdir=%s", ufs->config.lowerdir);
+	seq_printf(m, ",upperdir=%s", ufs->config.upperdir);
+	return 0;
+}
+
 static const struct super_operations ovl_super_operations = {
 	.put_super	= ovl_put_super,
 	.remount_fs	= ovl_remount_fs,
 	.statfs		= ovl_statfs,
-};
-
-struct ovl_config {
-	char *lowerdir;
-	char *upperdir;
+	.show_options	= ovl_show_options,
 };
 
 enum {
@@ -472,37 +496,36 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	struct dentry *root_dentry;
 	struct ovl_entry *oe;
 	struct ovl_fs *ufs;
-	struct ovl_config config;
 	int err;
 
-	err = ovl_parse_opt((char *) data, &config);
-	if (err)
+	err = -ENOMEM;
+	ufs = kmalloc(sizeof(struct ovl_fs), GFP_KERNEL);
+	if (!ufs)
 		goto out;
 
+	err = ovl_parse_opt((char *) data, &ufs->config);
+	if (err)
+		goto out_free_ufs;
+
 	err = -EINVAL;
-	if (!config.upperdir || !config.lowerdir) {
+	if (!ufs->config.upperdir || !ufs->config.lowerdir) {
 		printk(KERN_ERR "overlayfs: missing upperdir or lowerdir\n");
 		goto out_free_config;
 	}
 
-	err = -ENOMEM;
-	ufs = kmalloc(sizeof(struct ovl_fs), GFP_KERNEL);
-	if (!ufs)
-		goto out_free_config;
-
 	oe = ovl_alloc_entry();
 	if (oe == NULL)
-		goto out_free_ufs;
+		goto out_free_config;
 
 	root_inode = ovl_new_inode(sb, S_IFDIR, oe);
 	if (!root_inode)
 		goto out_free_oe;
 
-	err = kern_path(config.upperdir, LOOKUP_FOLLOW, &upperpath);
+	err = kern_path(ufs->config.upperdir, LOOKUP_FOLLOW, &upperpath);
 	if (err)
 		goto out_put_root;
 
-	err = kern_path(config.lowerdir, LOOKUP_FOLLOW, &lowerpath);
+	err = kern_path(ufs->config.lowerdir, LOOKUP_FOLLOW, &lowerpath);
 	if (err)
 		goto out_put_upperpath;
 
@@ -566,11 +589,11 @@ out_put_root:
 	iput(root_inode);
 out_free_oe:
 	kfree(oe);
+out_free_config:
+	kfree(ufs->config.lowerdir);
+	kfree(ufs->config.upperdir);
 out_free_ufs:
 	kfree(ufs);
-out_free_config:
-	kfree(config.lowerdir);
-	kfree(config.upperdir);
 out:
 	return err;
 }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 7/7] overlay: overlay filesystem documentation
  2011-06-01 12:46 [PATCH 0/7] overlay filesystem: request for inclusion Miklos Szeredi
                   ` (5 preceding siblings ...)
  2011-06-01 12:46 ` [PATCH 6/7] overlayfs: implement show_options Miklos Szeredi
@ 2011-06-01 12:46 ` Miklos Szeredi
  2011-06-08 22:32 ` [PATCH 0/7] overlay filesystem: request for inclusion Andrew Morton
       [not found] ` <4540f7aa16724111bd792a1d577261c2@HUBCAS1.cs.stonybrook.edu>
  8 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-01 12:46 UTC (permalink / raw)
  To: viro, torvalds
  Cc: linux-fsdevel, linux-kernel, akpm, apw, nbd, neilb, hramrach,
	jordipujolp, ezk, mszeredi

From: Neil Brown <neilb@suse.de>

Document the overlay filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 Documentation/filesystems/overlayfs.txt |  167 +++++++++++++++++++++++++++++++
 MAINTAINERS                             |    7 ++
 2 files changed, 174 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/overlayfs.txt

diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
new file mode 100644
index 0000000..4bc0b34
--- /dev/null
+++ b/Documentation/filesystems/overlayfs.txt
@@ -0,0 +1,167 @@
+Written by: Neil Brown <neilb@suse.de>
+
+Overlay Filesystem
+==================
+
+This document describes a prototype for a new approach to providing
+overlay-filesystem functionality in Linux (sometimes referred to as
+union-filesystems).  An overlay-filesystem tries to present a
+filesystem which is the result over overlaying one filesystem on top
+of the other.
+
+The result will inevitably fail to look exactly like a normal
+filesystem for various technical reasons.  The expectation is that
+many use cases will be able to ignore these differences.
+
+This approach is 'hybrid' because the objects that appear in the
+filesystem do not all appear to belong to that filesystem.  In many
+cases an object accessed in the union will be indistinguishable
+from accessing the corresponding object from the original filesystem.
+This is most obvious from the 'st_dev' field returned by stat(2).
+
+While directories will report an st_dev from the overlay-filesystem,
+all non-directory objects will report an st_dev from the lower or
+upper filesystem that is providing the object.  Similarly st_ino will
+only be unique when combined with st_dev, and both of these can change
+over the lifetime of a non-directory object.  Many applications and
+tools ignore these values and will not be affected.
+
+Upper and Lower
+---------------
+
+An overlay filesystem combines two filesystems - an 'upper' filesystem
+and a 'lower' filesystem.  When a name exists in both filesystems, the
+object in the 'upper' filesystem is visible while the object in the
+'lower' filesystem is either hidden or, in the case of directories,
+merged with the 'upper' object.
+
+It would be more correct to refer to an upper and lower 'directory
+tree' rather than 'filesystem' as it is quite possible for both
+directory trees to be in the same filesystem and there is no
+requirement that the root of a filesystem be given for either upper or
+lower.
+
+The lower filesystem can be any filesystem supported by Linux and does
+not need to be writable.  The lower filesystem can even be another
+overlayfs.  The upper filesystem will normally be writable and if it
+is it must support the creation of trusted.* extended attributes, and
+must provide valid d_type in readdir responses, at least for symbolic
+links - so NFS is not suitable.
+
+A read-only overlay of two read-only filesystems may use any
+filesystem type.
+
+Directories
+-----------
+
+Overlaying mainly involved directories.  If a given name appears in both
+upper and lower filesystems and refers to a non-directory in either,
+then the lower object is hidden - the name refers only to the upper
+object.
+
+Where both upper and lower objects are directories, a merged directory
+is formed.
+
+At mount time, the two directories given as mount options are combined
+into a merged directory:
+
+  mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper /overlay
+
+Then whenever a lookup is requested in such a merged directory, the
+lookup is performed in each actual directory and the combined result
+is cached in the dentry belonging to the overlay filesystem.  If both
+actual lookups find directories, both are stored and a merged
+directory is created, otherwise only one is stored: the upper if it
+exists, else the lower.
+
+Only the lists of names from directories are merged.  Other content
+such as metadata and extended attributes are reported for the upper
+directory only.  These attributes of the lower directory are hidden.
+
+whiteouts and opaque directories
+--------------------------------
+
+In order to support rm and rmdir without changing the lower
+filesystem, an overlay filesystem needs to record in the upper filesystem
+that files have been removed.  This is done using whiteouts and opaque
+directories (non-directories are always opaque).
+
+The overlay filesystem uses extended attributes with a
+"trusted.overlay."  prefix to record these details.
+
+A whiteout is created as a symbolic link with target
+"(overlay-whiteout)" and with xattr "trusted.overlay.whiteout" set to "y".
+When a whiteout is found in the upper level of a merged directory, any
+matching name in the lower level is ignored, and the whiteout itself
+is also hidden.
+
+A directory is made opaque by setting the xattr "trusted.overlay.opaque"
+to "y".  Where the upper filesystem contains an opaque directory, any
+directory in the lower filesystem with the same name is ignored.
+
+readdir
+-------
+
+When a 'readdir' request is made on a merged directory, the upper and
+lower directories are each read and the name lists merged in the
+obvious way (upper is read first, then lower - entries that already
+exist are not re-added).  This merged name list is cached in the
+'struct file' and so remains as long as the file is kept open.  If the
+directory is opened and read by two processes at the same time, they
+will each have separate caches.  A seekdir to the start of the
+directory (offset 0) followed by a readdir will cause the cache to be
+discarded and rebuilt.
+
+This means that changes to the merged directory do not appear while a
+directory is being read.  This is unlikely to be noticed by many
+programs.
+
+seek offsets are assigned sequentially when the directories are read.
+Thus if
+  - read part of a directory
+  - remember an offset, and close the directory
+  - re-open the directory some time later
+  - seek to the remembered offset
+
+there may be little correlation between the old and new locations in
+the list of filenames, particularly if anything has changed in the
+directory.
+
+Readdir on directories that are not merged is simply handled by the
+underlying directory (upper or lower).
+
+
+Non-directories
+---------------
+
+Objects that are not directories (files, symlinks, device-special
+files etc.) are presented either from the upper or lower filesystem as
+appropriate.  When a file in the lower filesystem is accessed in a way
+the requires write-access, such as opening for write access, changing
+some metadata etc., the file is first copied from the lower filesystem
+to the upper filesystem (copy_up).  Note that creating a hard-link
+also requires copy_up, though of course creation of a symlink does
+not.
+
+The copy_up process first makes sure that the containing directory
+exists in the upper filesystem - creating it and any parents as
+necessary.  It then creates the object with the same metadata (owner,
+mode, mtime, symlink-target etc.) and then if the object is a file, the
+data is copied from the lower to the upper filesystem.  Finally any
+extended attributes are copied up.
+
+Once the copy_up is complete, the overlay filesystem simply
+provides direct access to the newly created file in the upper
+filesystem - future operations on the file are barely noticed by the
+overlay filesystem (though an operation on the name of the file such as
+rename or unlink will of course be noticed and handled).
+
+Changes to underlying filesystems
+---------------------------------
+
+Offline changes, when the overlay is not mounted, are allowed to either
+the upper or the lower trees.
+
+Changes to the underlying filesystems while part of a mounted overlay
+filesystem are not allowed.  This is not yet enforced, but will be in
+the future.
diff --git a/MAINTAINERS b/MAINTAINERS
index 29801f7..5ffbe08 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4715,6 +4715,13 @@ F:	drivers/scsi/osd/
 F:	include/scsi/osd_*
 F:	fs/exofs/
 
+OVERLAYFS FILESYSTEM
+M:	Miklos Szeredi <miklos@szeredi.hu>
+L:	linux-fsdevel@vger.kernel.org
+S:	Supported
+F:	fs/overlayfs/*
+F:	Documentation/filesystems/overlayfs.txt
+
 P54 WIRELESS DRIVER
 M:	Christian Lamparter <chunkeey@googlemail.com>
 L:	linux-wireless@vger.kernel.org
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-01 12:46 [PATCH 0/7] overlay filesystem: request for inclusion Miklos Szeredi
                   ` (6 preceding siblings ...)
  2011-06-01 12:46 ` [PATCH 7/7] overlay: overlay filesystem documentation Miklos Szeredi
@ 2011-06-08 22:32 ` Andrew Morton
  2011-06-09  1:59   ` NeilBrown
  2011-07-05 19:54   ` Hans-Peter Jansen
       [not found] ` <4540f7aa16724111bd792a1d577261c2@HUBCAS1.cs.stonybrook.edu>
  8 siblings, 2 replies; 74+ messages in thread
From: Andrew Morton @ 2011-06-08 22:32 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd, neilb,
	hramrach, jordipujolp, ezk, mszeredi

On Wed,  1 Jun 2011 14:46:13 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:

> I'd like to ask for overlayfs to be merged into 3.1.

Dumb questions:

I've never really understood the need for fs overlaying.  Who wants it?
What are the use-cases?

This sort of thing could be implemented in userspace and wired up via
fuse, I assume.  Has that been attempted and why is it inadequate?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-08 22:32 ` [PATCH 0/7] overlay filesystem: request for inclusion Andrew Morton
@ 2011-06-09  1:59   ` NeilBrown
  2011-06-09  3:52     ` Andrew Morton
  2011-07-05 19:54   ` Hans-Peter Jansen
  1 sibling, 1 reply; 74+ messages in thread
From: NeilBrown @ 2011-06-09  1:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Miklos Szeredi, viro, torvalds, linux-fsdevel, linux-kernel, apw,
	nbd, hramrach, jordipujolp, ezk, mszeredi

On Wed, 8 Jun 2011 15:32:08 -0700 Andrew Morton <akpm@linux-foundation.org>
wrote:

> On Wed,  1 Jun 2011 14:46:13 +0200
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
> > I'd like to ask for overlayfs to be merged into 3.1.
> 
> Dumb questions:
> 
> I've never really understood the need for fs overlaying.  Who wants it?
> What are the use-cases?

https://lwn.net/Articles/324291/

I think the strongest use case is that LIVE-DVD's want it to have a write-able
root filesystem which is stored on the DVD.

> 
> This sort of thing could be implemented in userspace and wired up via
> fuse, I assume.  Has that been attempted and why is it inadequate?

I think that would be a valid question if the proposal was large and
complex.  But overlayfs is really quite small and self-contained.

NeilBrown

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09  1:59   ` NeilBrown
@ 2011-06-09  3:52     ` Andrew Morton
  2011-06-09 12:47       ` Miklos Szeredi
                         ` (3 more replies)
  0 siblings, 4 replies; 74+ messages in thread
From: Andrew Morton @ 2011-06-09  3:52 UTC (permalink / raw)
  To: NeilBrown
  Cc: Miklos Szeredi, viro, torvalds, linux-fsdevel, linux-kernel, apw,
	nbd, hramrach, jordipujolp, ezk, mszeredi

On Thu, 9 Jun 2011 11:59:34 +1000 NeilBrown <neilb@suse.de> wrote:

> On Wed, 8 Jun 2011 15:32:08 -0700 Andrew Morton <akpm@linux-foundation.org>
> wrote:
> 
> > On Wed,  1 Jun 2011 14:46:13 +0200
> > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > 
> > > I'd like to ask for overlayfs to be merged into 3.1.
> > 
> > Dumb questions:
> > 
> > I've never really understood the need for fs overlaying.  Who wants it?
> > What are the use-cases?
> 
> https://lwn.net/Articles/324291/
> 
> I think the strongest use case is that LIVE-DVD's want it to have a write-able
> root filesystem which is stored on the DVD.

Well, these things have been around for over 20 years.  What motivated
the developers of other OS's to develop these things and how are their
users using them?

> > 
> > This sort of thing could be implemented in userspace and wired up via
> > fuse, I assume.  Has that been attempted and why is it inadequate?
> 
> I think that would be a valid question if the proposal was large and
> complex.  But overlayfs is really quite small and self-contained.

Not merging it would be even smaller and simpler.  If there is a
userspace alternative then that option should be evaluated and compared
in a rational manner.

Another issue: there have been numerous attempts at Linux overlay
filesystems from numerous parties.  Does (or will) this implementation
satisfy all their requirements?

Because if not, we're in a situation where the in-kernel code is
unfixably inadequate so we end up merging another similar-looking
thing, or the presence of this driver makes it harder for them to get
other drivers merged and the other parties' requirements remain
unsatisfied.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09  3:52     ` Andrew Morton
@ 2011-06-09 12:47       ` Miklos Szeredi
  2011-06-09 19:38         ` Andrew Morton
  2011-06-09 13:49       ` Andy Whitcroft
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-09 12:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: NeilBrown, viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	hramrach, jordipujolp, ezk

Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu, 9 Jun 2011 11:59:34 +1000 NeilBrown <neilb@suse.de> wrote:
>
>> On Wed, 8 Jun 2011 15:32:08 -0700 Andrew Morton <akpm@linux-foundation.org>
>> wrote:
>> > I've never really understood the need for fs overlaying.  Who wants it?
>> > What are the use-cases?
>> 
>> https://lwn.net/Articles/324291/
>> 
>> I think the strongest use case is that LIVE-DVD's want it to have a
>> write-able root filesystem which is stored on the DVD.
>
> Well, these things have been around for over 20 years.  What motivated
> the developers of other OS's to develop these things and how are their
> users using them?

That's a good question, Erez might be able to answer that better.

We have customers who need this for the "common base + writable
configuration" case in a virtualized environment.

Since overlayfs's announcement several projects have tried it and have
been very good testers and bug reporters.  These include OpenWRT, Ubuntu
and other Debian based live systems.

>> > This sort of thing could be implemented in userspace and wired up via
>> > fuse, I assume.  Has that been attempted and why is it inadequate?

Yes, unionfs-fuse and deltafs (written by me) are two examples.

One issue that a customer had with deltafs was lack of XIP support.  The
other one (from the same customer) was the general yuck factor of
userspace filesystems.

There are also performance and resource use issues associated with
userspace filesystems.  These may or may not be problem depending on the
actual use.  But it's a fact that out-of-kernel filesystems will never
be as efficient as in-kernel ones.

>> I think that would be a valid question if the proposal was large and
>> complex.  But overlayfs is really quite small and self-contained.
>
> Not merging it would be even smaller and simpler.  If there is a
> userspace alternative then that option should be evaluated and compared
> in a rational manner.
>
>
>
> Another issue: there have been numerous attempts at Linux overlay
> filesystems from numerous parties.  Does (or will) this implementation
> satisfy all their requirements?

Overlayfs aims to be the simplest possible but not simpler.

I think the reason why "aufs" never had a real chance at getting merged
is because of feature creep.

Of course I expect new features to be added to overlayfs after the
merge, but I beleive some of the features in those other solutions are
simply unnecessary.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09 12:47       ` Miklos Szeredi
@ 2011-06-09 19:38         ` Andrew Morton
  2011-06-09 19:49           ` Felix Fietkau
  2011-06-09 22:02           ` Miklos Szeredi
  0 siblings, 2 replies; 74+ messages in thread
From: Andrew Morton @ 2011-06-09 19:38 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: NeilBrown, viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	hramrach, jordipujolp, ezk

On Thu, 09 Jun 2011 14:47:49 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:

> Andrew Morton <akpm@linux-foundation.org> writes:
> 
> > On Thu, 9 Jun 2011 11:59:34 +1000 NeilBrown <neilb@suse.de> wrote:
> >
> >> On Wed, 8 Jun 2011 15:32:08 -0700 Andrew Morton <akpm@linux-foundation.org>
> >> wrote:
> >> > I've never really understood the need for fs overlaying.  Who wants it?
> >> > What are the use-cases?
> >> 
> >> https://lwn.net/Articles/324291/
> >> 
> >> I think the strongest use case is that LIVE-DVD's want it to have a
> >> write-able root filesystem which is stored on the DVD.
> >
> > Well, these things have been around for over 20 years.  What motivated
> > the developers of other OS's to develop these things and how are their
> > users using them?
> 
> That's a good question, Erez might be able to answer that better.
> 
> We have customers who need this for the "common base + writable
> configuration" case in a virtualized environment.
> 
> Since overlayfs's announcement several projects have tried it and have
> been very good testers and bug reporters.  These include OpenWRT, Ubuntu
> and other Debian based live systems.

I assume that the live CD was your motivator for developing overlayfs?

> >> > This sort of thing could be implemented in userspace and wired up via
> >> > fuse, I assume.  Has that been attempted and why is it inadequate?
> 
> Yes, unionfs-fuse and deltafs (written by me) are two examples.
> 
> One issue that a customer had with deltafs was lack of XIP support.  The
> other one (from the same customer) was the general yuck factor of
> userspace filesystems.
> 
> There are also performance and resource use issues associated with
> userspace filesystems.  These may or may not be problem depending on the
> actual use.  But it's a fact that out-of-kernel filesystems will never
> be as efficient as in-kernel ones.

Yes, userspace filesystems have a good yuck factor.  In a way it's a
sad commentary on the concept of FUSE, but I guess one could look at it
another way: FUSE is good for prototypes and oddball small-volume stuff
but once a FUSE-based setup has proven useful and people are getting
benefit from it, it's time to look at an in-kernel implementation.

> > Another issue: there have been numerous attempts at Linux overlay
> > filesystems from numerous parties.  Does (or will) this implementation
> > satisfy all their requirements?
> 
> Overlayfs aims to be the simplest possible but not simpler.
> 
> I think the reason why "aufs" never had a real chance at getting merged
> is because of feature creep.
> 
> Of course I expect new features to be added to overlayfs after the
> merge, but I beleive some of the features in those other solutions are
> simply unnecessary.

This is my main worry.  If overlayfs doesn't appreciably decrease the
motivation to merge other unioned filesystems then we might end up with
two similar-looking things.  And, I assume, the later and more
fully-blown implementation might make overlayfs obsolete but by that
time it will be hard to remove.

So it would be interesting to hear the thoughts of the people who have
been working on the other implementations.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09 19:38         ` Andrew Morton
@ 2011-06-09 19:49           ` Felix Fietkau
  2011-06-09 22:02           ` Miklos Szeredi
  1 sibling, 0 replies; 74+ messages in thread
From: Felix Fietkau @ 2011-06-09 19:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Miklos Szeredi, NeilBrown, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, hramrach, jordipujolp, ezk

On 2011-06-09 9:38 PM, Andrew Morton wrote:
>>  >>  >  This sort of thing could be implemented in userspace and wired up via
>>  >>  >  fuse, I assume.  Has that been attempted and why is it inadequate?
>>
>>  Yes, unionfs-fuse and deltafs (written by me) are two examples.
>>
>>  One issue that a customer had with deltafs was lack of XIP support.  The
>>  other one (from the same customer) was the general yuck factor of
>>  userspace filesystems.
>>
>>  There are also performance and resource use issues associated with
>>  userspace filesystems.  These may or may not be problem depending on the
>>  actual use.  But it's a fact that out-of-kernel filesystems will never
>>  be as efficient as in-kernel ones.
>
> Yes, userspace filesystems have a good yuck factor.  In a way it's a
> sad commentary on the concept of FUSE, but I guess one could look at it
> another way: FUSE is good for prototypes and oddball small-volume stuff
> but once a FUSE-based setup has proven useful and people are getting
> benefit from it, it's time to look at an in-kernel implementation.
We're using overlayfs in OpenWrt for embedded systems with 4 MB flash 
and 16 MB RAM for using a jffs2 filesystem as an overlay to squashfs. 
FUSE would add quite a bit of unnecessary bloat to that.

overlayfs so far is the only overlay filesystem implementation that I've 
seen that works well for us with new kernels, and it has all the 
features that we need (which aren't that many).

- Felix

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09 19:38         ` Andrew Morton
  2011-06-09 19:49           ` Felix Fietkau
@ 2011-06-09 22:02           ` Miklos Szeredi
  2011-06-10  3:48             ` J. R. Okajima
  1 sibling, 1 reply; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-09 22:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: NeilBrown, viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	hramrach, jordipujolp, ezk, hooanon05

Andrew Morton <akpm@linux-foundation.org> writes:

>> > Well, these things have been around for over 20 years.  What motivated
>> > the developers of other OS's to develop these things and how are their
>> > users using them?
>> 
>> That's a good question, Erez might be able to answer that better.
>> 
>> We have customers who need this for the "common base + writable
>> configuration" case in a virtualized environment.
>> 
>> Since overlayfs's announcement several projects have tried it and have
>> been very good testers and bug reporters.  These include OpenWRT, Ubuntu
>> and other Debian based live systems.
>
> I assume that the live CD was your motivator for developing overlayfs?

Actually no.  The main motivator was that I started reviewing
union-mounts and got thinking about how to do it better.

>> > Another issue: there have been numerous attempts at Linux overlay
>> > filesystems from numerous parties.  Does (or will) this implementation
>> > satisfy all their requirements?
>> 
>> Overlayfs aims to be the simplest possible but not simpler.
>> 
>> I think the reason why "aufs" never had a real chance at getting merged
>> is because of feature creep.
>> 
>> Of course I expect new features to be added to overlayfs after the
>> merge, but I beleive some of the features in those other solutions are
>> simply unnecessary.
>
> This is my main worry.  If overlayfs doesn't appreciably decrease the
> motivation to merge other unioned filesystems then we might end up with
> two similar-looking things.  And, I assume, the later and more
> fully-blown implementation might make overlayfs obsolete but by that
> time it will be hard to remove.
>
> So it would be interesting to hear the thoughts of the people who have
> been working on the other implementations.

Added J. R. Okajima (aufs maintainer) to CC.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09 22:02           ` Miklos Szeredi
@ 2011-06-10  3:48             ` J. R. Okajima
  2011-06-10  9:31               ` Francis Moreau
                                 ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: J. R. Okajima @ 2011-06-10  3:48 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andrew Morton, NeilBrown, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

Miklos, thanks forwarding.
Here I try replying after summerising several messages.

Feature sets:
----------------------------------------
If I remember correctly, Miklos has mentioned about it like this.
- overlayfs provides the same feature set as UnionMount.
- but its implementation is much smaller and simpler than UnionMount.

I agree with this argument (Oh, I have to confess that I don't test
overlayfs by myself). But it means that overlayfs doesn't provide some
features which UnionMount doesn't provide. I have posted about such
features before, but I list them up again here.
- the inode number may change silently.
- hardlinks may corrupt by copy-up.
- read(2) may get obsolete filedata (fstat(2) for metadata too).
- fcntl(F_SETLK) may be broken by copy-up.
- unnecessary copy-up may happen, for example mmap(MAP_PRIVATE) after
  open(O_RDWR).

Later I noticed one more thing. /proc/PID/{fd/,exe} may not work
correctly for overlayfs while they would work correctly for
UnionMount. In overlayfs, they refer to the file on the real filesystems
(upper or lower) instead of the one in overlayfs, don't they? If so, I
am afraid a few applications may not work correctly, particularly
start-stop-daemon in debian.

I agree that overlayfs is simpler than aufs, because aufs has many
features which Miklos thinks unnecessary. But most features in aufs are
popped out of many reports or requests from users for a loooong time. I
don't think they are unnecessary.

By the way how looong history does aufs have?
It is long enough to allow major distibutors to make obsoleted and
not-maintained version of aufs distributed. I am tired of replying and
describing "your version is obsoleted. ask your distributor to update
aufs or you need to get latest version" or something. I hope you would
know aufs is released every Monday basically (currently I stopped
updating for a few months though).

Approaches in overlayfs:
----------------------------------------
This is also what I have posted, but I write again here since I don't
have any response.
I noticed overlayfs adopts overriding credentials for copy-up.
Although I didn't read about credintials in detail yet, is it safe?
For example, during copy-up other thread in the same process may gain
the higher capabilities unexpectedly? Signal hander in the process too?

Future of overlayfs:
----------------------------------------
I don't know what it will be after merging.
But to support the missing features above, I am afraid overlayfs will
grow by adding them. How large and complicated it will be? As current
aufs or much simpler? Nobody knows except the one who have ever think
how to implement these features.

I remember there was a post saying he can live without fully supported
hardlinks. There may exist people who says those missing features are
less important even if any problem arise. They may just restart their
system and forget everything. It is ok as long as he can be happy. They
might use overlayfs for LiveCD/DVD/Flash only.
But I believe there exists people who think those features important and
necessary. They might use layering for servers or long live systems to
provide their services to others.

Misc.
----------------------------------------
Miklos Szeredi:
> I think the reason why "aufs" never had a real chance at getting merged
> is because of feature creep.

What is "feature creep"?
Does it mean that aufs has many features and they make it much slower as
an insect creeps? If so, I'd suggest you read the aufs manual and try
some options to make it faster by skipping several features. If I
remember correctly, I have not received such report saying aufs is slow
from aufs users. So I'd request you to post some comparision tests
results if you have.

Aufs was rejected merging because LKML people decided that linux
mainline will never include the union-type-filesystem but will include
the union-type-mount (eg. UnionMount).
See http://marc.info/?l=linux-kernel&m=123938317421362&w=2

Michal Suchanek:
> No implementation will satisfy all needs. There is always some
> compromise between availability (userspace/in-tree/easy to patch in)
> feature completeness (eg. AuFS is not so easy to forward-port to new
> kernels but has numerous features) performance, reliability.

Not so easy?
While I stopped updating aufs2 just before 2.6.39 (because I simply have
no time), I think it is easy for aufs to support 2.6.39 or 3.0.
Would you tell me what is so difficult?

Sorry for long mail and broken English
J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-10  3:48             ` J. R. Okajima
@ 2011-06-10  9:31               ` Francis Moreau
  2011-06-16 18:27                 ` Ric Wheeler
  2011-06-10 10:19               ` Michal Suchanek
  2011-06-13 18:48               ` Miklos Szeredi
  2 siblings, 1 reply; 74+ messages in thread
From: Francis Moreau @ 2011-06-10  9:31 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Miklos Szeredi, Andrew Morton, NeilBrown, viro, torvalds,
	linux-fsdevel, linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

"J. R. Okajima" <hooanon05@yahoo.co.jp> writes:

[...]

> Aufs was rejected merging because LKML people decided that linux
> mainline will never include the union-type-filesystem but will include
> the union-type-mount (eg. UnionMount).

BTW, what is the current status of the union mount approach ?

Thanks
-- 
Francis

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-10  9:31               ` Francis Moreau
@ 2011-06-16 18:27                 ` Ric Wheeler
  0 siblings, 0 replies; 74+ messages in thread
From: Ric Wheeler @ 2011-06-16 18:27 UTC (permalink / raw)
  To: Francis Moreau, David Howells
  Cc: J. R. Okajima, Miklos Szeredi, Andrew Morton, NeilBrown, viro,
	torvalds, linux-fsdevel, linux-kernel, apw, nbd, hramrach,
	jordipujolp, ezk, Alexander Viro, Christoph Hellwig

On 06/10/2011 05:31 AM, Francis Moreau wrote:
> "J. R. Okajima"<hooanon05@yahoo.co.jp>  writes:
>
> [...]
>
>> Aufs was rejected merging because LKML people decided that linux
>> mainline will never include the union-type-filesystem but will include
>> the union-type-mount (eg. UnionMount).
> BTW, what is the current status of the union mount approach ?
>
> Thanks

Val has moved on to other projects, but David Howells has been working to 
refresh the union mount patches and post them upstream. Al and Christoph have 
been looking at the patches.

I think that it would be good to get those patches out so people can do a full 
comparison.

Thanks!

Ric

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-10  3:48             ` J. R. Okajima
  2011-06-10  9:31               ` Francis Moreau
@ 2011-06-10 10:19               ` Michal Suchanek
  2011-06-12  7:44                 ` J. R. Okajima
  2011-06-13 18:48               ` Miklos Szeredi
  2 siblings, 1 reply; 74+ messages in thread
From: Michal Suchanek @ 2011-06-10 10:19 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Miklos Szeredi, Andrew Morton, NeilBrown, viro, torvalds,
	linux-fsdevel, linux-kernel, apw, nbd, jordipujolp, ezk

On 10 June 2011 05:48, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:

> Michal Suchanek:
>> No implementation will satisfy all needs. There is always some
>> compromise between availability (userspace/in-tree/easy to patch in)
>> feature completeness (eg. AuFS is not so easy to forward-port to new
>> kernels but has numerous features) performance, reliability.
>
> Not so easy?
> While I stopped updating aufs2 just before 2.6.39 (because I simply have
> no time), I think it is easy for aufs to support 2.6.39 or 3.0.
> Would you tell me what is so difficult?
>
To be fair any out-of-tree in-kernel solution is going to be equally
hard to forward-port.

I am not a kernel VFS hacker so whenever there is a Linux VFS change
other than trivial changes like swapping headers and renaming stuff I
can't use an out-of-tree patch with the changed VFS.

Any solution that leverages the in-kernel interfaces, either hacking
them directly or calling functions not available from userspace is
going to have this issue unless merged into the kernel.

For me the current unionnount and overlayfs are sufficient in that I
can run a live filesystem on top of them reliably.

Others use overlayfs for small systems (eg. OpenWRT) where a solution
as large as aufs is likely not going to fit unless most features can
be compiled out.

Anyway, as I understand it aufs is not going to be merged because the
VFS maintainers don't want a filesyetem (like aufs) but do accept only
mount (overlayfs or unionmount).

So overlayfs is the only way forward now since unionmount development
has stopped.

Thanks

Michal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-10 10:19               ` Michal Suchanek
@ 2011-06-12  7:44                 ` J. R. Okajima
  0 siblings, 0 replies; 74+ messages in thread
From: J. R. Okajima @ 2011-06-12  7:44 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Miklos Szeredi, Andrew Morton, NeilBrown, viro, torvalds,
	linux-fsdevel, linux-kernel, apw, nbd, jordipujolp, ezk


Michal Suchanek:
> Anyway, as I understand it aufs is not going to be merged because the
> VFS maintainers don't want a filesyetem (like aufs) but do accept only
> mount (overlayfs or unionmount).
>
> So overlayfs is the only way forward now since unionmount development
> has stopped.

Actually overlayfs is union-type-filesystem as aufs, instead of
union-type-mount.
If the development of UnionMount is really stopped, then I'd ask people
to consider merging aufs as well as overlayfs.


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-10  3:48             ` J. R. Okajima
  2011-06-10  9:31               ` Francis Moreau
  2011-06-10 10:19               ` Michal Suchanek
@ 2011-06-13 18:48               ` Miklos Szeredi
  2011-07-08 14:44                 ` Miklos Szeredi
  2 siblings, 1 reply; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-13 18:48 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Andrew Morton, NeilBrown, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

"J. R. Okajima" <hooanon05@yahoo.co.jp> writes:

> Miklos, thanks forwarding.
> Here I try replying after summerising several messages.

Okajima-san, thanks for replying.

> Feature sets:
> ----------------------------------------
> If I remember correctly, Miklos has mentioned about it like this.
> - overlayfs provides the same feature set as UnionMount.
> - but its implementation is much smaller and simpler than UnionMount.
>
> I agree with this argument (Oh, I have to confess that I don't test
> overlayfs by myself). But it means that overlayfs doesn't provide some
> features which UnionMount doesn't provide. I have posted about such
> features before, but I list them up again here.
> - the inode number may change silently.
> - hardlinks may corrupt by copy-up.
> - read(2) may get obsolete filedata (fstat(2) for metadata too).
> - fcntl(F_SETLK) may be broken by copy-up.
> - unnecessary copy-up may happen, for example mmap(MAP_PRIVATE) after
>   open(O_RDWR).

Good summary of the unPOSIXy behavior in overlayfs/union-mounts.  Some
of this is already described in Documentation/filesystems/overlayfs.txt,
and I'll add the rest.

> Later I noticed one more thing. /proc/PID/{fd/,exe} may not work
> correctly for overlayfs while they would work correctly for
> UnionMount. In overlayfs, they refer to the file on the real filesystems
> (upper or lower) instead of the one in overlayfs, don't they? If so, I
> am afraid a few applications may not work correctly, particularly
> start-stop-daemon in debian.

You are right, proc symlinks work in unexpected ways in overlayfs and
this is not documented yet either.

> Approaches in overlayfs:
> ----------------------------------------
> This is also what I have posted, but I write again here since I don't
> have any response.
> I noticed overlayfs adopts overriding credentials for copy-up.
> Although I didn't read about credintials in detail yet, is it safe?
> For example, during copy-up other thread in the same process may gain
> the higher capabilities unexpectedly? Signal hander in the process
> too?

The credentials are gained only by one task for the duration of the
operation.  It's not possible to use this to gain elevated privileges
for other tasks or by signal handlers.


> Misc.
> ----------------------------------------
> Miklos Szeredi:
>> I think the reason why "aufs" never had a real chance at getting merged
>> is because of feature creep.
>
> What is "feature creep"?

http://en.wikipedia.org/wiki/Feature_creep

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-13 18:48               ` Miklos Szeredi
@ 2011-07-08 14:44                 ` Miklos Szeredi
  2011-07-08 15:21                   ` Tomas M
  2011-07-09 12:22                   ` J. R. Okajima
  0 siblings, 2 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-07-08 14:44 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Andrew Morton, NeilBrown, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

Miklos Szeredi <miklos@szeredi.hu> writes:

> "J. R. Okajima" <hooanon05@yahoo.co.jp> writes:
>> If I remember correctly, Miklos has mentioned about it like this.
>> - overlayfs provides the same feature set as UnionMount.
>> - but its implementation is much smaller and simpler than UnionMount.
>>
>> I agree with this argument (Oh, I have to confess that I don't test
>> overlayfs by myself). But it means that overlayfs doesn't provide some
>> features which UnionMount doesn't provide. I have posted about such
>> features before, but I list them up again here.
>> - the inode number may change silently.
>> - hardlinks may corrupt by copy-up.
>> - read(2) may get obsolete filedata (fstat(2) for metadata too).
>> - fcntl(F_SETLK) may be broken by copy-up.
>> - unnecessary copy-up may happen, for example mmap(MAP_PRIVATE) after
>>   open(O_RDWR).

Here's a patch to document these limitations.

Thanks,
Miklos
----

Subject: ovl: add limitation to documentation

From: Miklos Szeredi <mszeredi@suse.cz>

J. R. Okajima noted some examples where overlayfs behaves differently
from a standard filesystem.  Describe these cases in the documentation.

Reported-by: "J. R. Okajima" <hooanon05@yahoo.co.jp>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 Documentation/filesystems/overlayfs.txt |   29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

Index: linux-2.6/Documentation/filesystems/overlayfs.txt
===================================================================
--- linux-2.6.orig/Documentation/filesystems/overlayfs.txt	2011-07-07 16:01:47.000000000 +0200
+++ linux-2.6/Documentation/filesystems/overlayfs.txt	2011-07-08 14:16:44.000000000 +0200
@@ -143,6 +143,9 @@ to the upper filesystem (copy_up).  Note
 also requires copy_up, though of course creation of a symlink does
 not.
 
+The copy_up may turn out to be unnecessary, for example if the file is
+opened for read-write but the data is not modified.
+
 The copy_up process first makes sure that the containing directory
 exists in the upper filesystem - creating it and any parents as
 necessary.  It then creates the object with the same metadata (owner,
@@ -156,6 +159,27 @@ filesystem - future operations on the fi
 overlay filesystem (though an operation on the name of the file such as
 rename or unlink will of course be noticed and handled).
 
+
+Non-standard behavior at copy_up
+--------------------------------
+
+The copy_up operation essentially creates a new, identical file and
+moves it over to the old name.  The new file may be on a different
+filesystem, so both st_dev and st_ino of the file may change.
+
+Any open files referring to this inode will access the old data and
+metadata.  Similarly any file locks obtained before copy_up will not
+apply to the copied up file.
+
+If a file with multiple hard links is copied up, then this will
+"break" the link.  Changes will not be propagated to other names
+referring to the same inode.
+
+Symlinks in /proc/PID/ and /proc/PID/fd which point to a non-directory
+object in overlayfs will not contain vaid absolute paths, only
+relative paths leading up to the filesystem's root.  This will be
+fixed in the future.
+
 Changes to underlying filesystems
 ---------------------------------
 
@@ -163,5 +187,6 @@ Offline changes, when the overlay is not
 the upper or the lower trees.
 
 Changes to the underlying filesystems while part of a mounted overlay
-filesystem are not allowed.  This is not yet enforced, but will be in
-the future.
+filesystem are not allowed.  If the underlying filesystem is changed,
+the behavior of the overlay is undefined, though it will not result in
+a crash or deadlock.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-08 14:44                 ` Miklos Szeredi
@ 2011-07-08 15:21                   ` Tomas M
  2011-07-09 12:22                   ` J. R. Okajima
  1 sibling, 0 replies; 74+ messages in thread
From: Tomas M @ 2011-07-08 15:21 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: J. R. Okajima, Andrew Morton, NeilBrown, viro, torvalds,
	linux-fsdevel, linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

> Here's a patch to document these limitations.

Why would we need to 'document limitations' if we can use code which
DOESN'T IMPOSE the limitations? (read: aufs)

Believe me or not, OverlayFS will be pointless if there are any
limitations which make the final 'filesystem' work inconsistently from
the behavior expected by crazy applications, like KDE suite, for
example.

Inode number change (as mentioned in the 'non-standard' behavior
documentation) is a NO NO option, really, applications rely on that
more than you would expect!


Tomas M



On Fri, Jul 8, 2011 at 4:44 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> Miklos Szeredi <miklos@szeredi.hu> writes:
>
>> "J. R. Okajima" <hooanon05@yahoo.co.jp> writes:
>>> If I remember correctly, Miklos has mentioned about it like this.
>>> - overlayfs provides the same feature set as UnionMount.
>>> - but its implementation is much smaller and simpler than UnionMount.
>>>
>>> I agree with this argument (Oh, I have to confess that I don't test
>>> overlayfs by myself). But it means that overlayfs doesn't provide some
>>> features which UnionMount doesn't provide. I have posted about such
>>> features before, but I list them up again here.
>>> - the inode number may change silently.
>>> - hardlinks may corrupt by copy-up.
>>> - read(2) may get obsolete filedata (fstat(2) for metadata too).
>>> - fcntl(F_SETLK) may be broken by copy-up.
>>> - unnecessary copy-up may happen, for example mmap(MAP_PRIVATE) after
>>>   open(O_RDWR).
>
> Here's a patch to document these limitations.
>
> Thanks,
> Miklos
> ----
>
> Subject: ovl: add limitation to documentation
>
> From: Miklos Szeredi <mszeredi@suse.cz>
>
> J. R. Okajima noted some examples where overlayfs behaves differently
> from a standard filesystem.  Describe these cases in the documentation.
>
> Reported-by: "J. R. Okajima" <hooanon05@yahoo.co.jp>
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> ---
>  Documentation/filesystems/overlayfs.txt |   29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/Documentation/filesystems/overlayfs.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/filesystems/overlayfs.txt      2011-07-07 16:01:47.000000000 +0200
> +++ linux-2.6/Documentation/filesystems/overlayfs.txt   2011-07-08 14:16:44.000000000 +0200
> @@ -143,6 +143,9 @@ to the upper filesystem (copy_up).  Note
>  also requires copy_up, though of course creation of a symlink does
>  not.
>
> +The copy_up may turn out to be unnecessary, for example if the file is
> +opened for read-write but the data is not modified.
> +
>  The copy_up process first makes sure that the containing directory
>  exists in the upper filesystem - creating it and any parents as
>  necessary.  It then creates the object with the same metadata (owner,
> @@ -156,6 +159,27 @@ filesystem - future operations on the fi
>  overlay filesystem (though an operation on the name of the file such as
>  rename or unlink will of course be noticed and handled).
>
> +
> +Non-standard behavior at copy_up
> +--------------------------------
> +
> +The copy_up operation essentially creates a new, identical file and
> +moves it over to the old name.  The new file may be on a different
> +filesystem, so both st_dev and st_ino of the file may change.
> +
> +Any open files referring to this inode will access the old data and
> +metadata.  Similarly any file locks obtained before copy_up will not
> +apply to the copied up file.
> +
> +If a file with multiple hard links is copied up, then this will
> +"break" the link.  Changes will not be propagated to other names
> +referring to the same inode.
> +
> +Symlinks in /proc/PID/ and /proc/PID/fd which point to a non-directory
> +object in overlayfs will not contain vaid absolute paths, only
> +relative paths leading up to the filesystem's root.  This will be
> +fixed in the future.
> +
>  Changes to underlying filesystems
>  ---------------------------------
>
> @@ -163,5 +187,6 @@ Offline changes, when the overlay is not
>  the upper or the lower trees.
>
>  Changes to the underlying filesystems while part of a mounted overlay
> -filesystem are not allowed.  This is not yet enforced, but will be in
> -the future.
> +filesystem are not allowed.  If the underlying filesystem is changed,
> +the behavior of the overlay is undefined, though it will not result in
> +a crash or deadlock.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-08 14:44                 ` Miklos Szeredi
  2011-07-08 15:21                   ` Tomas M
@ 2011-07-09 12:22                   ` J. R. Okajima
  2011-07-15 12:33                     ` Miklos Szeredi
  1 sibling, 1 reply; 74+ messages in thread
From: J. R. Okajima @ 2011-07-09 12:22 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andrew Morton, NeilBrown, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, hramrach, jordipujolp, ezk


Miklos Szeredi:
> Here's a patch to document these limitations.

If you want covering limitations as possible, I'd suggest you to add
these.
- Some versions of GNU find(1) may produce a warning about the link
  count of a directory. When a sub dir exists, find(1) expects it
  increments the link count of the parent dir.
- If I remember correctly, Valerie Aurora has pointed out that
  open(O_RDONLY) + fchmod() will work correctly in UnionMount.
  It is true in overlayfs too?


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-09 12:22                   ` J. R. Okajima
@ 2011-07-15 12:33                     ` Miklos Szeredi
  2011-07-15 13:02                       ` J. R. Okajima
  0 siblings, 1 reply; 74+ messages in thread
From: Miklos Szeredi @ 2011-07-15 12:33 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Andrew Morton, NeilBrown, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

"J. R. Okajima" <hooanon05@yahoo.co.jp> writes:

> Miklos Szeredi:
>> Here's a patch to document these limitations.
>
> If you want covering limitations as possible, I'd suggest you to add
> these.
> - Some versions of GNU find(1) may produce a warning about the link
>   count of a directory. When a sub dir exists, find(1) expects it
>   increments the link count of the parent dir.

st_nlink==1 for directories is widely accepted way of saying that the
number of subdirectories is unknown.  Various filesystems already do
this, and versions of GNU utils that I have come across accept it.

> - If I remember correctly, Valerie Aurora has pointed out that
>   open(O_RDONLY) + fchmod() will work correctly in UnionMount.
>   It is true in overlayfs too?

Neither union-mounts nor overlayfs can handle this case.

I hadn't thought about this case, so overlayfs would modify the lower
filesystem in that case, which is a no-no.

Following patch fixes this and return -EROFS for the above case.  I also
updated the non-standard section in the docs.

Thanks,
Miklos
----

Subject: ovl: make lower mount read-only

From: Miklos Szeredi <mszeredi@suse.cz>

If a file only existing on the lower fs is operned for O_RDONLY and
fchmod/fchown/etc is performed on the open file then this will modify
the lower fs, which is not what we want.

Copying up at this point is not possible.  The best solution is to
return an error for this corner case and hope applications are not
relying on it.

Reported-by: "J. R. Okajima" <hooanon05@yahoo.co.jp>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/overlayfs/super.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux-2.6/fs/overlayfs/super.c
===================================================================
--- linux-2.6.orig/fs/overlayfs/super.c	2011-07-15 12:48:03.000000000 +0200
+++ linux-2.6/fs/overlayfs/super.c	2011-07-15 13:47:35.000000000 +0200
@@ -569,6 +569,12 @@ static int ovl_fill_super(struct super_b
 		goto out_put_upper_mnt;
 	}
 
+	/*
+	 * Make lower_mnt R/O.  That way fchmod/fchown on lower file
+	 * will fail instead of modifying lower fs.
+	 */
+	ufs->lower_mnt->mnt_flags |= MNT_READONLY;
+
 	/* If the upper fs is r/o, we mark overlayfs r/o too */
 	if (ufs->upper_mnt->mnt_sb->s_flags & MS_RDONLY)
 		sb->s_flags |= MS_RDONLY;

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-15 12:33                     ` Miklos Szeredi
@ 2011-07-15 13:02                       ` J. R. Okajima
  2011-07-15 13:04                         ` J. R. Okajima
  2011-07-15 13:07                         ` Miklos Szeredi
  0 siblings, 2 replies; 74+ messages in thread
From: J. R. Okajima @ 2011-07-15 13:02 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andrew Morton, NeilBrown, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, hramrach, jordipujolp, ezk


Miklos Szeredi:
> st_nlink==1 for directories is widely accepted way of saying that the
> number of subdirectories is unknown.  Various filesystems already do
> this, and versions of GNU utils that I have come across accept it.

When the upperdir is tmpfs, the link count of the directories in it will
not be 1, won't it?


> > - If I remember correctly, Valerie Aurora has pointed out that
> >   open(O_RDONLY) + fchmod() will work correctly in UnionMount.
> >   It is true in overlayfs too?
>
> Neither union-mounts nor overlayfs can handle this case.

Oh, I meant "will work NOT correctly". Sorry.


> I hadn't thought about this case, so overlayfs would modify the lower
> filesystem in that case, which is a no-no.
>
> Following patch fixes this and return -EROFS for the above case.  I also
> updated the non-standard section in the docs.

Hmm, such changes to mnt_flags looks slightly rude to me. Do we have to
consider about these?
- there may exist files opened as RW on the lower.
- when overlayfs is unmounted, it should restore the original mnt_flags.
- (there may exist more...)

If overlayfs doesn't expect the lower mounted as RW, then it might be
better to reject it simply at mounting.


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-15 13:02                       ` J. R. Okajima
@ 2011-07-15 13:04                         ` J. R. Okajima
  2011-07-15 13:07                         ` Miklos Szeredi
  1 sibling, 0 replies; 74+ messages in thread
From: J. R. Okajima @ 2011-07-15 13:04 UTC (permalink / raw)
  To: Miklos Szeredi, Andrew Morton, NeilBrown, viro, torvalds,
	linux-fsdevel


> > st_nlink==1 for directories is widely accepted way of saying that the
> > number of subdirectories is unknown.  Various filesystems already do
> > this, and versions of GNU utils that I have come across accept it.
>
> When the upperdir is tmpfs, the link count of the directories in it will
> not be 1, won't it?

Ah, I've found overlayfs sets it to 1 now.

Thanks
J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-15 13:02                       ` J. R. Okajima
  2011-07-15 13:04                         ` J. R. Okajima
@ 2011-07-15 13:07                         ` Miklos Szeredi
  2011-07-15 13:33                           ` J. R. Okajima
  1 sibling, 1 reply; 74+ messages in thread
From: Miklos Szeredi @ 2011-07-15 13:07 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Andrew Morton, NeilBrown, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

"J. R. Okajima" <hooanon05@yahoo.co.jp> writes:

> Miklos Szeredi:
>> st_nlink==1 for directories is widely accepted way of saying that the
>> number of subdirectories is unknown.  Various filesystems already do
>> this, and versions of GNU utils that I have come across accept it.
>
> When the upperdir is tmpfs, the link count of the directories in it will
> not be 1, won't it?

In overlayfs directory will have a link count of one if it's a "merged"
directory, regardless of the filesystem types used.  If it's not a
merged directory, then st_nlink will correctly refrect the number of
subdirs.

>> I hadn't thought about this case, so overlayfs would modify the lower
>> filesystem in that case, which is a no-no.
>>
>> Following patch fixes this and return -EROFS for the above case.  I also
>> updated the non-standard section in the docs.
>
> Hmm, such changes to mnt_flags looks slightly rude to me. Do we have to
> consider about these?
> - there may exist files opened as RW on the lower.
> - when overlayfs is unmounted, it should restore the original mnt_flags.
> - (there may exist more...)
>
> If overlayfs doesn't expect the lower mounted as RW, then it might be
> better to reject it simply at mounting.

Overlayfs will create a private clone of both the lower and upper
mounts.  The mnt_flags are only modified on the private clone, whose
sole user is overlayfs itself.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-15 13:07                         ` Miklos Szeredi
@ 2011-07-15 13:33                           ` J. R. Okajima
  2011-07-15 15:16                             ` Miklos Szeredi
  0 siblings, 1 reply; 74+ messages in thread
From: J. R. Okajima @ 2011-07-15 13:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andrew Morton, NeilBrown, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

Miklos Szeredi:
> Overlayfs will create a private clone of both the lower and upper
> mounts.  The mnt_flags are only modified on the private clone, whose
> sole user is overlayfs itself.

Yes, I've found overlayfs creates private clone mounts. Sorry noise
again.
By cloning privately, users cannot change mount flags. But it will not
be a big problem.

By the way, you might remember that I have asked about overriding
creditials and signals.
When a process issues chmod, rename or something, then the internal
copyup happens. If RLIMIT_FSIZE is set to the process, then it may
fire SIGXFSZ and return EFBIG. It looks strange. But I don't know how
important is it since setting RLIMIT_FSIZE is less popular I am afraid.

J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-15 13:33                           ` J. R. Okajima
@ 2011-07-15 15:16                             ` Miklos Szeredi
  0 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-07-15 15:16 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Andrew Morton, NeilBrown, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

"J. R. Okajima" <hooanon05@yahoo.co.jp> writes:

> By the way, you might remember that I have asked about overriding
> creditials and signals.
> When a process issues chmod, rename or something, then the internal
> copyup happens. If RLIMIT_FSIZE is set to the process, then it may
> fire SIGXFSZ and return EFBIG. It looks strange.

Yeah, good observation.  Copy-up shouldn't fail on RLIMIT_FSIZE and it
doesn't because splice ignores RLIMIT_FSIZE.

So overlayfs works correctly because of a bug in splice.  Not very
satisfactory situation.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09  3:52     ` Andrew Morton
  2011-06-09 12:47       ` Miklos Szeredi
@ 2011-06-09 13:49       ` Andy Whitcroft
  2011-06-09 19:32         ` Andrew Morton
  2011-06-09 13:57       ` Michal Suchanek
  2011-06-09 13:57       ` Andy Whitcroft
  3 siblings, 1 reply; 74+ messages in thread
From: Andy Whitcroft @ 2011-06-09 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: NeilBrown, Miklos Szeredi, viro, torvalds, linux-fsdevel,
	linux-kernel, nbd, hramrach, jordipujolp, ezk, mszeredi

On Wed, Jun 08, 2011 at 08:52:33PM -0700, Andrew Morton wrote:

> > > This sort of thing could be implemented in userspace and wired up via
> > > fuse, I assume.  Has that been attempted and why is it inadequate?
> > 
> > I think that would be a valid question if the proposal was large and
> > complex.  But overlayfs is really quite small and self-contained.
> 
> Not merging it would be even smaller and simpler.  If there is a
> userspace alternative then that option should be evaluated and compared
> in a rational manner.

For the Ubuntu liveCD we have tried to use unions via fuse with a view
to dropping aufs2 as an external module.  The performance was atrocious
(IIRC of the order of 10x slower), to the point that most people assumed
it was broken and reset the machine.

The other use case I have seen here have been for package builders on which
a virgin chroot has a writable layer dropped on top, allowing simple undo
at the end of the build.  I have heard of people wanting to use this for
root filesystems for virtual machines as well.

We have done quite a bit of testing with liveCDs built to use overlayfs
with a view to switching over, and have been very impressed with its
stability.  It is also pleasing to see an implementation which is small
enough to actually understand.

-apw

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09 13:49       ` Andy Whitcroft
@ 2011-06-09 19:32         ` Andrew Morton
  2011-06-09 19:40           ` Linus Torvalds
  2011-06-10 11:51           ` Bernd Schubert
  0 siblings, 2 replies; 74+ messages in thread
From: Andrew Morton @ 2011-06-09 19:32 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: NeilBrown, Miklos Szeredi, viro, torvalds, linux-fsdevel,
	linux-kernel, nbd, hramrach, jordipujolp, ezk, mszeredi

On Thu, 9 Jun 2011 14:49:47 +0100
Andy Whitcroft <apw@canonical.com> wrote:

> On Wed, Jun 08, 2011 at 08:52:33PM -0700, Andrew Morton wrote:
> 
> > > > This sort of thing could be implemented in userspace and wired up via
> > > > fuse, I assume.  Has that been attempted and why is it inadequate?
> > > 
> > > I think that would be a valid question if the proposal was large and
> > > complex.  But overlayfs is really quite small and self-contained.
> > 
> > Not merging it would be even smaller and simpler.  If there is a
> > userspace alternative then that option should be evaluated and compared
> > in a rational manner.
> 
> For the Ubuntu liveCD we have tried to use unions via fuse with a view
> to dropping aufs2 as an external module.  The performance was atrocious
> (IIRC of the order of 10x slower), to the point that most people assumed
> it was broken and reset the machine.

On Thu, 9 Jun 2011 15:57:48 +0200
Michal Suchanek <hramrach@centrum.cz> wrote:

> The problem with the userspace alternative is that it does not work. I
> tried to run my live CD on top of unionfs-fuse and the filesystem
> would fail intermittently leading to random errors during boot.


If the implementation is slow or buggy then the appropriate action is
to speed it up and to fix the bugs, so these are just non-arguments,
IMO.

If it is demonstrated that the userspace implementation simply cannot
ever have acceptable performance then OK, we have an argument for a
kernel driver.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09 19:32         ` Andrew Morton
@ 2011-06-09 19:40           ` Linus Torvalds
  2011-06-09 20:17             ` Miklos Szeredi
  2011-06-10 11:51           ` Bernd Schubert
  1 sibling, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2011-06-09 19:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Whitcroft, NeilBrown, Miklos Szeredi, viro, linux-fsdevel,
	linux-kernel, nbd, hramrach, jordipujolp, ezk, mszeredi

On Thu, Jun 9, 2011 at 12:32 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> If the implementation is slow or buggy then the appropriate action is
> to speed it up and to fix the bugs, so these are just non-arguments,
> IMO.

Umm.

"userspace filesystem"?

The problem is right there. Always has been. People who think that
userspace filesystems are realistic for anything but toys are just
misguided.

fuse works fine if the thing being exported is some random low-use
interface to a fundamentally slow device. But for something like your
root filesystem? Nope. Not going to happen.

So Andrew, I think that arguing that something _can_ be done with
fuse, and thus _should_ be done with fuse is just ridiculous. That's
like saying you should do a microkernel - it may sound nice on paper,
but it's a damn stupid idea for people who care more about some idea
than they care about reality.

                          Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09 19:40           ` Linus Torvalds
@ 2011-06-09 20:17             ` Miklos Szeredi
  2011-06-09 22:58               ` Anton Altaparmakov
  0 siblings, 1 reply; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-09 20:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Andy Whitcroft, NeilBrown, Miklos Szeredi, viro,
	linux-fsdevel, linux-kernel, nbd, hramrach, jordipujolp, ezk

On Thu, 2011-06-09 at 12:40 -0700, Linus Torvalds wrote:
> On Thu, Jun 9, 2011 at 12:32 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > If the implementation is slow or buggy then the appropriate action is
> > to speed it up and to fix the bugs, so these are just non-arguments,
> > IMO.
> 
> Umm.
> 
> "userspace filesystem"?
> 
> The problem is right there. Always has been. People who think that
> userspace filesystems are realistic for anything but toys are just
> misguided.
> 
> fuse works fine if the thing being exported is some random low-use
> interface to a fundamentally slow device. But for something like your
> root filesystem? Nope. Not going to happen.

It's a tradeoff between speed and ease of development.

NTFS has been doing nicely in userspace for almost half a decade.  It's
not as fast as a kernel driver _could_ be, but it's faster than _the_
kernel driver.

And there's room for improvement.  The fact is (and you know it) the
speed of filesystems mainly comes from caching not from the filesystem
itself, so whether it's in userspace or in kernelspace matters not all
that much in the end.

> So Andrew, I think that arguing that something _can_ be done with
> fuse, and thus _should_ be done with fuse is just ridiculous. That's
> like saying you should do a microkernel - it may sound nice on paper,
> but it's a damn stupid idea for people who care more about some idea
> than they care about reality.

I think it isn't ridiculous, but here the tradeoffs might be in favor of
a kernel based solution.  And I'm saying that after having done both.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09 20:17             ` Miklos Szeredi
@ 2011-06-09 22:58               ` Anton Altaparmakov
  2011-06-11  2:39                 ` Greg KH
  0 siblings, 1 reply; 74+ messages in thread
From: Anton Altaparmakov @ 2011-06-09 22:58 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Andrew Morton, Andy Whitcroft, NeilBrown,
	Miklos Szeredi, viro, linux-fsdevel, linux-kernel, nbd, hramrach,
	jordipujolp, ezk

Hi,

On 9 Jun 2011, at 21:17, Miklos Szeredi wrote:
> On Thu, 2011-06-09 at 12:40 -0700, Linus Torvalds wrote:
>> On Thu, Jun 9, 2011 at 12:32 PM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>>> 
>>> If the implementation is slow or buggy then the appropriate action is
>>> to speed it up and to fix the bugs, so these are just non-arguments,
>>> IMO.
>> 
>> Umm.
>> 
>> "userspace filesystem"?
>> 
>> The problem is right there. Always has been. People who think that
>> userspace filesystems are realistic for anything but toys are just
>> misguided.
>> 
>> fuse works fine if the thing being exported is some random low-use
>> interface to a fundamentally slow device. But for something like your
>> root filesystem? Nope. Not going to happen.
> 
> It's a tradeoff between speed and ease of development.
> 
> NTFS has been doing nicely in userspace for almost half a decade.  It's
> not as fast as a kernel driver _could_ be, but it's faster than _the_
> kernel driver.

Er, sorry to disappoint but the Tuxera NTFS kernel driver is faster than any user space NTFS driver could ever be.  It is faster than ext3/4, too.  (-:  To give you a random example on an embedded system (800MHz, 512MB RAM, 64kiB write buffer size) where NTFS in user space achieves a maximum cached write throughput of ~15MiB/s, ext3 achieves ~75MiB/s, ext4 ~100MiB/s and Tuxera NTFS kernel driver achieves ~190MiB/s blowing ext4 out of the water by almost a factor of 2 and the user space code by more than a factor of 10.  File systems in user space have their applications but high performance is definitely not one of them...  You might say that ext3/4 are journalling so not a fair comparison so let me add that FAT32 achieves about 100MiB/s in the same hardware/test, still about half of NTFS.

Only problem is that Tuxera NTFS is not open source.  )-:  Hopefully one day it will be!!!

The big advantage of user space drivers is that you can compile a binary and it will run on a lot of installs and also it is hugely easier to port to other architectures like the multitude of embedded OS out there that are not Linux based.  With the kernel driver it has to be compiled for each and every kernel and .config of each customer which is a lot of work (even though it is scripted) and the work to port to other kernels is immense...

> And there's room for improvement.  The fact is (and you know it) the
> speed of filesystems mainly comes from caching not from the filesystem
> itself, so whether it's in userspace or in kernelspace matters not all
> that much in the end.

It does matter if nothing else because you cannot cache everything in user space that you can cache in the kernel, not with any sort of reliability anyway...

PS. Please note I have nothing against FUSE and user space file systems in general.  I think FUSE is brilliant and makes writing weird file systems a pleasurable experience!  I have myself used it to solve all sorts of problems.  (-:

Best regards,

	Anton

>> So Andrew, I think that arguing that something _can_ be done with
>> fuse, and thus _should_ be done with fuse is just ridiculous. That's
>> like saying you should do a microkernel - it may sound nice on paper,
>> but it's a damn stupid idea for people who care more about some idea
>> than they care about reality.
> 
> I think it isn't ridiculous, but here the tradeoffs might be in favor of
> a kernel based solution.  And I'm saying that after having done both.
> 
> Thanks,
> Miklos

-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09 22:58               ` Anton Altaparmakov
@ 2011-06-11  2:39                 ` Greg KH
  2011-06-12 20:51                   ` Anton Altaparmakov
  0 siblings, 1 reply; 74+ messages in thread
From: Greg KH @ 2011-06-11  2:39 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Miklos Szeredi, Linus Torvalds, Andrew Morton, Andy Whitcroft,
	NeilBrown, Miklos Szeredi, viro, linux-fsdevel, linux-kernel, nbd,
	hramrach, jordipujolp, ezk

On Thu, Jun 09, 2011 at 11:58:37PM +0100, Anton Altaparmakov wrote:
> > NTFS has been doing nicely in userspace for almost half a decade.  It's
> > not as fast as a kernel driver _could_ be, but it's faster than _the_
> > kernel driver.
> 
> Er, sorry to disappoint but the Tuxera NTFS kernel driver is faster
> than any user space NTFS driver could ever be.  It is faster than
> ext3/4, too.  (-:  To give you a random example on an embedded system
> (800MHz, 512MB RAM, 64kiB write buffer size) where NTFS in user space
> achieves a maximum cached write throughput of ~15MiB/s, ext3 achieves
> ~75MiB/s, ext4 ~100MiB/s and Tuxera NTFS kernel driver achieves
> ~190MiB/s blowing ext4 out of the water by almost a factor of 2 and
> the user space code by more than a factor of 10.  File systems in user
> space have their applications but high performance is definitely not
> one of them...  You might say that ext3/4 are journalling so not a
> fair comparison so let me add that FAT32 achieves about 100MiB/s in
> the same hardware/test, still about half of NTFS.

Talk to Tuxera, they have a new version of their userspace FUSE version
that is _much_ faster than their public one, and it might be almost as
fast as their in-kernel version for some streaming loads (where caching
isn't necessary or needed.)

So it can be done, and done well, if you know what you are doing :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-11  2:39                 ` Greg KH
@ 2011-06-12 20:51                   ` Anton Altaparmakov
  0 siblings, 0 replies; 74+ messages in thread
From: Anton Altaparmakov @ 2011-06-12 20:51 UTC (permalink / raw)
  To: Greg KH
  Cc: Miklos Szeredi, Linus Torvalds, Andrew Morton, Andy Whitcroft,
	NeilBrown, Miklos Szeredi, viro, linux-fsdevel, linux-kernel, nbd,
	hramrach, jordipujolp, ezk

Hi Greg,

On 11 Jun 2011, at 03:39, Greg KH wrote:
> On Thu, Jun 09, 2011 at 11:58:37PM +0100, Anton Altaparmakov wrote:
>>> NTFS has been doing nicely in userspace for almost half a decade.  It's
>>> not as fast as a kernel driver _could_ be, but it's faster than _the_
>>> kernel driver.
>> 
>> Er, sorry to disappoint but the Tuxera NTFS kernel driver is faster
>> than any user space NTFS driver could ever be.  It is faster than
>> ext3/4, too.  (-:  To give you a random example on an embedded system
>> (800MHz, 512MB RAM, 64kiB write buffer size) where NTFS in user space
>> achieves a maximum cached write throughput of ~15MiB/s, ext3 achieves
>> ~75MiB/s, ext4 ~100MiB/s and Tuxera NTFS kernel driver achieves
>> ~190MiB/s blowing ext4 out of the water by almost a factor of 2 and
>> the user space code by more than a factor of 10.  File systems in user
>> space have their applications but high performance is definitely not
>> one of them...  You might say that ext3/4 are journalling so not a
>> fair comparison so let me add that FAT32 achieves about 100MiB/s in
>> the same hardware/test, still about half of NTFS.
> 
> Talk to Tuxera,

Look at my email address.  (-;

> they have a new version of their userspace FUSE version
> that is _much_ faster than their public one, and it might be almost as
> fast as their in-kernel version for some streaming loads (where caching
> isn't necessary or needed.)

That was some time ago (before Christmas) though I admit the numbers I quoted might well be from the opensource ntfs-3g (not sure, I didn't run the tests myself).  The in-kernel driver has since taken the lead since I implemented delayed metadata updates.  That is why the in-kernel Tuxera NTFS driver now outperforms all other file systems that have been tested (whether in kernel or in user space including ext*, XFS and FAT).  The only file system approaching Tuxera (kernel) NTFS is Tuxera (kernel) exFAT which I also wrote and where I first developed the delayed metadata write idea.  (-:

I can't wait to have the time at some point to implement delayed allocation as well and then NTFS will perhaps become the fastest file system driver on the planet if it isn't already...  (-:

But yes if you confine yourself to a single i/o stream with large i/os, with only one process doing it, and you use direct i/o obviously just about any optimized file system can achieve close to the actual device speed, no matter whether in kernel or in user space.  However the CPU utilization varies dramatically.  In kernel you easily get as little as 3-10% or even less whilst in user space you end up with much higher CPU utilization and on embedded hardware this sometimes goes to 100% CPU and the i/os are sometimes even CPU limited rather than device speed limited.  But I think it is more useful when talking about speed not to limit oneself to only a single use case and then user space file systems suffer in comparison to in-kernel ones.

On embedded the kernel driver is sometimes further optimized with custom kernel/hardware based optimizations.  For example for some embedded chipsets vendor has modified the kernel so that, in combination with a modified file system (I had to adapt NTFS to the kernel changes) data is received from the network directly into the page cache pages of the file (using splice of the network socket to the target file but with a lot of vendor chipset voodoo so if I understand it correctly the network card is directly doing DMA into the mapped page cache pages).  I cannot even begin to imagine what hoops you would have to jump through in a user space file system to pull off such tricks if it is even possible at all...

Best regards,

	Anton

> So it can be done, and done well, if you know what you are doing :)
> 
> thanks,
> 
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09 19:32         ` Andrew Morton
  2011-06-09 19:40           ` Linus Torvalds
@ 2011-06-10 11:51           ` Bernd Schubert
  2011-06-10 12:45             ` Michal Suchanek
  1 sibling, 1 reply; 74+ messages in thread
From: Bernd Schubert @ 2011-06-10 11:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Whitcroft, NeilBrown, Miklos Szeredi, viro, torvalds,
	linux-fsdevel, linux-kernel, nbd, hramrach, jordipujolp, ezk,
	mszeredi

On 06/09/2011 09:32 PM, Andrew Morton wrote:
> On Thu, 9 Jun 2011 14:49:47 +0100
> Andy Whitcroft<apw@canonical.com>  wrote:
>
>> On Wed, Jun 08, 2011 at 08:52:33PM -0700, Andrew Morton wrote:
>>
>> The problem with the userspace alternative is that it does not work. I
>> tried to run my live CD on top of unionfs-fuse and the filesystem
>> would fail intermittently leading to random errors during boot.
>
>
> If the implementation is slow or buggy then the appropriate action is
> to speed it up and to fix the bugs, so these are just non-arguments,
> IMO.

Exactly. It is rather sad that people never bothered to file bug reports 
about slow performance issues. I'm one of the upstream authors of 
unionfs-fuse and also use it on my own for live-USB sticks and 
NFS-booted systems and do not have such problems.

>
> If it is demonstrated that the userspace implementation simply cannot
> ever have acceptable performance then OK, we have an argument for a
> kernel driver.

Well, I know rather well were most of the performance issues in 
unionfs-fuse come from (at least the normal issues, not those above 
where it seems to fail at all). And those issues are mostly unrelated to 
user-space, but just due to very simple approaches to get a working 
union implementation at all (neither kernel solution actually does what 
I need...). The main problem is that I barely find spare time to further 
improve unionfs-fuse...

Another real difficulty with user-space implementations are init scripts 
- while it works perfectly with Debian, I always have the feeling that 
RedHat does everything they to make it impossible to use fuse as root 
file system (eventually I always get it working, till the next minor 
release, which breaks all workarounds again).

Anyway, given my lack of time and the resulting slow development of 
unionfs-fuse, distros probably need another solution for their live CDs.

Cheers,
Bernd

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-10 11:51           ` Bernd Schubert
@ 2011-06-10 12:45             ` Michal Suchanek
  2011-06-10 12:54               ` Bernd Schubert
  0 siblings, 1 reply; 74+ messages in thread
From: Michal Suchanek @ 2011-06-10 12:45 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Andrew Morton, Andy Whitcroft, NeilBrown, Miklos Szeredi, viro,
	torvalds, linux-fsdevel, linux-kernel, nbd, jordipujolp, ezk,
	mszeredi

On 10 June 2011 13:51, Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> wrote:
> On 06/09/2011 09:32 PM, Andrew Morton wrote:
>>
>> On Thu, 9 Jun 2011 14:49:47 +0100
>> Andy Whitcroft<apw@canonical.com>  wrote:
>>
>>> On Wed, Jun 08, 2011 at 08:52:33PM -0700, Andrew Morton wrote:
>>>
>>> The problem with the userspace alternative is that it does not work. I
>>> tried to run my live CD on top of unionfs-fuse and the filesystem
>>> would fail intermittently leading to random errors during boot.
>>
>>
>> If the implementation is slow or buggy then the appropriate action is
>> to speed it up and to fix the bugs, so these are just non-arguments,
>> IMO.
>
> Exactly. It is rather sad that people never bothered to file bug reports
> about slow performance issues. I'm one of the upstream authors of
> unionfs-fuse and also use it on my own for live-USB sticks and NFS-booted
> systems and do not have such problems.

The issue is that while I can pause the boot process in initramfs and
the filesystem appears all well and running if I run init off the
filesystem some filesystem operations just fail at random leading to
files seemingly missing intermittently and the live CD failing to
boot.

I realize this is tremendously useful information but that's all I can
say about the issue which is why I did not bother to report it
anywhere.

I used whatever was packaged in Debian Squeeze.

Thanks

Michal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-10 12:45             ` Michal Suchanek
@ 2011-06-10 12:54               ` Bernd Schubert
  0 siblings, 0 replies; 74+ messages in thread
From: Bernd Schubert @ 2011-06-10 12:54 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Andrew Morton, Andy Whitcroft, NeilBrown, Miklos Szeredi, viro,
	torvalds, linux-fsdevel, linux-kernel, nbd, jordipujolp, ezk,
	mszeredi

On 06/10/2011 02:45 PM, Michal Suchanek wrote:
> On 10 June 2011 13:51, Bernd Schubert<bernd.schubert@itwm.fraunhofer.de>  wrote:
>> On 06/09/2011 09:32 PM, Andrew Morton wrote:
>>>
>>> On Thu, 9 Jun 2011 14:49:47 +0100
>>> Andy Whitcroft<apw@canonical.com>    wrote:
>>>
>>>> On Wed, Jun 08, 2011 at 08:52:33PM -0700, Andrew Morton wrote:
>>>>
>>>> The problem with the userspace alternative is that it does not work. I
>>>> tried to run my live CD on top of unionfs-fuse and the filesystem
>>>> would fail intermittently leading to random errors during boot.
>>>
>>>
>>> If the implementation is slow or buggy then the appropriate action is
>>> to speed it up and to fix the bugs, so these are just non-arguments,
>>> IMO.
>>
>> Exactly. It is rather sad that people never bothered to file bug reports
>> about slow performance issues. I'm one of the upstream authors of
>> unionfs-fuse and also use it on my own for live-USB sticks and NFS-booted
>> systems and do not have such problems.
>
> The issue is that while I can pause the boot process in initramfs and
> the filesystem appears all well and running if I run init off the
> filesystem some filesystem operations just fail at random leading to
> files seemingly missing intermittently and the live CD failing to
> boot.
>
> I realize this is tremendously useful information but that's all I can
> say about the issue which is why I did not bother to report it
> anywhere.
>
> I used whatever was packaged in Debian Squeeze.

Any chance you can you describe more in detail how you start 
unionfs-fuse? Directly within the initramfs (if so, could you please 
tell me exactly how)? Or using
/usr/share/doc/unionfs-fuse/examples/S01a-unionfs-fuse-live-cd.sh as 
link in rcS.d? We just have a 3 day weekend ahead and there might a good 
chance I can fix whatever your problems are... But it would be good if I 
could reproduce it somehow. I think for the following mails we should 
also drop most CCs here, as it is kernel unrelated.

Thanks,
Bernd

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09  3:52     ` Andrew Morton
  2011-06-09 12:47       ` Miklos Szeredi
  2011-06-09 13:49       ` Andy Whitcroft
@ 2011-06-09 13:57       ` Michal Suchanek
  2011-06-09 13:57       ` Andy Whitcroft
  3 siblings, 0 replies; 74+ messages in thread
From: Michal Suchanek @ 2011-06-09 13:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: NeilBrown, Miklos Szeredi, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, jordipujolp, ezk, mszeredi

On 9 June 2011 05:52, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 9 Jun 2011 11:59:34 +1000 NeilBrown <neilb@suse.de> wrote:
>
>> On Wed, 8 Jun 2011 15:32:08 -0700 Andrew Morton <akpm@linux-foundation.org>
>> wrote:
>>
>> > On Wed,  1 Jun 2011 14:46:13 +0200
>> > Miklos Szeredi <miklos@szeredi.hu> wrote:
>> >
>> > > I'd like to ask for overlayfs to be merged into 3.1.
>> >
>> > Dumb questions:
>> >
>> > I've never really understood the need for fs overlaying.  Who wants it?
>> > What are the use-cases?
>>
>> https://lwn.net/Articles/324291/
>>
>> I think the strongest use case is that LIVE-DVD's want it to have a write-able
>> root filesystem which is stored on the DVD.
>
> Well, these things have been around for over 20 years.  What motivated
> the developers of other OS's to develop these things and how are their
> users using them?

FWIW there is an union solution in NetBSD. I am not sure it is used in
the LiveCD but you can definitely use it to build a piece of software
without actually touching the source directory.

>
>> >
>> > This sort of thing could be implemented in userspace and wired up via
>> > fuse, I assume.  Has that been attempted and why is it inadequate?
>>
>> I think that would be a valid question if the proposal was large and
>> complex.  But overlayfs is really quite small and self-contained.
>
> Not merging it would be even smaller and simpler.  If there is a
> userspace alternative then that option should be evaluated and compared
> in a rational manner.

The problem with the userspace alternative is that it does not work. I
tried to run my live CD on top of unionfs-fuse and the filesystem
would fail intermittently leading to random errors during boot.

>
>
>
> Another issue: there have been numerous attempts at Linux overlay
> filesystems from numerous parties.  Does (or will) this implementation
> satisfy all their requirements?

No implementation will satisfy all needs. There is always some
compromise between availability (userspace/in-tree/easy to patch in)
feature completeness (eg. AuFS is not so easy to forward-port to new
kernels but has numerous features) performance, reliability.

>
> Because if not, we're in a situation where the in-kernel code is
> unfixably inadequate so we end up merging another similar-looking
> thing, or the presence of this driver makes it harder for them to get
> other drivers merged and the other parties' requirements remain
> unsatisfied.

One of the major use cases is building live CDs.

That and other things can be done with overlayfs.

Thanks

Michal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-09  3:52     ` Andrew Morton
                         ` (2 preceding siblings ...)
  2011-06-09 13:57       ` Michal Suchanek
@ 2011-06-09 13:57       ` Andy Whitcroft
  3 siblings, 0 replies; 74+ messages in thread
From: Andy Whitcroft @ 2011-06-09 13:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: NeilBrown, Miklos Szeredi, viro, torvalds, linux-fsdevel,
	linux-kernel, nbd, hramrach, jordipujolp, ezk, mszeredi

On Wed, Jun 08, 2011 at 08:52:33PM -0700, Andrew Morton wrote:

> Another issue: there have been numerous attempts at Linux overlay
> filesystems from numerous parties.  Does (or will) this implementation
> satisfy all their requirements?
> 
> Because if not, we're in a situation where the in-kernel code is
> unfixably inadequate so we end up merging another similar-looking
> thing, or the presence of this driver makes it harder for them to get
> other drivers merged and the other parties' requirements remain
> unsatisfied.

>From what I have seen the main advantage of the overlayfs implementation
is its simplicity.  It allows you to layer exactly two things.  That said,
in testing overlayfs seems perfectly happy to take its own mounts and
further union them providing the flexibility that other union mounts
implmentations provide.

-apw

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-08 22:32 ` [PATCH 0/7] overlay filesystem: request for inclusion Andrew Morton
  2011-06-09  1:59   ` NeilBrown
@ 2011-07-05 19:54   ` Hans-Peter Jansen
  2011-07-08 12:57     ` Miklos Szeredi
  1 sibling, 1 reply; 74+ messages in thread
From: Hans-Peter Jansen @ 2011-07-05 19:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Miklos Szeredi, viro, torvalds, linux-fsdevel, linux-kernel, apw,
	nbd, neilb, hramrach, jordipujolp, ezk, mszeredi, hooanon05

Dear Andrew, dear kernel developers,

I'm sorry for chiming in that late, but I had a motorbike accident that 
resulted in a 3 week stay in a hospital, and I still depend on a 
wheelchair for the next few weeks..

On Thursday 09 June 2011, 00:32:08 Andrew Morton wrote:
> On Wed,  1 Jun 2011 14:46:13 +0200
>
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> > I'd like to ask for overlayfs to be merged into 3.1.

All kodos to you, Miklos. While I'm still missing a major feature from 
overlayfs that is a NFS as upper layer, it provides a fairly good 
start. A commitment from you, that such an extension is considered for 
inclusion - given, that it appears one day - is appreciated. Also, 
since xattr support is available for NFS, it would be nice to outline, 
what is missing for such an implementation from overlayfs's POV.

> Dumb questions:
>
> I've never really understood the need for fs overlaying.  Who wants
> it? What are the use-cases?

I do use it for diskless NFS installations in production environments.
Please note, that this isn't the usual thin client approach, that runs 
on specialized expensive, but dump hardware, and scales bad on the 
server side (you find this setup typically in the medical center next 
door..).

Let's call it fat diskless client approach. I'm up to 3 * 24" heads on 
fairly capable hardware for some clients. Besides the usual office 
stuff, those systems mostly run a VMware based XP setup unfortunately, 
diskless due to its very nature, but at acceptable speed, BTW.

Thanks to aufs, the setup of the linux diskless clients boils down to 
install a distribution into a single folder, add a bit of boot mimic (I 
use pxelinux and kiwi), and get it to mount NFS root with aufs in 
initrd and an empty upper layer. Now you have a simple, but handy 
persistent setup, that can be used from a hundred systems easily.

NFS on switched gigabit ethernet is fast enough (even without playing 
SSD games on the server) to be nearly on eye level with single local 
disks, but the advantage of a single installation instance for all 
clients is paying off manifoldly.

Let me put it this way: administration effort for ONE XP instance (even 
for the emulation driven one) is greater than for ALL linux clients 
combined (although the number of applications used under XP is limited 
to the absolute minimum necessary to get the work done).

Specializing some systems is pretty easy in this setup, backup is a 
piece of cake, and moving systems around is a child's play.

And this is a fairly trivial way of using stacked file systems. There 
are many creative use cases, that are unexploited due to its been 
missing in the standard kernel. People will start using this feature, 
when it is available without additional effort. Want to see, what files  
in what ways an arbitrary application changes? Sure, you can trace it 
down to its bones, or run it on top of a layered filesystem, and 
diff/cmp/whatever the files between the upper and lower FS.

My favorite use case are build farms, where several basic setups for all 
kind of usual distribution versions are maintained as lower layers of 
stackable file systems. The builder checks for typical packages and 
selects the matching layer, e.g. "kernel module", where the layer has 
all kernel-devel packages installed. With similar layers 
for "x11", "kde" and "gnome", I expect a typical build farm to speed up 
by factor 10-20. 

When the first wheels where invented, their use cases where pretty 
limited, but today... Okay, stacked, unioned, layered, or overlayed 
filesystems might not as universally useful as wheels in the end, but I 
bet, that your linux based smartphone will use it by the end of next 
year, if it gets merged in 3.1.

Pete

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-05 19:54   ` Hans-Peter Jansen
@ 2011-07-08 12:57     ` Miklos Szeredi
  2011-07-10  8:23       ` Ric Wheeler
  2011-07-10 11:16       ` Hans-Peter Jansen
  0 siblings, 2 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-07-08 12:57 UTC (permalink / raw)
  To: Hans-Peter Jansen
  Cc: Andrew Morton, viro, torvalds, linux-fsdevel, linux-kernel, apw,
	nbd, neilb, hramrach, jordipujolp, ezk, hooanon05

"Hans-Peter Jansen" <hpj@urpla.net> writes:

> All kodos to you, Miklos. While I'm still missing a major feature from 
> overlayfs that is a NFS as upper layer, it provides a fairly good 
> start. A commitment from you, that such an extension is considered for 
> inclusion - given, that it appears one day - is appreciated. Also, 
> since xattr support is available for NFS,

AFAIK development of generic xattr support on NFS stopped some time ago.

> it would be nice to outline, what is missing for such an
> implementation from overlayfs's POV.

Allow using namspace polluting xattr replacements, such as aufs is
doing.

But why?  Why is it better to do the overlaying on the client instead of
the server?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-08 12:57     ` Miklos Szeredi
@ 2011-07-10  8:23       ` Ric Wheeler
  2011-07-10 13:55         ` Sorin Faibish
  2011-07-10 11:16       ` Hans-Peter Jansen
  1 sibling, 1 reply; 74+ messages in thread
From: Ric Wheeler @ 2011-07-10  8:23 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Hans-Peter Jansen, Andrew Morton, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, neilb, hramrach, jordipujolp, ezk,
	hooanon05, James Morris, Bruce Fields, Steve Dickson,
	Trond Myklebust

On 07/08/2011 01:57 PM, Miklos Szeredi wrote:
> "Hans-Peter Jansen"<hpj@urpla.net>  writes:
>
>> All kodos to you, Miklos. While I'm still missing a major feature from
>> overlayfs that is a NFS as upper layer, it provides a fairly good
>> start. A commitment from you, that such an extension is considered for
>> inclusion - given, that it appears one day - is appreciated. Also,
>> since xattr support is available for NFS,
> AFAIK development of generic xattr support on NFS stopped some time ago.

Hi Miklos,

There is a proposed (at the IETF) standard called "labelled NFS" that would 
allow the protocol to handle xattrs.

It has not set the world on fire in terms of enthusiasm, but has been making 
some progress. We have patches from Dave Quigley that did work, but need to 
resolve the standards issues I suspect before it could make progress upstream...

Ric

>> it would be nice to outline, what is missing for such an
>> implementation from overlayfs's POV.
> Allow using namspace polluting xattr replacements, such as aufs is
> doing.
>
> But why?  Why is it better to do the overlaying on the client instead of
> the server?
>
> Thanks,
> Miklos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-10  8:23       ` Ric Wheeler
@ 2011-07-10 13:55         ` Sorin Faibish
  2011-07-12 15:59           ` Miklos Szeredi
  0 siblings, 1 reply; 74+ messages in thread
From: Sorin Faibish @ 2011-07-10 13:55 UTC (permalink / raw)
  To: Ric Wheeler, Miklos Szeredi
  Cc: Hans-Peter Jansen, Andrew Morton, viro, torvalds, linux-fsdevel,
	linux-kernel, apw, nbd, neilb, hramrach, jordipujolp, ezk,
	hooanon05, James Morris, Bruce Fields, Steve Dickson,
	Trond Myklebust

On Sun, 10 Jul 2011 04:23:17 -0400, Ric Wheeler <ricwheeler@gmail.com>
wrote:

> On 07/08/2011 01:57 PM, Miklos Szeredi wrote:
>> "Hans-Peter Jansen"<hpj@urpla.net>  writes:
>>
>>> All kodos to you, Miklos. While I'm still missing a major feature from
>>> overlayfs that is a NFS as upper layer, it provides a fairly good
>>> start. A commitment from you, that such an extension is considered for
>>> inclusion - given, that it appears one day - is appreciated. Also,
>>> since xattr support is available for NFS,
>> AFAIK development of generic xattr support on NFS stopped some time ago.
>
> Hi Miklos,
>
> There is a proposed (at the IETF) standard called "labelled NFS" that  
> would allow the protocol to handle xattrs.
Will be included in NFSv4.2. And we are already very close to a good I-D.
Not sure that xattr change mentioned here will be included. You can look
at the current I-D at:
http://datatracker.ietf.org/doc/draft-quigley-nfsv4-labeled/

/Sorin

>
> It has not set the world on fire in terms of enthusiasm, but has been  
> making some progress. We have patches from Dave Quigley that did work,  
> but need to resolve the standards issues I suspect before it could make  
> progress upstream...
>
> Ric
>
>>> it would be nice to outline, what is missing for such an
>>> implementation from overlayfs's POV.
>> Allow using namspace polluting xattr replacements, such as aufs is
>> doing.
>>
>> But why?  Why is it better to do the overlaying on the client instead of
>> the server?
>>
>> Thanks,
>> Miklos
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel"  
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best Regards
Sorin Faibish
Corporate Distinguished Engineer
Unified Storage Division

         EMC²
where information lives

Phone: 508-435-1000 x 48545
Cellphone: 617-510-0422
Email : sfaibish@emc.com

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-10 13:55         ` Sorin Faibish
@ 2011-07-12 15:59           ` Miklos Szeredi
  0 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-07-12 15:59 UTC (permalink / raw)
  To: Sorin Faibish
  Cc: Ric Wheeler, Hans-Peter Jansen, Andrew Morton, viro, torvalds,
	linux-fsdevel, linux-kernel, apw, nbd, neilb, hramrach,
	jordipujolp, ezk, hooanon05, James Morris, Bruce Fields,
	Steve Dickson, Trond Myklebust

"Sorin Faibish" <sfaibish@emc.com> writes:

> On Sun, 10 Jul 2011 04:23:17 -0400, Ric Wheeler <ricwheeler@gmail.com>
> wrote:
>> There is a proposed (at the IETF) standard called "labelled NFS"
>> that would allow the protocol to handle xattrs.
> Will be included in NFSv4.2. And we are already very close to a good I-D.
> Not sure that xattr change mentioned here will be included. You can look
> at the current I-D at:
> http://datatracker.ietf.org/doc/draft-quigley-nfsv4-labeled/

I skimmed the draft, and it looks like being mostly generic enough to
support any xattr, not just security labels.

But the naming and some of the requirements (such as notifying clients
on label change) are very much security label specific, forcing generic
xattr support into this protocol might not be a good idea.

I see that NFSv4 also has named attributes, which are conceptually
similar to linux xattr, but the APIs are not easily synchronized.

Doing xattr as a new protocol extension would be much easier and
cleaner, IMO.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-08 12:57     ` Miklos Szeredi
  2011-07-10  8:23       ` Ric Wheeler
@ 2011-07-10 11:16       ` Hans-Peter Jansen
  2011-07-12 16:15         ` Miklos Szeredi
  1 sibling, 1 reply; 74+ messages in thread
From: Hans-Peter Jansen @ 2011-07-10 11:16 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andrew Morton, viro, torvalds, linux-fsdevel, linux-kernel, apw,
	nbd, neilb, hramrach, jordipujolp, ezk, hooanon05

On Friday 08 July 2011, 14:57:09 Miklos Szeredi wrote:
> "Hans-Peter Jansen" <hpj@urpla.net> writes:
> > All kodos to you, Miklos. While I'm still missing a major feature
> > from overlayfs that is a NFS as upper layer, it provides a fairly
> > good start. A commitment from you, that such an extension is
> > considered for inclusion - given, that it appears one day - is
> > appreciated. Also, since xattr support is available for NFS,
>
> AFAIK development of generic xattr support on NFS stopped some time
> ago.
>
> > it would be nice to outline, what is missing for such an
> > implementation from overlayfs's POV.
>
> Allow using namspace polluting xattr replacements, such as aufs is
> doing.
>
> But why?  Why is it better to do the overlaying on the client instead
> of the server?

Exporting layered filesystems via NFS suffered from many problems 
traditionally, because that permuted NFS export issues of the server FS 
in use (say xfs) with FS layering issues. Since I'm doing diskless 
computing for more then two decades now, I always persued for lowering 
complexity, and/or localize it. Layering on the client is done with the 
latter in mind. While the basic concept of layered FS is sound, 
especially, things like mmapping and splicing cause hard to track down 
and problems, that are even harder to solve properly. 

Do you have experiences with NFS exported overlay FSs already? If that 
proves stable, does scale, and a client is able to survive a server 
reboot, layering on the server is a sexy approach of course (I hate to 
being forced to maintain my own kernel flavors for diskless clients, 
while I love to track the Linux kernel progress in general..).

Does a openSUSE build service kernel project exist with overlayfs 
included? If I read the patch correctly, it's not possible to just bake 
overlayfs as a standalone KMP ATM.

Let's-get-it-in-for-3.1-please'ly yours,
Pete

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-10 11:16       ` Hans-Peter Jansen
@ 2011-07-12 16:15         ` Miklos Szeredi
  0 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-07-12 16:15 UTC (permalink / raw)
  To: Hans-Peter Jansen
  Cc: Andrew Morton, viro, torvalds, linux-fsdevel, linux-kernel, apw,
	nbd, neilb, hramrach, jordipujolp, ezk, hooanon05

"Hans-Peter Jansen" <hpj@urpla.net> writes:

> On Friday 08 July 2011, 14:57:09 Miklos Szeredi wrote:
>> Allow using namspace polluting xattr replacements, such as aufs is
>> doing.
>>
>> But why?  Why is it better to do the overlaying on the client instead
>> of the server?
>
> Exporting layered filesystems via NFS suffered from many problems 
> traditionally, because that permuted NFS export issues of the server FS 
> in use (say xfs) with FS layering issues. Since I'm doing diskless 
> computing for more then two decades now, I always persued for lowering 
> complexity, and/or localize it. Layering on the client is done with the 
> latter in mind. While the basic concept of layered FS is sound, 
> especially, things like mmapping and splicing cause hard to track down 
> and problems, that are even harder to solve properly. 
>
> Do you have experiences with NFS exported overlay FSs already? If that 
> proves stable, does scale, and a client is able to survive a server 
> reboot, layering on the server is a sexy approach of course (I hate to 
> being forced to maintain my own kernel flavors for diskless clients, 
> while I love to track the Linux kernel progress in general..).

Well, you are right, exporting an overlay has its own complexities.

Abstracting the whiteout/opaque flags behind an implementation that can
use xattr or plain files sounds pretty easy to do in comparison.  I'll
look into that.  But this is again a feature that needs to go in a
later.

> Does a openSUSE build service kernel project exist with overlayfs 
> included? If I read the patch correctly, it's not possible to just bake 
> overlayfs as a standalone KMP ATM.

No, it needs small VFS modifications, so a standalone module doesn't
work.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

[parent not found: <4540f7aa16724111bd792a1d577261c2@HUBCAS1.cs.stonybrook.edu>]

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
       [not found] ` <4540f7aa16724111bd792a1d577261c2@HUBCAS1.cs.stonybrook.edu>
@ 2011-06-16  6:51   ` Erez Zadok
  2011-06-16  9:45     ` Michal Suchanek
                       ` (3 more replies)
  0 siblings, 4 replies; 74+ messages in thread
From: Erez Zadok @ 2011-06-16  6:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Miklos Szeredi, viro@ZenIV.linux.org.uk Viro, Linus Torvalds,
	linux-fsdevel, linux-kernel, apw, nbd, neilb, hramrach,
	jordipujolp, mszeredi, J. R. Okajima

On Jun 8, 2011, at 3:32 PM, Andrew Morton wrote:

> On Wed,  1 Jun 2011 14:46:13 +0200
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
>> I'd like to ask for overlayfs to be merged into 3.1.
> 
> Dumb questions:
> 
> I've never really understood the need for fs overlaying.  Who wants it?
> What are the use-cases?

A fair question, Andrew.  I've read the entire thread and will try to address multiple issues in this one reply.

(A) USEFULNESS OF UNIONING:

Unioning is a very useful feature, used by many many people world wide.  Largest users are live-cds.  Then there are people who like to use unioning for its snapshotting/recovery abilities (e.g., revert a bad package installation).  Also some diskless NFS people like to use unioning (one shared readonly image, CoW on clients).  And more (the unionfs web page lists 40 projects that used/using unionfs at one point or another). When we started unionfs back in 2003, we were very quickly surprised at all the many ways in which people found creative ways to use it.  Bottom line is, this kind of file-based CoW is very useful.

If anyone is still not convinced that unioning is useful, consider all the many different implementations of it that had been attempted over the years: in-kernel file-system based, VFS-based, fuse-based, and various variants.  And consider how much debate this topic has generated too. :-)

One often mentioned metric asked of people who want to merge their code into linux is "do you have users and how long have they used it?"  By this metric, unioning should have been in mainline 7+ years ago.  It's way overdue.

> This sort of thing could be implemented in userspace and wired up via
> fuse, I assume.  Has that been attempted and why is it inadequate?

(B) APPROACHES TO UNIONING

Userspace file systems are very useful for many, esp. for prototyping, but not for unioning, for all the reasons people outlined already.  So the question is, which in-kernel approach is The Right One[™].  I'm afraid that's the million-dollar question no one in this community had been able to answer yet.

My group, Juniro and his team, and I have spent a huge amount of time over the years developing a standalone stackable file system based approach.  These approaches were rejected largely due to their complexity and large size (often over 10KLoC) which make it hard to review (we all know that code reviewers on linux forums are in very short supply).  That complexity and size were necessary due to the various features users have asked for over the years.  Aside for large/complex code base, some people plain didn't like unioning as a stackable file system; those people often suggested the VFS as the best location for this functionality.  There is some merit to a VFS based approach: unioning performs a fair amount of namespace manipulation (merging directories, eliminating duplications, whiteouts and opaques, etc.), and the VFS is often best suited for complex namespace operations.

Val, Jan, Bharata, and others have spent untold amounts of time trying to develop a VFS-based approach.  In lieu of stackable file system approaches, I was hoping to see those VFS-based approaches get the support needed to get merged, and yet they have not.  It appears that development of the VFS-based approaches has stalled, sadly.  To be fair, I have argued before that adding a lot of code to the VFS "just" to support unioning was a bad idea (see http://lkml.org/lkml/2007/12/13/242).  And I also felt that it was going to be hard to support approaches which required changes (however small) to many individual file systems (e.g., to add native whiteout support). But was I'd have been happy to see a VFS-based approach get merged, if the Powers That Be[™] sanctioned it.

Then there were suggestions to support unioning inside certain file systems like cramfs/tmpfs (see http://lkml.org/lkml/2008/6/2/6). I argued that these approaches were not good either because they required changes to existing stable file systems, and that it wouldn't have provided enough useful unioning features for users to switch to.

And now we have overlayfs, a small stackable file system which requires only small VFS changes and very small changes to other file systems (e.g., tmpfs xattr support, which was merged recently).  At this point in time, overlayfs seems to be the running favorite for merging.  How ironic it is that we've come full circle to the standalone in-kernel stackable file system approach.

(C) ABOUT OVERLAYFS

I've reviewed overlayfs's code.  I found it easy enough to follow that I was able to fix a few bugs and add a feature or two.  It's small enough to be easily reviewed.  I therefore argue that we should NOT try and add a ton of features to overlayfs now, but rather review it as is, consider merging it soon, and gradually add features over time (BTW, I just counted, and ecryptfs, another stackable f/s, grew by over 55% in LoC since first merged).

I also tested overlayfs and it's remarkably stable (and fast) given the features it offers and the relatively short amount of time it's been out compared to other solutions.  I ran LTP-full on overlayfs for several days and couldn't get it to crash (to be fair, some tests didn't pass — which I plan to report).  Either way, kudos to Miklos for such stable and functional code.

Overlayfs is already being used by others who find it useful.  I think the features it already has are good enough for a large user base of unioning solutions.

Andrew, you've asked another good question:

> Another issue: there have been numerous attempts at Linux overlay
> filesystems from numerous parties.  Does (or will) this implementation
> satisfy all their requirements?
> 
> Because if not, we're in a situation where the in-kernel code is
> unfixably inadequate so we end up merging another similar-looking
> thing, or the presence of this driver makes it harder for them to get
> other drivers merged and the other parties' requirements remain
> unsatisfied.

Because this debate has taken so long (8 years now), there is a large user base of unioning already, and several solutions.  By and large, most of these solutions offer the same basic set of features — only that implementations and approaches differ.  Many users couldn't care less how it was implemented as long as it gave them the basic features they needed.  For that reason, I believe that if this community (and the VFS Gods :-) finally decide WHICH APPROACH they liked best and finally just merge something, this'll have two very positive effects.  First, users who need the basic features will start migrating to the new in-kernel unioning solution.  It won't be an overnight migration, but that's fine. Second, and more important, I think the collective resources of this community can finally focus their attention on ONE solution and help make it better, add features, etc.  Frankly, unioning is the kind of problem for which I don't see a need for more than one in-kernel solution, because the basic features users need are all the same.  Sure, some people would continue to use the out-of-mainline solutions because of extra features they offer that the in-kernel solution doesn't, but over time, as newer features are carefully added to the in-kernel solution, more people will migrate to it.

Some might argue that it's good to have many competing solutions. Generally I agree. But in this case I think we have too many competing unioning solutions now, and the community is too splintered in its efforts; instead we should get behind one sanctioned solution and help make it best.  Recently some people "dared" to suggest we consider removing ext2/3 from the code b/c they're old and less used, in order to make the the linux code base smaller and more easily maintainable.  If that's the case, then why have more than one unioning solution in Linux?

Cheers,
Erez.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-16  6:51   ` Erez Zadok
@ 2011-06-16  9:45     ` Michal Suchanek
  2011-06-16 10:45     ` Jordi Pujol
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 74+ messages in thread
From: Michal Suchanek @ 2011-06-16  9:45 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Andrew Morton, Miklos Szeredi, viro@ZenIV.linux.org.uk Viro,
	Linus Torvalds, linux-fsdevel, linux-kernel, apw, nbd, neilb,
	jordipujolp, mszeredi, J. R. Okajima

On 16 June 2011 08:51, Erez Zadok <ezk@fsl.cs.sunysb.edu> wrote:
> On Jun 8, 2011, at 3:32 PM, Andrew Morton wrote:

>
> Val, Jan, Bharata, and others have spent untold amounts of time trying to develop a VFS-based approach.  In lieu of stackable file system approaches, I was hoping to see those VFS-based approaches get the support needed to get merged, and yet they have not.  It appears that development of the VFS-based approaches has stalled, sadly.  To be fair, I have argued before that adding a lot of code to the VFS "just" to support unioning was a bad idea (see http://lkml.org/lkml/2007/12/13/242).  And I also felt that it was going to be hard to support approaches which required changes (however small) to many individual file systems (e.g., to add native whiteout support). But was I'd have been happy to see a VFS-based approach get merged, if the Powers That Be[™] sanctioned it.

Is there any reason why unionmount could not use the xattrs which are
natively supported by most filesystem already like overlayfs does?
That would cut on the sheer number of patches required to get it
working.

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-16  6:51   ` Erez Zadok
  2011-06-16  9:45     ` Michal Suchanek
@ 2011-06-16 10:45     ` Jordi Pujol
  2011-06-16 15:15     ` J. R. Okajima
       [not found]     ` <b624059d70d546d4a4ecb940613235ab@HUBCAS2.cs.stonybrook.edu>
  3 siblings, 0 replies; 74+ messages in thread
From: Jordi Pujol @ 2011-06-16 10:45 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Andrew Morton, Miklos Szeredi, viro@ZenIV.linux.org.uk Viro,
	Linus Torvalds, linux-fsdevel, linux-kernel, apw, nbd, neilb,
	hramrach, mszeredi, J. R. Okajima

A Dijous, 16 de juny de 2011 08:51:39, Erez Zadok va escriure:
> (A) USEFULNESS OF UNIONING:
> 
> Unioning is a very useful feature,

like an user of union filesystem modules I completely agree with Erez opinions.

I will say that Overlayfs responds to the needs of a Live system, both for 
daily operations and for making a new remaster also.

It is true that Overlayfs is a young program, and these last few days have 
been found some small problems that Miklos has solved promptly.

According to developers, it is assumed that there are other issues, such as  
links, or the modification of the lower branches of filesystems, etc; but they 
do not appear in a Live system that has been properly prepared taking into 
account those limitations, already documented.

Overlayfs now works correctly, I use it on some computers on a daily basis and 
have not found any problems.

Currently I install all my Linux computers in hard drive copying a file that 
contains the compressed operating system, then the computer uses an union file 
system to boot; that has many advantages and some drawbacks, one of these is 
that the programmer must wait for the corresponding version of the union 
module while a new version of the kernel has been already released; Overlayfs 
has greatly simplified source code and an average user can adapt it to a new 
version making small changes.

For this type of installation no more characteristics are needed in Overlayfs, 
the next step would be to include it in kernel as a staging module. This way 
people will refine some details of its operation and later maybe could add some 
other functionality.

Thanks,

Jordi Pujol

Live never ending Tale
GNU/Linux Live forever!
http://livenet.selfip.com

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-16  6:51   ` Erez Zadok
  2011-06-16  9:45     ` Michal Suchanek
  2011-06-16 10:45     ` Jordi Pujol
@ 2011-06-16 15:15     ` J. R. Okajima
  2011-06-16 16:09       ` Miklos Szeredi
       [not found]     ` <b624059d70d546d4a4ecb940613235ab@HUBCAS2.cs.stonybrook.edu>
  3 siblings, 1 reply; 74+ messages in thread
From: J. R. Okajima @ 2011-06-16 15:15 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Andrew Morton, Miklos Szeredi, viro@ZenIV.linux.org.uk Viro,
	Linus Torvalds, linux-fsdevel, linux-kernel, apw, nbd, neilb,
	hramrach, jordipujolp, mszeredi

Erez Zadok:
> (B) APPROACHES TO UNIONING
	:::
> My group, Juniro and his team, and I have spent a huge amount of time =

Oh, I have no team, no co-worker.

> over the years developing a standalone stackable file system based =
> approach.  These approaches were rejected largely due to their =
	:::
> location for this functionality.  There is some merit to a VFS based =
> approach: unioning performs a fair amount of namespace manipulation =
> (merging directories, eliminating duplications, whiteouts and opaques, =
> etc.), and the VFS is often best suited for complex namespace =
> operations.

Exactly.
I understand everybody likes simpler patch, and I have no objection to
merge UnionMount into mainline. But this union-type-mount approach has
some demerit which I have posted before. Those are inherited by
overlayfs too, and Miklos called it "unPOSIXy behavior". I think the
most part of the cause of these behaviour came from its design or
architecture. At the same time, that is one reason I chose
union-type-filesystem. In other words, there surely exists several
issues which are hard to implement if we don't adopt
union-type-filesystem (I never say it is impossible since someone else
may get a new idea someday).

> (C) ABOUT OVERLAYFS
>
> I've reviewed overlayfs's code.  I found it easy enough to follow that I =
> was able to fix a few bugs and add a feature or two.  It's small enough =
> to be easily reviewed.  I therefore argue that we should NOT try and add =
> a ton of features to overlayfs now, but rather review it as is, consider =
> merging it soon, and gradually add features over time (BTW, I just =

I agree that is one good way among several possible ways.
But I think those missing features or "unPOSIXy behavior" are important
and essentially necessary. For me, the current feature set of overlayfs
looks like aufs many years ago when I started thinking about
unioning. Aufs tried making those unPOSIXy behavior into correct
behaviour for years and it was satisfied in the middle of aufs1 era.
I don't know how next few years of overlayfs will be. It may be similar
to the history of aufs, or totally different one.
The priority of a feature to support direct-modification on a member is
not so high. The correct behaviour is most important I think.

Additionally the number of members may be important too. Overlayfs
supports only two members currently. When a user wants more layers,
he has to mount another overlayfs over overlayfs. Since it is
essentially equivalent to a recursive function call internally, and of
course the stack size in kernel space is limited, I don't think it is
good.

Also Miklos replied and said modifying the credentials internally does
no harm to other threads. But I am still afraid it a security hole since
the credentials is shared among threads. If I had time, I would test it
by myself.

J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-16 15:15     ` J. R. Okajima
@ 2011-06-16 16:09       ` Miklos Szeredi
  2011-06-16 22:59         ` J. R. Okajima
  2011-07-08 14:40         ` Miklos Szeredi
  0 siblings, 2 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-16 16:09 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Erez Zadok, Andrew Morton, viro@ZenIV.linux.org.uk Viro,
	Linus Torvalds, linux-fsdevel, linux-kernel, apw, nbd, neilb,
	hramrach, jordipujolp

"J. R. Okajima" <hooanon05@yahoo.co.jp> writes:

>> over the years developing a standalone stackable file system based =
>> approach.  These approaches were rejected largely due to their =
> 	:::
>> location for this functionality.  There is some merit to a VFS based =
>> approach: unioning performs a fair amount of namespace manipulation =
>> (merging directories, eliminating duplications, whiteouts and opaques, =
>> etc.), and the VFS is often best suited for complex namespace =
>> operations.
>
> Exactly.
> I understand everybody likes simpler patch, and I have no objection to
> merge UnionMount into mainline. But this union-type-mount approach has
> some demerit which I have posted before. Those are inherited by
> overlayfs too, and Miklos called it "unPOSIXy behavior". I think the
> most part of the cause of these behaviour came from its design or
> architecture.

Yes, overlayfs shares some of the basic architecture of union-mounts.
The most important such property is that when a file is copied up, it's
like replacing the file with a new one:

   cp /foo/bar /tmp/ttt
   mv -f /tmp/ttt /foo/bar

Which is exactly the thing that some editors do when saving a modified
file, so most applications should handle this behavior fine.  The truth
is a bit more complicated and the effect of the copy-up is more like
this:

   cp /foo/bar /tmp/ttt
   mount --bind /tmp/ttt /foo/bar

> Additionally the number of members may be important too. Overlayfs
> supports only two members currently. When a user wants more layers,
> he has to mount another overlayfs over overlayfs. Since it is
> essentially equivalent to a recursive function call internally, and of
> course the stack size in kernel space is limited, I don't think it is
> good.

Good point about stack space.

Adding multiple read-only layers should be really easy, and could be one
of the first extensions after the merge.

> Also Miklos replied and said modifying the credentials internally does
> no harm to other threads. But I am still afraid it a security hole since
> the credentials is shared among threads. If I had time, I would test it
> by myself.

The credentials of the current task are not modified but replaced by
new, temporary credentials.  This will only have an affect on a single
thread.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-16 16:09       ` Miklos Szeredi
@ 2011-06-16 22:59         ` J. R. Okajima
  2011-07-08 14:40         ` Miklos Szeredi
  1 sibling, 0 replies; 74+ messages in thread
From: J. R. Okajima @ 2011-06-16 22:59 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Erez Zadok, Andrew Morton, viro@ZenIV.linux.org.uk Viro,
	Linus Torvalds, linux-fsdevel, linux-kernel, apw, nbd, neilb,
	hramrach, jordipujolp


Miklos Szeredi:
> file, so most applications should handle this behavior fine.  The truth
> is a bit more complicated and the effect of the copy-up is more like
> this:
>
>    cp /foo/bar /tmp/ttt
>    mount --bind /tmp/ttt /foo/bar

Good example.
'mount' instead of simple 'cp' means struct stat.st_dev differs.
And it will make some applications confused such as find -x, du -x, df,
chmod/own -R, rm -r, or etc.


> The credentials of the current task are not modified but replaced by
> new, temporary credentials.  This will only have an affect on a single
> thread.

I see.
I think I can understand about multi threads. Other threads will not be
affected. I will test about a signal handler someday.


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-16 16:09       ` Miklos Szeredi
  2011-06-16 22:59         ` J. R. Okajima
@ 2011-07-08 14:40         ` Miklos Szeredi
  2011-07-09 12:18           ` J. R. Okajima
  1 sibling, 1 reply; 74+ messages in thread
From: Miklos Szeredi @ 2011-07-08 14:40 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Erez Zadok, Andrew Morton, viro@ZenIV.linux.org.uk Viro,
	Linus Torvalds, linux-fsdevel, linux-kernel, apw, nbd, neilb,
	hramrach, jordipujolp

Miklos Szeredi <miklos@szeredi.hu> writes:

> "J. R. Okajima" <hooanon05@yahoo.co.jp> writes:
>
>> Additionally the number of members may be important too. Overlayfs
>> supports only two members currently. When a user wants more layers,
>> he has to mount another overlayfs over overlayfs. Since it is
>> essentially equivalent to a recursive function call internally, and of
>> course the stack size in kernel space is limited, I don't think it is
>> good.
>
> Good point about stack space.

Here's a patch to limit stacking overlayfs instances on top of each
other and on ecryptfs to prevent kernel stack overflow.

Thanks,
Miklos


Subject: fs: limit filesystem stacking depth

From: Miklos Szeredi <mszeredi@suse.cz>

Add a simple read-only counter to super_block that indicates deep this
is in the stack of filesystems.  Previously ecryptfs was the only
stackable filesystem and it explicitly disallowed multiple layers of
itself.

Overlayfs, however, can be stacked recursively and also may be stacked
on top of ecryptfs or vice versa.

To limit the kernel stack usage we must limit the depth of the
filesystem stack.  Initially the limit is set to 2.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ecryptfs/main.c   |    7 +++++++
 fs/overlayfs/super.c |   10 ++++++++++
 include/linux/fs.h   |   11 +++++++++++
 3 files changed, 28 insertions(+)

Index: linux-2.6/fs/ecryptfs/main.c
===================================================================
--- linux-2.6.orig/fs/ecryptfs/main.c	2011-07-08 12:45:21.000000000 +0200
+++ linux-2.6/fs/ecryptfs/main.c	2011-07-08 12:45:27.000000000 +0200
@@ -525,6 +525,13 @@ static struct dentry *ecryptfs_mount(str
 	s->s_maxbytes = path.dentry->d_sb->s_maxbytes;
 	s->s_blocksize = path.dentry->d_sb->s_blocksize;
 	s->s_magic = ECRYPTFS_SUPER_MAGIC;
+	s->s_stack_depth = path.dentry->d_sb->s_stack_depth + 1;
+
+	rc = -EINVAL;
+	if (s->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
+		printk(KERN_ERR "eCryptfs: maximum fs stacking depth exceeded\n");
+		goto out_free;
+	}
 
 	inode = ecryptfs_get_inode(path.dentry->d_inode, s);
 	rc = PTR_ERR(inode);
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2011-07-08 12:45:21.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2011-07-08 12:45:27.000000000 +0200
@@ -480,6 +480,12 @@ struct iattr {
  */
 #include <linux/quota.h>
 
+/*
+ * Maximum number of layers of fs stack.  Needs to be limited to
+ * prevent kernel stack overflow
+ */
+#define FILESYSTEM_MAX_STACK_DEPTH 2
+
 /** 
  * enum positive_aop_returns - aop return codes with specific semantics
  *
@@ -1438,6 +1444,11 @@ struct super_block {
 	 * Saved pool identifier for cleancache (-1 means none)
 	 */
 	int cleancache_poolid;
+
+	/*
+	 * Indicates how deep in a filesystem stack this SB is
+	 */
+	int s_stack_depth;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);
Index: linux-2.6/fs/overlayfs/super.c
===================================================================
--- linux-2.6.orig/fs/overlayfs/super.c	2011-07-07 16:01:47.000000000 +0200
+++ linux-2.6/fs/overlayfs/super.c	2011-07-08 12:51:29.000000000 +0200
@@ -545,6 +545,16 @@ static int ovl_fill_super(struct super_b
 	    !S_ISDIR(lowerpath.dentry->d_inode->i_mode))
 		goto out_put_lowerpath;
 
+	sb->s_stack_depth = max(upperpath.mnt->mnt_sb->s_stack_depth,
+				lowerpath.mnt->mnt_sb->s_stack_depth) + 1;
+
+	err = -EINVAL;
+	if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
+		printk(KERN_ERR "overlayfs: maximum fs stacking depth exceeded\n");
+		goto out_put_lowerpath;
+	}
+
+
 	ufs->upper_mnt = clone_private_mount(&upperpath);
 	err = PTR_ERR(ufs->upper_mnt);
 	if (IS_ERR(ufs->upper_mnt)) {

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-08 14:40         ` Miklos Szeredi
@ 2011-07-09 12:18           ` J. R. Okajima
  2011-07-15 10:59             ` Miklos Szeredi
  0 siblings, 1 reply; 74+ messages in thread
From: J. R. Okajima @ 2011-07-09 12:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Erez Zadok, Andrew Morton, viro@ZenIV.linux.org.uk Viro,
	Linus Torvalds, linux-fsdevel, linux-kernel, apw, nbd, neilb,
	hramrach, jordipujolp


Miklos Szeredi:
> Here's a patch to limit stacking overlayfs instances on top of each
> other and on ecryptfs to prevent kernel stack overflow.

I don't think it a good idea to introduce such new member to generic
struct super_block.
- the new member is unrelated to most of other fs.
- ecryptfs already rejects such nests by checking
  (sb->s_type == &ecryptfs_fs_type).
Instead I'd suggest you to introduce a new small test function,
something like
int test_nested(sb)
{
	return sb->s_magic == ECRYPTFS_SUPER_MAGIC
		|| sb->s_type == &ovl_fs_type;
}
Of course "#ifdef CONFIG_ECRYPT_FS" or something should be added too.

If overlayfs had its own SUPER_MAGIC number, it might be better to test it
instead of s_type. But there is no such magic number currently, and I am
afraid intoducing it may affect stat/statfs for overlayfs which you
might dislike.


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-07-09 12:18           ` J. R. Okajima
@ 2011-07-15 10:59             ` Miklos Szeredi
  0 siblings, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-07-15 10:59 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Erez Zadok, Andrew Morton, viro@ZenIV.linux.org.uk Viro,
	Linus Torvalds, linux-fsdevel, linux-kernel, apw, nbd, neilb,
	hramrach, jordipujolp

"J. R. Okajima" <hooanon05@yahoo.co.jp> writes:

> Miklos Szeredi:
>> Here's a patch to limit stacking overlayfs instances on top of each
>> other and on ecryptfs to prevent kernel stack overflow.
>
> I don't think it a good idea to introduce such new member to generic
> struct super_block.
> - the new member is unrelated to most of other fs.
> - ecryptfs already rejects such nests by checking
>   (sb->s_type == &ecryptfs_fs_type).
> Instead I'd suggest you to introduce a new small test function,
> something like
> int test_nested(sb)
> {
> 	return sb->s_magic == ECRYPTFS_SUPER_MAGIC
> 		|| sb->s_type == &ovl_fs_type;
> }

I don't want to prevent stacking completely, only limit it.  And the
only sane way to do that is with a counter in super_block.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

[parent not found: <b624059d70d546d4a4ecb940613235ab@HUBCAS2.cs.stonybrook.edu>]

[parent not found: <BF42D8D9-B947-448A-8818-BCA786E75325@fsl.cs.sunysb.edu>]

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
       [not found]       ` <BF42D8D9-B947-448A-8818-BCA786E75325@fsl.cs.sunysb.edu>
@ 2011-06-16 23:41         ` J. R. Okajima
       [not found]         ` <ab75a25c918145569b721dea9aea5506@HUBCAS2.cs.stonybrook.edu>
  1 sibling, 0 replies; 74+ messages in thread
From: J. R. Okajima @ 2011-06-16 23:41 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Andrew Morton, Miklos Szeredi, viro@ZenIV.linux.org.uk Viro,
	Linus Torvalds, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, apw@canonical.com, nbd@openwrt.org,
	neilb@suse.de, hramrach@centrum.cz, jordipujolp@gmail.com,
	mszeredi@suse.cz

Erez Zadok:
> My point is that Overlayfs has ENOUGH useful features NOW to be merged. =
> What stops it from going in?! More freeping creaturisms? Why do we need =
	:::

As I wrote before, I have no objection about merging overlayfs or
UnionMount. My point is they have unioning feature but don't have some
of essential filesystem features. I don't think it is a trade-off or
something.
As you and other people wrote, many years passed in unioning. The very
basic features are already achieved in very early stage. The point is
how normal filesystem features are designed and implemented.
I am discussing about the design and feature of unioning, but don't stop
merging overlayfs.

> We cannot ask Overlayfs to support all of the features that other =
> solutions have, b/c it may take a very long time to get those in when =

Agreed, particularly union-specifc extra features.
Actually I am not asking overlayfs to support all features aufs has. You
may think what I am doing as a design review.

> The vast majority of unioning users want 2 layers, one readonly, one =
> read-write. Those who really want 3+ layers can use stack Overlayfs =
> multiple times: yes it'd be less efficient, but so what? First we want =

I don't think consuming stack space is efficiency issue.

> We all have to accept a solution that's pretty good NOW but less than =
> perfect.  Otherwise we'll continue to have these debates and discussions =
> for years on end.

If you think merging overlayfs means the end of discussion, then I won't
agree. It may be a beginning.

J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

[parent not found: <ab75a25c918145569b721dea9aea5506@HUBCAS2.cs.stonybrook.edu>]

[parent not found: <BF19F4F8-9E0F-4983-87C1-BB1B0A11D011@fsl.cs.sunysb.edu>]

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
       [not found]           ` <BF19F4F8-9E0F-4983-87C1-BB1B0A11D011@fsl.cs.sunysb.edu>
@ 2011-06-17  1:49             ` J. R. Okajima
  0 siblings, 0 replies; 74+ messages in thread
From: J. R. Okajima @ 2011-06-17  1:49 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Andrew Morton, Miklos Szeredi, viro@ZenIV.linux.org.uk Viro,
	Linus Torvalds, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, apw@canonical.com, nbd@openwrt.org,
	neilb@suse.de, hramrach@centrum.cz, jordipujolp@gmail.com,
	mszeredi@suse.cz

Erez Zadok:
> OK. Then I believe you and I are in agreement:
>
> - Overlayfs has useful features to be merged now; no objections here.
> - Other features can be added later on.

Yes, that is what I wrote.

> If, however, you feel that Overlayfs has some fundamental design flaws =
> that prevent important future features from being added easily after a =
> merge, then please outline such design flaws.

I have made some suggestions and pointed out several issues actually.
If my English is enough poor to make you misunderstood, then please
point out with quoting.

But Erez, don't you remember that you requested me to promise not to
submit aufs for inclusion into mainline until unionfs gets accepted and
I agreed? Do you really think I am bothering overlayfs?

Here is a suggestion about your patch for ovl_show_options().
It is better to call d_path() or something instead of copying and
holding the paths, since overlayfs already has mnt and dentry. And with
d_path(), show_options will be able to follow even if the upper/lower
mount is moved.

J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

[parent not found: <20110609125114.8dff08da.akpm@linux-foundation.org>]

* Re: Fw: Re: [PATCH 0/7] overlay filesystem: request for inclusion
       [not found] <20110609125114.8dff08da.akpm@linux-foundation.org>
@ 2011-06-10  6:57 ` Valerie Aurora
  2011-06-10  9:01   ` Alan Cox
  0 siblings, 1 reply; 74+ messages in thread
From: Valerie Aurora @ 2011-06-10  6:57 UTC (permalink / raw)
  To: Andrew Morton, Miklos Szeredi
  Cc: NeilBrown, viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	hramrach, jordipujolp, ezk

Andrew Morton <akpm@linux-foundation.org> wrote:
> Subject: Re: [PATCH 0/7] overlay filesystem: request for inclusion
>
>
> On Thu, 09 Jun 2011 14:47:49 +0200
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>> Andrew Morton <akpm@linux-foundation.org> writes:
>> > Another issue: there have been numerous attempts at Linux overlay
>> > filesystems from numerous parties.  Does (or will) this implementation
>> > satisfy all their requirements?
>>
>> Overlayfs aims to be the simplest possible but not simpler.
>>
>> I think the reason why "aufs" never had a real chance at getting merged
>> is because of feature creep.
>>
>> Of course I expect new features to be added to overlayfs after the
>> merge, but I beleive some of the features in those other solutions are
>> simply unnecessary.
>
> This is my main worry.  If overlayfs doesn't appreciably decrease the
> motivation to merge other unioned filesystems then we might end up with
> two similar-looking things.  And, I assume, the later and more
> fully-blown implementation might make overlayfs obsolete but by that
> time it will be hard to remove.
>
> So it would be interesting to hear the thoughts of the people who have
> been working on the other implementations.

1. Al Viro and Christoph Hellwig bring up the same locking problems
*every single time* someone proposes a copy-up in-kernel file system,
and *every single time* they are dismissed or hand-waved.  I'd like to
see the thread in which one of them says, "Why yes, you have
understood and solved that problem to my satisfaction," before even
considering merging something.

2. Overlayfs is not the simplest possible solution at present.  For
example, it currently does not prevent modification of the underlying
file system directories, which is absolutely required to prevent bugs
according to Al.  Al proposed a solution he was happy with (read-only
superblocks), I implemented it for union mounts, and I believe it can
be ported to overlayfs.  But that should happen *before* merging.

3. To my knowledge, Al is not currently able to reply to email on a
regular basis and Christoph doesn't frequently comment on the subject
these days.  Don't take their silence as approval.

I have many more possible comments but I don't think they should be
relevant to the discussion about merging.  Al and Christoph's
judgements should be sufficient.

-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-10  6:57 ` Fw: " Valerie Aurora
@ 2011-06-10  9:01   ` Alan Cox
  2011-06-15 11:19     ` Miklos Szeredi
  0 siblings, 1 reply; 74+ messages in thread
From: Alan Cox @ 2011-06-10  9:01 UTC (permalink / raw)
  To: Valerie Aurora
  Cc: Andrew Morton, Miklos Szeredi, NeilBrown, viro, torvalds,
	linux-fsdevel, linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

> 1. Al Viro and Christoph Hellwig bring up the same locking problems
> *every single time* someone proposes a copy-up in-kernel file system,
> and *every single time* they are dismissed or hand-waved.  

Perhaps you can detail the locking problem in question ?

Alan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-10  9:01   ` Alan Cox
@ 2011-06-15 11:19     ` Miklos Szeredi
  2011-06-15 14:32       ` J. R. Okajima
  0 siblings, 1 reply; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-15 11:19 UTC (permalink / raw)
  To: Alan Cox
  Cc: Valerie Aurora, Andrew Morton, NeilBrown, viro, torvalds,
	linux-fsdevel, linux-kernel, apw, nbd, hramrach, jordipujolp, ezk

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

>> 1. Al Viro and Christoph Hellwig bring up the same locking problems
>> *every single time* someone proposes a copy-up in-kernel file system,
>> and *every single time* they are dismissed or hand-waved.  
>
> Perhaps you can detail the locking problem in question ?

Pretty bizarre things can happen when the topology of the underlying
layers change after overlayfs acquired refs to underlying dentries.  I
think this is the case Val is talking about.

Example:

# mount -toverlayfs x -oupperdir=/upper,lowerdir=/lower /ovl
# mkdir -p /upper/a/b
# ls /ovl/a/b
# mv /upper/a/b /upper/
# mv /upper/a /upper/b/
# ls  /ovl/a/b
a

Apparently "a" became its own ancestor.

Overlayfs is careful not to assume anything about child/parent
relationships of underlying dentries.

For example "rmdir /ovl/a/b" will do the following:

  1. find the underlying dentry for "a" -> upper-a
  2. lock upper-a
  3. find the underlying dentry for "b" -> upper-b
  4. verify that upper-b is a child of upper-a
  5. remove upper-b

With the above example it will fail on step 4.  Changes to the
underlying filesystems are not supported and result in undefined
behavior.  But it should never result in BUGs or deadlocks.

The overall locking order is:

-> overlayfs locks
  -> upper fs locks
  -> lower fs locks

Within each filesystem the usual locking rules apply.

One more difficulty is copy-up.  This happens without being protected by
i_mutex on overlayfs.  The rules here are: 

 A. directory renames only succeed if both source and destination are
    only on the upper fs (never copied up)
 B. non-directory renames start with copy-up of source if necessary
 C. copy-up takes i_mutex on upper parent

During copy-up no ancestor will be renamed because of A.

The file being copied up may be moved concurrently, however.  If this
happens then copy-up will acquire i_mutex for either the old or the new
upper parent.  In the latter case the file has already been copied up.
In the former case the file may or may not have been copied up.  The
state of the file is checked after having locked the upper parent and
the copy-up skipped if it has already succeeded.

There may be flaws in the above reasoning or the implementation and
reviews are very welcome.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-15 11:19     ` Miklos Szeredi
@ 2011-06-15 14:32       ` J. R. Okajima
  2011-06-15 15:49         ` Miklos Szeredi
       [not found]         ` <803fd88dc28748428861b75afdee3575@HUBCAS1.cs.stonybrook.edu>
  0 siblings, 2 replies; 74+ messages in thread
From: J. R. Okajima @ 2011-06-15 14:32 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alan Cox, Valerie Aurora, Andrew Morton, NeilBrown, viro,
	torvalds, linux-fsdevel, linux-kernel, apw, nbd, hramrach,
	jordipujolp, ezk


Miklos Szeredi:
> For example "rmdir /ovl/a/b" will do the following:
>
>   1. find the underlying dentry for "a" -> upper-a
>   2. lock upper-a
>   3. find the underlying dentry for "b" -> upper-b
>   4. verify that upper-b is a child of upper-a
>   5. remove upper-b

It is good to verify in step 4.
Essentially (or ideally) this verification should be equivalent to all
of what VFS does before vfs_rmdir(). I know overlayfs makes the upper
mnt_want_write()-ed in early stage and keeps it. So it might be better
to lookup again (as step 3 and 4) instead of comparing d_parent
simply. If you think it is unnecessary to lookup here, then I'd suggest
you to make it option (choosable by user).

I see ovl_rmdir() does,
- lookup and unlink all whiteouts
- rmdir the target dir
- create a whiteout for the target
Right?
But I am afraid that any error can happen in every step on the upper
dir. And if it happens, then ovl_rmdir() returns the error but the dir
left in incomplete status. It may be one of these.
- some whiteouts are unlinked but others are left
- all whiteouts are gone but the target dir remains
- the target dir is removed but the whiteout is not created
Of course, it is bad and makes users really confused, since it will show
users things which should not be. At the same time, I don't know how
possible it can happen.

Anyway if you have read aufs, then you would know how aufs solves these
problems. I don't think the approaches in aufs is best or one and
only. I just could not get another good idea.


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-15 14:32       ` J. R. Okajima
@ 2011-06-15 15:49         ` Miklos Szeredi
  2011-06-15 16:14           ` J. R. Okajima
       [not found]         ` <803fd88dc28748428861b75afdee3575@HUBCAS1.cs.stonybrook.edu>
  1 sibling, 1 reply; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-15 15:49 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Alan Cox, Valerie Aurora, Andrew Morton, NeilBrown, viro,
	torvalds, linux-fsdevel, linux-kernel, apw, nbd, hramrach,
	jordipujolp, ezk

"J. R. Okajima" <hooanon05@yahoo.co.jp> writes:

> Miklos Szeredi:
>> For example "rmdir /ovl/a/b" will do the following:
>>
>>   1. find the underlying dentry for "a" -> upper-a
>>   2. lock upper-a
>>   3. find the underlying dentry for "b" -> upper-b
>>   4. verify that upper-b is a child of upper-a
>>   5. remove upper-b
>
> It is good to verify in step 4.
> Essentially (or ideally) this verification should be equivalent to all
> of what VFS does before vfs_rmdir(). I know overlayfs makes the upper
> mnt_want_write()-ed in early stage and keeps it. So it might be better
> to lookup again (as step 3 and 4) instead of comparing d_parent
> simply. If you think it is unnecessary to lookup here, then I'd suggest
> you to make it option (choosable by user).

The parent verification is only to make sure the locking is correct.
It's not to make sure that modifications of underlying filesystems will
have sane semantics.

Until someone comes up with a sane use case for allowing modification of
underlying filesystem I won't bother with that.

> I see ovl_rmdir() does,
> - lookup and unlink all whiteouts
> - rmdir the target dir
> - create a whiteout for the target
> Right?

Not quite.

 - checks if directory is empty (all lower entries are whiteouted)
 - marks directory opaque
 - unlinks whiteouts
 - rmdir
 - create whiteout

> But I am afraid that any error can happen in every step on the upper
> dir. And if it happens, then ovl_rmdir() returns the error but the dir
> left in incomplete status. It may be one of these.
> - some whiteouts are unlinked but others are left
> - all whiteouts are gone but the target dir remains
> - the target dir is removed but the whiteout is not created
> Of course, it is bad and makes users really confused, since it will show
> users things which should not be. At the same time, I don't know how
> possible it can happen.

Atomic whiteout and atomic copy-up would be nice, that's one feature I'm
willing to think about.

> Anyway if you have read aufs, then you would know how aufs solves these
> problems. I don't think the approaches in aufs is best or one and
> only. I just could not get another good idea.

Rollback on failure is an incomplete solution, rollback itself can fail.
And it doesn't protect against machine crashing in the middle of
operation.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-15 15:49         ` Miklos Szeredi
@ 2011-06-15 16:14           ` J. R. Okajima
  2011-06-15 17:20             ` Michal Suchanek
  0 siblings, 1 reply; 74+ messages in thread
From: J. R. Okajima @ 2011-06-15 16:14 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alan Cox, Valerie Aurora, Andrew Morton, NeilBrown, viro,
	torvalds, linux-fsdevel, linux-kernel, apw, nbd, hramrach,
	jordipujolp, ezk


Miklos Szeredi:
> Rollback on failure is an incomplete solution, rollback itself can fail.
> And it doesn't protect against machine crashing in the middle of
> operation.

Maybe you are right.
But do you think rollback is unnecessary since it is an incomplete
solution?

And you might not have read about the approach in aufs, which tries
reducing the operations in rollback.

(from '[RFC 2/8] Aufs2: structure' in 2009
      <http://marc.info/?l=linux-kernel&m=123537453514896&w=2>)
----------------------------------------
In aufs, rmdir(2) and rename(2) for dir uses whiteout alternatively.
In order to make several functions in a single systemcall to be
revertible, aufs adopts an approach to rename a directory to a temporary
unique whiteouted name.
For example, in rename(2) dir where the target dir already existed, aufs
renames the target dir to a temporary unique whiteouted name before the
actual rename on a branch and then handles other actions (make it opaque,
update the attributes, etc). If an error happens in these actions, aufs
simply renames the whiteouted name back and returns an error. If all are
succeeded, aufs registers a function to remove the whiteouted unique
temporary name completely and asynchronously to the system global
workqueue.
----------------------------------------


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-15 16:14           ` J. R. Okajima
@ 2011-06-15 17:20             ` Michal Suchanek
  2011-06-15 18:12               ` Miklos Szeredi
  2011-06-16  2:43               ` J. R. Okajima
  0 siblings, 2 replies; 74+ messages in thread
From: Michal Suchanek @ 2011-06-15 17:20 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Miklos Szeredi, Alan Cox, Valerie Aurora, Andrew Morton,
	NeilBrown, viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	jordipujolp, ezk

On 15 June 2011 18:14, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:
>
> Miklos Szeredi:
>> Rollback on failure is an incomplete solution, rollback itself can fail.
>> And it doesn't protect against machine crashing in the middle of
>> operation.
>
> Maybe you are right.
> But do you think rollback is unnecessary since it is an incomplete
> solution?
>
> And you might not have read about the approach in aufs, which tries
> reducing the operations in rollback.
>
> (from '[RFC 2/8] Aufs2: structure' in 2009
>      <http://marc.info/?l=linux-kernel&m=123537453514896&w=2>)
> ----------------------------------------
> In aufs, rmdir(2) and rename(2) for dir uses whiteout alternatively.
> In order to make several functions in a single systemcall to be
> revertible, aufs adopts an approach to rename a directory to a temporary
> unique whiteouted name.
> For example, in rename(2) dir where the target dir already existed, aufs
> renames the target dir to a temporary unique whiteouted name before the

This is generally not possible in solutions that don't reserve any filenames.

However, it should be possible to create whiteout of a non-existent
entry in a directory while it is locked without affecting userspace.

> actual rename on a branch and then handles other actions (make it opaque,
> update the attributes, etc). If an error happens in these actions, aufs
> simply renames the whiteouted name back and returns an error. If all are
> succeeded, aufs registers a function to remove the whiteouted unique
> temporary name completely and asynchronously to the system global
> workqueue.

Removing the whiteout asynchronously does not seem like a good idea.
It should be gone before the directory containing the whiteout is
unlocked. Otherwise there might be an entry created which conflicts
with this whiteout that did not exist when the operation started. Also
if you unlock the directory while the artifical whiteout exists an
asynchronous process might replace the whiteout and the rollback would
fail.

As an alternative way to perform atomic renames I would suggest
"fallthrough symlinks". If you want to rename an entry which is
"fallthrough" (ie pointing to the entry with the same name in the
lower layer in the same directory) you can replace it with a
"fallthrough symlink" which points to the lower layer and does not
just implicitly say "here" but specifies a path relative to the
mountpoint instead. This can then be moved like any other entry. it is
in no way special anymore. Moving a directory tree which is partially
in the upper layer is still time-consuming but can be performed with
reasonable semantics imho. You perform a preparation step during which
nothing seems to change from the user's point of view and at the very
end you just move the directory.

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-15 17:20             ` Michal Suchanek
@ 2011-06-15 18:12               ` Miklos Szeredi
  2011-06-16  2:43               ` J. R. Okajima
  1 sibling, 0 replies; 74+ messages in thread
From: Miklos Szeredi @ 2011-06-15 18:12 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: J. R. Okajima, Alan Cox, Valerie Aurora, Andrew Morton, NeilBrown,
	viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	jordipujolp, ezk

Michal Suchanek <hramrach@centrum.cz> writes:

> On 15 June 2011 18:14, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:
>> For example, in rename(2) dir where the target dir already existed, aufs
>> renames the target dir to a temporary unique whiteouted name before the
>
> This is generally not possible in solutions that don't reserve any filenames.
>
> However, it should be possible to create whiteout of a non-existent
> entry in a directory while it is locked without affecting userspace.

Yes, creation of whiteout and renaming it to target or vice versa works
if target is non-directory.

Cases where this trick could make operations atomic:

 - create/mknod/symlink/link over whiteout
 - rename non-directory to whiteout
 - remove of non-directory with whiteout creation
 - copy up

Cases where atomicity is not possible with this:

 - mkdir over whiteout
 - rename directory to whiteout
 - rename where source needs whiteout
 - rmdir with whiteout creation


>> actual rename on a branch and then handles other actions (make it opaque,
>> update the attributes, etc). If an error happens in these actions, aufs
>> simply renames the whiteouted name back and returns an error. If all are
>> succeeded, aufs registers a function to remove the whiteouted unique
>> temporary name completely and asynchronously to the system global
>> workqueue.
>
> Removing the whiteout asynchronously does not seem like a good idea.
> It should be gone before the directory containing the whiteout is
> unlocked. Otherwise there might be an entry created which conflicts
> with this whiteout that did not exist when the operation started. Also
> if you unlock the directory while the artifical whiteout exists an
> asynchronous process might replace the whiteout and the rollback would
> fail.
>
> As an alternative way to perform atomic renames I would suggest
> "fallthrough symlinks". If you want to rename an entry which is
> "fallthrough" (ie pointing to the entry with the same name in the
> lower layer in the same directory) you can replace it with a
> "fallthrough symlink" which points to the lower layer and does not
> just implicitly say "here" but specifies a path relative to the
> mountpoint instead. This can then be moved like any other entry. it is
> in no way special anymore.

This is a nice idea, but doesn't have a lot to do with atomicity.  It
allows rename of non-pure upper directory (they return EXDEV currently).

> Moving a directory tree which is partially
> in the upper layer is still time-consuming but can be performed with
> reasonable semantics imho.

Shouldn't be time consuming, really.  The upper, mixed directory is
renamed and given a "trusted.overlay.redirect" attribute to show where
its lower directory resides.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-15 17:20             ` Michal Suchanek
  2011-06-15 18:12               ` Miklos Szeredi
@ 2011-06-16  2:43               ` J. R. Okajima
  2011-06-16 10:35                 ` Michal Suchanek
  1 sibling, 1 reply; 74+ messages in thread
From: J. R. Okajima @ 2011-06-16  2:43 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Miklos Szeredi, Alan Cox, Valerie Aurora, Andrew Morton,
	NeilBrown, viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	jordipujolp, ezk


Michal Suchanek:
> This is generally not possible in solutions that don't reserve any filename=
> s.
>
> However, it should be possible to create whiteout of a non-existent
> entry in a directory while it is locked without affecting userspace.

Actually aufs generates a doubly whiteouted unique name dynamically for
the target dir. For instance, when rmdir("dirA") aufs does,
- lock i_mutex of the parent dir of dirA on the real fs
- some verifycations for the parent-child relationship
- some tests whether we can do rmdir
- create whiteout for dirA
- rename dirA to .wh..wh.XXXXXXXX (random value in hex), after making
  sure the name doesn't exist
- unlock the parent dir
- return to VFS
And then the async workqueue removes the .wh..wh.XXXXXXXX dir with some
whiteouts under it.

It means the temporary whiteout name is,
- always unique
- always hidden (from users), even if it remains accidentally
So even if an error happens in the async work, it doesn't matter.

Additionally there is a userspace script called "auchk" which is like
fsck for real fs. auchk script checks the logical consistency on the
(writable) real fs, and removes the illegal whiteouts, remained
pseudo-links, and remained temp files.


> As an alternative way to perform atomic renames I would suggest
> "fallthrough symlinks". If you want to rename an entry which is

Symlink?
Is it a different thing from DCACHE_FALLTHRU in UnionMount?
I am afraid a special symlink is fragile or dangerous.
Its special meaning is valid in inner union world only, is it? If
something in outer world gets changed, we may not follow the symlink
anymore or follow something different unexpectedly. Is it acceptable?


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-16  2:43               ` J. R. Okajima
@ 2011-06-16 10:35                 ` Michal Suchanek
  2011-06-16 15:15                   ` J. R. Okajima
  0 siblings, 1 reply; 74+ messages in thread
From: Michal Suchanek @ 2011-06-16 10:35 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Miklos Szeredi, Alan Cox, Valerie Aurora, Andrew Morton,
	NeilBrown, viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	jordipujolp, ezk

On 16 June 2011 04:43, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:
>
> Michal Suchanek:
>> This is generally not possible in solutions that don't reserve any filename=
>> s.
>>
>> However, it should be possible to create whiteout of a non-existent
>> entry in a directory while it is locked without affecting userspace.
>
> Actually aufs generates a doubly whiteouted unique name dynamically for
> the target dir. For instance, when rmdir("dirA") aufs does,
> - lock i_mutex of the parent dir of dirA on the real fs
> - some verifycations for the parent-child relationship
> - some tests whether we can do rmdir
> - create whiteout for dirA
> - rename dirA to .wh..wh.XXXXXXXX (random value in hex), after making

Probably swap the two above, you can't make a whiteout in presence of
the directory, right?
Anyway, you could just mark dirA as whiteout and remove any whiteouts
contained in it asynchronously, and only jump through these hoops when
trying to create a new entry in place of non-empty whiteout, or sync
on emptying the old whiteout before making a new entry.

>  sure the name doesn't exist
> - unlock the parent dir
> - return to VFS
> And then the async workqueue removes the .wh..wh.XXXXXXXX dir with some
> whiteouts under it.
>
> It means the temporary whiteout name is,
> - always unique
> - always hidden (from users), even if it remains accidentally
> So even if an error happens in the async work, it doesn't matter.

Yes, it can only cause pollution with whiteouts unrelated to any files
that ever existed which is not too much of an issue unless people want
to add random stuff to the lower layer and see it in the union when
they reconstruct it again.

>
> Additionally there is a userspace script called "auchk" which is like
> fsck for real fs. auchk script checks the logical consistency on the
> (writable) real fs, and removes the illegal whiteouts, remained
> pseudo-links, and remained temp files.
>
>
>> As an alternative way to perform atomic renames I would suggest
>> "fallthrough symlinks". If you want to rename an entry which is
>
> Symlink?
> Is it a different thing from DCACHE_FALLTHRU in UnionMount?

Yes, the fallthru in unionmount only says "look below here", it cannot
point to a  different place in the filesystem.

> I am afraid a special symlink is fragile or dangerous.
> Its special meaning is valid in inner union world only, is it? If

It is only valid when in the upper layer of a union. However, so is
whiteout, and so are files that were visible in the union but are not
visible in the top layer if examined separately, outside of the union.

It must be accepted that the top layer is different from the union,
otherwise you want a copy, not a union.

> something in outer world gets changed, we may not follow the symlink
> anymore or follow something different unexpectedly. Is it acceptable?

That' the whole idea behind symlinks, and also unions which implicitly
link the lower layer into the upper to present the result as a single
directory tree.

Anyway, the motivation behind the "fallthru symlink" is that you need
not copy-up on seemingly trivial operations like rename, touch, etc.
which both makes them more efficient and easier to get atomic. As I
understand it copy-up is the operation that causes the most issues and
with "fallthru symlinks" you need it only for operations that are
expected to modify something non-trivially.

Obviously, this is not so nice for zero sized files but they should be
handled the same way for consistency I guess. Also metada that can be
conveniently recorded on the fallthru entry would make touch fast but
would hide possible later updates to the lower layer so it might be
not good solution for all use cases. For throwaway tmpfs, however, any
optimization counts.

Seriously, the overlayfs documents that it can have opaque directories
but I don't see what they would be used for. There is no way to turn a
directory opaque with normal userspace operation afaict.
It has no explicit fallthrus, at least not documented so to have any
level of consistency it should always check the lower layer because it
can grow some new directories when the union is deconstructed, offline
modified, and reconstructed (which is supported use case according to
the docs).

Thanks

Michal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-16 10:35                 ` Michal Suchanek
@ 2011-06-16 15:15                   ` J. R. Okajima
  2011-06-17  7:38                     ` Michal Suchanek
  0 siblings, 1 reply; 74+ messages in thread
From: J. R. Okajima @ 2011-06-16 15:15 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Miklos Szeredi, Alan Cox, Valerie Aurora, Andrew Morton,
	NeilBrown, viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	jordipujolp, ezk


Michal Suchanek:
> Probably swap the two above, you can't make a whiteout in presence of
> the directory, right?
> Anyway, you could just mark dirA as whiteout and remove any whiteouts
> contained in it asynchronously, and only jump through these hoops when
> trying to create a new entry in place of non-empty whiteout, or sync
> on emptying the old whiteout before making a new entry.

Unfortunately I cannot understand what you wrote.

First, the order of
> - create whiteout for dirA
> - rename dirA to .wh..wh.XXXXXXXX
is correct and I think it should be, in order to make a little help for
fsck/auchk.
And what is "non-empty whiteout" and "emptying the old whiteout"?
The whiteout is a size zero-ed and hardlinked regular file in aufs.


> Yes, it can only cause pollution with whiteouts unrelated to any files
> that ever existed which is not too much of an issue unless people want
> to add random stuff to the lower layer and see it in the union when
> they reconstruct it again.

??
Do you think that the .wh..wh.XXXXXXXX hides something on the lower
layer? If so, it is wrong. Such doubly whiteout hides nothing except
itself.


> It is only valid when in the upper layer of a union. However, so is
> whiteout, and so are files that were visible in the union but are not
> visible in the top layer if examined separately, outside of the union.

Do you mean that your special symlink has totally different file-type
from a symlink?
Anyway what I want to say is, what such symlink refers may differ
from what users originally expect. But I may misunderstand what you call
"fallthru symlink".


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-16 15:15                   ` J. R. Okajima
@ 2011-06-17  7:38                     ` Michal Suchanek
  2011-06-20  0:43                       ` J. R. Okajima
  0 siblings, 1 reply; 74+ messages in thread
From: Michal Suchanek @ 2011-06-17  7:38 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Miklos Szeredi, Alan Cox, Valerie Aurora, Andrew Morton,
	NeilBrown, viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	jordipujolp, ezk

On 16 June 2011 17:15, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:
>
> Michal Suchanek:
>> Probably swap the two above, you can't make a whiteout in presence of
>> the directory, right?
>> Anyway, you could just mark dirA as whiteout and remove any whiteouts
>> contained in it asynchronously, and only jump through these hoops when
>> trying to create a new entry in place of non-empty whiteout, or sync
>> on emptying the old whiteout before making a new entry.
>
> Unfortunately I cannot understand what you wrote.
>
> First, the order of
>> - create whiteout for dirA
>> - rename dirA to .wh..wh.XXXXXXXX
> is correct and I think it should be, in order to make a little help for

Yes, it's correct for aufs which uses reserved file names for whiteouts.

Filesystems that don't reserve filenames cannot make whiteout for an
existing entry but aufs can.

> fsck/auchk.
> And what is "non-empty whiteout" and "emptying the old whiteout"?
> The whiteout is a size zero-ed and hardlinked regular file in aufs.

Is there any reason why a directory cannot be whiteout?

>
>
>> Yes, it can only cause pollution with whiteouts unrelated to any files
>> that ever existed which is not too much of an issue unless people want
>> to add random stuff to the lower layer and see it in the union when
>> they reconstruct it again.
>
> ??
> Do you think that the .wh..wh.XXXXXXXX hides something on the lower
> layer? If so, it is wrong. Such doubly whiteout hides nothing except
> itself.

It may possibly hide a XXXXXXXX file if it is later added to the lower layer.

But if .wh.XXXXXXXX is in itself a reserved filename that is never
brought up from the lower layer then this is a non-issue, it works
consistently regardless of existence of the superfluous whiteout.

>
>
>> It is only valid when in the upper layer of a union. However, so is
>> whiteout, and so are files that were visible in the union but are not
>> visible in the top layer if examined separately, outside of the union.
>
> Do you mean that your special symlink has totally different file-type
> from a symlink?

Just as whiteout has totally different file-type from a file. It's
specific to the union.

> Anyway what I want to say is, what such symlink refers may differ
> from what users originally expect. But I may misunderstand what you call
> "fallthru symlink".

How is this different from other files that are taken from the lower
layer and not copied into the upper layer?

If you are concerned about that you want a full copy, not a union.

Thanks

Michal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-17  7:38                     ` Michal Suchanek
@ 2011-06-20  0:43                       ` J. R. Okajima
  0 siblings, 0 replies; 74+ messages in thread
From: J. R. Okajima @ 2011-06-20  0:43 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Miklos Szeredi, Alan Cox, Valerie Aurora, Andrew Morton,
	NeilBrown, viro, torvalds, linux-fsdevel, linux-kernel, apw, nbd,
	jordipujolp, ezk


Michal Suchanek:
> Is there any reason why a directory cannot be whiteout?

Just to reduce consuming inodes.


> It may possibly hide a XXXXXXXX file if it is later added to the lower layer.

No, because it is "doubly" whiteouted.


> Just as whiteout has totally different file-type from a file. It's
> specific to the union.

Ok, we are talking about different whiteouts.


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

[parent not found: <803fd88dc28748428861b75afdee3575@HUBCAS1.cs.stonybrook.edu>]

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
       [not found]         ` <803fd88dc28748428861b75afdee3575@HUBCAS1.cs.stonybrook.edu>
@ 2011-06-16  0:44           ` Erez Zadok
  2011-06-16  3:07             ` J. R. Okajima
  0 siblings, 1 reply; 74+ messages in thread
From: Erez Zadok @ 2011-06-16  0:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: J. R. Okajima, Alan Cox, Valerie Aurora, Andrew Morton, NeilBrown,
	viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	apw@canonical.com, nbd@openwrt.org, hramrach@centrum.cz,
	jordipujolp@gmail.com

On Jun 15, 2011, at 8:49 AM, Miklos Szeredi wrote:

> It's not to make sure that modifications of underlying filesystems will
> have sane semantics.

Miklos, I agree with you. I think it makes perfect sense for overlayfs at this point not to bother with users who modify lower files directly, and expect sane semantics at the upper layer.  Most unioning users do NOT do that anyway, but the few who do cause unioning code to be much more complex.

That said, if users do go and add/del files below overlayfs, it shouldn't oops...

I've often been bothered by people who suggested that stackable file systems must solve the "cache coherency" problem and must perfectly detect lower-level changes consistently.  A lot of code in Unionfs is spent on cache coherency.

Ecryptfs has been around for several years, and I've yet to see the masses scream for upper/lower layer consistency.  NFS works just fine for years and no one expects changes to server-side disk-based file system to be reflected immediately and correctly on al clients.  Asking overlayfs or other stackable file systems to solve this multi-layer coherency perfectly is somewhat ridiculous: we don't expect file systems like ext3 to detect and correctly handle changes to lower devices — i.e., if someone hand-edited direct blocks in /dev/sda1, do we?

The union mount team also tried to handle this issue (the so-called "really really NFS readonly" idea).  And it's really hard — and unnecessary for v1 of a unioning solution.

Skip this can of worms, Miklos.  And you'll be happier for it. :-)  I think your approach of keeping Overlayfs small for now is absolutely the right way.

Cheers,
Erez.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] overlay filesystem: request for inclusion
  2011-06-16  0:44           ` Erez Zadok
@ 2011-06-16  3:07             ` J. R. Okajima
  0 siblings, 0 replies; 74+ messages in thread
From: J. R. Okajima @ 2011-06-16  3:07 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Miklos Szeredi, Alan Cox, Valerie Aurora, Andrew Morton,
	NeilBrown, viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	apw@canonical.com, nbd@openwrt.org, hramrach@centrum.cz,
	jordipujolp@gmail.com


Erez Zadok:
> ...  Asking =
> overlayfs or other stackable file systems to solve this multi-layer =
> coherency perfectly is somewhat ridiculous: we don't expect file systems =
> like ext3 to detect and correctly handle changes to lower devices =97 =
> i.e., if someone hand-edited direct blocks in /dev/sda1, do we?

I agree with you if we discuss about union-type-mount, which handles a
block device as its member. As long as the layered-fs handles a
directory (mounted filesystem) as its member, it is obviously right that
users expect the modification on the member fs (by-passing a union) is
available.

Of course I agree it brings complication to us, and I'd suggest three
level options to support this issue.
- detect the direct changes and reflect it to union (hardest option)
- skip the detection, but verify the parent-child relationship or more
  at least. (this is something like overlayfs is trying to do)
- skip both of the detection and verification (lowest option)
  this option depends how user sets up the union and its member. if user
  hides the members totally by over-mounting an empty dir on the member
  (or something), then he can specify this option. otherwise, this
  option is dangerous. also some symlinks may not work.
  # mkdir /hide
  # mount -o upper=/rw,lower=/ro none /union
  # mount -o bind /hide /rw
  # mount -o bind /hide /ro


J. R. Okajima

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2011-07-15 15:16 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-01 12:46 [PATCH 0/7] overlay filesystem: request for inclusion Miklos Szeredi
2011-06-01 12:46 ` [PATCH 1/7] vfs: add i_op->open() Miklos Szeredi
2011-06-01 12:46 ` [PATCH 2/7] vfs: export do_splice_direct() to modules Miklos Szeredi
2011-06-01 12:46 ` [PATCH 3/7] vfs: introduce clone_private_mount() Miklos Szeredi
2011-06-01 12:46 ` [PATCH 4/7] overlay filesystem Miklos Szeredi
2011-06-01 12:46 ` [PATCH 5/7] overlayfs: add statfs support Miklos Szeredi
2011-06-01 12:46 ` [PATCH 6/7] overlayfs: implement show_options Miklos Szeredi
2011-06-01 12:46 ` [PATCH 7/7] overlay: overlay filesystem documentation Miklos Szeredi
2011-06-08 22:32 ` [PATCH 0/7] overlay filesystem: request for inclusion Andrew Morton
2011-06-09  1:59   ` NeilBrown
2011-06-09  3:52     ` Andrew Morton
2011-06-09 12:47       ` Miklos Szeredi
2011-06-09 19:38         ` Andrew Morton
2011-06-09 19:49           ` Felix Fietkau
2011-06-09 22:02           ` Miklos Szeredi
2011-06-10  3:48             ` J. R. Okajima
2011-06-10  9:31               ` Francis Moreau
2011-06-16 18:27                 ` Ric Wheeler
2011-06-10 10:19               ` Michal Suchanek
2011-06-12  7:44                 ` J. R. Okajima
2011-06-13 18:48               ` Miklos Szeredi
2011-07-08 14:44                 ` Miklos Szeredi
2011-07-08 15:21                   ` Tomas M
2011-07-09 12:22                   ` J. R. Okajima
2011-07-15 12:33                     ` Miklos Szeredi
2011-07-15 13:02                       ` J. R. Okajima
2011-07-15 13:04                         ` J. R. Okajima
2011-07-15 13:07                         ` Miklos Szeredi
2011-07-15 13:33                           ` J. R. Okajima
2011-07-15 15:16                             ` Miklos Szeredi
2011-06-09 13:49       ` Andy Whitcroft
2011-06-09 19:32         ` Andrew Morton
2011-06-09 19:40           ` Linus Torvalds
2011-06-09 20:17             ` Miklos Szeredi
2011-06-09 22:58               ` Anton Altaparmakov
2011-06-11  2:39                 ` Greg KH
2011-06-12 20:51                   ` Anton Altaparmakov
2011-06-10 11:51           ` Bernd Schubert
2011-06-10 12:45             ` Michal Suchanek
2011-06-10 12:54               ` Bernd Schubert
2011-06-09 13:57       ` Michal Suchanek
2011-06-09 13:57       ` Andy Whitcroft
2011-07-05 19:54   ` Hans-Peter Jansen
2011-07-08 12:57     ` Miklos Szeredi
2011-07-10  8:23       ` Ric Wheeler
2011-07-10 13:55         ` Sorin Faibish
2011-07-12 15:59           ` Miklos Szeredi
2011-07-10 11:16       ` Hans-Peter Jansen
2011-07-12 16:15         ` Miklos Szeredi
     [not found] ` <4540f7aa16724111bd792a1d577261c2@HUBCAS1.cs.stonybrook.edu>
2011-06-16  6:51   ` Erez Zadok
2011-06-16  9:45     ` Michal Suchanek
2011-06-16 10:45     ` Jordi Pujol
2011-06-16 15:15     ` J. R. Okajima
2011-06-16 16:09       ` Miklos Szeredi
2011-06-16 22:59         ` J. R. Okajima
2011-07-08 14:40         ` Miklos Szeredi
2011-07-09 12:18           ` J. R. Okajima
2011-07-15 10:59             ` Miklos Szeredi
     [not found]     ` <b624059d70d546d4a4ecb940613235ab@HUBCAS2.cs.stonybrook.edu>
     [not found]       ` <BF42D8D9-B947-448A-8818-BCA786E75325@fsl.cs.sunysb.edu>
2011-06-16 23:41         ` J. R. Okajima
     [not found]         ` <ab75a25c918145569b721dea9aea5506@HUBCAS2.cs.stonybrook.edu>
     [not found]           ` <BF19F4F8-9E0F-4983-87C1-BB1B0A11D011@fsl.cs.sunysb.edu>
2011-06-17  1:49             ` J. R. Okajima
     [not found] <20110609125114.8dff08da.akpm@linux-foundation.org>
2011-06-10  6:57 ` Fw: " Valerie Aurora
2011-06-10  9:01   ` Alan Cox
2011-06-15 11:19     ` Miklos Szeredi
2011-06-15 14:32       ` J. R. Okajima
2011-06-15 15:49         ` Miklos Szeredi
2011-06-15 16:14           ` J. R. Okajima
2011-06-15 17:20             ` Michal Suchanek
2011-06-15 18:12               ` Miklos Szeredi
2011-06-16  2:43               ` J. R. Okajima
2011-06-16 10:35                 ` Michal Suchanek
2011-06-16 15:15                   ` J. R. Okajima
2011-06-17  7:38                     ` Michal Suchanek
2011-06-20  0:43                       ` J. R. Okajima
     [not found]         ` <803fd88dc28748428861b75afdee3575@HUBCAS1.cs.stonybrook.edu>
2011-06-16  0:44           ` Erez Zadok
2011-06-16  3:07             ` J. R. Okajima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).