[GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups
@ 2008-04-01 21:06 Erez Zadok
  2008-04-01 21:06 ` [PATCH 01/14] VFS: export release_open_intent as GPL symbol, not regular symbol Erez Zadok
                   ` (14 more replies)
  0 siblings, 15 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch

The following is a series of patchsets related to Unionfs.  Two of the
patches are VFS changes (and one of those two is rather minor).  There are
two major changes to Unionfs in this release:

1. Based on recommendations we've received while at LSF'08, we've changed
   Unionfs to use the vm_operations->fault method, instead of
   address_space_operations.  This makes unionfs's code smaller, runs faster
   (reduce memory pressure), and removes a number of potential races.  This
   change is significant, but is only an internal implementation change: it
   shouldn't change any user-visible behavior.  (Note: this change was
   inspired in part by Junjiro Okajima's work and suggestions -- thank you.)

   This change required some workarounds to VFS deficiencies for filesystems
   that want to use vm_ops->fault instead of address_space_operations.  In
   the next few weeks we'll be submitting patches that try to address these
   small deficiencies more cleanly.  One of those changes was necessary now
   (patch 02 of this patchset which exports vfs_splice* helpers).

2. Unionfs now implements a "delete-all" policy.  This means that if an
   object (e.g., a regular file) exists in multiple writable branches,
   unionfs will attempt to delete all instances of the same object, not just
   the first one it finds.  This helps to reduce the total number of
   whiteouts that unionfs has to create, and saves on the number of inodes
   consumed.  Of course, if a branch is marked or mounted read-only, then
   unionfs won't delete an object there, and only then will it fall back to
   creating a whiteout.  Since most users use only one writable branch in
   the unionfs, this change will probably affect only a small subset of
   people.  (Note: unionfs-odf already implements this delete-all policy, so
   this change here helps to synchronize the behavior of the two releases.)

These patches were tested (where appropriate) on 2.6.25-rc7, MM (mmotm
stamp-2008-04-01-02-39), as well as the backports to
2.6.{24,23,22,21,20,19,18,9} on ext2/3/4, xfs, reiserfs, nfs2/3/4, jffs2,
ramfs, tmpfs, cramfs, and squashfs (where available).  Also tested with
LTP-full-20080229 and with a continuous parallel kernel compile (while
forcing cache flushing, manipulating lower branches, etc.).  See
http://unionfs.filesystems.org/ to download back-ported unionfs code.

Please pull from the 'master' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/ezk/unionfs.git

to receive the following:

Erez Zadok (14):
      VFS: export release_open_intent as GPL symbol, not regular symbol
      VFS: rename do_splice_to/from to vfs_splice_* and export symbols
      Unionfs: implement splice_read/write methods directly
      Unionfs: implement vm_operations->fault
      Unionfs: lock our dentry in file operations
      Unionfs: reduce number of whiteouts by deleting all instances of files
      Unionfs: don't copy parent inode times in setattr
      Unionfs: use __func__ instead of __FUNCTION__
      Unionfs: use noinline_for_stack
      Unionfs: document reasons for opaque directories
      Unionfs: display mount point name along with generation number
      Unionfs: do not over-decrement lower superblock refs on remount
      Unionfs: don't purge lower sb data on remount
      Unionfs: update lower mnts on rmdir with copyup

 Documentation/filesystems/unionfs/concepts.txt |   61 ++++
 fs/namei.c                                     |    2 
 fs/splice.c                                    |   20 -
 fs/unionfs/commonfops.c                        |   32 --
 fs/unionfs/dentry.c                            |   13 
 fs/unionfs/dirfops.c                           |   18 -
 fs/unionfs/file.c                              |  203 +++++++++++++--
 fs/unionfs/inode.c                             |    3 
 fs/unionfs/lookup.c                            |   10 
 fs/unionfs/main.c                              |    7 
 fs/unionfs/mmap.c                              |  338 +++----------------------
 fs/unionfs/super.c                             |   38 +-
 fs/unionfs/union.h                             |   25 +
 fs/unionfs/unlink.c                            |  109 +++++---
 include/linux/splice.h                         |    5 
 15 files changed, 447 insertions(+), 437 deletions(-)

Thanks.
---
Erez Zadok
ezk@cs.sunysb.edu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/14] VFS: export release_open_intent as GPL symbol, not regular symbol
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 02/14] VFS: rename do_splice_to/from to vfs_splice_* and export symbols Erez Zadok
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/namei.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 00078e4..e421fb9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -399,7 +399,7 @@ void release_open_intent(struct nameidata *nd)
 	else
 		fput(nd->intent.open.file);
 }
-EXPORT_SYMBOL(release_open_intent);
+EXPORT_SYMBOL_GPL(release_open_intent);
 
 static inline struct dentry *
 do_revalidate(struct dentry *dentry, struct nameidata *nd)
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 02/14] VFS: rename do_splice_to/from to vfs_splice_* and export symbols
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
  2008-04-01 21:06 ` [PATCH 01/14] VFS: export release_open_intent as GPL symbol, not regular symbol Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 03/14] Unionfs: implement splice_read/write methods directly Erez Zadok
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

A stackable file system which uses vm_ops->fault, and does not implement
address_space_operations, cannot use generic_file_splice_read/write, but has
to implement ->splice_read/write itself.  These two helper functions are
very useful to such a module.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/splice.c            |   20 +++++++++++---------
 include/linux/splice.h |    5 +++++
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 0670c91..62b0db2 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -899,8 +899,8 @@ EXPORT_SYMBOL(generic_splice_sendpage);
 /*
  * Attempt to initiate a splice from pipe to file.
  */
-static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
-			   loff_t *ppos, size_t len, unsigned int flags)
+long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
+		     loff_t *ppos, size_t len, unsigned int flags)
 {
 	int ret;
 
@@ -916,13 +916,14 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
 
 	return out->f_op->splice_write(pipe, out, ppos, len, flags);
 }
+EXPORT_SYMBOL_GPL(vfs_splice_from);
 
 /*
  * Attempt to initiate a splice from a file to a pipe.
  */
-static long do_splice_to(struct file *in, loff_t *ppos,
-			 struct pipe_inode_info *pipe, size_t len,
-			 unsigned int flags)
+long vfs_splice_to(struct file *in, loff_t *ppos,
+		   struct pipe_inode_info *pipe, size_t len,
+		   unsigned int flags)
 {
 	int ret;
 
@@ -938,6 +939,7 @@ static long do_splice_to(struct file *in, loff_t *ppos,
 
 	return in->f_op->splice_read(in, ppos, pipe, len, flags);
 }
+EXPORT_SYMBOL_GPL(vfs_splice_to);
 
 /**
  * splice_direct_to_actor - splices data directly between two non-pipes
@@ -1007,7 +1009,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 		size_t read_len;
 		loff_t pos = sd->pos;
 
-		ret = do_splice_to(in, &pos, pipe, len, flags);
+		ret = vfs_splice_to(in, &pos, pipe, len, flags);
 		if (unlikely(ret <= 0))
 			goto out_release;
 
@@ -1062,7 +1064,7 @@ static int direct_splice_actor(struct pipe_inode_info *pipe,
 {
 	struct file *file = sd->u.file;
 
-	return do_splice_from(pipe, file, &sd->pos, sd->total_len, sd->flags);
+	return vfs_splice_from(pipe, file, &sd->pos, sd->total_len, sd->flags);
 }
 
 /**
@@ -1136,7 +1138,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
 		} else
 			off = &out->f_pos;
 
-		ret = do_splice_from(pipe, out, off, len, flags);
+		ret = vfs_splice_from(pipe, out, off, len, flags);
 
 		if (off_out && copy_to_user(off_out, off, sizeof(loff_t)))
 			ret = -EFAULT;
@@ -1157,7 +1159,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
 		} else
 			off = &in->f_pos;
 
-		ret = do_splice_to(in, off, pipe, len, flags);
+		ret = vfs_splice_to(in, off, pipe, len, flags);
 
 		if (off_in && copy_to_user(off_in, off, sizeof(loff_t)))
 			ret = -EFAULT;
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 528dcb9..4b5727c 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -70,5 +70,10 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
 			      struct splice_pipe_desc *);
 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
 				      splice_direct_actor *);
+extern long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
+			    loff_t *ppos, size_t len, unsigned int flags);
+extern long vfs_splice_to(struct file *in, loff_t *ppos,
+			  struct pipe_inode_info *pipe, size_t len,
+			  unsigned int flags);
 
 #endif
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 03/14] Unionfs: implement splice_read/write methods directly
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
  2008-04-01 21:06 ` [PATCH 01/14] VFS: export release_open_intent as GPL symbol, not regular symbol Erez Zadok
  2008-04-01 21:06 ` [PATCH 02/14] VFS: rename do_splice_to/from to vfs_splice_* and export symbols Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 04/14] Unionfs: implement vm_operations->fault Erez Zadok
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

Must implement splice_read/write directly, using VFS helpers, because we can
no longer rely on generic_file_splice_read/write: they need
address_space_operations implemented, which we no longer have.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/file.c  |   59 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/unionfs/union.h |    1 +
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
index 0c424f6..1fcaf8e 100644
--- a/fs/unionfs/file.c
+++ b/fs/unionfs/file.c
@@ -165,6 +165,61 @@ out:
 	return err;
 }
 
+static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
+				   struct pipe_inode_info *pipe, size_t len,
+				   unsigned int flags)
+{
+	ssize_t err;
+	struct file *lower_file;
+
+	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	err = unionfs_file_revalidate(file, false);
+	if (unlikely(err))
+		goto out;
+
+	lower_file = unionfs_lower_file(file);
+	err = vfs_splice_to(lower_file, ppos, pipe, len, flags);
+	/* update our inode atime upon a successful lower splice-read */
+	if (err >= 0) {
+		fsstack_copy_attr_atime(file->f_path.dentry->d_inode,
+					lower_file->f_path.dentry->d_inode);
+		unionfs_check_file(file);
+	}
+
+out:
+	unionfs_check_file(file);
+	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	return err;
+}
+
+static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
+				    struct file *file, loff_t *ppos,
+				    size_t len, unsigned int flags)
+{
+	ssize_t err = 0;
+	struct file *lower_file;
+
+	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	err = unionfs_file_revalidate(file, true);
+	if (unlikely(err))
+		goto out;
+
+	lower_file = unionfs_lower_file(file);
+	err = vfs_splice_from(pipe, lower_file, ppos, len, flags);
+	/* update our inode times+sizes upon a successful lower write */
+	if (err >= 0) {
+		fsstack_copy_inode_size(file->f_path.dentry->d_inode,
+					lower_file->f_path.dentry->d_inode);
+		fsstack_copy_attr_times(file->f_path.dentry->d_inode,
+					lower_file->f_path.dentry->d_inode);
+		unionfs_check_file(file);
+	}
+
+out:
+	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	return err;
+}
+
 struct file_operations unionfs_main_fops = {
 	.llseek		= generic_file_llseek,
 	.read		= do_sync_read,
@@ -179,6 +234,6 @@ struct file_operations unionfs_main_fops = {
 	.release	= unionfs_file_release,
 	.fsync		= unionfs_fsync,
 	.fasync		= unionfs_fasync,
-	.splice_read	= generic_file_splice_read,
-	.splice_write	= generic_file_splice_write,
+	.splice_read	= unionfs_splice_read,
+	.splice_write	= unionfs_splice_write,
 };
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index 533806c..7b55e33 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -46,6 +46,7 @@
 #include <linux/poison.h>
 #include <linux/mman.h>
 #include <linux/backing-dev.h>
+#include <linux/splice.h>
 
 #include <asm/system.h>
 
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/14] Unionfs: implement vm_operations->fault
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (2 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 03/14] Unionfs: implement splice_read/write methods directly Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 05/14] Unionfs: lock our dentry in file operations Erez Zadok
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

As per recommendations made at LSF'08, a stackable file system which does
not change the data across layers, should try to use vm_operations instead
of address_space_operations.  This means we now have to implement out own
->read and ->write methods because we cannot rely on VFS helpers which
require us to have a ->readpage method.  Either way there are two caveats:

(1) It's not possible currently to set inode->i_mapping->a_ops to NULL,
because too many code paths expect i_mapping to be non-NULL.

(2) a small/dummy ->readpage is still needed because generic_file_mmap,
which we used in unionfs_mmap, still check for the existence of the
->readpage method.  These code paths may have to be changed to remove the
need for readpage().

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/file.c  |  109 +++++++++++++++--
 fs/unionfs/mmap.c  |  338 +++++++---------------------------------------------
 fs/unionfs/union.h |    4 +-
 3 files changed, 147 insertions(+), 304 deletions(-)

diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
index 1fcaf8e..9397ff3 100644
--- a/fs/unionfs/file.c
+++ b/fs/unionfs/file.c
@@ -18,17 +18,83 @@
 
 #include "union.h"
 
+static ssize_t unionfs_read(struct file *file, char __user *buf,
+			    size_t count, loff_t *ppos)
+{
+	int err;
+	struct file *lower_file;
+	struct dentry *dentry = file->f_path.dentry;
+
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
+	err = unionfs_file_revalidate_locked(file, false);
+	if (unlikely(err))
+		goto out;
+
+	lower_file = unionfs_lower_file(file);
+	err = vfs_read(lower_file, buf, count, ppos);
+	/* update our inode atime upon a successful lower read */
+	if (err >= 0) {
+		fsstack_copy_attr_atime(file->f_path.dentry->d_inode,
+					lower_file->f_path.dentry->d_inode);
+		unionfs_check_file(file);
+	}
+
+out:
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	return err;
+}
+
+static ssize_t unionfs_write(struct file *file, const char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	int err = 0;
+	struct file *lower_file;
+	struct dentry *dentry = file->f_path.dentry;
+
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
+	err = unionfs_file_revalidate_locked(file, true);
+	if (unlikely(err))
+		goto out;
+
+	lower_file = unionfs_lower_file(file);
+	err = vfs_write(lower_file, buf, count, ppos);
+	/* update our inode times+sizes upon a successful lower write */
+	if (err >= 0) {
+		fsstack_copy_inode_size(file->f_path.dentry->d_inode,
+					lower_file->f_path.dentry->d_inode);
+		fsstack_copy_attr_times(file->f_path.dentry->d_inode,
+					lower_file->f_path.dentry->d_inode);
+		unionfs_check_file(file);
+	}
+
+out:
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	return err;
+}
+
+
 static int unionfs_file_readdir(struct file *file, void *dirent,
 				filldir_t filldir)
 {
 	return -ENOTDIR;
 }
 
+int unionfs_readpage_dummy(struct file *file, struct page *page)
+{
+	BUG();
+	return -EINVAL;
+}
+
 static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	int err = 0;
 	bool willwrite;
 	struct file *lower_file;
+	struct vm_operations_struct *saved_vm_ops = NULL;
 
 	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
 
@@ -54,12 +120,39 @@ static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
 		err = -EINVAL;
 		printk(KERN_ERR "unionfs: branch %d file system does not "
 		       "support writeable mmap\n", fbstart(file));
-	} else {
-		err = generic_file_mmap(file, vma);
-		if (err)
-			printk(KERN_ERR
-			       "unionfs: generic_file_mmap failed %d\n", err);
+		goto out;
+	}
+
+	/*
+	 * find and save lower vm_ops.
+	 *
+	 * XXX: the VFS should have a cleaner way of finding the lower vm_ops
+	 */
+	if (!UNIONFS_F(file)->lower_vm_ops) {
+		err = lower_file->f_op->mmap(lower_file, vma);
+		if (err) {
+			printk(KERN_ERR "unionfs: lower mmap failed %d\n", err);
+			goto out;
+		}
+		saved_vm_ops = vma->vm_ops;
+		err = do_munmap(current->mm, vma->vm_start,
+				vma->vm_end - vma->vm_start);
+		if (err) {
+			printk(KERN_ERR "unionfs: do_munmap failed %d\n", err);
+			goto out;
+		}
+	}
+
+	file->f_mapping->a_ops = &unionfs_dummy_aops;
+	err = generic_file_mmap(file, vma);
+	file->f_mapping->a_ops = &unionfs_aops;
+	if (err) {
+		printk(KERN_ERR "unionfs: generic_file_mmap failed %d\n", err);
+		goto out;
 	}
+	vma->vm_ops = &unionfs_vm_ops;
+	if (!UNIONFS_F(file)->lower_vm_ops)
+		UNIONFS_F(file)->lower_vm_ops = saved_vm_ops;
 
 out:
 	if (!err) {
@@ -222,10 +315,8 @@ out:
 
 struct file_operations unionfs_main_fops = {
 	.llseek		= generic_file_llseek,
-	.read		= do_sync_read,
-	.aio_read	= generic_file_aio_read,
-	.write		= do_sync_write,
-	.aio_write	= generic_file_aio_write,
+	.read		= unionfs_read,
+	.write		= unionfs_write,
 	.readdir	= unionfs_file_readdir,
 	.unlocked_ioctl	= unionfs_ioctl,
 	.mmap		= unionfs_mmap,
diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c
index d6ac61e..07db5b0 100644
--- a/fs/unionfs/mmap.c
+++ b/fs/unionfs/mmap.c
@@ -19,316 +19,66 @@
 
 #include "union.h"
 
-static int unionfs_writepage(struct page *page, struct writeback_control *wbc)
-{
-	int err = -EIO;
-	struct inode *inode;
-	struct inode *lower_inode;
-	struct page *lower_page;
-	struct address_space *lower_mapping; /* lower inode mapping */
-	gfp_t mask;
-
-	BUG_ON(!PageUptodate(page));
-	inode = page->mapping->host;
-	/* if no lower inode, nothing to do */
-	if (!inode || !UNIONFS_I(inode) || UNIONFS_I(inode)->lower_inodes) {
-		err = 0;
-		goto out;
-	}
-	lower_inode = unionfs_lower_inode(inode);
-	lower_mapping = lower_inode->i_mapping;
-
-	/*
-	 * find lower page (returns a locked page)
-	 *
-	 * We turn off __GFP_FS while we look for or create a new lower
-	 * page.  This prevents a recursion into the file system code, which
-	 * under memory pressure conditions could lead to a deadlock.  This
-	 * is similar to how the loop driver behaves (see loop_set_fd in
-	 * drivers/block/loop.c).  If we can't find the lower page, we
-	 * redirty our page and return "success" so that the VM will call us
-	 * again in the (hopefully near) future.
-	 */
-	mask = mapping_gfp_mask(lower_mapping) & ~(__GFP_FS);
-	lower_page = find_or_create_page(lower_mapping, page->index, mask);
-	if (!lower_page) {
-		err = 0;
-		set_page_dirty(page);
-		goto out;
-	}
-
-	/* copy page data from our upper page to the lower page */
-	copy_highpage(lower_page, page);
-	flush_dcache_page(lower_page);
-	SetPageUptodate(lower_page);
-	set_page_dirty(lower_page);
-
-	/*
-	 * Call lower writepage (expects locked page).  However, if we are
-	 * called with wbc->for_reclaim, then the VFS/VM just wants to
-	 * reclaim our page.  Therefore, we don't need to call the lower
-	 * ->writepage: just copy our data to the lower page (already done
-	 * above), then mark the lower page dirty and unlock it, and return
-	 * success.
-	 */
-	if (wbc->for_reclaim) {
-		unlock_page(lower_page);
-		goto out_release;
-	}
-
-	BUG_ON(!lower_mapping->a_ops->writepage);
-	wait_on_page_writeback(lower_page); /* prevent multiple writers */
-	clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */
-	err = lower_mapping->a_ops->writepage(lower_page, wbc);
-	if (err < 0)
-		goto out_release;
-
-	/*
-	 * Lower file systems such as ramfs and tmpfs, may return
-	 * AOP_WRITEPAGE_ACTIVATE so that the VM won't try to (pointlessly)
-	 * write the page again for a while.  But those lower file systems
-	 * also set the page dirty bit back again.  Since we successfully
-	 * copied our page data to the lower page, then the VM will come
-	 * back to the lower page (directly) and try to flush it.  So we can
-	 * save the VM the hassle of coming back to our page and trying to
-	 * flush too.  Therefore, we don't re-dirty our own page, and we
-	 * never return AOP_WRITEPAGE_ACTIVATE back to the VM (we consider
-	 * this a success).
-	 *
-	 * We also unlock the lower page if the lower ->writepage returned
-	 * AOP_WRITEPAGE_ACTIVATE.  (This "anomalous" behaviour may be
-	 * addressed in future shmem/VM code.)
-	 */
-	if (err == AOP_WRITEPAGE_ACTIVATE) {
-		err = 0;
-		unlock_page(lower_page);
-	}
-
-	/* all is well */
-
-	/* lower mtimes have changed: update ours */
-	unionfs_copy_attr_times(inode);
-
-out_release:
-	/* b/c find_or_create_page increased refcnt */
-	page_cache_release(lower_page);
-out:
-	/*
-	 * We unlock our page unconditionally, because we never return
-	 * AOP_WRITEPAGE_ACTIVATE.
-	 */
-	unlock_page(page);
-	return err;
-}
 
-static int unionfs_writepages(struct address_space *mapping,
-			      struct writeback_control *wbc)
+/*
+ * XXX: we need a dummy readpage handler because generic_file_mmap (which we
+ * use in unionfs_mmap) checks for the existence of
+ * mapping->a_ops->readpage, else it returns -ENOEXEC.  The VFS will need to
+ * be fixed to allow a file system to define vm_ops->fault without any
+ * address_space_ops whatsoever.
+ *
+ * Otherwise, we don't want to use our readpage method at all.
+ */
+static int unionfs_readpage(struct file *file, struct page *page)
 {
-	int err = 0;
-	struct inode *lower_inode;
-	struct inode *inode;
-
-	inode = mapping->host;
-	if (ibstart(inode) < 0 && ibend(inode) < 0)
-		goto out;
-	lower_inode = unionfs_lower_inode(inode);
-	if (!lower_inode)
-		goto out;
-
-	err = generic_writepages(mapping, wbc);
-	if (!err)
-		unionfs_copy_attr_times(inode);
-out:
-	return err;
+	BUG();
+	return -EINVAL;
 }
 
-/* Readpage expects a locked page, and must unlock it */
-static int unionfs_readpage(struct file *file, struct page *page)
+static int unionfs_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	int err;
-	struct file *lower_file;
-	struct inode *inode;
-	mm_segment_t old_fs;
-	char *page_data = NULL;
-	mode_t orig_mode;
+	struct file *file, *lower_file;
+	struct vm_operations_struct *lower_vm_ops;
 
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
-	err = unionfs_file_revalidate(file, false);
-	if (unlikely(err))
-		goto out;
-	unionfs_check_file(file);
-
-	if (!UNIONFS_F(file)) {
-		err = -ENOENT;
-		goto out;
-	}
+	BUG_ON(!vma);
+	file = vma->vm_file;
+	lower_vm_ops = UNIONFS_F(file)->lower_vm_ops;
+	BUG_ON(!lower_vm_ops);
 
 	lower_file = unionfs_lower_file(file);
-	/* FIXME: is this assertion right here? */
-	BUG_ON(lower_file == NULL);
-
-	inode = file->f_path.dentry->d_inode;
-
-	page_data = (char *) kmap(page);
-	/*
-	 * Use vfs_read because some lower file systems don't have a
-	 * readpage method, and some file systems (esp. distributed ones)
-	 * don't like their pages to be accessed directly.  Using vfs_read
-	 * may be a little slower, but a lot safer, as the VFS does a lot of
-	 * the necessary magic for us.
-	 */
-	lower_file->f_pos = page_offset(page);
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	/*
-	 * generic_file_splice_write may call us on a file not opened for
-	 * reading, so temporarily allow reading.
-	 */
-	orig_mode = lower_file->f_mode;
-	lower_file->f_mode |= FMODE_READ;
-	err = vfs_read(lower_file, page_data, PAGE_CACHE_SIZE,
-		       &lower_file->f_pos);
-	lower_file->f_mode = orig_mode;
-	set_fs(old_fs);
-	if (err >= 0 && err < PAGE_CACHE_SIZE)
-		memset(page_data + err, 0, PAGE_CACHE_SIZE - err);
-	kunmap(page);
-
-	if (err < 0)
-		goto out;
-	err = 0;
-
-	/* if vfs_read succeeded above, sync up our times */
-	unionfs_copy_attr_times(inode);
-
-	flush_dcache_page(page);
-
+	BUG_ON(!lower_file);
 	/*
-	 * we have to unlock our page, b/c we _might_ have gotten a locked
-	 * page.  but we no longer have to wakeup on our page here, b/c
-	 * UnlockPage does it
+	 * XXX: we set the vm_file to the lower_file, before calling the
+	 * lower ->fault op, then we restore the vm_file back to the upper
+	 * file.  Need to change the ->fault prototype to take an explicit
+	 * struct file, and fix all users accordingly.
 	 */
-out:
-	if (err == 0)
-		SetPageUptodate(page);
-	else
-		ClearPageUptodate(page);
-
-	unlock_page(page);
-	unionfs_check_file(file);
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
-
-	return err;
-}
-
-static int unionfs_prepare_write(struct file *file, struct page *page,
-				 unsigned from, unsigned to)
-{
-	int err;
-
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
-	err = unionfs_file_revalidate(file, true);
-	if (!err) {
-		unionfs_copy_attr_times(file->f_path.dentry->d_inode);
-		unionfs_check_file(file);
-	}
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
-
+	vma->vm_file = lower_file;
+	err = lower_vm_ops->fault(vma, vmf);
+	vma->vm_file = file;
 	return err;
 }
 
-static int unionfs_commit_write(struct file *file, struct page *page,
-				unsigned from, unsigned to)
-{
-	int err = -ENOMEM;
-	struct inode *inode, *lower_inode;
-	struct file *lower_file = NULL;
-	unsigned bytes = to - from;
-	char *page_data = NULL;
-	mm_segment_t old_fs;
-
-	BUG_ON(file == NULL);
-
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
-	err = unionfs_file_revalidate(file, true);
-	if (unlikely(err))
-		goto out;
-	unionfs_check_file(file);
-
-	inode = page->mapping->host;
-
-	if (UNIONFS_F(file) != NULL)
-		lower_file = unionfs_lower_file(file);
-
-	/* FIXME: is this assertion right here? */
-	BUG_ON(lower_file == NULL);
-
-	page_data = (char *)kmap(page);
-	lower_file->f_pos = page_offset(page) + from;
-
-	/*
-	 * We use vfs_write instead of copying page data and the
-	 * prepare_write/commit_write combo because file system's like
-	 * GFS/OCFS2 don't like things touching those directly,
-	 * calling the underlying write op, while a little bit slower, will
-	 * call all the FS specific code as well
-	 */
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	err = vfs_write(lower_file, page_data + from, bytes,
-			&lower_file->f_pos);
-	set_fs(old_fs);
-
-	kunmap(page);
-
-	if (err < 0)
-		goto out;
-
-	/* if vfs_write succeeded above, sync up our times/sizes */
-	lower_inode = lower_file->f_path.dentry->d_inode;
-	if (!lower_inode)
-		lower_inode = unionfs_lower_inode(inode);
-	BUG_ON(!lower_inode);
-	fsstack_copy_inode_size(inode, lower_inode);
-	unionfs_copy_attr_times(inode);
-	mark_inode_dirty_sync(inode);
-
-out:
-	if (err < 0)
-		ClearPageUptodate(page);
-
-	unionfs_check_file(file);
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
-	return err;		/* assume all is ok */
-}
-
 /*
- * Although unionfs isn't a block-based file system, it may stack on one.
- * ->bmap is needed, for example, to swapon(2) files.
+ * XXX: the default address_space_ops for unionfs is empty.  We cannot set
+ * our inode->i_mapping->a_ops to NULL because too many code paths expect
+ * the a_ops vector to be non-NULL.
  */
-sector_t unionfs_bmap(struct address_space *mapping, sector_t block)
-{
-	int err = -EINVAL;
-	struct inode *inode, *lower_inode;
-	sector_t (*bmap)(struct address_space *, sector_t);
-
-	inode = (struct inode *)mapping->host;
-	lower_inode = unionfs_lower_inode(inode);
-	if (!lower_inode)
-		goto out;
-	bmap = lower_inode->i_mapping->a_ops->bmap;
-	if (bmap)
-		err = bmap(lower_inode->i_mapping, block);
-out:
-	return err;
-}
-
-
 struct address_space_operations unionfs_aops = {
-	.writepage	= unionfs_writepage,
-	.writepages	= unionfs_writepages,
+	/* empty on purpose */
+};
+
+/*
+ * XXX: we need a second, dummy address_space_ops vector, to be used
+ * temporarily during unionfs_mmap, because the latter calls
+ * generic_file_mmap, which checks if ->readpage exists, else returns
+ * -ENOEXEC.
+ */
+struct address_space_operations unionfs_dummy_aops = {
 	.readpage	= unionfs_readpage,
-	.prepare_write	= unionfs_prepare_write,
-	.commit_write	= unionfs_commit_write,
-	.bmap		= unionfs_bmap,
+};
+
+struct vm_operations_struct unionfs_vm_ops = {
+	.fault		= unionfs_fault,
 };
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index 7b55e33..f6fabf8 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -75,7 +75,8 @@ extern struct inode_operations unionfs_dir_iops;
 extern struct inode_operations unionfs_symlink_iops;
 extern struct super_operations unionfs_sops;
 extern struct dentry_operations unionfs_dops;
-extern struct address_space_operations unionfs_aops;
+extern struct address_space_operations unionfs_aops, unionfs_dummy_aops;
+extern struct vm_operations_struct unionfs_vm_ops;
 
 /* How long should an entry be allowed to persist */
 #define RDCACHE_JIFFIES	(5*HZ)
@@ -89,6 +90,7 @@ struct unionfs_file_info {
 	struct unionfs_dir_state *rdstate;
 	struct file **lower_files;
 	int *saved_branch_ids; /* IDs of branches when file was opened */
+	struct vm_operations_struct *lower_vm_ops;
 };
 
 /* unionfs inode data in memory */
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 05/14] Unionfs: lock our dentry in file operations
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (3 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 04/14] Unionfs: implement vm_operations->fault Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 06/14] Unionfs: reduce number of whiteouts by deleting all instances of files Erez Zadok
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/commonfops.c |   32 ++++++++++-----------------
 fs/unionfs/dirfops.c    |   18 ++++++++++-----
 fs/unionfs/file.c       |   55 +++++++++++++++++++++++++++++------------------
 fs/unionfs/union.h      |    1 -
 4 files changed, 58 insertions(+), 48 deletions(-)

diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
index 2add167..50f4eda 100644
--- a/fs/unionfs/commonfops.c
+++ b/fs/unionfs/commonfops.c
@@ -302,7 +302,7 @@ out:
  * @willwrite: true if caller may cause changes to the file; false otherwise.
  * Caller must lock/unlock dentry's branch configuration.
  */
-int unionfs_file_revalidate_locked(struct file *file, bool willwrite)
+int unionfs_file_revalidate(struct file *file, bool willwrite)
 {
 	struct super_block *sb;
 	struct dentry *dentry;
@@ -312,6 +312,7 @@ int unionfs_file_revalidate_locked(struct file *file, bool willwrite)
 	int err = 0;
 
 	dentry = file->f_path.dentry;
+	verify_locked(dentry);
 	sb = dentry->d_sb;
 
 	/*
@@ -419,17 +420,6 @@ out_nofree:
 	return err;
 }
 
-int unionfs_file_revalidate(struct file *file, bool willwrite)
-{
-	int err;
-
-	unionfs_lock_dentry(file->f_path.dentry, UNIONFS_DMUTEX_CHILD);
-	err = unionfs_file_revalidate_locked(file, willwrite);
-	unionfs_unlock_dentry(file->f_path.dentry);
-
-	return err;
-}
-
 /* unionfs_open helper function: open a directory */
 static int __open_dir(struct inode *inode, struct file *file)
 {
@@ -632,6 +622,8 @@ int unionfs_file_release(struct inode *inode, struct file *file)
 	int fgen, err = 0;
 
 	unionfs_read_lock(sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
+
 	/*
 	 * Yes, we have to revalidate this file even if it's being released.
 	 * This is important for open-but-unlinked files, as well as mmap
@@ -650,7 +642,6 @@ int unionfs_file_release(struct inode *inode, struct file *file)
 	bstart = fbstart(file);
 	bend = fbend(file);
 
-	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 	for (bindex = bstart; bindex <= bend; bindex++) {
 		lower_file = unionfs_lower_file_idx(file, bindex);
 
@@ -665,7 +656,6 @@ int unionfs_file_release(struct inode *inode, struct file *file)
 			unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
 		}
 	}
-	unionfs_unlock_dentry(dentry);
 
 	kfree(fileinfo->lower_files);
 	kfree(fileinfo->saved_branch_ids);
@@ -683,6 +673,7 @@ int unionfs_file_release(struct inode *inode, struct file *file)
 	kfree(fileinfo);
 
 out:
+	unionfs_unlock_dentry(dentry);
 	unionfs_read_unlock(sb);
 	return err;
 }
@@ -729,7 +720,6 @@ static int unionfs_ioctl_queryfile(struct file *file, unsigned int cmd,
 	struct vfsmount *mnt;
 
 	dentry = file->f_path.dentry;
-	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 	orig_bstart = dbstart(dentry);
 	orig_bend = dbend(dentry);
 	err = unionfs_partial_lookup(dentry);
@@ -771,15 +761,16 @@ static int unionfs_ioctl_queryfile(struct file *file, unsigned int cmd,
 		err = -EFAULT;
 
 out:
-	unionfs_unlock_dentry(dentry);
 	return err < 0 ? err : bend;
 }
 
 long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
 	long err;
+	struct dentry *dentry = file->f_path.dentry;
 
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 
 	err = unionfs_file_revalidate(file, true);
 	if (unlikely(err))
@@ -807,7 +798,8 @@ long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 
 out:
 	unionfs_check_file(file);
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(dentry->d_sb);
 	return err;
 }
 
@@ -821,7 +813,7 @@ int unionfs_flush(struct file *file, fl_owner_t id)
 	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
 	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 
-	err = unionfs_file_revalidate_locked(file, true);
+	err = unionfs_file_revalidate(file, true);
 	if (unlikely(err))
 		goto out;
 	unionfs_check_file(file);
@@ -843,7 +835,7 @@ int unionfs_flush(struct file *file, fl_owner_t id)
 out:
 	if (!err)
 		unionfs_check_file(file);
-	unionfs_unlock_dentry(file->f_path.dentry);
+	unionfs_unlock_dentry(dentry);
 	unionfs_read_unlock(dentry->d_sb);
 	return err;
 }
diff --git a/fs/unionfs/dirfops.c b/fs/unionfs/dirfops.c
index a613862..8272fb6 100644
--- a/fs/unionfs/dirfops.c
+++ b/fs/unionfs/dirfops.c
@@ -97,19 +97,21 @@ static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
 {
 	int err = 0;
 	struct file *lower_file = NULL;
+	struct dentry *dentry = file->f_path.dentry;
 	struct inode *inode = NULL;
 	struct unionfs_getdents_callback buf;
 	struct unionfs_dir_state *uds;
 	int bend;
 	loff_t offset;
 
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 
 	err = unionfs_file_revalidate(file, false);
 	if (unlikely(err))
 		goto out;
 
-	inode = file->f_path.dentry->d_inode;
+	inode = dentry->d_inode;
 
 	uds = UNIONFS_F(file)->rdstate;
 	if (!uds) {
@@ -189,7 +191,8 @@ static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
 	}
 
 out:
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(dentry->d_sb);
 	return err;
 }
 
@@ -206,9 +209,11 @@ out:
 static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
 {
 	struct unionfs_dir_state *rdstate;
+	struct dentry *dentry = file->f_path.dentry;
 	loff_t err;
 
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 
 	err = unionfs_file_revalidate(file, false);
 	if (unlikely(err))
@@ -250,7 +255,7 @@ static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
 					err = -EINVAL;
 			} else {
 				struct inode *inode;
-				inode = file->f_path.dentry->d_inode;
+				inode = dentry->d_inode;
 				rdstate = find_rdstate(inode, offset);
 				if (rdstate) {
 					UNIONFS_F(file)->rdstate = rdstate;
@@ -269,7 +274,8 @@ static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
 	}
 
 out:
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(dentry->d_sb);
 	return err;
 }
 
diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
index 9397ff3..0a39387 100644
--- a/fs/unionfs/file.c
+++ b/fs/unionfs/file.c
@@ -27,7 +27,7 @@ static ssize_t unionfs_read(struct file *file, char __user *buf,
 
 	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
 	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
-	err = unionfs_file_revalidate_locked(file, false);
+	err = unionfs_file_revalidate(file, false);
 	if (unlikely(err))
 		goto out;
 
@@ -55,7 +55,7 @@ static ssize_t unionfs_write(struct file *file, const char __user *buf,
 
 	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
 	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
-	err = unionfs_file_revalidate_locked(file, true);
+	err = unionfs_file_revalidate(file, true);
 	if (unlikely(err))
 		goto out;
 
@@ -94,9 +94,11 @@ static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
 	int err = 0;
 	bool willwrite;
 	struct file *lower_file;
+	struct dentry *dentry = file->f_path.dentry;
 	struct vm_operations_struct *saved_vm_ops = NULL;
 
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 
 	/* This might be deferred to mmap's writepage */
 	willwrite = ((vma->vm_flags | VM_SHARED | VM_WRITE) == vma->vm_flags);
@@ -157,10 +159,11 @@ static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
 out:
 	if (!err) {
 		/* copyup could cause parent dir times to change */
-		unionfs_copy_attr_times(file->f_path.dentry->d_parent->d_inode);
+		unionfs_copy_attr_times(dentry->d_parent->d_inode);
 		unionfs_check_file(file);
 	}
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(dentry->d_sb);
 	return err;
 }
 
@@ -172,7 +175,8 @@ int unionfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 	struct inode *lower_inode, *inode;
 	int err = -EINVAL;
 
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 	err = unionfs_file_revalidate(file, true);
 	if (unlikely(err))
 		goto out;
@@ -207,8 +211,10 @@ int unionfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 	unionfs_copy_attr_times(inode);
 
 out:
-	unionfs_check_file(file);
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	if (!err)
+		unionfs_check_file(file);
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(dentry->d_sb);
 	return err;
 }
 
@@ -216,11 +222,12 @@ int unionfs_fasync(int fd, struct file *file, int flag)
 {
 	int bindex, bstart, bend;
 	struct file *lower_file;
-	struct dentry *dentry;
+	struct dentry *dentry = file->f_path.dentry;
 	struct inode *lower_inode, *inode;
 	int err = 0;
 
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 	err = unionfs_file_revalidate(file, true);
 	if (unlikely(err))
 		goto out;
@@ -231,7 +238,6 @@ int unionfs_fasync(int fd, struct file *file, int flag)
 	if (bstart < 0 || bend < 0)
 		goto out;
 
-	dentry = file->f_path.dentry;
 	inode = dentry->d_inode;
 	if (unlikely(!inode)) {
 		printk(KERN_ERR
@@ -253,8 +259,10 @@ int unionfs_fasync(int fd, struct file *file, int flag)
 	unionfs_copy_attr_times(inode);
 
 out:
-	unionfs_check_file(file);
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	if (!err)
+		unionfs_check_file(file);
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(dentry->d_sb);
 	return err;
 }
 
@@ -264,8 +272,10 @@ static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
 {
 	ssize_t err;
 	struct file *lower_file;
+	struct dentry *dentry = file->f_path.dentry;
 
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 	err = unionfs_file_revalidate(file, false);
 	if (unlikely(err))
 		goto out;
@@ -274,14 +284,14 @@ static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
 	err = vfs_splice_to(lower_file, ppos, pipe, len, flags);
 	/* update our inode atime upon a successful lower splice-read */
 	if (err >= 0) {
-		fsstack_copy_attr_atime(file->f_path.dentry->d_inode,
+		fsstack_copy_attr_atime(dentry->d_inode,
 					lower_file->f_path.dentry->d_inode);
 		unionfs_check_file(file);
 	}
 
 out:
-	unionfs_check_file(file);
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(dentry->d_sb);
 	return err;
 }
 
@@ -291,8 +301,10 @@ static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
 {
 	ssize_t err = 0;
 	struct file *lower_file;
+	struct dentry *dentry = file->f_path.dentry;
 
-	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
 	err = unionfs_file_revalidate(file, true);
 	if (unlikely(err))
 		goto out;
@@ -301,15 +313,16 @@ static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
 	err = vfs_splice_from(pipe, lower_file, ppos, len, flags);
 	/* update our inode times+sizes upon a successful lower write */
 	if (err >= 0) {
-		fsstack_copy_inode_size(file->f_path.dentry->d_inode,
+		fsstack_copy_inode_size(dentry->d_inode,
 					lower_file->f_path.dentry->d_inode);
-		fsstack_copy_attr_times(file->f_path.dentry->d_inode,
+		fsstack_copy_attr_times(dentry->d_inode,
 					lower_file->f_path.dentry->d_inode);
 		unionfs_check_file(file);
 	}
 
 out:
-	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(dentry->d_sb);
 	return err;
 }
 
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index f6fabf8..63f3b21 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -355,7 +355,6 @@ extern int unionfs_setlk(struct file *file, int cmd, struct file_lock *fl);
 extern int unionfs_getlk(struct file *file, struct file_lock *fl);
 
 /* Common file operations. */
-extern int unionfs_file_revalidate_locked(struct file *file, bool willwrite);
 extern int unionfs_file_revalidate(struct file *file, bool willwrite);
 extern int unionfs_open(struct inode *inode, struct file *file);
 extern int unionfs_file_release(struct inode *inode, struct file *file);
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 06/14] Unionfs: reduce number of whiteouts by deleting all instances of files
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (4 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 05/14] Unionfs: lock our dentry in file operations Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 07/14] Unionfs: don't copy parent inode times in setattr Erez Zadok
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok,
	Himanshu Kanda

Optimize the unlinking of non-dir objects in unionfs by deleting all
possible lower inode objects from all writable lower branches.  This may
consume a bit more processing, but on average reduces overall inode
consumption and further saves a lot by reducing the total number of
whiteouts needed.  We create a whiteout now only if we could not delete all
lower objects, or if one of the lower branches was explicitly marked
read-only.

Signed-off-by: Himanshu Kanda <hkanda@cs.sunysb.edu>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 Documentation/filesystems/unionfs/concepts.txt |   25 ++++++
 fs/unionfs/lookup.c                            |    8 +-
 fs/unionfs/unlink.c                            |  107 +++++++++++++++---------
 3 files changed, 97 insertions(+), 43 deletions(-)

diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
index 8d9a1c5..0bc1a19 100644
--- a/Documentation/filesystems/unionfs/concepts.txt
+++ b/Documentation/filesystems/unionfs/concepts.txt
@@ -62,6 +62,31 @@ simplest solution is to take the instance from the highest priority
 (numerically lowest value) and "hide" the others.
 
 
+Unlinking:
+=========
+
+Unlink operation on non-directory instances is optimized to remove the
+maximum possible objects in case multiple underlying branches have the same
+file name.  The unlink operation will first try to delete file instances
+from highest priority branch and then move further to delete from remaining
+branches in order of their decreasing priority.  Consider a case (F..D..F),
+where F is a file and D is a directory of the same name; here, some
+intermediate branch could have an empty directory instance with the same
+name, so this operation also tries to delete this directory instance and
+proceed further to delete from next possible lower priority branch.  The
+unionfs unlink operation will smoothly delete the files with same name from
+all possible underlying branches.  In case if some error occurs, it creates
+whiteout in highest priority branch that will hide file instance in rest of
+the branches.  An error could occur either if an unlink operations in any of
+the underlying branch failed or if a branch has no write permission.
+
+This unlinking policy is known as "delete all" and it has the benefit of
+overall reducing the number of inodes used by duplicate files, and further
+reducing the total number of inodes consumed by whiteouts.  The cost is of
+extra processing, but testing shows this extra processing is well worth the
+savings.
+
+
 Copyup:
 =======
 
diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
index 755158e..7618716 100644
--- a/fs/unionfs/lookup.c
+++ b/fs/unionfs/lookup.c
@@ -264,7 +264,9 @@ struct dentry *unionfs_lookup_backend(struct dentry *dentry,
 		 * branches, but we have to skip non-dirs (to avoid, say,
 		 * calling readdir on a regular file).
 		 */
-		if (!S_ISDIR(lower_dentry->d_inode->i_mode) && dentry_count) {
+		if ((lookupmode != INTERPOSE_PARTIAL) &&
+		    !S_ISDIR(lower_dentry->d_inode->i_mode) &&
+		    dentry_count) {
 			dput(lower_dentry);
 			continue;
 		}
@@ -295,10 +297,6 @@ struct dentry *unionfs_lookup_backend(struct dentry *dentry,
 				continue;
 			if (dentry_count == 1)
 				goto out_positive;
-			/* This can only happen with mixed D-*-F-* */
-			BUG_ON(!S_ISDIR(unionfs_lower_dentry(dentry)->
-					d_inode->i_mode));
-			continue;
 		}
 
 		opaque = is_opaque_dir(dentry, bindex);
diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
index 6e93da3..c66bb3e 100644
--- a/fs/unionfs/unlink.c
+++ b/fs/unionfs/unlink.c
@@ -18,7 +18,32 @@
 
 #include "union.h"
 
-/* unlink a file by creating a whiteout */
+/*
+ * Helper function for Unionfs's unlink operation.
+ *
+ * The main goal of this function is to optimize the unlinking of non-dir
+ * objects in unionfs by deleting all possible lower inode objects from the
+ * underlying branches having same dentry name as the non-dir dentry on
+ * which this unlink operation is called.  This way we delete as many lower
+ * inodes as possible, and save space.  Whiteouts need to be created in
+ * branch0 only if unlinking fails on any of the lower branch other than
+ * branch0, or if a lower branch is marked read-only.
+ *
+ * Also, while unlinking a file, if we encounter any dir type entry in any
+ * intermediate branch, then we remove the directory by calling vfs_rmdir.
+ * The following special cases are also handled:
+
+ * (1) If an error occurs in branch0 during vfs_unlink, then we return
+ *     appropriate error.
+ *
+ * (2) If we get an error during unlink in any of other lower branch other
+ *     than branch0, then we create a whiteout in branch0.
+ *
+ * (3) If a whiteout already exists in any intermediate branch, we delete
+ *     all possible inodes only up to that branch (this is an "opaqueness"
+ *     as as per Documentation/filesystems/unionfs/concepts.txt).
+ *
+ */
 static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry)
 {
 	struct dentry *lower_dentry;
@@ -30,51 +55,57 @@ static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry)
 	if (err)
 		goto out;
 
-	bindex = dbstart(dentry);
-
-	lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
-	if (!lower_dentry)
-		goto out;
+	/* trying to unlink all possible valid instances */
+	for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
+		lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+		if (!lower_dentry || !lower_dentry->d_inode)
+			continue;
+
+		lower_dir_dentry = lock_parent(lower_dentry);
+
+		/* avoid destroying the lower inode if the object is in use */
+		dget(lower_dentry);
+		err = is_robranch_super(dentry->d_sb, bindex);
+		if (!err) {
+			/* see Documentation/filesystems/unionfs/issues.txt */
+			lockdep_off();
+			if (!S_ISDIR(lower_dentry->d_inode->i_mode))
+				err = vfs_unlink(lower_dir_dentry->d_inode,
+								lower_dentry);
+			else
+				err = vfs_rmdir(lower_dir_dentry->d_inode,
+								lower_dentry);
+			lockdep_on();
+		}
 
-	lower_dir_dentry = lock_parent(lower_dentry);
+		/* if lower object deletion succeeds, update inode's times */
+		if (!err)
+			unionfs_copy_attr_times(dentry->d_inode);
+		dput(lower_dentry);
+		fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
+		unlock_dir(lower_dir_dentry);
 
-	/* avoid destroying the lower inode if the file is in use */
-	dget(lower_dentry);
-	err = is_robranch_super(dentry->d_sb, bindex);
-	if (!err) {
-		/* see Documentation/filesystems/unionfs/issues.txt */
-		lockdep_off();
-		err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
-		lockdep_on();
+		if (err)
+			break;
 	}
-	/* if vfs_unlink succeeded, update our inode's times */
-	if (!err)
-		unionfs_copy_attr_times(dentry->d_inode);
-	dput(lower_dentry);
-	fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
-	unlock_dir(lower_dir_dentry);
-
-	if (err && !IS_COPYUP_ERR(err))
-		goto out;
 
 	/*
-	 * We create whiteouts if (1) there was an error unlinking the main
-	 * file; (2) there is a lower priority file with the same name
-	 * (dbopaque); (3) the branch in which the file is not the last
-	 * (rightmost0 branch.  The last rule is an optimization to avoid
-	 * creating all those whiteouts if there's no chance they'd be
-	 * masking any lower-priority branch, as well as unionfs is used
-	 * with only one branch (using only one branch, while odd, is still
-	 * possible).
+	 * Create the whiteout in branch 0 (highest priority) only if (a)
+	 * there was an error in any intermediate branch other than branch 0
+	 * due to failure of vfs_unlink/vfs_rmdir or (b) a branch marked or
+	 * mounted read-only.
 	 */
 	if (err) {
-		if (dbstart(dentry) == 0)
+		if ((bindex == 0) ||
+		    ((bindex == dbstart(dentry)) &&
+		     (!IS_COPYUP_ERR(err))))
 			goto out;
-		err = create_whiteout(dentry, dbstart(dentry) - 1);
-	} else if (dbopaque(dentry) != -1) {
-		err = create_whiteout(dentry, dbopaque(dentry));
-	} else if (dbstart(dentry) < sbend(dentry->d_sb)) {
-		err = create_whiteout(dentry, dbstart(dentry));
+		else {
+			if (!IS_COPYUP_ERR(err))
+				pr_debug("unionfs: lower object deletion "
+					     "failed in branch:%d\n", bindex);
+			err = create_whiteout(dentry, sbstart(dentry->d_sb));
+		}
 	}
 
 out:
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 07/14] Unionfs: don't copy parent inode times in setattr
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (5 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 06/14] Unionfs: reduce number of whiteouts by deleting all instances of files Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 08/14] Unionfs: use __func__ instead of __FUNCTION__ Erez Zadok
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/inode.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
index 640969d..0dc07ec 100644
--- a/fs/unionfs/inode.c
+++ b/fs/unionfs/inode.c
@@ -1064,8 +1064,7 @@ static int unionfs_setattr(struct dentry *dentry, struct iattr *ia)
 	if (ia->ia_valid & ATTR_ATIME_SET)
 		inode->i_atime = lower_inode->i_atime;
 	fsstack_copy_inode_size(inode, lower_inode);
-	/* if setattr succeeded, then parent dir may have changed */
-	unionfs_copy_attr_times(dentry->d_parent->d_inode);
+
 out:
 	if (!err)
 		unionfs_check_dentry(dentry);
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 08/14] Unionfs: use __func__ instead of __FUNCTION__
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (6 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 07/14] Unionfs: don't copy parent inode times in setattr Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 09/14] Unionfs: use noinline_for_stack Erez Zadok
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/dentry.c |    2 +-
 fs/unionfs/union.h  |   18 +++++++++---------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
index f8f65e1..5e498c2 100644
--- a/fs/unionfs/dentry.c
+++ b/fs/unionfs/dentry.c
@@ -406,7 +406,7 @@ bool __unionfs_d_revalidate_chain(struct dentry *dentry, struct nameidata *nd,
 	chain = kzalloc(chain_len * sizeof(struct dentry *), GFP_KERNEL);
 	if (unlikely(!chain)) {
 		printk(KERN_CRIT "unionfs: no more memory in %s\n",
-		       __FUNCTION__);
+		       __func__);
 		goto out;
 	}
 
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index 63f3b21..e93b9ef 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -557,24 +557,24 @@ static inline void unionfs_mntput(struct dentry *dentry, int bindex)
 #ifdef CONFIG_UNION_FS_DEBUG
 
 /* useful for tracking code reachability */
-#define UDBG pr_debug("DBG:%s:%s:%d\n", __FILE__, __FUNCTION__, __LINE__)
+#define UDBG pr_debug("DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__)
 
 #define unionfs_check_inode(i)	__unionfs_check_inode((i),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define unionfs_check_dentry(d)	__unionfs_check_dentry((d),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define unionfs_check_file(f)	__unionfs_check_file((f),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define unionfs_check_nd(n)	__unionfs_check_nd((n),		\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define show_branch_counts(sb)	__show_branch_counts((sb),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define show_inode_times(i)	__show_inode_times((i),		\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define show_dinode_times(d)	__show_dinode_times((d),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define show_inode_counts(i)	__show_inode_counts((i),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 
 extern void __unionfs_check_inode(const struct inode *inode, const char *fname,
 				  const char *fxn, int line);
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 09/14] Unionfs: use noinline_for_stack
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (7 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 08/14] Unionfs: use __func__ instead of __FUNCTION__ Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 10/14] Unionfs: document reasons for opaque directories Erez Zadok
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/lookup.c |    2 +-
 fs/unionfs/super.c  |   24 ++++++++++++++----------
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
index 7618716..7f512c2 100644
--- a/fs/unionfs/lookup.c
+++ b/fs/unionfs/lookup.c
@@ -32,7 +32,7 @@ static int is_validname(const char *name)
 }
 
 /* The rest of these are utility functions for lookup. */
-static noinline int is_opaque_dir(struct dentry *dentry, int bindex)
+static noinline_for_stack int is_opaque_dir(struct dentry *dentry, int bindex)
 {
 	int err = 0;
 	struct dentry *lower_dentry;
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index fba1598..2456654 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -189,9 +189,11 @@ out:
 }
 
 /* handle mode changing during remount */
-static noinline int do_remount_mode_option(char *optarg, int cur_branches,
-					   struct unionfs_data *new_data,
-					   struct path *new_lower_paths)
+static noinline_for_stack int do_remount_mode_option(
+					char *optarg,
+					int cur_branches,
+					struct unionfs_data *new_data,
+					struct path *new_lower_paths)
 {
 	int err = -EINVAL;
 	int perms, idx;
@@ -250,9 +252,10 @@ out:
 }
 
 /* handle branch deletion during remount */
-static noinline int do_remount_del_option(char *optarg, int cur_branches,
-					  struct unionfs_data *new_data,
-					  struct path *new_lower_paths)
+static noinline_for_stack int do_remount_del_option(
+					char *optarg, int cur_branches,
+					struct unionfs_data *new_data,
+					struct path *new_lower_paths)
 {
 	int err = -EINVAL;
 	int idx;
@@ -313,10 +316,11 @@ out:
 }
 
 /* handle branch insertion during remount */
-static noinline int do_remount_add_option(char *optarg, int cur_branches,
-					  struct unionfs_data *new_data,
-					  struct path *new_lower_paths,
-					  int *high_branch_id)
+static noinline_for_stack int do_remount_add_option(
+					char *optarg, int cur_branches,
+					struct unionfs_data *new_data,
+					struct path *new_lower_paths,
+					int *high_branch_id)
 {
 	int err = -EINVAL;
 	int perms;
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 10/14] Unionfs: document reasons for opaque directories
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (8 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 09/14] Unionfs: use noinline_for_stack Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 11/14] Unionfs: display mount point name along with generation number Erez Zadok
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 Documentation/filesystems/unionfs/concepts.txt |   36 ++++++++++++++++++++++++
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
index 0bc1a19..b853788 100644
--- a/Documentation/filesystems/unionfs/concepts.txt
+++ b/Documentation/filesystems/unionfs/concepts.txt
@@ -52,6 +52,42 @@ Later, when Unionfs traverses branches (due to lookup or readdir), it
 eliminate 'foo' from the namespace (as well as the whiteout itself.)
 
 
+Opaque Directories:
+===================
+
+Assume we have a unionfs mount comprising of two branches.  Branch 0 is
+empty; branch 1 has the directory /a and file /a/f.  Let's say we mount a
+union of branch 0 as read-write and branch 1 as read-only.  Now, let's say
+we try to perform the following operation in the union:
+
+	rm -fr a
+
+Because branch 1 is not writable, we cannot physically remove the file /a/f
+or the directory /a.  So instead, we will create a whiteout in branch 0
+named /.wh.a, masking out the name "a" from branch 1.  Next, let's say we
+try to create a directory named "a" as follows:
+
+	mkdir a
+
+Because we have a whiteout for "a" already, Unionfs behaves as if "a"
+doesn't exist, and thus will delete the whiteout and replace it with an
+actual directory named "a".
+
+The problem now is that if you try to "ls" in the union, Unionfs will
+perform is normal directory name unification, for *all* directories named
+"a" in all branches.  This will cause the file /a/f from branch 1 to
+re-appear in the union's namespace, which violates Unix semantics.
+
+To avoid this problem, we have a different form of whiteouts for
+directories, called "opaque directories" (same as BSD Union Mount does).
+Whenever we replace a whiteout with a directory, that directory is marked as
+opaque.  In Unionfs 2.x, it means that we create a file named
+/a/.wh.__dir_opaque in branch 0, after having created directory /a there.
+When unionfs notices that a directory is opaque, it stops all namespace
+operations (including merging readdir contents) at that opaque directory.
+This prevents re-exposing names from masked out directories.
+
+
 Duplicate Elimination:
 ======================
 
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 11/14] Unionfs: display mount point name along with generation number
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (9 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 10/14] Unionfs: document reasons for opaque directories Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 12/14] Unionfs: do not over-decrement lower superblock refs on remount Erez Zadok
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok, Dave Miller

CC: Dave Miller <justdave@mozilla.com>

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/main.c  |    7 ++++++-
 fs/unionfs/super.c |    4 +++-
 fs/unionfs/union.h |    1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/unionfs/main.c b/fs/unionfs/main.c
index 8f59fb5..b76264a 100644
--- a/fs/unionfs/main.c
+++ b/fs/unionfs/main.c
@@ -736,7 +736,12 @@ static int unionfs_get_sb(struct file_system_type *fs_type,
 			  int flags, const char *dev_name,
 			  void *raw_data, struct vfsmount *mnt)
 {
-	return get_sb_nodev(fs_type, flags, raw_data, unionfs_read_super, mnt);
+	int err;
+	err = get_sb_nodev(fs_type, flags, raw_data, unionfs_read_super, mnt);
+	if (!err)
+		UNIONFS_SB(mnt->mnt_sb)->dev_name =
+			kstrdup(dev_name, GFP_KERNEL);
+	return err;
 }
 
 static struct file_system_type unionfs_fs_type = {
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index 2456654..e5cb235 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -134,6 +134,7 @@ static void unionfs_put_super(struct super_block *sb)
 		atomic_dec(&s->s_active);
 	}
 
+	kfree(spd->dev_name);
 	kfree(spd->data);
 	kfree(spd);
 	sb->s_fs_info = NULL;
@@ -805,7 +806,8 @@ out_no_change:
 	atomic_set(&UNIONFS_D(sb->s_root)->generation, i);
 	atomic_set(&UNIONFS_I(sb->s_root->d_inode)->generation, i);
 	if (!(*flags & MS_SILENT))
-		pr_info("unionfs: new generation number %d\n", i);
+		pr_info("unionfs: %s: new generation number %d\n",
+			UNIONFS_SB(sb)->dev_name, i);
 	/* finally, update the root dentry's times */
 	unionfs_copy_attr_times(sb->s_root->d_inode);
 	err = 0;		/* reset to success */
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index e93b9ef..4ca4e76 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -159,6 +159,7 @@ struct unionfs_sb_info {
 	struct rw_semaphore rwsem;
 	pid_t write_lock_owner;	/* PID of rw_sem owner (write lock) */
 	int high_branch_id;	/* last unique branch ID given */
+	char *dev_name;		/* to identify different unions in pr_debug */
 	struct unionfs_data *data;
 };
 
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 12/14] Unionfs: do not over-decrement lower superblock refs on remount
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (10 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 11/14] Unionfs: display mount point name along with generation number Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 13/14] Unionfs: don't purge lower sb data " Erez Zadok
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/super.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index e5cb235..4cddc83 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -755,7 +755,7 @@ out_no_change:
 	/* grab new lower super references; release old ones */
 	for (i = 0; i < new_branches; i++)
 		atomic_inc(&new_data[i].sb->s_active);
-	for (i = 0; i < new_branches; i++)
+	for (i = 0; i < sbmax(sb); i++)
 		atomic_dec(&UNIONFS_SB(sb)->data[i].sb->s_active);
 
 	/* copy new vectors into their correct place */
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 13/14] Unionfs: don't purge lower sb data on remount
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (11 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 12/14] Unionfs: do not over-decrement lower superblock refs on remount Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-01 21:06 ` [PATCH 14/14] Unionfs: update lower mnts on rmdir with copyup Erez Zadok
  2008-04-02 13:48 ` [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Tomas M
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

This is no longer needed, as we don't have upper and lower pages.  Plus this
was racy, requiring the unexported inode_lock variable.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/dentry.c |   11 -----------
 fs/unionfs/super.c  |    8 +++++---
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
index 5e498c2..ee0da4f 100644
--- a/fs/unionfs/dentry.c
+++ b/fs/unionfs/dentry.c
@@ -273,17 +273,6 @@ static inline void purge_inode_data(struct inode *inode)
 	 */
 }
 
-void purge_sb_data(struct super_block *sb)
-{
-	struct inode *inode;
-
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		if (inode->i_state & (I_FREEING|I_WILL_FREE))
-			continue;
-		purge_inode_data(inode);
-	}
-}
-
 /*
  * Revalidate a single file/symlink/special dentry.  Assume that info nodes
  * of the dentry and its parent are locked.  Assume that parent(s) are all
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index 4cddc83..82b4045 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -747,10 +747,12 @@ out_no_change:
 	 * function).  This revalidation will happen lazily and
 	 * incrementally, as users perform operations on cached inodes.  We
 	 * would like to encourage this revalidation to happen sooner if
-	 * possible, so we try to invalidate as many other pages in our
-	 * superblock as we can.
+	 * possible, so we like to try to invalidate as many other pages in
+	 * our superblock as we can.  We used to call drop_pagecache_sb() or
+	 * a variant thereof, but either method was racy (drop_caches alone
+	 * is known to be racy).  So now we let the revalidation happen on a
+	 * per file basis in ->d_revalidate.
 	 */
-	purge_sb_data(sb);
 
 	/* grab new lower super references; release old ones */
 	for (i = 0; i < new_branches; i++)
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 14/14] Unionfs: update lower mnts on rmdir with copyup
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (12 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 13/14] Unionfs: don't purge lower sb data " Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
  2008-04-02 13:48 ` [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Tomas M
  14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/unlink.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
index c66bb3e..cad0386 100644
--- a/fs/unionfs/unlink.c
+++ b/fs/unionfs/unlink.c
@@ -280,6 +280,8 @@ out:
 		    !unionfs_lower_inode_idx(inode, dend))
 			ibstart(inode) = ibend(inode) = -1;
 		d_drop(dentry);
+		/* update our lower vfsmnts, in case a copyup took place */
+		unionfs_postcopyup_setmnt(dentry);
 	}
 
 	if (namelist)
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups
  2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
                   ` (13 preceding siblings ...)
  2008-04-01 21:06 ` [PATCH 14/14] Unionfs: update lower mnts on rmdir with copyup Erez Zadok
@ 2008-04-02 13:48 ` Tomas M
  2008-04-02 16:54   ` Erez Zadok
  14 siblings, 1 reply; 17+ messages in thread
From: Tomas M @ 2008-04-02 13:48 UTC (permalink / raw)
  To: Erez Zadok; +Cc: akpm, linux-kernel, linux-fsdevel, viro, hch

> 1. Based on recommendations we've received while at LSF'08, we've changed
>    Unionfs to use the vm_operations->fault method, instead of
>    address_space_operations.  This makes unionfs's code smaller, runs faster
>    (reduce memory pressure), and removes a number of potential races.  This
>    change is significant ...

Interesting to see you're finally switching to the approach,
which is in AUFS for ages.

May I ask you, why did it took so long?
Mainly, why did you refused to make the changes last year?

Thank you


Tomas M

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups
  2008-04-02 13:48 ` [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Tomas M
@ 2008-04-02 16:54   ` Erez Zadok
  0 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-02 16:54 UTC (permalink / raw)
  To: Tomas M; +Cc: Erez Zadok, akpm, linux-kernel, linux-fsdevel, viro, hch

In message <47F38E90.20306@slax.org>, Tomas M writes:

> > 1. Based on recommendations we've received while at LSF'08, we've changed
> >    Unionfs to use the vm_operations->fault method, instead of
> >    address_space_operations.
[...]
> May I ask you, why did it took so long?
> Mainly, why did you refused to make the changes last year?
> 
> Thank you
> 
> Tomas M

The current ->fault API requires (IMHO) an ugly hack such as this:

1. get the upper file pointer from vma->vm_file
2. get the lower file from the upper file
3. *override* the vma->vm_file pointer with the lower file
4. call the lower ->fault op with the changed vma->vm_file pointer
5. upon return, *reset* the vma->vm_file pointer back to its original value
   (i.e., pointing to the upper file)

Steps 3 and 5 require this "pointer flipping" that I was sure was
unacceptable to kernel developers who might be reviewing the code.  I didn't
want to use this technique before I've had the sanctioning of the
appropriate kernel maintainers, along with an acceptable path to fix the
->fault API.  LSF was an excellent opportunity to discuss this and many
other related issues in person.

Cheers,
Erez.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-04-02 16:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
2008-04-01 21:06 ` [PATCH 01/14] VFS: export release_open_intent as GPL symbol, not regular symbol Erez Zadok
2008-04-01 21:06 ` [PATCH 02/14] VFS: rename do_splice_to/from to vfs_splice_* and export symbols Erez Zadok
2008-04-01 21:06 ` [PATCH 03/14] Unionfs: implement splice_read/write methods directly Erez Zadok
2008-04-01 21:06 ` [PATCH 04/14] Unionfs: implement vm_operations->fault Erez Zadok
2008-04-01 21:06 ` [PATCH 05/14] Unionfs: lock our dentry in file operations Erez Zadok
2008-04-01 21:06 ` [PATCH 06/14] Unionfs: reduce number of whiteouts by deleting all instances of files Erez Zadok
2008-04-01 21:06 ` [PATCH 07/14] Unionfs: don't copy parent inode times in setattr Erez Zadok
2008-04-01 21:06 ` [PATCH 08/14] Unionfs: use __func__ instead of __FUNCTION__ Erez Zadok
2008-04-01 21:06 ` [PATCH 09/14] Unionfs: use noinline_for_stack Erez Zadok
2008-04-01 21:06 ` [PATCH 10/14] Unionfs: document reasons for opaque directories Erez Zadok
2008-04-01 21:06 ` [PATCH 11/14] Unionfs: display mount point name along with generation number Erez Zadok
2008-04-01 21:06 ` [PATCH 12/14] Unionfs: do not over-decrement lower superblock refs on remount Erez Zadok
2008-04-01 21:06 ` [PATCH 13/14] Unionfs: don't purge lower sb data " Erez Zadok
2008-04-01 21:06 ` [PATCH 14/14] Unionfs: update lower mnts on rmdir with copyup Erez Zadok
2008-04-02 13:48 ` [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Tomas M
2008-04-02 16:54   ` Erez Zadok

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).