* [PATCH 01/14] VFS: export release_open_intent as GPL symbol, not regular symbol
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 02/14] VFS: rename do_splice_to/from to vfs_splice_* and export symbols Erez Zadok
` (13 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/namei.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 00078e4..e421fb9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -399,7 +399,7 @@ void release_open_intent(struct nameidata *nd)
else
fput(nd->intent.open.file);
}
-EXPORT_SYMBOL(release_open_intent);
+EXPORT_SYMBOL_GPL(release_open_intent);
static inline struct dentry *
do_revalidate(struct dentry *dentry, struct nameidata *nd)
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 02/14] VFS: rename do_splice_to/from to vfs_splice_* and export symbols
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
2008-04-01 21:06 ` [PATCH 01/14] VFS: export release_open_intent as GPL symbol, not regular symbol Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 03/14] Unionfs: implement splice_read/write methods directly Erez Zadok
` (12 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
A stackable file system which uses vm_ops->fault, and does not implement
address_space_operations, cannot use generic_file_splice_read/write, but has
to implement ->splice_read/write itself. These two helper functions are
very useful to such a module.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/splice.c | 20 +++++++++++---------
include/linux/splice.h | 5 +++++
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 0670c91..62b0db2 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -899,8 +899,8 @@ EXPORT_SYMBOL(generic_splice_sendpage);
/*
* Attempt to initiate a splice from pipe to file.
*/
-static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
- loff_t *ppos, size_t len, unsigned int flags)
+long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags)
{
int ret;
@@ -916,13 +916,14 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
return out->f_op->splice_write(pipe, out, ppos, len, flags);
}
+EXPORT_SYMBOL_GPL(vfs_splice_from);
/*
* Attempt to initiate a splice from a file to a pipe.
*/
-static long do_splice_to(struct file *in, loff_t *ppos,
- struct pipe_inode_info *pipe, size_t len,
- unsigned int flags)
+long vfs_splice_to(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
{
int ret;
@@ -938,6 +939,7 @@ static long do_splice_to(struct file *in, loff_t *ppos,
return in->f_op->splice_read(in, ppos, pipe, len, flags);
}
+EXPORT_SYMBOL_GPL(vfs_splice_to);
/**
* splice_direct_to_actor - splices data directly between two non-pipes
@@ -1007,7 +1009,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
size_t read_len;
loff_t pos = sd->pos;
- ret = do_splice_to(in, &pos, pipe, len, flags);
+ ret = vfs_splice_to(in, &pos, pipe, len, flags);
if (unlikely(ret <= 0))
goto out_release;
@@ -1062,7 +1064,7 @@ static int direct_splice_actor(struct pipe_inode_info *pipe,
{
struct file *file = sd->u.file;
- return do_splice_from(pipe, file, &sd->pos, sd->total_len, sd->flags);
+ return vfs_splice_from(pipe, file, &sd->pos, sd->total_len, sd->flags);
}
/**
@@ -1136,7 +1138,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
} else
off = &out->f_pos;
- ret = do_splice_from(pipe, out, off, len, flags);
+ ret = vfs_splice_from(pipe, out, off, len, flags);
if (off_out && copy_to_user(off_out, off, sizeof(loff_t)))
ret = -EFAULT;
@@ -1157,7 +1159,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
} else
off = &in->f_pos;
- ret = do_splice_to(in, off, pipe, len, flags);
+ ret = vfs_splice_to(in, off, pipe, len, flags);
if (off_in && copy_to_user(off_in, off, sizeof(loff_t)))
ret = -EFAULT;
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 528dcb9..4b5727c 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -70,5 +70,10 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
struct splice_pipe_desc *);
extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
splice_direct_actor *);
+extern long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags);
+extern long vfs_splice_to(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags);
#endif
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 03/14] Unionfs: implement splice_read/write methods directly
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
2008-04-01 21:06 ` [PATCH 01/14] VFS: export release_open_intent as GPL symbol, not regular symbol Erez Zadok
2008-04-01 21:06 ` [PATCH 02/14] VFS: rename do_splice_to/from to vfs_splice_* and export symbols Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 04/14] Unionfs: implement vm_operations->fault Erez Zadok
` (11 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
Must implement splice_read/write directly, using VFS helpers, because we can
no longer rely on generic_file_splice_read/write: they need
address_space_operations implemented, which we no longer have.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/unionfs/file.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/unionfs/union.h | 1 +
2 files changed, 58 insertions(+), 2 deletions(-)
diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
index 0c424f6..1fcaf8e 100644
--- a/fs/unionfs/file.c
+++ b/fs/unionfs/file.c
@@ -165,6 +165,61 @@ out:
return err;
}
+static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ ssize_t err;
+ struct file *lower_file;
+
+ unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ err = unionfs_file_revalidate(file, false);
+ if (unlikely(err))
+ goto out;
+
+ lower_file = unionfs_lower_file(file);
+ err = vfs_splice_to(lower_file, ppos, pipe, len, flags);
+ /* update our inode atime upon a successful lower splice-read */
+ if (err >= 0) {
+ fsstack_copy_attr_atime(file->f_path.dentry->d_inode,
+ lower_file->f_path.dentry->d_inode);
+ unionfs_check_file(file);
+ }
+
+out:
+ unionfs_check_file(file);
+ unionfs_read_unlock(file->f_path.dentry->d_sb);
+ return err;
+}
+
+static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
+ struct file *file, loff_t *ppos,
+ size_t len, unsigned int flags)
+{
+ ssize_t err = 0;
+ struct file *lower_file;
+
+ unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ err = unionfs_file_revalidate(file, true);
+ if (unlikely(err))
+ goto out;
+
+ lower_file = unionfs_lower_file(file);
+ err = vfs_splice_from(pipe, lower_file, ppos, len, flags);
+ /* update our inode times+sizes upon a successful lower write */
+ if (err >= 0) {
+ fsstack_copy_inode_size(file->f_path.dentry->d_inode,
+ lower_file->f_path.dentry->d_inode);
+ fsstack_copy_attr_times(file->f_path.dentry->d_inode,
+ lower_file->f_path.dentry->d_inode);
+ unionfs_check_file(file);
+ }
+
+out:
+ unionfs_read_unlock(file->f_path.dentry->d_sb);
+ return err;
+}
+
struct file_operations unionfs_main_fops = {
.llseek = generic_file_llseek,
.read = do_sync_read,
@@ -179,6 +234,6 @@ struct file_operations unionfs_main_fops = {
.release = unionfs_file_release,
.fsync = unionfs_fsync,
.fasync = unionfs_fasync,
- .splice_read = generic_file_splice_read,
- .splice_write = generic_file_splice_write,
+ .splice_read = unionfs_splice_read,
+ .splice_write = unionfs_splice_write,
};
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index 533806c..7b55e33 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -46,6 +46,7 @@
#include <linux/poison.h>
#include <linux/mman.h>
#include <linux/backing-dev.h>
+#include <linux/splice.h>
#include <asm/system.h>
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 04/14] Unionfs: implement vm_operations->fault
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (2 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 03/14] Unionfs: implement splice_read/write methods directly Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 05/14] Unionfs: lock our dentry in file operations Erez Zadok
` (10 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
As per recommendations made at LSF'08, a stackable file system which does
not change the data across layers, should try to use vm_operations instead
of address_space_operations. This means we now have to implement out own
->read and ->write methods because we cannot rely on VFS helpers which
require us to have a ->readpage method. Either way there are two caveats:
(1) It's not possible currently to set inode->i_mapping->a_ops to NULL,
because too many code paths expect i_mapping to be non-NULL.
(2) a small/dummy ->readpage is still needed because generic_file_mmap,
which we used in unionfs_mmap, still check for the existence of the
->readpage method. These code paths may have to be changed to remove the
need for readpage().
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/unionfs/file.c | 109 +++++++++++++++--
fs/unionfs/mmap.c | 338 +++++++---------------------------------------------
fs/unionfs/union.h | 4 +-
3 files changed, 147 insertions(+), 304 deletions(-)
diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
index 1fcaf8e..9397ff3 100644
--- a/fs/unionfs/file.c
+++ b/fs/unionfs/file.c
@@ -18,17 +18,83 @@
#include "union.h"
+static ssize_t unionfs_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int err;
+ struct file *lower_file;
+ struct dentry *dentry = file->f_path.dentry;
+
+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
+ err = unionfs_file_revalidate_locked(file, false);
+ if (unlikely(err))
+ goto out;
+
+ lower_file = unionfs_lower_file(file);
+ err = vfs_read(lower_file, buf, count, ppos);
+ /* update our inode atime upon a successful lower read */
+ if (err >= 0) {
+ fsstack_copy_attr_atime(file->f_path.dentry->d_inode,
+ lower_file->f_path.dentry->d_inode);
+ unionfs_check_file(file);
+ }
+
+out:
+ unionfs_unlock_dentry(dentry);
+ unionfs_read_unlock(file->f_path.dentry->d_sb);
+ return err;
+}
+
+static ssize_t unionfs_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int err = 0;
+ struct file *lower_file;
+ struct dentry *dentry = file->f_path.dentry;
+
+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
+ err = unionfs_file_revalidate_locked(file, true);
+ if (unlikely(err))
+ goto out;
+
+ lower_file = unionfs_lower_file(file);
+ err = vfs_write(lower_file, buf, count, ppos);
+ /* update our inode times+sizes upon a successful lower write */
+ if (err >= 0) {
+ fsstack_copy_inode_size(file->f_path.dentry->d_inode,
+ lower_file->f_path.dentry->d_inode);
+ fsstack_copy_attr_times(file->f_path.dentry->d_inode,
+ lower_file->f_path.dentry->d_inode);
+ unionfs_check_file(file);
+ }
+
+out:
+ unionfs_unlock_dentry(dentry);
+ unionfs_read_unlock(file->f_path.dentry->d_sb);
+ return err;
+}
+
+
static int unionfs_file_readdir(struct file *file, void *dirent,
filldir_t filldir)
{
return -ENOTDIR;
}
+int unionfs_readpage_dummy(struct file *file, struct page *page)
+{
+ BUG();
+ return -EINVAL;
+}
+
static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
{
int err = 0;
bool willwrite;
struct file *lower_file;
+ struct vm_operations_struct *saved_vm_ops = NULL;
unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
@@ -54,12 +120,39 @@ static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
err = -EINVAL;
printk(KERN_ERR "unionfs: branch %d file system does not "
"support writeable mmap\n", fbstart(file));
- } else {
- err = generic_file_mmap(file, vma);
- if (err)
- printk(KERN_ERR
- "unionfs: generic_file_mmap failed %d\n", err);
+ goto out;
+ }
+
+ /*
+ * find and save lower vm_ops.
+ *
+ * XXX: the VFS should have a cleaner way of finding the lower vm_ops
+ */
+ if (!UNIONFS_F(file)->lower_vm_ops) {
+ err = lower_file->f_op->mmap(lower_file, vma);
+ if (err) {
+ printk(KERN_ERR "unionfs: lower mmap failed %d\n", err);
+ goto out;
+ }
+ saved_vm_ops = vma->vm_ops;
+ err = do_munmap(current->mm, vma->vm_start,
+ vma->vm_end - vma->vm_start);
+ if (err) {
+ printk(KERN_ERR "unionfs: do_munmap failed %d\n", err);
+ goto out;
+ }
+ }
+
+ file->f_mapping->a_ops = &unionfs_dummy_aops;
+ err = generic_file_mmap(file, vma);
+ file->f_mapping->a_ops = &unionfs_aops;
+ if (err) {
+ printk(KERN_ERR "unionfs: generic_file_mmap failed %d\n", err);
+ goto out;
}
+ vma->vm_ops = &unionfs_vm_ops;
+ if (!UNIONFS_F(file)->lower_vm_ops)
+ UNIONFS_F(file)->lower_vm_ops = saved_vm_ops;
out:
if (!err) {
@@ -222,10 +315,8 @@ out:
struct file_operations unionfs_main_fops = {
.llseek = generic_file_llseek,
- .read = do_sync_read,
- .aio_read = generic_file_aio_read,
- .write = do_sync_write,
- .aio_write = generic_file_aio_write,
+ .read = unionfs_read,
+ .write = unionfs_write,
.readdir = unionfs_file_readdir,
.unlocked_ioctl = unionfs_ioctl,
.mmap = unionfs_mmap,
diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c
index d6ac61e..07db5b0 100644
--- a/fs/unionfs/mmap.c
+++ b/fs/unionfs/mmap.c
@@ -19,316 +19,66 @@
#include "union.h"
-static int unionfs_writepage(struct page *page, struct writeback_control *wbc)
-{
- int err = -EIO;
- struct inode *inode;
- struct inode *lower_inode;
- struct page *lower_page;
- struct address_space *lower_mapping; /* lower inode mapping */
- gfp_t mask;
-
- BUG_ON(!PageUptodate(page));
- inode = page->mapping->host;
- /* if no lower inode, nothing to do */
- if (!inode || !UNIONFS_I(inode) || UNIONFS_I(inode)->lower_inodes) {
- err = 0;
- goto out;
- }
- lower_inode = unionfs_lower_inode(inode);
- lower_mapping = lower_inode->i_mapping;
-
- /*
- * find lower page (returns a locked page)
- *
- * We turn off __GFP_FS while we look for or create a new lower
- * page. This prevents a recursion into the file system code, which
- * under memory pressure conditions could lead to a deadlock. This
- * is similar to how the loop driver behaves (see loop_set_fd in
- * drivers/block/loop.c). If we can't find the lower page, we
- * redirty our page and return "success" so that the VM will call us
- * again in the (hopefully near) future.
- */
- mask = mapping_gfp_mask(lower_mapping) & ~(__GFP_FS);
- lower_page = find_or_create_page(lower_mapping, page->index, mask);
- if (!lower_page) {
- err = 0;
- set_page_dirty(page);
- goto out;
- }
-
- /* copy page data from our upper page to the lower page */
- copy_highpage(lower_page, page);
- flush_dcache_page(lower_page);
- SetPageUptodate(lower_page);
- set_page_dirty(lower_page);
-
- /*
- * Call lower writepage (expects locked page). However, if we are
- * called with wbc->for_reclaim, then the VFS/VM just wants to
- * reclaim our page. Therefore, we don't need to call the lower
- * ->writepage: just copy our data to the lower page (already done
- * above), then mark the lower page dirty and unlock it, and return
- * success.
- */
- if (wbc->for_reclaim) {
- unlock_page(lower_page);
- goto out_release;
- }
-
- BUG_ON(!lower_mapping->a_ops->writepage);
- wait_on_page_writeback(lower_page); /* prevent multiple writers */
- clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */
- err = lower_mapping->a_ops->writepage(lower_page, wbc);
- if (err < 0)
- goto out_release;
-
- /*
- * Lower file systems such as ramfs and tmpfs, may return
- * AOP_WRITEPAGE_ACTIVATE so that the VM won't try to (pointlessly)
- * write the page again for a while. But those lower file systems
- * also set the page dirty bit back again. Since we successfully
- * copied our page data to the lower page, then the VM will come
- * back to the lower page (directly) and try to flush it. So we can
- * save the VM the hassle of coming back to our page and trying to
- * flush too. Therefore, we don't re-dirty our own page, and we
- * never return AOP_WRITEPAGE_ACTIVATE back to the VM (we consider
- * this a success).
- *
- * We also unlock the lower page if the lower ->writepage returned
- * AOP_WRITEPAGE_ACTIVATE. (This "anomalous" behaviour may be
- * addressed in future shmem/VM code.)
- */
- if (err == AOP_WRITEPAGE_ACTIVATE) {
- err = 0;
- unlock_page(lower_page);
- }
-
- /* all is well */
-
- /* lower mtimes have changed: update ours */
- unionfs_copy_attr_times(inode);
-
-out_release:
- /* b/c find_or_create_page increased refcnt */
- page_cache_release(lower_page);
-out:
- /*
- * We unlock our page unconditionally, because we never return
- * AOP_WRITEPAGE_ACTIVATE.
- */
- unlock_page(page);
- return err;
-}
-static int unionfs_writepages(struct address_space *mapping,
- struct writeback_control *wbc)
+/*
+ * XXX: we need a dummy readpage handler because generic_file_mmap (which we
+ * use in unionfs_mmap) checks for the existence of
+ * mapping->a_ops->readpage, else it returns -ENOEXEC. The VFS will need to
+ * be fixed to allow a file system to define vm_ops->fault without any
+ * address_space_ops whatsoever.
+ *
+ * Otherwise, we don't want to use our readpage method at all.
+ */
+static int unionfs_readpage(struct file *file, struct page *page)
{
- int err = 0;
- struct inode *lower_inode;
- struct inode *inode;
-
- inode = mapping->host;
- if (ibstart(inode) < 0 && ibend(inode) < 0)
- goto out;
- lower_inode = unionfs_lower_inode(inode);
- if (!lower_inode)
- goto out;
-
- err = generic_writepages(mapping, wbc);
- if (!err)
- unionfs_copy_attr_times(inode);
-out:
- return err;
+ BUG();
+ return -EINVAL;
}
-/* Readpage expects a locked page, and must unlock it */
-static int unionfs_readpage(struct file *file, struct page *page)
+static int unionfs_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
int err;
- struct file *lower_file;
- struct inode *inode;
- mm_segment_t old_fs;
- char *page_data = NULL;
- mode_t orig_mode;
+ struct file *file, *lower_file;
+ struct vm_operations_struct *lower_vm_ops;
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
- err = unionfs_file_revalidate(file, false);
- if (unlikely(err))
- goto out;
- unionfs_check_file(file);
-
- if (!UNIONFS_F(file)) {
- err = -ENOENT;
- goto out;
- }
+ BUG_ON(!vma);
+ file = vma->vm_file;
+ lower_vm_ops = UNIONFS_F(file)->lower_vm_ops;
+ BUG_ON(!lower_vm_ops);
lower_file = unionfs_lower_file(file);
- /* FIXME: is this assertion right here? */
- BUG_ON(lower_file == NULL);
-
- inode = file->f_path.dentry->d_inode;
-
- page_data = (char *) kmap(page);
- /*
- * Use vfs_read because some lower file systems don't have a
- * readpage method, and some file systems (esp. distributed ones)
- * don't like their pages to be accessed directly. Using vfs_read
- * may be a little slower, but a lot safer, as the VFS does a lot of
- * the necessary magic for us.
- */
- lower_file->f_pos = page_offset(page);
- old_fs = get_fs();
- set_fs(KERNEL_DS);
- /*
- * generic_file_splice_write may call us on a file not opened for
- * reading, so temporarily allow reading.
- */
- orig_mode = lower_file->f_mode;
- lower_file->f_mode |= FMODE_READ;
- err = vfs_read(lower_file, page_data, PAGE_CACHE_SIZE,
- &lower_file->f_pos);
- lower_file->f_mode = orig_mode;
- set_fs(old_fs);
- if (err >= 0 && err < PAGE_CACHE_SIZE)
- memset(page_data + err, 0, PAGE_CACHE_SIZE - err);
- kunmap(page);
-
- if (err < 0)
- goto out;
- err = 0;
-
- /* if vfs_read succeeded above, sync up our times */
- unionfs_copy_attr_times(inode);
-
- flush_dcache_page(page);
-
+ BUG_ON(!lower_file);
/*
- * we have to unlock our page, b/c we _might_ have gotten a locked
- * page. but we no longer have to wakeup on our page here, b/c
- * UnlockPage does it
+ * XXX: we set the vm_file to the lower_file, before calling the
+ * lower ->fault op, then we restore the vm_file back to the upper
+ * file. Need to change the ->fault prototype to take an explicit
+ * struct file, and fix all users accordingly.
*/
-out:
- if (err == 0)
- SetPageUptodate(page);
- else
- ClearPageUptodate(page);
-
- unlock_page(page);
- unionfs_check_file(file);
- unionfs_read_unlock(file->f_path.dentry->d_sb);
-
- return err;
-}
-
-static int unionfs_prepare_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
-{
- int err;
-
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
- err = unionfs_file_revalidate(file, true);
- if (!err) {
- unionfs_copy_attr_times(file->f_path.dentry->d_inode);
- unionfs_check_file(file);
- }
- unionfs_read_unlock(file->f_path.dentry->d_sb);
-
+ vma->vm_file = lower_file;
+ err = lower_vm_ops->fault(vma, vmf);
+ vma->vm_file = file;
return err;
}
-static int unionfs_commit_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
-{
- int err = -ENOMEM;
- struct inode *inode, *lower_inode;
- struct file *lower_file = NULL;
- unsigned bytes = to - from;
- char *page_data = NULL;
- mm_segment_t old_fs;
-
- BUG_ON(file == NULL);
-
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
- err = unionfs_file_revalidate(file, true);
- if (unlikely(err))
- goto out;
- unionfs_check_file(file);
-
- inode = page->mapping->host;
-
- if (UNIONFS_F(file) != NULL)
- lower_file = unionfs_lower_file(file);
-
- /* FIXME: is this assertion right here? */
- BUG_ON(lower_file == NULL);
-
- page_data = (char *)kmap(page);
- lower_file->f_pos = page_offset(page) + from;
-
- /*
- * We use vfs_write instead of copying page data and the
- * prepare_write/commit_write combo because file system's like
- * GFS/OCFS2 don't like things touching those directly,
- * calling the underlying write op, while a little bit slower, will
- * call all the FS specific code as well
- */
- old_fs = get_fs();
- set_fs(KERNEL_DS);
- err = vfs_write(lower_file, page_data + from, bytes,
- &lower_file->f_pos);
- set_fs(old_fs);
-
- kunmap(page);
-
- if (err < 0)
- goto out;
-
- /* if vfs_write succeeded above, sync up our times/sizes */
- lower_inode = lower_file->f_path.dentry->d_inode;
- if (!lower_inode)
- lower_inode = unionfs_lower_inode(inode);
- BUG_ON(!lower_inode);
- fsstack_copy_inode_size(inode, lower_inode);
- unionfs_copy_attr_times(inode);
- mark_inode_dirty_sync(inode);
-
-out:
- if (err < 0)
- ClearPageUptodate(page);
-
- unionfs_check_file(file);
- unionfs_read_unlock(file->f_path.dentry->d_sb);
- return err; /* assume all is ok */
-}
-
/*
- * Although unionfs isn't a block-based file system, it may stack on one.
- * ->bmap is needed, for example, to swapon(2) files.
+ * XXX: the default address_space_ops for unionfs is empty. We cannot set
+ * our inode->i_mapping->a_ops to NULL because too many code paths expect
+ * the a_ops vector to be non-NULL.
*/
-sector_t unionfs_bmap(struct address_space *mapping, sector_t block)
-{
- int err = -EINVAL;
- struct inode *inode, *lower_inode;
- sector_t (*bmap)(struct address_space *, sector_t);
-
- inode = (struct inode *)mapping->host;
- lower_inode = unionfs_lower_inode(inode);
- if (!lower_inode)
- goto out;
- bmap = lower_inode->i_mapping->a_ops->bmap;
- if (bmap)
- err = bmap(lower_inode->i_mapping, block);
-out:
- return err;
-}
-
-
struct address_space_operations unionfs_aops = {
- .writepage = unionfs_writepage,
- .writepages = unionfs_writepages,
+ /* empty on purpose */
+};
+
+/*
+ * XXX: we need a second, dummy address_space_ops vector, to be used
+ * temporarily during unionfs_mmap, because the latter calls
+ * generic_file_mmap, which checks if ->readpage exists, else returns
+ * -ENOEXEC.
+ */
+struct address_space_operations unionfs_dummy_aops = {
.readpage = unionfs_readpage,
- .prepare_write = unionfs_prepare_write,
- .commit_write = unionfs_commit_write,
- .bmap = unionfs_bmap,
+};
+
+struct vm_operations_struct unionfs_vm_ops = {
+ .fault = unionfs_fault,
};
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index 7b55e33..f6fabf8 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -75,7 +75,8 @@ extern struct inode_operations unionfs_dir_iops;
extern struct inode_operations unionfs_symlink_iops;
extern struct super_operations unionfs_sops;
extern struct dentry_operations unionfs_dops;
-extern struct address_space_operations unionfs_aops;
+extern struct address_space_operations unionfs_aops, unionfs_dummy_aops;
+extern struct vm_operations_struct unionfs_vm_ops;
/* How long should an entry be allowed to persist */
#define RDCACHE_JIFFIES (5*HZ)
@@ -89,6 +90,7 @@ struct unionfs_file_info {
struct unionfs_dir_state *rdstate;
struct file **lower_files;
int *saved_branch_ids; /* IDs of branches when file was opened */
+ struct vm_operations_struct *lower_vm_ops;
};
/* unionfs inode data in memory */
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 05/14] Unionfs: lock our dentry in file operations
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (3 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 04/14] Unionfs: implement vm_operations->fault Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 06/14] Unionfs: reduce number of whiteouts by deleting all instances of files Erez Zadok
` (9 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/unionfs/commonfops.c | 32 ++++++++++-----------------
fs/unionfs/dirfops.c | 18 ++++++++++-----
fs/unionfs/file.c | 55 +++++++++++++++++++++++++++++------------------
fs/unionfs/union.h | 1 -
4 files changed, 58 insertions(+), 48 deletions(-)
diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
index 2add167..50f4eda 100644
--- a/fs/unionfs/commonfops.c
+++ b/fs/unionfs/commonfops.c
@@ -302,7 +302,7 @@ out:
* @willwrite: true if caller may cause changes to the file; false otherwise.
* Caller must lock/unlock dentry's branch configuration.
*/
-int unionfs_file_revalidate_locked(struct file *file, bool willwrite)
+int unionfs_file_revalidate(struct file *file, bool willwrite)
{
struct super_block *sb;
struct dentry *dentry;
@@ -312,6 +312,7 @@ int unionfs_file_revalidate_locked(struct file *file, bool willwrite)
int err = 0;
dentry = file->f_path.dentry;
+ verify_locked(dentry);
sb = dentry->d_sb;
/*
@@ -419,17 +420,6 @@ out_nofree:
return err;
}
-int unionfs_file_revalidate(struct file *file, bool willwrite)
-{
- int err;
-
- unionfs_lock_dentry(file->f_path.dentry, UNIONFS_DMUTEX_CHILD);
- err = unionfs_file_revalidate_locked(file, willwrite);
- unionfs_unlock_dentry(file->f_path.dentry);
-
- return err;
-}
-
/* unionfs_open helper function: open a directory */
static int __open_dir(struct inode *inode, struct file *file)
{
@@ -632,6 +622,8 @@ int unionfs_file_release(struct inode *inode, struct file *file)
int fgen, err = 0;
unionfs_read_lock(sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
+
/*
* Yes, we have to revalidate this file even if it's being released.
* This is important for open-but-unlinked files, as well as mmap
@@ -650,7 +642,6 @@ int unionfs_file_release(struct inode *inode, struct file *file)
bstart = fbstart(file);
bend = fbend(file);
- unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
for (bindex = bstart; bindex <= bend; bindex++) {
lower_file = unionfs_lower_file_idx(file, bindex);
@@ -665,7 +656,6 @@ int unionfs_file_release(struct inode *inode, struct file *file)
unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
}
}
- unionfs_unlock_dentry(dentry);
kfree(fileinfo->lower_files);
kfree(fileinfo->saved_branch_ids);
@@ -683,6 +673,7 @@ int unionfs_file_release(struct inode *inode, struct file *file)
kfree(fileinfo);
out:
+ unionfs_unlock_dentry(dentry);
unionfs_read_unlock(sb);
return err;
}
@@ -729,7 +720,6 @@ static int unionfs_ioctl_queryfile(struct file *file, unsigned int cmd,
struct vfsmount *mnt;
dentry = file->f_path.dentry;
- unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
orig_bstart = dbstart(dentry);
orig_bend = dbend(dentry);
err = unionfs_partial_lookup(dentry);
@@ -771,15 +761,16 @@ static int unionfs_ioctl_queryfile(struct file *file, unsigned int cmd,
err = -EFAULT;
out:
- unionfs_unlock_dentry(dentry);
return err < 0 ? err : bend;
}
long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
long err;
+ struct dentry *dentry = file->f_path.dentry;
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
err = unionfs_file_revalidate(file, true);
if (unlikely(err))
@@ -807,7 +798,8 @@ long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
out:
unionfs_check_file(file);
- unionfs_read_unlock(file->f_path.dentry->d_sb);
+ unionfs_unlock_dentry(dentry);
+ unionfs_read_unlock(dentry->d_sb);
return err;
}
@@ -821,7 +813,7 @@ int unionfs_flush(struct file *file, fl_owner_t id)
unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
- err = unionfs_file_revalidate_locked(file, true);
+ err = unionfs_file_revalidate(file, true);
if (unlikely(err))
goto out;
unionfs_check_file(file);
@@ -843,7 +835,7 @@ int unionfs_flush(struct file *file, fl_owner_t id)
out:
if (!err)
unionfs_check_file(file);
- unionfs_unlock_dentry(file->f_path.dentry);
+ unionfs_unlock_dentry(dentry);
unionfs_read_unlock(dentry->d_sb);
return err;
}
diff --git a/fs/unionfs/dirfops.c b/fs/unionfs/dirfops.c
index a613862..8272fb6 100644
--- a/fs/unionfs/dirfops.c
+++ b/fs/unionfs/dirfops.c
@@ -97,19 +97,21 @@ static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
{
int err = 0;
struct file *lower_file = NULL;
+ struct dentry *dentry = file->f_path.dentry;
struct inode *inode = NULL;
struct unionfs_getdents_callback buf;
struct unionfs_dir_state *uds;
int bend;
loff_t offset;
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
err = unionfs_file_revalidate(file, false);
if (unlikely(err))
goto out;
- inode = file->f_path.dentry->d_inode;
+ inode = dentry->d_inode;
uds = UNIONFS_F(file)->rdstate;
if (!uds) {
@@ -189,7 +191,8 @@ static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
}
out:
- unionfs_read_unlock(file->f_path.dentry->d_sb);
+ unionfs_unlock_dentry(dentry);
+ unionfs_read_unlock(dentry->d_sb);
return err;
}
@@ -206,9 +209,11 @@ out:
static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
{
struct unionfs_dir_state *rdstate;
+ struct dentry *dentry = file->f_path.dentry;
loff_t err;
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
err = unionfs_file_revalidate(file, false);
if (unlikely(err))
@@ -250,7 +255,7 @@ static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
err = -EINVAL;
} else {
struct inode *inode;
- inode = file->f_path.dentry->d_inode;
+ inode = dentry->d_inode;
rdstate = find_rdstate(inode, offset);
if (rdstate) {
UNIONFS_F(file)->rdstate = rdstate;
@@ -269,7 +274,8 @@ static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
}
out:
- unionfs_read_unlock(file->f_path.dentry->d_sb);
+ unionfs_unlock_dentry(dentry);
+ unionfs_read_unlock(dentry->d_sb);
return err;
}
diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
index 9397ff3..0a39387 100644
--- a/fs/unionfs/file.c
+++ b/fs/unionfs/file.c
@@ -27,7 +27,7 @@ static ssize_t unionfs_read(struct file *file, char __user *buf,
unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
- err = unionfs_file_revalidate_locked(file, false);
+ err = unionfs_file_revalidate(file, false);
if (unlikely(err))
goto out;
@@ -55,7 +55,7 @@ static ssize_t unionfs_write(struct file *file, const char __user *buf,
unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
- err = unionfs_file_revalidate_locked(file, true);
+ err = unionfs_file_revalidate(file, true);
if (unlikely(err))
goto out;
@@ -94,9 +94,11 @@ static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
int err = 0;
bool willwrite;
struct file *lower_file;
+ struct dentry *dentry = file->f_path.dentry;
struct vm_operations_struct *saved_vm_ops = NULL;
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
/* This might be deferred to mmap's writepage */
willwrite = ((vma->vm_flags | VM_SHARED | VM_WRITE) == vma->vm_flags);
@@ -157,10 +159,11 @@ static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
out:
if (!err) {
/* copyup could cause parent dir times to change */
- unionfs_copy_attr_times(file->f_path.dentry->d_parent->d_inode);
+ unionfs_copy_attr_times(dentry->d_parent->d_inode);
unionfs_check_file(file);
}
- unionfs_read_unlock(file->f_path.dentry->d_sb);
+ unionfs_unlock_dentry(dentry);
+ unionfs_read_unlock(dentry->d_sb);
return err;
}
@@ -172,7 +175,8 @@ int unionfs_fsync(struct file *file, struct dentry *dentry, int datasync)
struct inode *lower_inode, *inode;
int err = -EINVAL;
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
err = unionfs_file_revalidate(file, true);
if (unlikely(err))
goto out;
@@ -207,8 +211,10 @@ int unionfs_fsync(struct file *file, struct dentry *dentry, int datasync)
unionfs_copy_attr_times(inode);
out:
- unionfs_check_file(file);
- unionfs_read_unlock(file->f_path.dentry->d_sb);
+ if (!err)
+ unionfs_check_file(file);
+ unionfs_unlock_dentry(dentry);
+ unionfs_read_unlock(dentry->d_sb);
return err;
}
@@ -216,11 +222,12 @@ int unionfs_fasync(int fd, struct file *file, int flag)
{
int bindex, bstart, bend;
struct file *lower_file;
- struct dentry *dentry;
+ struct dentry *dentry = file->f_path.dentry;
struct inode *lower_inode, *inode;
int err = 0;
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
err = unionfs_file_revalidate(file, true);
if (unlikely(err))
goto out;
@@ -231,7 +238,6 @@ int unionfs_fasync(int fd, struct file *file, int flag)
if (bstart < 0 || bend < 0)
goto out;
- dentry = file->f_path.dentry;
inode = dentry->d_inode;
if (unlikely(!inode)) {
printk(KERN_ERR
@@ -253,8 +259,10 @@ int unionfs_fasync(int fd, struct file *file, int flag)
unionfs_copy_attr_times(inode);
out:
- unionfs_check_file(file);
- unionfs_read_unlock(file->f_path.dentry->d_sb);
+ if (!err)
+ unionfs_check_file(file);
+ unionfs_unlock_dentry(dentry);
+ unionfs_read_unlock(dentry->d_sb);
return err;
}
@@ -264,8 +272,10 @@ static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
{
ssize_t err;
struct file *lower_file;
+ struct dentry *dentry = file->f_path.dentry;
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
err = unionfs_file_revalidate(file, false);
if (unlikely(err))
goto out;
@@ -274,14 +284,14 @@ static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
err = vfs_splice_to(lower_file, ppos, pipe, len, flags);
/* update our inode atime upon a successful lower splice-read */
if (err >= 0) {
- fsstack_copy_attr_atime(file->f_path.dentry->d_inode,
+ fsstack_copy_attr_atime(dentry->d_inode,
lower_file->f_path.dentry->d_inode);
unionfs_check_file(file);
}
out:
- unionfs_check_file(file);
- unionfs_read_unlock(file->f_path.dentry->d_sb);
+ unionfs_unlock_dentry(dentry);
+ unionfs_read_unlock(dentry->d_sb);
return err;
}
@@ -291,8 +301,10 @@ static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
{
ssize_t err = 0;
struct file *lower_file;
+ struct dentry *dentry = file->f_path.dentry;
- unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+ unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
err = unionfs_file_revalidate(file, true);
if (unlikely(err))
goto out;
@@ -301,15 +313,16 @@ static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
err = vfs_splice_from(pipe, lower_file, ppos, len, flags);
/* update our inode times+sizes upon a successful lower write */
if (err >= 0) {
- fsstack_copy_inode_size(file->f_path.dentry->d_inode,
+ fsstack_copy_inode_size(dentry->d_inode,
lower_file->f_path.dentry->d_inode);
- fsstack_copy_attr_times(file->f_path.dentry->d_inode,
+ fsstack_copy_attr_times(dentry->d_inode,
lower_file->f_path.dentry->d_inode);
unionfs_check_file(file);
}
out:
- unionfs_read_unlock(file->f_path.dentry->d_sb);
+ unionfs_unlock_dentry(dentry);
+ unionfs_read_unlock(dentry->d_sb);
return err;
}
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index f6fabf8..63f3b21 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -355,7 +355,6 @@ extern int unionfs_setlk(struct file *file, int cmd, struct file_lock *fl);
extern int unionfs_getlk(struct file *file, struct file_lock *fl);
/* Common file operations. */
-extern int unionfs_file_revalidate_locked(struct file *file, bool willwrite);
extern int unionfs_file_revalidate(struct file *file, bool willwrite);
extern int unionfs_open(struct inode *inode, struct file *file);
extern int unionfs_file_release(struct inode *inode, struct file *file);
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 06/14] Unionfs: reduce number of whiteouts by deleting all instances of files
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (4 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 05/14] Unionfs: lock our dentry in file operations Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 07/14] Unionfs: don't copy parent inode times in setattr Erez Zadok
` (8 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok,
Himanshu Kanda
Optimize the unlinking of non-dir objects in unionfs by deleting all
possible lower inode objects from all writable lower branches. This may
consume a bit more processing, but on average reduces overall inode
consumption and further saves a lot by reducing the total number of
whiteouts needed. We create a whiteout now only if we could not delete all
lower objects, or if one of the lower branches was explicitly marked
read-only.
Signed-off-by: Himanshu Kanda <hkanda@cs.sunysb.edu>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
Documentation/filesystems/unionfs/concepts.txt | 25 ++++++
fs/unionfs/lookup.c | 8 +-
fs/unionfs/unlink.c | 107 +++++++++++++++---------
3 files changed, 97 insertions(+), 43 deletions(-)
diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
index 8d9a1c5..0bc1a19 100644
--- a/Documentation/filesystems/unionfs/concepts.txt
+++ b/Documentation/filesystems/unionfs/concepts.txt
@@ -62,6 +62,31 @@ simplest solution is to take the instance from the highest priority
(numerically lowest value) and "hide" the others.
+Unlinking:
+=========
+
+Unlink operation on non-directory instances is optimized to remove the
+maximum possible objects in case multiple underlying branches have the same
+file name. The unlink operation will first try to delete file instances
+from highest priority branch and then move further to delete from remaining
+branches in order of their decreasing priority. Consider a case (F..D..F),
+where F is a file and D is a directory of the same name; here, some
+intermediate branch could have an empty directory instance with the same
+name, so this operation also tries to delete this directory instance and
+proceed further to delete from next possible lower priority branch. The
+unionfs unlink operation will smoothly delete the files with same name from
+all possible underlying branches. In case if some error occurs, it creates
+whiteout in highest priority branch that will hide file instance in rest of
+the branches. An error could occur either if an unlink operations in any of
+the underlying branch failed or if a branch has no write permission.
+
+This unlinking policy is known as "delete all" and it has the benefit of
+overall reducing the number of inodes used by duplicate files, and further
+reducing the total number of inodes consumed by whiteouts. The cost is of
+extra processing, but testing shows this extra processing is well worth the
+savings.
+
+
Copyup:
=======
diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
index 755158e..7618716 100644
--- a/fs/unionfs/lookup.c
+++ b/fs/unionfs/lookup.c
@@ -264,7 +264,9 @@ struct dentry *unionfs_lookup_backend(struct dentry *dentry,
* branches, but we have to skip non-dirs (to avoid, say,
* calling readdir on a regular file).
*/
- if (!S_ISDIR(lower_dentry->d_inode->i_mode) && dentry_count) {
+ if ((lookupmode != INTERPOSE_PARTIAL) &&
+ !S_ISDIR(lower_dentry->d_inode->i_mode) &&
+ dentry_count) {
dput(lower_dentry);
continue;
}
@@ -295,10 +297,6 @@ struct dentry *unionfs_lookup_backend(struct dentry *dentry,
continue;
if (dentry_count == 1)
goto out_positive;
- /* This can only happen with mixed D-*-F-* */
- BUG_ON(!S_ISDIR(unionfs_lower_dentry(dentry)->
- d_inode->i_mode));
- continue;
}
opaque = is_opaque_dir(dentry, bindex);
diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
index 6e93da3..c66bb3e 100644
--- a/fs/unionfs/unlink.c
+++ b/fs/unionfs/unlink.c
@@ -18,7 +18,32 @@
#include "union.h"
-/* unlink a file by creating a whiteout */
+/*
+ * Helper function for Unionfs's unlink operation.
+ *
+ * The main goal of this function is to optimize the unlinking of non-dir
+ * objects in unionfs by deleting all possible lower inode objects from the
+ * underlying branches having same dentry name as the non-dir dentry on
+ * which this unlink operation is called. This way we delete as many lower
+ * inodes as possible, and save space. Whiteouts need to be created in
+ * branch0 only if unlinking fails on any of the lower branch other than
+ * branch0, or if a lower branch is marked read-only.
+ *
+ * Also, while unlinking a file, if we encounter any dir type entry in any
+ * intermediate branch, then we remove the directory by calling vfs_rmdir.
+ * The following special cases are also handled:
+
+ * (1) If an error occurs in branch0 during vfs_unlink, then we return
+ * appropriate error.
+ *
+ * (2) If we get an error during unlink in any of other lower branch other
+ * than branch0, then we create a whiteout in branch0.
+ *
+ * (3) If a whiteout already exists in any intermediate branch, we delete
+ * all possible inodes only up to that branch (this is an "opaqueness"
+ * as as per Documentation/filesystems/unionfs/concepts.txt).
+ *
+ */
static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry)
{
struct dentry *lower_dentry;
@@ -30,51 +55,57 @@ static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry)
if (err)
goto out;
- bindex = dbstart(dentry);
-
- lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
- if (!lower_dentry)
- goto out;
+ /* trying to unlink all possible valid instances */
+ for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
+ lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+ if (!lower_dentry || !lower_dentry->d_inode)
+ continue;
+
+ lower_dir_dentry = lock_parent(lower_dentry);
+
+ /* avoid destroying the lower inode if the object is in use */
+ dget(lower_dentry);
+ err = is_robranch_super(dentry->d_sb, bindex);
+ if (!err) {
+ /* see Documentation/filesystems/unionfs/issues.txt */
+ lockdep_off();
+ if (!S_ISDIR(lower_dentry->d_inode->i_mode))
+ err = vfs_unlink(lower_dir_dentry->d_inode,
+ lower_dentry);
+ else
+ err = vfs_rmdir(lower_dir_dentry->d_inode,
+ lower_dentry);
+ lockdep_on();
+ }
- lower_dir_dentry = lock_parent(lower_dentry);
+ /* if lower object deletion succeeds, update inode's times */
+ if (!err)
+ unionfs_copy_attr_times(dentry->d_inode);
+ dput(lower_dentry);
+ fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
+ unlock_dir(lower_dir_dentry);
- /* avoid destroying the lower inode if the file is in use */
- dget(lower_dentry);
- err = is_robranch_super(dentry->d_sb, bindex);
- if (!err) {
- /* see Documentation/filesystems/unionfs/issues.txt */
- lockdep_off();
- err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
- lockdep_on();
+ if (err)
+ break;
}
- /* if vfs_unlink succeeded, update our inode's times */
- if (!err)
- unionfs_copy_attr_times(dentry->d_inode);
- dput(lower_dentry);
- fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
- unlock_dir(lower_dir_dentry);
-
- if (err && !IS_COPYUP_ERR(err))
- goto out;
/*
- * We create whiteouts if (1) there was an error unlinking the main
- * file; (2) there is a lower priority file with the same name
- * (dbopaque); (3) the branch in which the file is not the last
- * (rightmost0 branch. The last rule is an optimization to avoid
- * creating all those whiteouts if there's no chance they'd be
- * masking any lower-priority branch, as well as unionfs is used
- * with only one branch (using only one branch, while odd, is still
- * possible).
+ * Create the whiteout in branch 0 (highest priority) only if (a)
+ * there was an error in any intermediate branch other than branch 0
+ * due to failure of vfs_unlink/vfs_rmdir or (b) a branch marked or
+ * mounted read-only.
*/
if (err) {
- if (dbstart(dentry) == 0)
+ if ((bindex == 0) ||
+ ((bindex == dbstart(dentry)) &&
+ (!IS_COPYUP_ERR(err))))
goto out;
- err = create_whiteout(dentry, dbstart(dentry) - 1);
- } else if (dbopaque(dentry) != -1) {
- err = create_whiteout(dentry, dbopaque(dentry));
- } else if (dbstart(dentry) < sbend(dentry->d_sb)) {
- err = create_whiteout(dentry, dbstart(dentry));
+ else {
+ if (!IS_COPYUP_ERR(err))
+ pr_debug("unionfs: lower object deletion "
+ "failed in branch:%d\n", bindex);
+ err = create_whiteout(dentry, sbstart(dentry->d_sb));
+ }
}
out:
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 07/14] Unionfs: don't copy parent inode times in setattr
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (5 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 06/14] Unionfs: reduce number of whiteouts by deleting all instances of files Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 08/14] Unionfs: use __func__ instead of __FUNCTION__ Erez Zadok
` (7 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/unionfs/inode.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
index 640969d..0dc07ec 100644
--- a/fs/unionfs/inode.c
+++ b/fs/unionfs/inode.c
@@ -1064,8 +1064,7 @@ static int unionfs_setattr(struct dentry *dentry, struct iattr *ia)
if (ia->ia_valid & ATTR_ATIME_SET)
inode->i_atime = lower_inode->i_atime;
fsstack_copy_inode_size(inode, lower_inode);
- /* if setattr succeeded, then parent dir may have changed */
- unionfs_copy_attr_times(dentry->d_parent->d_inode);
+
out:
if (!err)
unionfs_check_dentry(dentry);
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 08/14] Unionfs: use __func__ instead of __FUNCTION__
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (6 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 07/14] Unionfs: don't copy parent inode times in setattr Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 09/14] Unionfs: use noinline_for_stack Erez Zadok
` (6 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/unionfs/dentry.c | 2 +-
fs/unionfs/union.h | 18 +++++++++---------
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
index f8f65e1..5e498c2 100644
--- a/fs/unionfs/dentry.c
+++ b/fs/unionfs/dentry.c
@@ -406,7 +406,7 @@ bool __unionfs_d_revalidate_chain(struct dentry *dentry, struct nameidata *nd,
chain = kzalloc(chain_len * sizeof(struct dentry *), GFP_KERNEL);
if (unlikely(!chain)) {
printk(KERN_CRIT "unionfs: no more memory in %s\n",
- __FUNCTION__);
+ __func__);
goto out;
}
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index 63f3b21..e93b9ef 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -557,24 +557,24 @@ static inline void unionfs_mntput(struct dentry *dentry, int bindex)
#ifdef CONFIG_UNION_FS_DEBUG
/* useful for tracking code reachability */
-#define UDBG pr_debug("DBG:%s:%s:%d\n", __FILE__, __FUNCTION__, __LINE__)
+#define UDBG pr_debug("DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__)
#define unionfs_check_inode(i) __unionfs_check_inode((i), \
- __FILE__, __FUNCTION__, __LINE__)
+ __FILE__, __func__, __LINE__)
#define unionfs_check_dentry(d) __unionfs_check_dentry((d), \
- __FILE__, __FUNCTION__, __LINE__)
+ __FILE__, __func__, __LINE__)
#define unionfs_check_file(f) __unionfs_check_file((f), \
- __FILE__, __FUNCTION__, __LINE__)
+ __FILE__, __func__, __LINE__)
#define unionfs_check_nd(n) __unionfs_check_nd((n), \
- __FILE__, __FUNCTION__, __LINE__)
+ __FILE__, __func__, __LINE__)
#define show_branch_counts(sb) __show_branch_counts((sb), \
- __FILE__, __FUNCTION__, __LINE__)
+ __FILE__, __func__, __LINE__)
#define show_inode_times(i) __show_inode_times((i), \
- __FILE__, __FUNCTION__, __LINE__)
+ __FILE__, __func__, __LINE__)
#define show_dinode_times(d) __show_dinode_times((d), \
- __FILE__, __FUNCTION__, __LINE__)
+ __FILE__, __func__, __LINE__)
#define show_inode_counts(i) __show_inode_counts((i), \
- __FILE__, __FUNCTION__, __LINE__)
+ __FILE__, __func__, __LINE__)
extern void __unionfs_check_inode(const struct inode *inode, const char *fname,
const char *fxn, int line);
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 09/14] Unionfs: use noinline_for_stack
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (7 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 08/14] Unionfs: use __func__ instead of __FUNCTION__ Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 10/14] Unionfs: document reasons for opaque directories Erez Zadok
` (5 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/unionfs/lookup.c | 2 +-
fs/unionfs/super.c | 24 ++++++++++++++----------
2 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
index 7618716..7f512c2 100644
--- a/fs/unionfs/lookup.c
+++ b/fs/unionfs/lookup.c
@@ -32,7 +32,7 @@ static int is_validname(const char *name)
}
/* The rest of these are utility functions for lookup. */
-static noinline int is_opaque_dir(struct dentry *dentry, int bindex)
+static noinline_for_stack int is_opaque_dir(struct dentry *dentry, int bindex)
{
int err = 0;
struct dentry *lower_dentry;
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index fba1598..2456654 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -189,9 +189,11 @@ out:
}
/* handle mode changing during remount */
-static noinline int do_remount_mode_option(char *optarg, int cur_branches,
- struct unionfs_data *new_data,
- struct path *new_lower_paths)
+static noinline_for_stack int do_remount_mode_option(
+ char *optarg,
+ int cur_branches,
+ struct unionfs_data *new_data,
+ struct path *new_lower_paths)
{
int err = -EINVAL;
int perms, idx;
@@ -250,9 +252,10 @@ out:
}
/* handle branch deletion during remount */
-static noinline int do_remount_del_option(char *optarg, int cur_branches,
- struct unionfs_data *new_data,
- struct path *new_lower_paths)
+static noinline_for_stack int do_remount_del_option(
+ char *optarg, int cur_branches,
+ struct unionfs_data *new_data,
+ struct path *new_lower_paths)
{
int err = -EINVAL;
int idx;
@@ -313,10 +316,11 @@ out:
}
/* handle branch insertion during remount */
-static noinline int do_remount_add_option(char *optarg, int cur_branches,
- struct unionfs_data *new_data,
- struct path *new_lower_paths,
- int *high_branch_id)
+static noinline_for_stack int do_remount_add_option(
+ char *optarg, int cur_branches,
+ struct unionfs_data *new_data,
+ struct path *new_lower_paths,
+ int *high_branch_id)
{
int err = -EINVAL;
int perms;
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 10/14] Unionfs: document reasons for opaque directories
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (8 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 09/14] Unionfs: use noinline_for_stack Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 11/14] Unionfs: display mount point name along with generation number Erez Zadok
` (4 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
Documentation/filesystems/unionfs/concepts.txt | 36 ++++++++++++++++++++++++
1 files changed, 36 insertions(+), 0 deletions(-)
diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
index 0bc1a19..b853788 100644
--- a/Documentation/filesystems/unionfs/concepts.txt
+++ b/Documentation/filesystems/unionfs/concepts.txt
@@ -52,6 +52,42 @@ Later, when Unionfs traverses branches (due to lookup or readdir), it
eliminate 'foo' from the namespace (as well as the whiteout itself.)
+Opaque Directories:
+===================
+
+Assume we have a unionfs mount comprising of two branches. Branch 0 is
+empty; branch 1 has the directory /a and file /a/f. Let's say we mount a
+union of branch 0 as read-write and branch 1 as read-only. Now, let's say
+we try to perform the following operation in the union:
+
+ rm -fr a
+
+Because branch 1 is not writable, we cannot physically remove the file /a/f
+or the directory /a. So instead, we will create a whiteout in branch 0
+named /.wh.a, masking out the name "a" from branch 1. Next, let's say we
+try to create a directory named "a" as follows:
+
+ mkdir a
+
+Because we have a whiteout for "a" already, Unionfs behaves as if "a"
+doesn't exist, and thus will delete the whiteout and replace it with an
+actual directory named "a".
+
+The problem now is that if you try to "ls" in the union, Unionfs will
+perform is normal directory name unification, for *all* directories named
+"a" in all branches. This will cause the file /a/f from branch 1 to
+re-appear in the union's namespace, which violates Unix semantics.
+
+To avoid this problem, we have a different form of whiteouts for
+directories, called "opaque directories" (same as BSD Union Mount does).
+Whenever we replace a whiteout with a directory, that directory is marked as
+opaque. In Unionfs 2.x, it means that we create a file named
+/a/.wh.__dir_opaque in branch 0, after having created directory /a there.
+When unionfs notices that a directory is opaque, it stops all namespace
+operations (including merging readdir contents) at that opaque directory.
+This prevents re-exposing names from masked out directories.
+
+
Duplicate Elimination:
======================
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 11/14] Unionfs: display mount point name along with generation number
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (9 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 10/14] Unionfs: document reasons for opaque directories Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 12/14] Unionfs: do not over-decrement lower superblock refs on remount Erez Zadok
` (3 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok, Dave Miller
CC: Dave Miller <justdave@mozilla.com>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/unionfs/main.c | 7 ++++++-
fs/unionfs/super.c | 4 +++-
fs/unionfs/union.h | 1 +
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/unionfs/main.c b/fs/unionfs/main.c
index 8f59fb5..b76264a 100644
--- a/fs/unionfs/main.c
+++ b/fs/unionfs/main.c
@@ -736,7 +736,12 @@ static int unionfs_get_sb(struct file_system_type *fs_type,
int flags, const char *dev_name,
void *raw_data, struct vfsmount *mnt)
{
- return get_sb_nodev(fs_type, flags, raw_data, unionfs_read_super, mnt);
+ int err;
+ err = get_sb_nodev(fs_type, flags, raw_data, unionfs_read_super, mnt);
+ if (!err)
+ UNIONFS_SB(mnt->mnt_sb)->dev_name =
+ kstrdup(dev_name, GFP_KERNEL);
+ return err;
}
static struct file_system_type unionfs_fs_type = {
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index 2456654..e5cb235 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -134,6 +134,7 @@ static void unionfs_put_super(struct super_block *sb)
atomic_dec(&s->s_active);
}
+ kfree(spd->dev_name);
kfree(spd->data);
kfree(spd);
sb->s_fs_info = NULL;
@@ -805,7 +806,8 @@ out_no_change:
atomic_set(&UNIONFS_D(sb->s_root)->generation, i);
atomic_set(&UNIONFS_I(sb->s_root->d_inode)->generation, i);
if (!(*flags & MS_SILENT))
- pr_info("unionfs: new generation number %d\n", i);
+ pr_info("unionfs: %s: new generation number %d\n",
+ UNIONFS_SB(sb)->dev_name, i);
/* finally, update the root dentry's times */
unionfs_copy_attr_times(sb->s_root->d_inode);
err = 0; /* reset to success */
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index e93b9ef..4ca4e76 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -159,6 +159,7 @@ struct unionfs_sb_info {
struct rw_semaphore rwsem;
pid_t write_lock_owner; /* PID of rw_sem owner (write lock) */
int high_branch_id; /* last unique branch ID given */
+ char *dev_name; /* to identify different unions in pr_debug */
struct unionfs_data *data;
};
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 12/14] Unionfs: do not over-decrement lower superblock refs on remount
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (10 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 11/14] Unionfs: display mount point name along with generation number Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 13/14] Unionfs: don't purge lower sb data " Erez Zadok
` (2 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/unionfs/super.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index e5cb235..4cddc83 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -755,7 +755,7 @@ out_no_change:
/* grab new lower super references; release old ones */
for (i = 0; i < new_branches; i++)
atomic_inc(&new_data[i].sb->s_active);
- for (i = 0; i < new_branches; i++)
+ for (i = 0; i < sbmax(sb); i++)
atomic_dec(&UNIONFS_SB(sb)->data[i].sb->s_active);
/* copy new vectors into their correct place */
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 13/14] Unionfs: don't purge lower sb data on remount
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (11 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 12/14] Unionfs: do not over-decrement lower superblock refs on remount Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-01 21:06 ` [PATCH 14/14] Unionfs: update lower mnts on rmdir with copyup Erez Zadok
2008-04-02 13:48 ` [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Tomas M
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
This is no longer needed, as we don't have upper and lower pages. Plus this
was racy, requiring the unexported inode_lock variable.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/unionfs/dentry.c | 11 -----------
fs/unionfs/super.c | 8 +++++---
2 files changed, 5 insertions(+), 14 deletions(-)
diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
index 5e498c2..ee0da4f 100644
--- a/fs/unionfs/dentry.c
+++ b/fs/unionfs/dentry.c
@@ -273,17 +273,6 @@ static inline void purge_inode_data(struct inode *inode)
*/
}
-void purge_sb_data(struct super_block *sb)
-{
- struct inode *inode;
-
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
- if (inode->i_state & (I_FREEING|I_WILL_FREE))
- continue;
- purge_inode_data(inode);
- }
-}
-
/*
* Revalidate a single file/symlink/special dentry. Assume that info nodes
* of the dentry and its parent are locked. Assume that parent(s) are all
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index 4cddc83..82b4045 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -747,10 +747,12 @@ out_no_change:
* function). This revalidation will happen lazily and
* incrementally, as users perform operations on cached inodes. We
* would like to encourage this revalidation to happen sooner if
- * possible, so we try to invalidate as many other pages in our
- * superblock as we can.
+ * possible, so we like to try to invalidate as many other pages in
+ * our superblock as we can. We used to call drop_pagecache_sb() or
+ * a variant thereof, but either method was racy (drop_caches alone
+ * is known to be racy). So now we let the revalidation happen on a
+ * per file basis in ->d_revalidate.
*/
- purge_sb_data(sb);
/* grab new lower super references; release old ones */
for (i = 0; i < new_branches; i++)
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 14/14] Unionfs: update lower mnts on rmdir with copyup
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (12 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 13/14] Unionfs: don't purge lower sb data " Erez Zadok
@ 2008-04-01 21:06 ` Erez Zadok
2008-04-02 13:48 ` [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Tomas M
14 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-01 21:06 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, viro, hch, Erez Zadok
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
fs/unionfs/unlink.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
index c66bb3e..cad0386 100644
--- a/fs/unionfs/unlink.c
+++ b/fs/unionfs/unlink.c
@@ -280,6 +280,8 @@ out:
!unionfs_lower_inode_idx(inode, dend))
ibstart(inode) = ibend(inode) = -1;
d_drop(dentry);
+ /* update our lower vfsmnts, in case a copyup took place */
+ unionfs_postcopyup_setmnt(dentry);
}
if (namelist)
--
1.5.2.2
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups
2008-04-01 21:06 [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Erez Zadok
` (13 preceding siblings ...)
2008-04-01 21:06 ` [PATCH 14/14] Unionfs: update lower mnts on rmdir with copyup Erez Zadok
@ 2008-04-02 13:48 ` Tomas M
2008-04-02 16:54 ` Erez Zadok
14 siblings, 1 reply; 17+ messages in thread
From: Tomas M @ 2008-04-02 13:48 UTC (permalink / raw)
To: Erez Zadok; +Cc: akpm, linux-kernel, linux-fsdevel, viro, hch
> 1. Based on recommendations we've received while at LSF'08, we've changed
> Unionfs to use the vm_operations->fault method, instead of
> address_space_operations. This makes unionfs's code smaller, runs faster
> (reduce memory pressure), and removes a number of potential races. This
> change is significant ...
Interesting to see you're finally switching to the approach,
which is in AUFS for ages.
May I ask you, why did it took so long?
Mainly, why did you refused to make the changes last year?
Thank you
Tomas M
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups
2008-04-02 13:48 ` [GIT PULL -mm] 00/14 VFS/Unionfs updates/fixes/cleanups Tomas M
@ 2008-04-02 16:54 ` Erez Zadok
0 siblings, 0 replies; 17+ messages in thread
From: Erez Zadok @ 2008-04-02 16:54 UTC (permalink / raw)
To: Tomas M; +Cc: Erez Zadok, akpm, linux-kernel, linux-fsdevel, viro, hch
In message <47F38E90.20306@slax.org>, Tomas M writes:
> > 1. Based on recommendations we've received while at LSF'08, we've changed
> > Unionfs to use the vm_operations->fault method, instead of
> > address_space_operations.
[...]
> May I ask you, why did it took so long?
> Mainly, why did you refused to make the changes last year?
>
> Thank you
>
> Tomas M
The current ->fault API requires (IMHO) an ugly hack such as this:
1. get the upper file pointer from vma->vm_file
2. get the lower file from the upper file
3. *override* the vma->vm_file pointer with the lower file
4. call the lower ->fault op with the changed vma->vm_file pointer
5. upon return, *reset* the vma->vm_file pointer back to its original value
(i.e., pointing to the upper file)
Steps 3 and 5 require this "pointer flipping" that I was sure was
unacceptable to kernel developers who might be reviewing the code. I didn't
want to use this technique before I've had the sanctioning of the
appropriate kernel maintainers, along with an acceptable path to fix the
->fault API. LSF was an excellent opportunity to discuss this and many
other related issues in person.
Cheers,
Erez.
^ permalink raw reply [flat|nested] 17+ messages in thread