* [PATCH 0/7 v3] overlay filesystem prototype
@ 2010-09-20 18:04 Miklos Szeredi
2010-09-20 18:04 ` [PATCH 1/7 v3] vfs: implement open "forwarding" Miklos Szeredi
` (7 more replies)
0 siblings, 8 replies; 26+ messages in thread
From: Miklos Szeredi @ 2010-09-20 18:04 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro
Here's an updated patch series.
For now I reverted Neil's revalidation patch. Not requiring strict
read-only would make sense for just trying it out and experimenting.
But for real uses, I'm not sure...
Git tree is here:
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v3
Thanks,
Miklos
------------------------------------------------------------------------------
Changes from v2 to v3
- Minimal remount support. As overlayfs reflects the 'readonly'
mount status in write-access to the upper filesystem, we must
handle remount and either drop or take write access when the ro
status changes. (NeilBrown)
- Use correct seek function for directories. It is incorrect to call
generic_llseek_file on a file from a different filesystem. For
that we must use the seek function that the filesystem defines,
which is called by vfs_llseek. Also, we only want to seek the
realfile when is_real is true. Otherwise we just want to update
our own f_pos pointer, so use generic_llseek_file for
that. (NeilBrown)
- Initialise is_real before use. The previous patch can use
od->is_real before it is properly initialised is llseek is called
before readdir. So factor out the initialisation of is_real and
call it from both readdir and llseek when f_pos is 0. (NeilBrown)
- Rename ovl_fill_cache to ovl_dir_read (NeilBrown)
- Tiny optimisation in open_other handling (NeilBrown)
- Assorted updates to Documentation/filesystems/overlayfs.txt (NeilBrown)
- Make copy-up work for >=4G files, make it killable during copy-up.
Need to fix recovery after a failed/interrupted copy-up.
- Store and reference upper/lower dentries in overlay dentries.
Store and reference upper/lower vfsmounts in overlay superblock.
- Add necessary barriers for setting upper dentry in copyup and for
retrieving upper dentry locklessly.
- Make sure the right file is used for directory fsync() after
copy-up.
- Add locking to ovl_dir_llseek() to prevent concurrent call of
ovl_dir_reset() with ovl_dir_read().
- Get rid of ovl_dentry_iput(). The VFS doesn't provide enough
locking for this function that the contents of ->d_fsdata could be
safely updated.
- After copying up a non-directory unhash the dentry. This way the
lower dentry ref, which is no longer necessary, can go away. This
revealed a use-after-free bug in truncate handling in
fs/namei.c:finish_open().
- Fix if a copy-up happens between the follow_linka the put_link
calls.
- Replace some WARN_ONs with BUG_ON. Some things just _really_
shouldn't happen.
- Extract common code from ovl_unlink and ovl_rmdir to a helper
function.
- After unlink and rmdir unhash the dentry. This will get rid of the
lower and upper dentry references after there are no more users of
the deleted dentry. This is a safe replacement for the removed
->d_iput() functionality.
- Added checks to unlink, rmdir and rename to verify that the
parent-child relationship in the upper filesystem matches that of
the overlay. This is necessary to prevent crash and/or corruption
if the upper filesystem topology is being modified while part of
the overlay.
- Optimize checking whiteout and opaque attributes.
- Optimize copy-up on truncate: don't copy up whole file before
truncating
- Misc bug fixes
------------------------------------------------------------------------------
Changes from v1 to v2
- rename "hybrid union filesystem" to "overlay filesystem" or overlayfs
- added documentation written by Neil
- correct st_dev for directories (reported by Neil)
- use getattr() to get attributes from the underlying filesystems,
this means that now an overlay filesystem itself can be the lower,
read-only layer of another overlay
- listxattr filters out private extended attributes
- get write ref on the upper layer on mount unless the overlay
itself is mounted read-only
- raise capabilities for copy up, dealing with whiteouts and opaque
directories. Now the overlay works for non-root users as well
- "rm -rf" didn't work correctly in all cases if the directory was
copied up between opendir and the first readdir, this is now fixed
(and the directory operations consolidated)
- simplified copy up, this broke optimization for truncate and
open(O_TRUNC) (now file is copied up to be immediately truncated,
will fix)
- st_nlink for merged directories set to 1, this is an "illegal"
value that normal filesystems never have but some use it to
indicate that the number of subdirectories is unknown. Utilities
(find, ...) seem to tolerate this well.
- misc fixes I forgot about
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/7 v3] vfs: implement open "forwarding"
2010-09-20 18:04 [PATCH 0/7 v3] overlay filesystem prototype Miklos Szeredi
@ 2010-09-20 18:04 ` Miklos Szeredi
2010-09-20 18:04 ` [PATCH 2/7 v3] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
` (6 subsequent siblings)
7 siblings, 0 replies; 26+ messages in thread
From: Miklos Szeredi @ 2010-09-20 18:04 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro
[-- Attachment #1: vfs-open-redirect.patch --]
[-- Type: text/plain, Size: 2737 bytes --]
From: Miklos Szeredi <mszeredi@suse.cz>
Add a new file operation f_op->open_other(). This acts just like
f_op->open() except the return value can be another open struct file
pointer. In that case the original file is discarded and the
replacement file is used instead.
[NeilBrown]
If IS_ERR(ret), then ret != NULL, so if we are performing the second
test we don't need the first.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
fs/open.c | 23 +++++++++++++++++------
include/linux/fs.h | 1 +
2 files changed, 18 insertions(+), 6 deletions(-)
Index: linux-2.6/fs/open.c
===================================================================
--- linux-2.6.orig/fs/open.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/open.c 2010-09-20 13:26:53.000000000 +0200
@@ -657,6 +657,7 @@ static struct file *__dentry_open(struct
const struct cred *cred)
{
struct inode *inode;
+ struct file *ret;
int error;
f->f_mode = OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
@@ -664,6 +665,7 @@ static struct file *__dentry_open(struct
inode = dentry->d_inode;
if (f->f_mode & FMODE_WRITE) {
error = __get_file_write_access(inode, mnt);
+ ret = ERR_PTR(error);
if (error)
goto cleanup_file;
if (!special_file(inode->i_mode))
@@ -678,15 +680,24 @@ static struct file *__dentry_open(struct
file_sb_list_add(f, inode->i_sb);
error = security_dentry_open(f, cred);
+ ret = ERR_PTR(error);
if (error)
goto cleanup_all;
- if (!open && f->f_op)
- open = f->f_op->open;
- if (open) {
- error = open(inode, f);
- if (error)
+ if (!open && f->f_op && f->f_op->open_other) {
+ /* NULL means keep f, non-error non-null means replace */
+ ret = f->f_op->open_other(f);
+ if (ret)
goto cleanup_all;
+ } else {
+ if (!open && f->f_op)
+ open = f->f_op->open;
+ if (open) {
+ error = open(inode, f);
+ ret = ERR_PTR(error);
+ if (error)
+ goto cleanup_all;
+ }
}
ima_counts_get(f);
@@ -728,7 +739,7 @@ cleanup_file:
put_filp(f);
dput(dentry);
mntput(mnt);
- return ERR_PTR(error);
+ return ret;
}
/**
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/include/linux/fs.h 2010-09-20 13:26:34.000000000 +0200
@@ -1494,6 +1494,7 @@ struct file_operations {
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
+ struct file *(*open_other) (struct file *);
int (*flush) (struct file *, fl_owner_t id);
int (*release) (struct inode *, struct file *);
int (*fsync) (struct file *, int datasync);
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 2/7 v3] vfs: make i_op->permission take a dentry instead of an inode
2010-09-20 18:04 [PATCH 0/7 v3] overlay filesystem prototype Miklos Szeredi
2010-09-20 18:04 ` [PATCH 1/7 v3] vfs: implement open "forwarding" Miklos Szeredi
@ 2010-09-20 18:04 ` Miklos Szeredi
2010-09-20 18:04 ` [PATCH 3/7 v3] vfs: add flag to allow rename to same inode Miklos Szeredi
` (5 subsequent siblings)
7 siblings, 0 replies; 26+ messages in thread
From: Miklos Szeredi @ 2010-09-20 18:04 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro
[-- Attachment #1: vfs-permission-dentry.patch --]
[-- Type: text/plain, Size: 35440 bytes --]
From: Miklos Szeredi <mszeredi@suse.cz>
Like most other inode operations ->permission() should take a dentry
instead of an inode. This is necessary for filesystems which operate
on names not on inodes.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
fs/afs/internal.h | 2 +-
fs/afs/security.c | 3 ++-
fs/bad_inode.c | 2 +-
fs/btrfs/inode.c | 4 +++-
fs/btrfs/ioctl.c | 8 ++++----
fs/ceph/inode.c | 3 ++-
fs/ceph/super.h | 2 +-
fs/cifs/cifsfs.c | 3 ++-
fs/coda/dir.c | 3 ++-
fs/coda/pioctl.c | 4 ++--
fs/ecryptfs/inode.c | 4 ++--
fs/fuse/dir.c | 3 ++-
fs/gfs2/ops_inode.c | 11 ++++++++---
fs/hostfs/hostfs_kern.c | 3 ++-
fs/logfs/dir.c | 6 ------
fs/namei.c | 37 ++++++++++++++++++++-----------------
fs/namespace.c | 2 +-
fs/nfs/dir.c | 3 ++-
fs/nfsd/nfsfh.c | 2 +-
fs/nfsd/vfs.c | 4 ++--
fs/nilfs2/nilfs.h | 2 +-
fs/notify/fanotify/fanotify_user.c | 2 +-
fs/notify/inotify/inotify_user.c | 2 +-
fs/ocfs2/file.c | 3 ++-
fs/ocfs2/file.h | 2 +-
fs/ocfs2/refcounttree.c | 4 ++--
fs/open.c | 10 +++++-----
fs/proc/base.c | 3 ++-
fs/proc/proc_sysctl.c | 3 ++-
fs/reiserfs/xattr.c | 4 +++-
fs/smbfs/file.c | 4 ++--
fs/sysfs/inode.c | 3 ++-
fs/sysfs/sysfs.h | 2 +-
fs/utimes.c | 2 +-
fs/xattr.c | 12 +++++++-----
include/linux/coda_linux.h | 2 +-
include/linux/fs.h | 4 ++--
include/linux/nfs_fs.h | 2 +-
include/linux/reiserfs_xattr.h | 2 +-
ipc/mqueue.c | 2 +-
net/unix/af_unix.c | 2 +-
41 files changed, 100 insertions(+), 81 deletions(-)
Index: linux-2.6/fs/btrfs/ioctl.c
===================================================================
--- linux-2.6.orig/fs/btrfs/ioctl.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/btrfs/ioctl.c 2010-09-20 13:28:59.000000000 +0200
@@ -396,13 +396,13 @@ fail:
}
/* copy of may_create in fs/namei.c() */
-static inline int btrfs_may_create(struct inode *dir, struct dentry *child)
+static inline int btrfs_may_create(struct dentry *dir, struct dentry *child)
{
if (child->d_inode)
return -EEXIST;
- if (IS_DEADDIR(dir))
+ if (IS_DEADDIR(dir->d_inode))
return -ENOENT;
- return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+ return dentry_permission(dir, MAY_WRITE | MAY_EXEC);
}
/*
@@ -433,7 +433,7 @@ static noinline int btrfs_mksubvol(struc
if (error)
goto out_dput;
- error = btrfs_may_create(dir, dentry);
+ error = btrfs_may_create(parent->dentry, dentry);
if (error)
goto out_drop_write;
Index: linux-2.6/fs/ecryptfs/inode.c
===================================================================
--- linux-2.6.orig/fs/ecryptfs/inode.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/ecryptfs/inode.c 2010-09-20 13:28:59.000000000 +0200
@@ -975,9 +975,9 @@ int ecryptfs_truncate(struct dentry *den
}
static int
-ecryptfs_permission(struct inode *inode, int mask)
+ecryptfs_permission(struct dentry *dentry, int mask)
{
- return inode_permission(ecryptfs_inode_to_lower(inode), mask);
+ return dentry_permission(ecryptfs_dentry_to_lower(dentry), mask);
}
/**
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/namei.c 2010-09-20 13:28:59.000000000 +0200
@@ -240,17 +240,18 @@ int generic_permission(struct inode *ino
}
/**
- * inode_permission - check for access rights to a given inode
- * @inode: inode to check permission on
+ * dentry_permission - check for access rights to a given dentry
+ * @dentry: dentry to check permission on
* @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
*
- * Used to check for read/write/execute permissions on an inode.
+ * Used to check for read/write/execute permissions on an dentry.
* We use "fsuid" for this, letting us set arbitrary permissions
* for filesystem access without changing the "normal" uids which
* are used for other things.
*/
-int inode_permission(struct inode *inode, int mask)
+int dentry_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
int retval;
if (mask & MAY_WRITE) {
@@ -271,7 +272,7 @@ int inode_permission(struct inode *inode
}
if (inode->i_op->permission)
- retval = inode->i_op->permission(inode, mask);
+ retval = inode->i_op->permission(dentry, mask);
else
retval = generic_permission(inode, mask, inode->i_op->check_acl);
@@ -295,11 +296,11 @@ int inode_permission(struct inode *inode
*
* Note:
* Do not use this function in new code. All access checks should
- * be done using inode_permission().
+ * be done using dentry_permission().
*/
int file_permission(struct file *file, int mask)
{
- return inode_permission(file->f_path.dentry->d_inode, mask);
+ return dentry_permission(file->f_path.dentry, mask);
}
/*
@@ -459,12 +460,13 @@ force_reval_path(struct path *path, stru
* short-cut DAC fails, then call ->permission() to do more
* complete permission check.
*/
-static int exec_permission(struct inode *inode)
+static int exec_permission(struct dentry *dentry)
{
int ret;
+ struct inode *inode = dentry->d_inode;
if (inode->i_op->permission) {
- ret = inode->i_op->permission(inode, MAY_EXEC);
+ ret = inode->i_op->permission(dentry, MAY_EXEC);
if (!ret)
goto ok;
return ret;
@@ -837,7 +839,7 @@ static int link_path_walk(const char *na
unsigned int c;
nd->flags |= LOOKUP_CONTINUE;
- err = exec_permission(inode);
+ err = exec_permission(nd->path.dentry);
if (err)
break;
@@ -1163,7 +1165,7 @@ static struct dentry *lookup_hash(struct
{
int err;
- err = exec_permission(nd->path.dentry->d_inode);
+ err = exec_permission(nd->path.dentry);
if (err)
return ERR_PTR(err);
return __lookup_hash(&nd->last, nd->path.dentry, nd);
@@ -1213,7 +1215,7 @@ struct dentry *lookup_one_len(const char
if (err)
return ERR_PTR(err);
- err = exec_permission(base->d_inode);
+ err = exec_permission(base);
if (err)
return ERR_PTR(err);
return __lookup_hash(&this, base, NULL);
@@ -1301,7 +1303,7 @@ static int may_delete(struct inode *dir,
BUG_ON(victim->d_parent->d_inode != dir);
audit_inode_child(victim, dir);
- error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+ error = dentry_permission(victim->d_parent, MAY_WRITE | MAY_EXEC);
if (error)
return error;
if (IS_APPEND(dir))
@@ -1337,7 +1339,8 @@ static inline int may_create(struct inod
return -EEXIST;
if (IS_DEADDIR(dir))
return -ENOENT;
- return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+ BUG_ON(child->d_parent->d_inode != dir);
+ return dentry_permission(child->d_parent, MAY_WRITE | MAY_EXEC);
}
/*
@@ -1430,7 +1433,7 @@ int may_open(struct path *path, int acc_
break;
}
- error = inode_permission(inode, acc_mode);
+ error = dentry_permission(dentry, acc_mode);
if (error)
return error;
@@ -2545,7 +2548,7 @@ static int vfs_rename_dir(struct inode *
* we'll need to flip '..'.
*/
if (new_dir != old_dir) {
- error = inode_permission(old_dentry->d_inode, MAY_WRITE);
+ error = dentry_permission(old_dentry, MAY_WRITE);
if (error)
return error;
}
@@ -2900,7 +2903,7 @@ EXPORT_SYMBOL(page_symlink_inode_operati
EXPORT_SYMBOL(path_lookup);
EXPORT_SYMBOL(kern_path);
EXPORT_SYMBOL(vfs_path_lookup);
-EXPORT_SYMBOL(inode_permission);
+EXPORT_SYMBOL(dentry_permission);
EXPORT_SYMBOL(file_permission);
EXPORT_SYMBOL(unlock_rename);
EXPORT_SYMBOL(vfs_create);
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/namespace.c 2010-09-20 13:28:59.000000000 +0200
@@ -1230,7 +1230,7 @@ static int mount_is_safe(struct path *pa
if (current_uid() != path->dentry->d_inode->i_uid)
return -EPERM;
}
- if (inode_permission(path->dentry->d_inode, MAY_WRITE))
+ if (dentry_permission(path->dentry, MAY_WRITE))
return -EPERM;
return 0;
#endif
Index: linux-2.6/fs/nfsd/nfsfh.c
===================================================================
--- linux-2.6.orig/fs/nfsd/nfsfh.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/nfsd/nfsfh.c 2010-09-20 13:28:59.000000000 +0200
@@ -38,7 +38,7 @@ static int nfsd_acceptable(void *expv, s
/* make sure parents give x permission to user */
int err;
parent = dget_parent(tdentry);
- err = inode_permission(parent->d_inode, MAY_EXEC);
+ err = dentry_permission(parent, MAY_EXEC);
if (err < 0) {
dput(parent);
break;
Index: linux-2.6/fs/nfsd/vfs.c
===================================================================
--- linux-2.6.orig/fs/nfsd/vfs.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/nfsd/vfs.c 2010-09-20 13:28:59.000000000 +0200
@@ -2126,12 +2126,12 @@ nfsd_permission(struct svc_rqst *rqstp,
return 0;
/* This assumes NFSD_MAY_{READ,WRITE,EXEC} == MAY_{READ,WRITE,EXEC} */
- err = inode_permission(inode, acc & (MAY_READ|MAY_WRITE|MAY_EXEC));
+ err = dentry_permission(dentry, acc & (MAY_READ|MAY_WRITE|MAY_EXEC));
/* Allow read access to binaries even when mode 111 */
if (err == -EACCES && S_ISREG(inode->i_mode) &&
acc == (NFSD_MAY_READ | NFSD_MAY_OWNER_OVERRIDE))
- err = inode_permission(inode, MAY_EXEC);
+ err = dentry_permission(dentry, MAY_EXEC);
return err? nfserrno(err) : 0;
}
Index: linux-2.6/fs/notify/fanotify/fanotify_user.c
===================================================================
--- linux-2.6.orig/fs/notify/fanotify/fanotify_user.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/notify/fanotify/fanotify_user.c 2010-09-20 13:28:59.000000000 +0200
@@ -481,7 +481,7 @@ static int fanotify_find_path(int dfd, c
}
/* you can only watch an inode if you have read permissions on it */
- ret = inode_permission(path->dentry->d_inode, MAY_READ);
+ ret = dentry_permission(path->dentry, MAY_READ);
if (ret)
path_put(path);
out:
Index: linux-2.6/fs/notify/inotify/inotify_user.c
===================================================================
--- linux-2.6.orig/fs/notify/inotify/inotify_user.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/notify/inotify/inotify_user.c 2010-09-20 13:28:59.000000000 +0200
@@ -358,7 +358,7 @@ static int inotify_find_inode(const char
if (error)
return error;
/* you can only watch an inode if you have read permissions on it */
- error = inode_permission(path->dentry->d_inode, MAY_READ);
+ error = dentry_permission(path->dentry, MAY_READ);
if (error)
path_put(path);
return error;
Index: linux-2.6/fs/ocfs2/refcounttree.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/refcounttree.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/ocfs2/refcounttree.c 2010-09-20 13:28:59.000000000 +0200
@@ -4323,7 +4323,7 @@ static inline int ocfs2_may_create(struc
return -EEXIST;
if (IS_DEADDIR(dir))
return -ENOENT;
- return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+ return dentry_permission(child->d_parent, MAY_WRITE | MAY_EXEC);
}
/* copied from user_path_parent. */
@@ -4396,7 +4396,7 @@ static int ocfs2_vfs_reflink(struct dent
* file.
*/
if (!preserve) {
- error = inode_permission(inode, MAY_READ);
+ error = dentry_permission(old_dentry, MAY_READ);
if (error)
return error;
}
Index: linux-2.6/fs/open.c
===================================================================
--- linux-2.6.orig/fs/open.c 2010-09-20 13:26:53.000000000 +0200
+++ linux-2.6/fs/open.c 2010-09-20 13:28:59.000000000 +0200
@@ -89,7 +89,7 @@ static long do_sys_truncate(const char _
if (error)
goto dput_and_out;
- error = inode_permission(inode, MAY_WRITE);
+ error = dentry_permission(path.dentry, MAY_WRITE);
if (error)
goto mnt_drop_write_and_out;
@@ -328,7 +328,7 @@ SYSCALL_DEFINE3(faccessat, int, dfd, con
goto out_path_release;
}
- res = inode_permission(inode, mode | MAY_ACCESS);
+ res = dentry_permission(path.dentry, mode | MAY_ACCESS);
/* SuS v2 requires we report a read only fs too */
if (res || !(mode & S_IWOTH) || special_file(inode->i_mode))
goto out_path_release;
@@ -367,7 +367,7 @@ SYSCALL_DEFINE1(chdir, const char __user
if (error)
goto out;
- error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+ error = dentry_permission(path.dentry, MAY_EXEC | MAY_CHDIR);
if (error)
goto dput_and_out;
@@ -396,7 +396,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd
if (!S_ISDIR(inode->i_mode))
goto out_putf;
- error = inode_permission(inode, MAY_EXEC | MAY_CHDIR);
+ error = dentry_permission(file->f_path.dentry, MAY_EXEC | MAY_CHDIR);
if (!error)
set_fs_pwd(current->fs, &file->f_path);
out_putf:
@@ -414,7 +414,7 @@ SYSCALL_DEFINE1(chroot, const char __use
if (error)
goto out;
- error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+ error = dentry_permission(path.dentry, MAY_EXEC | MAY_CHDIR);
if (error)
goto dput_and_out;
Index: linux-2.6/fs/utimes.c
===================================================================
--- linux-2.6.orig/fs/utimes.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/utimes.c 2010-09-20 13:28:59.000000000 +0200
@@ -96,7 +96,7 @@ static int utimes_common(struct path *pa
goto mnt_drop_write_and_out;
if (!is_owner_or_cap(inode)) {
- error = inode_permission(inode, MAY_WRITE);
+ error = dentry_permission(path->dentry, MAY_WRITE);
if (error)
goto mnt_drop_write_and_out;
}
Index: linux-2.6/fs/xattr.c
===================================================================
--- linux-2.6.orig/fs/xattr.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/xattr.c 2010-09-20 13:28:59.000000000 +0200
@@ -26,8 +26,10 @@
* because different namespaces have very different rules.
*/
static int
-xattr_permission(struct inode *inode, const char *name, int mask)
+xattr_permission(struct dentry *dentry, const char *name, int mask)
{
+ struct inode *inode = dentry->d_inode;
+
/*
* We can never set or remove an extended attribute on a read-only
* filesystem or on an immutable / append-only inode.
@@ -63,7 +65,7 @@ xattr_permission(struct inode *inode, co
return -EPERM;
}
- return inode_permission(inode, mask);
+ return dentry_permission(dentry, mask);
}
/**
@@ -115,7 +117,7 @@ vfs_setxattr(struct dentry *dentry, cons
struct inode *inode = dentry->d_inode;
int error;
- error = xattr_permission(inode, name, MAY_WRITE);
+ error = xattr_permission(dentry, name, MAY_WRITE);
if (error)
return error;
@@ -165,7 +167,7 @@ vfs_getxattr(struct dentry *dentry, cons
struct inode *inode = dentry->d_inode;
int error;
- error = xattr_permission(inode, name, MAY_READ);
+ error = xattr_permission(dentry, name, MAY_READ);
if (error)
return error;
@@ -224,7 +226,7 @@ vfs_removexattr(struct dentry *dentry, c
if (!inode->i_op->removexattr)
return -EOPNOTSUPP;
- error = xattr_permission(inode, name, MAY_WRITE);
+ error = xattr_permission(dentry, name, MAY_WRITE);
if (error)
return error;
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h 2010-09-20 13:26:34.000000000 +0200
+++ linux-2.6/include/linux/fs.h 2010-09-20 13:28:59.000000000 +0200
@@ -1525,7 +1525,7 @@ struct inode_operations {
void * (*follow_link) (struct dentry *, struct nameidata *);
void (*put_link) (struct dentry *, struct nameidata *, void *);
void (*truncate) (struct inode *);
- int (*permission) (struct inode *, int);
+ int (*permission) (struct dentry *, int);
int (*check_acl)(struct inode *, int);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
@@ -2111,7 +2111,7 @@ extern void emergency_remount(void);
extern sector_t bmap(struct inode *, sector_t);
#endif
extern int notify_change(struct dentry *, struct iattr *);
-extern int inode_permission(struct inode *, int);
+extern int dentry_permission(struct dentry *, int);
extern int generic_permission(struct inode *, int,
int (*check_acl)(struct inode *, int));
Index: linux-2.6/ipc/mqueue.c
===================================================================
--- linux-2.6.orig/ipc/mqueue.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/ipc/mqueue.c 2010-09-20 13:28:59.000000000 +0200
@@ -656,7 +656,7 @@ static struct file *do_open(struct ipc_n
goto err;
}
- if (inode_permission(dentry->d_inode, oflag2acc[oflag & O_ACCMODE])) {
+ if (dentry_permission(dentry, oflag2acc[oflag & O_ACCMODE])) {
ret = -EACCES;
goto err;
}
Index: linux-2.6/net/unix/af_unix.c
===================================================================
--- linux-2.6.orig/net/unix/af_unix.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/net/unix/af_unix.c 2010-09-20 13:28:59.000000000 +0200
@@ -757,7 +757,7 @@ static struct sock *unix_find_other(stru
if (err)
goto fail;
inode = path.dentry->d_inode;
- err = inode_permission(inode, MAY_WRITE);
+ err = dentry_permission(path.dentry, MAY_WRITE);
if (err)
goto put_fail;
Index: linux-2.6/fs/afs/internal.h
===================================================================
--- linux-2.6.orig/fs/afs/internal.h 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/afs/internal.h 2010-09-20 13:28:59.000000000 +0200
@@ -624,7 +624,7 @@ extern void afs_clear_permits(struct afs
extern void afs_cache_permit(struct afs_vnode *, struct key *, long);
extern void afs_zap_permits(struct rcu_head *);
extern struct key *afs_request_key(struct afs_cell *);
-extern int afs_permission(struct inode *, int);
+extern int afs_permission(struct dentry *, int);
/*
* server.c
Index: linux-2.6/fs/afs/security.c
===================================================================
--- linux-2.6.orig/fs/afs/security.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/afs/security.c 2010-09-20 13:28:59.000000000 +0200
@@ -285,8 +285,9 @@ static int afs_check_permit(struct afs_v
* - AFS ACLs are attached to directories only, and a file is controlled by its
* parent directory's ACL
*/
-int afs_permission(struct inode *inode, int mask)
+int afs_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
struct afs_vnode *vnode = AFS_FS_I(inode);
afs_access_t uninitialized_var(access);
struct key *key;
Index: linux-2.6/fs/bad_inode.c
===================================================================
--- linux-2.6.orig/fs/bad_inode.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/bad_inode.c 2010-09-20 13:28:59.000000000 +0200
@@ -229,7 +229,7 @@ static int bad_inode_readlink(struct den
return -EIO;
}
-static int bad_inode_permission(struct inode *inode, int mask)
+static int bad_inode_permission(struct dentry *dentry, int mask)
{
return -EIO;
}
Index: linux-2.6/fs/btrfs/inode.c
===================================================================
--- linux-2.6.orig/fs/btrfs/inode.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/btrfs/inode.c 2010-09-20 13:28:59.000000000 +0200
@@ -6922,8 +6922,10 @@ static int btrfs_set_page_dirty(struct p
return __set_page_dirty_nobuffers(page);
}
-static int btrfs_permission(struct inode *inode, int mask)
+static int btrfs_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
+
if ((BTRFS_I(inode)->flags & BTRFS_INODE_READONLY) && (mask & MAY_WRITE))
return -EACCES;
return generic_permission(inode, mask, btrfs_check_acl);
Index: linux-2.6/fs/ceph/inode.c
===================================================================
--- linux-2.6.orig/fs/ceph/inode.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/ceph/inode.c 2010-09-20 13:28:59.000000000 +0200
@@ -1758,8 +1758,9 @@ int ceph_do_getattr(struct inode *inode,
* Check inode permissions. We verify we have a valid value for
* the AUTH cap, then call the generic handler.
*/
-int ceph_permission(struct inode *inode, int mask)
+int ceph_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
int err = ceph_do_getattr(inode, CEPH_CAP_AUTH_SHARED);
if (!err)
Index: linux-2.6/fs/ceph/super.h
===================================================================
--- linux-2.6.orig/fs/ceph/super.h 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/ceph/super.h 2010-09-20 13:28:59.000000000 +0200
@@ -779,7 +779,7 @@ extern void ceph_queue_invalidate(struct
extern void ceph_queue_writeback(struct inode *inode);
extern int ceph_do_getattr(struct inode *inode, int mask);
-extern int ceph_permission(struct inode *inode, int mask);
+extern int ceph_permission(struct dentry *dentry, int mask);
extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
extern int ceph_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat);
Index: linux-2.6/fs/cifs/cifsfs.c
===================================================================
--- linux-2.6.orig/fs/cifs/cifsfs.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/cifs/cifsfs.c 2010-09-20 13:28:59.000000000 +0200
@@ -269,8 +269,9 @@ cifs_statfs(struct dentry *dentry, struc
return 0;
}
-static int cifs_permission(struct inode *inode, int mask)
+static int cifs_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
struct cifs_sb_info *cifs_sb;
cifs_sb = CIFS_SB(inode->i_sb);
Index: linux-2.6/fs/coda/dir.c
===================================================================
--- linux-2.6.orig/fs/coda/dir.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/coda/dir.c 2010-09-20 13:28:59.000000000 +0200
@@ -138,8 +138,9 @@ exit:
}
-int coda_permission(struct inode *inode, int mask)
+int coda_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
int error = 0;
mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
Index: linux-2.6/fs/fuse/dir.c
===================================================================
--- linux-2.6.orig/fs/fuse/dir.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/fuse/dir.c 2010-09-20 13:28:59.000000000 +0200
@@ -981,8 +981,9 @@ static int fuse_access(struct inode *ino
* access request is sent. Execute permission is still checked
* locally based on file mode.
*/
-static int fuse_permission(struct inode *inode, int mask)
+static int fuse_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
struct fuse_conn *fc = get_fuse_conn(inode);
bool refreshed = false;
int err = 0;
Index: linux-2.6/fs/gfs2/ops_inode.c
===================================================================
--- linux-2.6.orig/fs/gfs2/ops_inode.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/gfs2/ops_inode.c 2010-09-20 13:28:59.000000000 +0200
@@ -1071,6 +1071,11 @@ int gfs2_permission(struct inode *inode,
return error;
}
+static int gfs2_dentry_permission(struct dentry *dentry, int mask)
+{
+ return gfs2_permission(dentry->d_inode, mask);
+}
+
/*
* XXX(truncate): the truncate_setsize calls should be moved to the end.
*/
@@ -1344,7 +1349,7 @@ out:
}
const struct inode_operations gfs2_file_iops = {
- .permission = gfs2_permission,
+ .permission = gfs2_dentry_permission,
.setattr = gfs2_setattr,
.getattr = gfs2_getattr,
.setxattr = gfs2_setxattr,
@@ -1364,7 +1369,7 @@ const struct inode_operations gfs2_dir_i
.rmdir = gfs2_rmdir,
.mknod = gfs2_mknod,
.rename = gfs2_rename,
- .permission = gfs2_permission,
+ .permission = gfs2_dentry_permission,
.setattr = gfs2_setattr,
.getattr = gfs2_getattr,
.setxattr = gfs2_setxattr,
@@ -1378,7 +1383,7 @@ const struct inode_operations gfs2_symli
.readlink = generic_readlink,
.follow_link = gfs2_follow_link,
.put_link = gfs2_put_link,
- .permission = gfs2_permission,
+ .permission = gfs2_dentry_permission,
.setattr = gfs2_setattr,
.getattr = gfs2_getattr,
.setxattr = gfs2_setxattr,
Index: linux-2.6/fs/coda/pioctl.c
===================================================================
--- linux-2.6.orig/fs/coda/pioctl.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/coda/pioctl.c 2010-09-20 13:28:59.000000000 +0200
@@ -26,7 +26,7 @@
#include <linux/smp_lock.h>
/* pioctl ops */
-static int coda_ioctl_permission(struct inode *inode, int mask);
+static int coda_ioctl_permission(struct dentry *dentry, int mask);
static long coda_pioctl(struct file *filp, unsigned int cmd,
unsigned long user_data);
@@ -42,7 +42,7 @@ const struct file_operations coda_ioctl_
};
/* the coda pioctl inode ops */
-static int coda_ioctl_permission(struct inode *inode, int mask)
+static int coda_ioctl_permission(struct dentry *dentry, int mask)
{
return (mask & MAY_EXEC) ? -EACCES : 0;
}
Index: linux-2.6/fs/hostfs/hostfs_kern.c
===================================================================
--- linux-2.6.orig/fs/hostfs/hostfs_kern.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/hostfs/hostfs_kern.c 2010-09-20 13:28:59.000000000 +0200
@@ -746,8 +746,9 @@ int hostfs_rename(struct inode *from_ino
return err;
}
-int hostfs_permission(struct inode *ino, int desired)
+static int hostfs_permission(struct dentry *dentry, int desired)
{
+ struct inode *ino = dentry->d_inode;
char *name;
int r = 0, w = 0, x = 0, err;
Index: linux-2.6/fs/logfs/dir.c
===================================================================
--- linux-2.6.orig/fs/logfs/dir.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/logfs/dir.c 2010-09-20 13:28:59.000000000 +0200
@@ -555,11 +555,6 @@ static int logfs_symlink(struct inode *d
return __logfs_create(dir, dentry, inode, target, destlen);
}
-static int logfs_permission(struct inode *inode, int mask)
-{
- return generic_permission(inode, mask, NULL);
-}
-
static int logfs_link(struct dentry *old_dentry, struct inode *dir,
struct dentry *dentry)
{
@@ -818,7 +813,6 @@ const struct inode_operations logfs_dir_
.mknod = logfs_mknod,
.rename = logfs_rename,
.rmdir = logfs_rmdir,
- .permission = logfs_permission,
.symlink = logfs_symlink,
.unlink = logfs_unlink,
};
Index: linux-2.6/fs/nfs/dir.c
===================================================================
--- linux-2.6.orig/fs/nfs/dir.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/nfs/dir.c 2010-09-20 13:28:59.000000000 +0200
@@ -1941,8 +1941,9 @@ int nfs_may_open(struct inode *inode, st
return nfs_do_access(inode, cred, nfs_open_permission_mask(openflags));
}
-int nfs_permission(struct inode *inode, int mask)
+int nfs_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
struct rpc_cred *cred;
int res = 0;
Index: linux-2.6/fs/nilfs2/nilfs.h
===================================================================
--- linux-2.6.orig/fs/nilfs2/nilfs.h 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/nilfs2/nilfs.h 2010-09-20 13:28:59.000000000 +0200
@@ -200,7 +200,7 @@ static inline struct inode *nilfs_dat_in
*/
#ifdef CONFIG_NILFS_POSIX_ACL
#error "NILFS: not yet supported POSIX ACL"
-extern int nilfs_permission(struct inode *, int, struct nameidata *);
+extern int nilfs_permission(struct dentry *, int);
extern int nilfs_acl_chmod(struct inode *);
extern int nilfs_init_acl(struct inode *, struct inode *);
#else
Index: linux-2.6/fs/ocfs2/file.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/file.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/ocfs2/file.c 2010-09-20 13:28:59.000000000 +0200
@@ -1319,8 +1319,9 @@ bail:
return err;
}
-int ocfs2_permission(struct inode *inode, int mask)
+int ocfs2_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
int ret;
mlog_entry_void();
Index: linux-2.6/fs/ocfs2/file.h
===================================================================
--- linux-2.6.orig/fs/ocfs2/file.h 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/ocfs2/file.h 2010-09-20 13:28:59.000000000 +0200
@@ -61,7 +61,7 @@ int ocfs2_zero_extend(struct inode *inod
int ocfs2_setattr(struct dentry *dentry, struct iattr *attr);
int ocfs2_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat);
-int ocfs2_permission(struct inode *inode, int mask);
+int ocfs2_permission(struct dentry *dentry, int mask);
int ocfs2_should_update_atime(struct inode *inode,
struct vfsmount *vfsmnt);
Index: linux-2.6/fs/proc/base.c
===================================================================
--- linux-2.6.orig/fs/proc/base.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/proc/base.c 2010-09-20 13:28:59.000000000 +0200
@@ -2050,8 +2050,9 @@ static const struct file_operations proc
* /proc/pid/fd needs a special permission handler so that a process can still
* access /proc/self/fd after it has executed a setuid().
*/
-static int proc_fd_permission(struct inode *inode, int mask)
+static int proc_fd_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
int rv;
rv = generic_permission(inode, mask, NULL);
Index: linux-2.6/fs/proc/proc_sysctl.c
===================================================================
--- linux-2.6.orig/fs/proc/proc_sysctl.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/proc/proc_sysctl.c 2010-09-20 13:28:59.000000000 +0200
@@ -292,12 +292,13 @@ out:
return ret;
}
-static int proc_sys_permission(struct inode *inode, int mask)
+static int proc_sys_permission(struct dentry *dentry, int mask)
{
/*
* sysctl entries that are not writeable,
* are _NOT_ writeable, capabilities or not.
*/
+ struct inode *inode = dentry->d_inode;
struct ctl_table_header *head;
struct ctl_table *table;
int error;
Index: linux-2.6/fs/reiserfs/xattr.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/xattr.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/reiserfs/xattr.c 2010-09-20 13:28:59.000000000 +0200
@@ -954,8 +954,10 @@ static int xattr_mount_check(struct supe
return 0;
}
-int reiserfs_permission(struct inode *inode, int mask)
+int reiserfs_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
+
/*
* We don't do permission checks on the internal objects.
* Permissions are determined by the "owning" object.
Index: linux-2.6/fs/smbfs/file.c
===================================================================
--- linux-2.6.orig/fs/smbfs/file.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/smbfs/file.c 2010-09-20 13:28:59.000000000 +0200
@@ -408,9 +408,9 @@ smb_file_release(struct inode *inode, st
* privileges, so we need our own check for this.
*/
static int
-smb_file_permission(struct inode *inode, int mask)
+smb_file_permission(struct dentry *dentry, int mask)
{
- int mode = inode->i_mode;
+ int mode = dentry->d_inode->i_mode;
int error = 0;
VERBOSE("mode=%x, mask=%x\n", mode, mask);
Index: linux-2.6/fs/sysfs/inode.c
===================================================================
--- linux-2.6.orig/fs/sysfs/inode.c 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/sysfs/inode.c 2010-09-20 13:28:59.000000000 +0200
@@ -348,8 +348,9 @@ int sysfs_hash_and_remove(struct sysfs_d
return -ENOENT;
}
-int sysfs_permission(struct inode *inode, int mask)
+int sysfs_permission(struct dentry *dentry, int mask)
{
+ struct inode *inode = dentry->d_inode;
struct sysfs_dirent *sd = inode->i_private;
mutex_lock(&sysfs_mutex);
Index: linux-2.6/fs/sysfs/sysfs.h
===================================================================
--- linux-2.6.orig/fs/sysfs/sysfs.h 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/sysfs/sysfs.h 2010-09-20 13:28:59.000000000 +0200
@@ -200,7 +200,7 @@ static inline void __sysfs_put(struct sy
struct inode *sysfs_get_inode(struct super_block *sb, struct sysfs_dirent *sd);
void sysfs_evict_inode(struct inode *inode);
int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
-int sysfs_permission(struct inode *inode, int mask);
+int sysfs_permission(struct dentry *dentry, int mask);
int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
Index: linux-2.6/include/linux/coda_linux.h
===================================================================
--- linux-2.6.orig/include/linux/coda_linux.h 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/include/linux/coda_linux.h 2010-09-20 13:28:59.000000000 +0200
@@ -37,7 +37,7 @@ extern const struct file_operations coda
/* operations shared over more than one file */
int coda_open(struct inode *i, struct file *f);
int coda_release(struct inode *i, struct file *f);
-int coda_permission(struct inode *inode, int mask);
+int coda_permission(struct dentry *dentry, int mask);
int coda_revalidate_inode(struct dentry *);
int coda_getattr(struct vfsmount *, struct dentry *, struct kstat *);
int coda_setattr(struct dentry *, struct iattr *);
Index: linux-2.6/include/linux/nfs_fs.h
===================================================================
--- linux-2.6.orig/include/linux/nfs_fs.h 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/include/linux/nfs_fs.h 2010-09-20 13:28:59.000000000 +0200
@@ -348,7 +348,7 @@ extern int nfs_refresh_inode(struct inod
extern int nfs_post_op_update_inode(struct inode *inode, struct nfs_fattr *fattr);
extern int nfs_post_op_update_inode_force_wcc(struct inode *inode, struct nfs_fattr *fattr);
extern int nfs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
-extern int nfs_permission(struct inode *, int);
+extern int nfs_permission(struct dentry *, int);
extern int nfs_open(struct inode *, struct file *);
extern int nfs_release(struct inode *, struct file *);
extern int nfs_attribute_timeout(struct inode *inode);
Index: linux-2.6/include/linux/reiserfs_xattr.h
===================================================================
--- linux-2.6.orig/include/linux/reiserfs_xattr.h 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/include/linux/reiserfs_xattr.h 2010-09-20 13:28:59.000000000 +0200
@@ -41,7 +41,7 @@ int reiserfs_xattr_init(struct super_blo
int reiserfs_lookup_privroot(struct super_block *sb);
int reiserfs_delete_xattrs(struct inode *inode);
int reiserfs_chown_xattrs(struct inode *inode, struct iattr *attrs);
-int reiserfs_permission(struct inode *inode, int mask);
+int reiserfs_permission(struct dentry *dentry, int mask);
#ifdef CONFIG_REISERFS_FS_XATTR
#define has_xattr_dir(inode) (REISERFS_I(inode)->i_flags & i_has_xattr_dir)
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 3/7 v3] vfs: add flag to allow rename to same inode
2010-09-20 18:04 [PATCH 0/7 v3] overlay filesystem prototype Miklos Szeredi
2010-09-20 18:04 ` [PATCH 1/7 v3] vfs: implement open "forwarding" Miklos Szeredi
2010-09-20 18:04 ` [PATCH 2/7 v3] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
@ 2010-09-20 18:04 ` Miklos Szeredi
2010-09-23 22:04 ` Valerie Aurora
2010-09-20 18:04 ` [PATCH 4/7 v3] vfs: export do_splice_direct() to modules Miklos Szeredi
` (4 subsequent siblings)
7 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2010-09-20 18:04 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro
[-- Attachment #1: vfs-fs_rename_self_allow.patch --]
[-- Type: text/plain, Size: 1522 bytes --]
From: Miklos Szeredi <mszeredi@suse.cz>
The overlay filesystem uses dummy inodes for non-directories. Allow
rename to work in this case despite the inode being the same.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
fs/namei.c | 4 +++-
include/linux/fs.h | 1 +
2 files changed, 4 insertions(+), 1 deletion(-)
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h 2010-08-25 14:19:34.000000000 +0200
+++ linux-2.6/include/linux/fs.h 2010-08-25 14:19:53.000000000 +0200
@@ -179,6 +179,7 @@ struct inodes_stat_t {
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move()
* during rename() internally.
*/
+#define FS_RENAME_SELF_ALLOW 65536 /* Allow rename to same inode */
/*
* These are the fs-independent mount-flags: up to 32 flags are supported
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c 2010-08-25 10:19:53.000000000 +0200
+++ linux-2.6/fs/namei.c 2010-08-25 14:22:56.000000000 +0200
@@ -2620,8 +2620,10 @@ int vfs_rename(struct inode *old_dir, st
int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
const unsigned char *old_name;
- if (old_dentry->d_inode == new_dentry->d_inode)
+ if (old_dentry->d_inode == new_dentry->d_inode &&
+ !(old_dir->i_sb->s_type->fs_flags & FS_RENAME_SELF_ALLOW)) {
return 0;
+ }
error = may_delete(old_dir, old_dentry, is_dir);
if (error)
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 4/7 v3] vfs: export do_splice_direct() to modules
2010-09-20 18:04 [PATCH 0/7 v3] overlay filesystem prototype Miklos Szeredi
` (2 preceding siblings ...)
2010-09-20 18:04 ` [PATCH 3/7 v3] vfs: add flag to allow rename to same inode Miklos Szeredi
@ 2010-09-20 18:04 ` Miklos Szeredi
2010-09-20 18:04 ` [PATCH 5/7 v3] vfs: fix possible use after free in finish_open() Miklos Szeredi
` (3 subsequent siblings)
7 siblings, 0 replies; 26+ messages in thread
From: Miklos Szeredi @ 2010-09-20 18:04 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro
[-- Attachment #1: vfs-export-do_splice_direct.patch --]
[-- Type: text/plain, Size: 673 bytes --]
From: Miklos Szeredi <mszeredi@suse.cz>
Export do_splice_direct() to modules. Needed by overlay filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
fs/splice.c | 1 +
1 file changed, 1 insertion(+)
Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c 2010-08-13 16:07:00.000000000 +0200
+++ linux-2.6/fs/splice.c 2010-08-25 18:59:08.000000000 +0200
@@ -1307,6 +1307,7 @@ long do_splice_direct(struct file *in, l
return ret;
}
+EXPORT_SYMBOL(do_splice_direct);
static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
struct pipe_inode_info *opipe,
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 5/7 v3] vfs: fix possible use after free in finish_open()
2010-09-20 18:04 [PATCH 0/7 v3] overlay filesystem prototype Miklos Szeredi
` (3 preceding siblings ...)
2010-09-20 18:04 ` [PATCH 4/7 v3] vfs: export do_splice_direct() to modules Miklos Szeredi
@ 2010-09-20 18:04 ` Miklos Szeredi
2010-09-23 20:19 ` Valerie Aurora
2010-09-20 18:04 ` [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype Miklos Szeredi
` (2 subsequent siblings)
7 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2010-09-20 18:04 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro, stable
[-- Attachment #1: vfs-open-truncate-fix.patch --]
[-- Type: text/plain, Size: 1243 bytes --]
From: Miklos Szeredi <mszeredi@suse.cz>
If open(O_TRUNC) is called and the actual open fails, then nd->path
will be released by nameidata_to_filp(). If this races with an
unmount then mnt_drop_write() can Oops.
Fix by acquiring a ref to nd->path and releasing after
mnt_drop_write().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
---
fs/namei.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c 2010-09-20 13:32:35.000000000 +0200
+++ linux-2.6/fs/namei.c 2010-09-20 13:33:14.000000000 +0200
@@ -1559,6 +1559,11 @@ static struct file *finish_open(struct n
mnt_drop_write(nd->path.mnt);
goto exit;
}
+ if (will_truncate) {
+ /* nameidata_to_filp() puts nd->path! */
+ path_get(&nd->path);
+ }
+
filp = nameidata_to_filp(nd);
if (!IS_ERR(filp)) {
error = ima_file_check(filp, acc_mode);
@@ -1581,8 +1586,10 @@ static struct file *finish_open(struct n
* because the filp has had a write taken
* on its behalf.
*/
- if (will_truncate)
+ if (will_truncate) {
mnt_drop_write(nd->path.mnt);
+ path_put(&nd->path);
+ }
return filp;
exit:
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-20 18:04 [PATCH 0/7 v3] overlay filesystem prototype Miklos Szeredi
` (4 preceding siblings ...)
2010-09-20 18:04 ` [PATCH 5/7 v3] vfs: fix possible use after free in finish_open() Miklos Szeredi
@ 2010-09-20 18:04 ` Miklos Szeredi
2010-09-22 23:21 ` Valerie Aurora
2010-09-24 17:56 ` Valerie Aurora
2010-09-20 18:04 ` [PATCH 7/7 v3] overlay: overlay filesystem documentation Miklos Szeredi
2010-09-21 1:31 ` [PATCH 0/7 v3] overlay filesystem prototype Neil Brown
7 siblings, 2 replies; 26+ messages in thread
From: Miklos Szeredi @ 2010-09-20 18:04 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro
[-- Attachment #1: overlayfs.patch --]
[-- Type: text/plain, Size: 52385 bytes --]
From: Miklos Szeredi <mszeredi@suse.cz>
This overlay filesystem is a hybrid of entirely filesystem based
(unionfs, aufs) and entierly VFS based (union mounts) solutions.
The dentry tree is duplicated from the underlying filesystems, this
enables fast cached lookups without adding special support into the
VFS. This uses slightly more memory than union mounts, but dentries
are relatively small.
Inode structures are only duplicated for directories. Regular files,
symlinks and special files each share a single inode. This means that
locking victim for unlink is a quasi-filesystem lock, which is
suboptimal, but could be worked around in the VFS.
Opening non directories results in the open forwarded to the
underlying filesystem. This makes the behavior very similar to union
mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
descriptors).
Usage:
mount -t overlay -olowerdir=/lower,upperdir=/upper overlay /mnt
Supported:
- all operations
Missing:
- ensure that filesystems part of the overlay are not modified outside
the overlay
- optimize directory merging and caching
[NeilBrown]
- minimal remount support
- use correct seek function for directories
- initialise is_real before use
- rename ovl_fill_cache to ovl_dir_read
- add initial revalidate support
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
fs/Kconfig | 1
fs/Makefile | 1
fs/overlayfs/Kconfig | 4
fs/overlayfs/Makefile | 5
fs/overlayfs/overlayfs.c | 2154 +++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 2165 insertions(+)
Index: linux-2.6/fs/overlayfs/overlayfs.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/fs/overlayfs/overlayfs.c 2010-09-20 15:15:50.000000000 +0200
@@ -0,0 +1,2154 @@
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/sched.h>
+#include <linux/fs_struct.h>
+#include <linux/file.h>
+#include <linux/xattr.h>
+#include <linux/security.h>
+#include <linux/mount.h>
+#include <linux/splice.h>
+#include <linux/slab.h>
+#include <linux/parser.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+
+MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
+MODULE_DESCRIPTION("Overlay filesystem");
+MODULE_LICENSE("GPL");
+
+#define OVL_COPY_UP_CHUNK_SIZE (1 << 20)
+
+struct ovl_fs {
+ struct inode *symlink_inode;
+ struct inode *regular_inode;
+ struct inode *special_inode;
+ struct vfsmount *upper_mnt;
+ struct vfsmount *lower_mnt;
+};
+
+struct ovl_entry {
+ struct dentry *__upperdentry;
+ struct dentry *lowerdentry;
+ bool opaque;
+};
+
+static const char *ovl_whiteout_xattr = "trusted.overlay.whiteout";
+static const char *ovl_opaque_xattr = "trusted.overlay.opaque";
+static const char *ovl_whiteout_symlink = "(overlay-whiteout)";
+
+enum ovl_path_type {
+ OVL_PATH_UPPER,
+ OVL_PATH_MERGE,
+ OVL_PATH_LOWER,
+};
+
+static enum ovl_path_type ovl_path_type(struct dentry *dentry)
+{
+ struct ovl_entry *oe = dentry->d_fsdata;
+
+ if (oe->__upperdentry) {
+ if (oe->lowerdentry && S_ISDIR(dentry->d_inode->i_mode))
+ return OVL_PATH_MERGE;
+ else
+ return OVL_PATH_UPPER;
+ } else {
+ return OVL_PATH_LOWER;
+ }
+}
+
+static struct dentry *ovl_upperdentry_dereference(struct ovl_entry *oe)
+{
+ struct dentry *upperdentry = ACCESS_ONCE(oe->__upperdentry);
+ smp_read_barrier_depends();
+ return upperdentry;
+}
+
+static void ovl_path_upper(struct dentry *dentry, struct path *path)
+{
+ struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+ struct ovl_entry *oe = dentry->d_fsdata;
+
+ path->mnt = ofs->upper_mnt;
+ path->dentry = ovl_upperdentry_dereference(oe);
+}
+
+static void ovl_path_lower(struct dentry *dentry, struct path *path)
+{
+ struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+ struct ovl_entry *oe = dentry->d_fsdata;
+
+ path->mnt = ofs->lower_mnt;
+ path->dentry = oe->lowerdentry;
+}
+
+static enum ovl_path_type ovl_path_real(struct dentry *dentry,
+ struct path *path)
+{
+
+ enum ovl_path_type type = ovl_path_type(dentry);
+
+ if (type == OVL_PATH_LOWER)
+ ovl_path_lower(dentry, path);
+ else
+ ovl_path_upper(dentry, path);
+
+ return type;
+}
+
+static struct dentry *ovl_dentry_upper(struct dentry *dentry)
+{
+ struct ovl_entry *oe = dentry->d_fsdata;
+
+ return ovl_upperdentry_dereference(oe);
+}
+
+static struct dentry *ovl_dentry_lower(struct dentry *dentry)
+{
+ struct ovl_entry *oe = dentry->d_fsdata;
+
+ return oe->lowerdentry;
+}
+
+static struct dentry *ovl_dentry_real(struct dentry *dentry)
+{
+ struct ovl_entry *oe = dentry->d_fsdata;
+ struct dentry *realdentry;
+
+ realdentry = ovl_upperdentry_dereference(oe);
+ if (!realdentry)
+ realdentry = oe->lowerdentry;
+
+ return realdentry;
+}
+
+static struct file *path_open(struct path *path, int flags)
+{
+ const struct cred *cred = current_cred();
+
+ path_get(path);
+ return dentry_open(path->dentry, path->mnt, flags, cred);
+}
+
+static bool ovl_dentry_is_opaque(struct dentry *dentry)
+{
+ struct ovl_entry *oe = dentry->d_fsdata;
+ return oe->opaque;
+}
+
+static void ovl_dentry_set_opaque(struct dentry *dentry, bool opaque)
+{
+ struct ovl_entry *oe = dentry->d_fsdata;
+ oe->opaque = opaque;
+}
+
+static void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
+{
+ struct ovl_entry *oe = dentry->d_fsdata;
+
+ WARN_ON(!mutex_is_locked(&upperdentry->d_parent->d_inode->i_mutex));
+ WARN_ON(oe->__upperdentry);
+ smp_wmb();
+ oe->__upperdentry = upperdentry;
+}
+
+static bool ovl_is_whiteout(struct dentry *dentry)
+{
+ int res;
+ char val;
+
+ if (!dentry)
+ return false;
+ if (!dentry->d_inode)
+ return false;
+ if (!S_ISLNK(dentry->d_inode->i_mode))
+ return false;
+
+ res = vfs_getxattr(dentry, ovl_whiteout_xattr, &val, 1);
+ if (res == 1 && val == 'y')
+ return true;
+
+ return false;
+}
+
+static bool ovl_is_opaquedir(struct dentry *dentry)
+{
+ int res;
+ char val;
+
+ if (!S_ISDIR(dentry->d_inode->i_mode))
+ return false;
+
+ res = vfs_getxattr(dentry, ovl_opaque_xattr, &val, 1);
+ if (res == 1 && val == 'y')
+ return true;
+
+ return false;
+}
+
+struct ovl_cache_entry {
+ struct ovl_cache_entry *next;
+ struct qstr name;
+ unsigned int type;
+ u64 ino;
+ bool is_whiteout;
+};
+
+struct ovl_readdir_data {
+ struct ovl_cache_entry *list;
+ struct ovl_cache_entry **endp;
+ struct dentry *dir;
+ int count;
+ int err;
+};
+
+struct ovl_dir_file {
+ bool is_real;
+ struct ovl_cache_entry *cache;
+ struct file *realfile;
+};
+
+static int ovl_cache_add_entry(struct ovl_readdir_data *rdd,
+ const char *name, int namelen, u64 ino,
+ unsigned int d_type, bool is_whiteout)
+{
+ struct ovl_cache_entry *p;
+
+ p = kmalloc(sizeof(*p), GFP_KERNEL);
+ if (!p)
+ return -ENOMEM;
+
+ p->name.name = kstrndup(name, namelen, GFP_KERNEL);
+ if (!p->name.name) {
+ kfree(p);
+ return -ENOMEM;
+ }
+ p->name.len = namelen;
+ p->name.hash = 0;
+ p->type = d_type;
+ p->ino = ino;
+ p->is_whiteout = is_whiteout;
+ p->next = NULL;
+ *rdd->endp = p;
+ rdd->endp = &p->next;
+
+ return 0;
+}
+
+static void ovl_cache_free(struct ovl_cache_entry *p)
+{
+ while (p) {
+ struct ovl_cache_entry *next = p->next;
+
+ kfree(p->name.name);
+ kfree(p);
+ p = next;
+ }
+}
+
+static int ovl_cache_find_entry(struct ovl_cache_entry *start,
+ const char *name, int namelen)
+{
+ struct ovl_cache_entry *p;
+ int ret = 0;
+
+ for (p = start; p; p = p->next) {
+ if (p->name.len != namelen)
+ continue;
+ if (strncmp(p->name.name, name, namelen) == 0) {
+ ret = 1;
+ break;
+ }
+ }
+
+ return ret;
+}
+
+static int ovl_fill_lower(void *buf, const char *name, int namlen,
+ loff_t offset, u64 ino, unsigned int d_type)
+{
+ struct ovl_readdir_data *rdd = buf;
+
+ rdd->count++;
+ if (!ovl_cache_find_entry(rdd->list, name, namlen))
+ rdd->err = ovl_cache_add_entry(rdd, name, namlen, ino, d_type, false);
+
+ return rdd->err;
+}
+
+static int ovl_fill_upper(void *buf, const char *name, int namlen,
+ loff_t offset, u64 ino, unsigned int d_type)
+{
+ struct ovl_readdir_data *rdd = buf;
+ bool is_whiteout = false;
+
+ rdd->count++;
+ if (d_type == DT_LNK) {
+ struct dentry *dentry;
+
+ dentry = lookup_one_len(name, rdd->dir, strlen(name));
+ if (IS_ERR(dentry)) {
+ rdd->err = PTR_ERR(dentry);
+ goto out;
+ }
+ is_whiteout = ovl_is_whiteout(dentry);
+ dput(dentry);
+ }
+
+ rdd->err = ovl_cache_add_entry(rdd, name, namlen, ino, d_type, is_whiteout);
+
+out:
+ return rdd->err;
+}
+
+static int ovl_dir_read(struct path *realpath, struct ovl_readdir_data *rdd,
+ filldir_t filler)
+{
+ const struct cred *old_cred;
+ struct cred *override_cred;
+ struct file *realfile;
+ int err;
+
+ realfile = path_open(realpath, O_RDONLY | O_DIRECTORY);
+ if (IS_ERR(realfile))
+ return PTR_ERR(realfile);
+
+ err = -ENOMEM;
+ override_cred = prepare_creds();
+ if (override_cred) {
+ /*
+ * CAP_SYS_ADMIN for getxattr
+ * CAP_DAC_OVERRIDE for lookup and unlink
+ */
+ cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+ cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
+ old_cred = override_creds(override_cred);
+
+ do {
+ rdd->count = 0;
+ rdd->err = 0;
+ err = vfs_readdir(realfile, filler, rdd);
+ if (err >= 0)
+ err = rdd->err;
+ } while (!err && rdd->count);
+
+ revert_creds(old_cred);
+ put_cred(override_cred);
+ }
+ fput(realfile);
+
+ if (err) {
+ ovl_cache_free(rdd->list);
+ rdd->list = NULL;
+ return err;
+ }
+
+ return 0;
+}
+
+static void ovl_dir_reset(struct file *file)
+{
+ struct ovl_dir_file *od = file->private_data;
+ enum ovl_path_type type = ovl_path_type(file->f_path.dentry);
+
+ ovl_cache_free(od->cache);
+ od->cache = NULL;
+ WARN_ON(!od->is_real && type != OVL_PATH_MERGE);
+ if (od->is_real && type == OVL_PATH_MERGE) {
+ fput(od->realfile);
+ od->realfile = NULL;
+ od->is_real = false;
+ }
+}
+
+static int ovl_readdir(struct file *file, void *buf, filldir_t filler)
+{
+ struct ovl_dir_file *od = file->private_data;
+ struct ovl_cache_entry *p;
+ loff_t off;
+ int res = 0;
+
+ if (!file->f_pos)
+ ovl_dir_reset(file);
+
+ if (od->is_real) {
+ res = vfs_readdir(od->realfile, filler, buf);
+ file->f_pos = od->realfile->f_pos;
+
+ return res;
+ }
+
+ if (!od->cache) {
+ struct path lowerpath;
+ struct path upperpath;
+ struct ovl_readdir_data rdd = {
+ .list = NULL,
+ .endp = &rdd.list,
+ };
+
+ ovl_path_lower(file->f_path.dentry, &lowerpath);
+ ovl_path_upper(file->f_path.dentry, &upperpath);
+
+ rdd.dir = upperpath.dentry;
+ res = ovl_dir_read(&upperpath, &rdd, ovl_fill_upper);
+ if (!res)
+ res = ovl_dir_read(&lowerpath, &rdd, ovl_fill_lower);
+
+ if (res)
+ return res;
+
+ od->cache = rdd.list;
+ }
+
+ off = 0;
+ for (p = od->cache; p; p = p->next) {
+ int over;
+
+ if (p->is_whiteout)
+ continue;
+
+ off++;
+ if (off <= file->f_pos)
+ continue;
+
+ over = filler(buf, p->name.name, p->name.len, off - 1,
+ p->ino, p->type);
+ if (over)
+ break;
+
+ file->f_pos = off;
+ }
+
+ return res;
+}
+
+static loff_t ovl_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+ loff_t res;
+ struct ovl_dir_file *od = file->private_data;
+
+ mutex_lock(&file->f_dentry->d_inode->i_mutex);
+ if (!file->f_pos)
+ ovl_dir_reset(file);
+
+ if (od->is_real) {
+ res = vfs_llseek(od->realfile, offset, origin);
+ file->f_pos = od->realfile->f_pos;
+ } else
+ res = generic_file_llseek_unlocked(file, offset, origin);
+ mutex_unlock(&file->f_dentry->d_inode->i_mutex);
+
+ return res;
+}
+
+static int ovl_dir_fsync(struct file *file, int datasync)
+{
+ struct ovl_dir_file *od = file->private_data;
+
+ /* May need to reopen directory if it got copied up */
+ if (!od->realfile) {
+ struct path upperpath;
+
+ ovl_path_upper(file->f_path.dentry, &upperpath);
+ od->realfile = path_open(&upperpath, O_RDONLY);
+ if (IS_ERR(od->realfile))
+ return PTR_ERR(od->realfile);
+ }
+
+ return vfs_fsync(od->realfile, datasync);
+}
+
+static int ovl_dir_release(struct inode *inode, struct file *file)
+{
+ struct ovl_dir_file *od = file->private_data;
+
+ ovl_cache_free(od->cache);
+ if (od->realfile)
+ fput(od->realfile);
+ kfree(od);
+
+ return 0;
+}
+
+static int ovl_dir_open(struct inode *inode, struct file *file)
+{
+ struct path realpath;
+ struct file *realfile;
+ struct ovl_dir_file *od;
+ enum ovl_path_type type;
+
+ od = kzalloc(sizeof(struct ovl_dir_file), GFP_KERNEL);
+ if (!od)
+ return -ENOMEM;
+
+ type = ovl_path_real(file->f_path.dentry, &realpath);
+ realfile = path_open(&realpath, file->f_flags);
+ if (IS_ERR(realfile)) {
+ kfree(od);
+ return PTR_ERR(realfile);
+ }
+ od->realfile = realfile;
+ od->is_real = (type != OVL_PATH_MERGE);
+ file->private_data = od;
+
+ return 0;
+}
+
+static const struct file_operations ovl_dir_operations = {
+ .read = generic_read_dir,
+ .open = ovl_dir_open,
+ .readdir = ovl_readdir,
+ .llseek = ovl_dir_llseek,
+ .fsync = ovl_dir_fsync,
+ .release = ovl_dir_release,
+};
+
+static const struct inode_operations ovl_dir_inode_operations;
+
+static void ovl_dentry_release(struct dentry *dentry)
+{
+ struct ovl_entry *oe = dentry->d_fsdata;
+
+ if (oe) {
+ dput(oe->__upperdentry);
+ dput(oe->lowerdentry);
+ kfree(oe);
+ }
+}
+
+static const struct dentry_operations ovl_dentry_operations = {
+ .d_release = ovl_dentry_release,
+};
+
+static struct inode *ovl_new_inode(struct super_block *sb, umode_t mode)
+{
+ struct ovl_fs *ufs = sb->s_fs_info;
+ struct inode *inode;
+
+ switch (mode & S_IFMT) {
+ case S_IFDIR:
+ inode = new_inode(sb);
+ inode->i_flags |= S_NOATIME|S_NOCMTIME;
+ inode->i_op = &ovl_dir_inode_operations;
+ inode->i_fop = &ovl_dir_operations;
+ inode->i_mode = S_IFDIR;
+ break;
+
+ case S_IFLNK:
+ inode = ufs->symlink_inode;
+ atomic_inc(&inode->i_count);
+ break;
+
+ case S_IFREG:
+ inode = ufs->regular_inode;
+ atomic_inc(&inode->i_count);
+ break;
+
+ case S_IFSOCK:
+ case S_IFBLK:
+ case S_IFCHR:
+ case S_IFIFO:
+ inode = ufs->special_inode;
+ atomic_inc(&inode->i_count);
+ break;
+
+ default:
+ WARN(1, "illegal file type: %i\n", mode & S_IFMT);
+ inode = NULL;
+ }
+
+ return inode;
+
+}
+
+static struct dentry *ovl_lookup_real(struct dentry *dir, struct qstr *name)
+{
+ struct dentry *dentry;
+
+ mutex_lock(&dir->d_inode->i_mutex);
+ dentry = lookup_one_len(name->name, dir, name->len);
+ mutex_unlock(&dir->d_inode->i_mutex);
+
+ if (IS_ERR(dentry)) {
+ if (PTR_ERR(dentry) == -ENOENT)
+ dentry = NULL;
+ } else if (!dentry->d_inode) {
+ dput(dentry);
+ dentry = NULL;
+ }
+ return dentry;
+}
+
+static struct ovl_entry *ovl_alloc_entry(void)
+{
+ return kzalloc(sizeof(struct ovl_entry), GFP_KERNEL);
+}
+
+static struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
+ struct nameidata *nd)
+{
+ struct ovl_entry *oe;
+ struct dentry *upperdir;
+ struct dentry *lowerdir;
+ struct dentry *upperdentry = NULL;
+ struct dentry *lowerdentry = NULL;
+ struct inode *inode = NULL;
+ int err;
+
+ err = -ENOMEM;
+ oe = ovl_alloc_entry();
+ if (!oe)
+ goto out;
+
+ upperdir = ovl_dentry_upper(dentry->d_parent);
+ lowerdir = ovl_dentry_lower(dentry->d_parent);
+
+ if (upperdir) {
+ upperdentry = ovl_lookup_real(upperdir, &dentry->d_name);
+ err = PTR_ERR(upperdentry);
+ if (IS_ERR(upperdentry))
+ goto out_put_dir;
+
+ if (lowerdir && upperdentry &&
+ (S_ISLNK(upperdentry->d_inode->i_mode) ||
+ S_ISDIR(upperdentry->d_inode->i_mode))) {
+ const struct cred *old_cred;
+ struct cred *override_cred;
+
+ err = -ENOMEM;
+ override_cred = prepare_creds();
+ if (!override_cred)
+ goto out_dput_upper;
+
+ /* CAP_SYS_ADMIN needed for getxattr */
+ cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+ old_cred = override_creds(override_cred);
+
+ if (ovl_is_opaquedir(upperdentry)) {
+ oe->opaque = true;
+ } else if (ovl_is_whiteout(upperdentry)) {
+ dput(upperdentry);
+ upperdentry = NULL;
+ oe->opaque = true;
+ }
+ revert_creds(old_cred);
+ put_cred(override_cred);
+ }
+ }
+ if (lowerdir && !oe->opaque) {
+ lowerdentry = ovl_lookup_real(lowerdir, &dentry->d_name);
+ err = PTR_ERR(lowerdentry);
+ if (IS_ERR(lowerdentry))
+ goto out_dput_upper;
+ }
+
+ if (lowerdentry && upperdentry &&
+ (!S_ISDIR(upperdentry->d_inode->i_mode) ||
+ !S_ISDIR(lowerdentry->d_inode->i_mode))) {
+ dput(lowerdentry);
+ lowerdentry = NULL;
+ oe->opaque = true;
+ }
+
+ if (lowerdentry || upperdentry) {
+ struct dentry *realdentry;
+
+ realdentry = upperdentry ? upperdentry : lowerdentry;
+ err = -ENOMEM;
+ inode = ovl_new_inode(dir->i_sb, realdentry->d_inode->i_mode);
+ if (!inode)
+ goto out_dput;
+ }
+
+ if (upperdentry)
+ oe->__upperdentry = upperdentry;
+
+ if (lowerdentry)
+ oe->lowerdentry = lowerdentry;
+
+ dentry->d_fsdata = oe;
+ dentry->d_op = &ovl_dentry_operations;
+ d_add(dentry, inode);
+
+ return NULL;
+
+out_dput:
+ dput(lowerdentry);
+out_dput_upper:
+ dput(upperdentry);
+out_put_dir:
+ kfree(oe);
+out:
+ return ERR_PTR(err);
+}
+
+static int ovl_copy_up_xattr(struct dentry *old, struct dentry *new)
+{
+ ssize_t list_size, size;
+ char *buf, *name, *value;
+ int error;
+
+ if (!old->d_inode->i_op->getxattr ||
+ !new->d_inode->i_op->getxattr)
+ return 0;
+
+ list_size = vfs_listxattr(old, NULL, 0);
+ if (list_size <= 0)
+ return list_size;
+
+ buf = kzalloc(list_size, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ error = -ENOMEM;
+ value = kmalloc(XATTR_SIZE_MAX, GFP_KERNEL);
+ if (!value)
+ goto out;
+
+ list_size = vfs_listxattr(old, buf, list_size);
+ if (list_size <= 0) {
+ error = list_size;
+ goto out_free_value;
+ }
+
+ for (name = buf; name < (buf + list_size); name += strlen(name) + 1) {
+ size = vfs_getxattr(old, name, value, XATTR_SIZE_MAX);
+ if (size <= 0) {
+ error = size;
+ goto out_free_value;
+ }
+ error = vfs_setxattr(new, name, value, size, 0);
+ if (error)
+ goto out_free_value;
+ }
+
+out_free_value:
+ kfree(value);
+out:
+ kfree(buf);
+ return error;
+}
+
+static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
+{
+ struct file *old_file;
+ struct file *new_file;
+ int error = 0;
+
+ if (len == 0)
+ return 0;
+
+ old_file = path_open(old, O_RDONLY);
+ if (IS_ERR(old_file))
+ return PTR_ERR(old_file);
+
+ new_file = path_open(new, O_WRONLY);
+ if (IS_ERR(new_file)) {
+ error = PTR_ERR(new_file);
+ goto out_fput;
+ }
+
+ /* FIXME: copy up sparse files efficiently */
+ while (len) {
+ loff_t offset = new_file->f_pos;
+ size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
+ long bytes;
+
+ if (len < this_len)
+ this_len = len;
+
+ if (signal_pending_state(TASK_KILLABLE, current))
+ return -EINTR;
+
+ bytes = do_splice_direct(old_file, &offset, new_file, this_len,
+ SPLICE_F_MOVE);
+ if (bytes <= 0) {
+ error = bytes;
+ break;
+ }
+
+ len -= bytes;
+ }
+
+ fput(new_file);
+out_fput:
+ fput(old_file);
+ return error;
+}
+
+static struct dentry *ovl_lookup_create(struct dentry *upperdir,
+ struct dentry *template)
+{
+ int err;
+ struct dentry *newdentry;
+ struct qstr *name = &template->d_name;
+
+ newdentry = lookup_one_len(name->name, upperdir, name->len);
+ if (IS_ERR(newdentry))
+ return newdentry;
+
+ if (newdentry->d_inode) {
+ const struct cred *old_cred;
+ struct cred *override_cred;
+
+ /* No need to check whiteout if lower parent is non-existent */
+ err = -EEXIST;
+ if (!ovl_dentry_lower(template->d_parent))
+ goto out_dput;
+
+ if (!S_ISLNK(newdentry->d_inode->i_mode))
+ goto out_dput;
+
+ err = -ENOMEM;
+ override_cred = prepare_creds();
+ if (!override_cred)
+ goto out_dput;
+
+ /*
+ * CAP_SYS_ADMIN for getxattr
+ * CAP_FOWNER for unlink in sticky directory
+ */
+ cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+ cap_raise(override_cred->cap_effective, CAP_FOWNER);
+ old_cred = override_creds(override_cred);
+
+ err = -EEXIST;
+ if (ovl_is_whiteout(newdentry))
+ err = vfs_unlink(upperdir->d_inode, newdentry);
+
+ revert_creds(old_cred);
+ put_cred(override_cred);
+ if (err)
+ goto out_dput;
+
+ dput(newdentry);
+ newdentry = lookup_one_len(name->name, upperdir, name->len);
+ if (IS_ERR(newdentry))
+ return newdentry;
+
+ /*
+ * Whiteout just been successfully removed, parent
+ * i_mutex is still held, there's no way the lookup
+ * could return positive.
+ */
+ BUG_ON(newdentry->d_inode);
+ }
+
+ return newdentry;
+
+out_dput:
+ dput(newdentry);
+ return ERR_PTR(err);
+}
+
+static struct dentry *ovl_upper_create(struct dentry *upperdir,
+ struct dentry *dentry,
+ struct kstat *stat, const char *link)
+{
+ int err;
+ struct dentry *newdentry;
+ struct inode *dir = upperdir->d_inode;
+
+ newdentry = ovl_lookup_create(upperdir, dentry);
+ if (IS_ERR(newdentry))
+ goto out;
+
+ switch (stat->mode & S_IFMT) {
+ case S_IFREG:
+ err = vfs_create(dir, newdentry, stat->mode, NULL);
+ break;
+
+ case S_IFDIR:
+ err = vfs_mkdir(dir, newdentry, stat->mode);
+ break;
+
+ case S_IFCHR:
+ case S_IFBLK:
+ case S_IFIFO:
+ case S_IFSOCK:
+ err = vfs_mknod(dir, newdentry, stat->mode, stat->rdev);
+ break;
+
+ case S_IFLNK:
+ err = vfs_symlink(dir, newdentry, link);
+ break;
+
+ default:
+ err = -EPERM;
+ }
+ if (err) {
+ dput(newdentry);
+ newdentry = ERR_PTR(err);
+ }
+
+out:
+ return newdentry;
+
+}
+
+static char *ovl_read_symlink(struct dentry *realdentry)
+{
+ int res;
+ char *buf;
+ struct inode *inode = realdentry->d_inode;
+ mm_segment_t old_fs;
+
+ res = -EINVAL;
+ if (!inode->i_op->readlink)
+ goto err;
+
+ res = -ENOMEM;
+ buf = (char *) __get_free_page(GFP_KERNEL);
+ if (!buf)
+ goto err;
+
+ old_fs = get_fs();
+ set_fs(get_ds());
+ /* The cast to a user pointer is valid due to the set_fs() */
+ res = inode->i_op->readlink(realdentry,
+ (char __user *)buf, PAGE_SIZE - 1);
+ set_fs(old_fs);
+ if (res < 0) {
+ free_page((unsigned long) buf);
+ goto err;
+ }
+ buf[res] = '\0';
+
+ return buf;
+
+err:
+ return ERR_PTR(res);
+}
+
+static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
+{
+ struct iattr attr = {
+ .ia_valid = ATTR_ATIME | ATTR_MTIME | ATTR_ATIME_SET | ATTR_MTIME_SET,
+ .ia_atime = stat->atime,
+ .ia_mtime = stat->mtime,
+ };
+
+ return notify_change(upperdentry, &attr);
+}
+
+static int ovl_set_mode(struct dentry *upperdentry, umode_t mode)
+{
+ struct iattr attr = {
+ .ia_valid = ATTR_MODE,
+ .ia_mode = mode,
+ };
+
+ return notify_change(upperdentry, &attr);
+}
+
+static int ovl_set_opaque(struct dentry *upperdentry)
+{
+ int err;
+ const struct cred *old_cred;
+ struct cred *override_cred;
+
+ override_cred = prepare_creds();
+ if (!override_cred)
+ return -ENOMEM;
+
+ /* CAP_SYS_ADMIN for setxattr of "trusted" namespace */
+ cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+ old_cred = override_creds(override_cred);
+ err = vfs_setxattr(upperdentry, ovl_opaque_xattr, "y", 1, 0);
+ revert_creds(old_cred);
+ put_cred(override_cred);
+
+ return err;
+}
+
+static int ovl_remove_opaque(struct dentry *upperdentry)
+{
+ int err;
+ const struct cred *old_cred;
+ struct cred *override_cred;
+
+ override_cred = prepare_creds();
+ if (!override_cred)
+ return -ENOMEM;
+
+ /* CAP_SYS_ADMIN for removexattr of "trusted" namespace */
+ cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+ old_cred = override_creds(override_cred);
+ err = vfs_removexattr(upperdentry, ovl_opaque_xattr);
+ revert_creds(old_cred);
+ put_cred(override_cred);
+
+ return err;
+}
+
+static int ovl_copy_up_locked(struct dentry *upperdir, struct dentry *dentry,
+ struct path *lowerpath, struct kstat *stat,
+ const char *link)
+{
+ int err;
+ struct path newpath;
+ umode_t mode = stat->mode;
+ struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+
+ /* Can't properly set mode on creation because of the umask */
+ stat->mode &= S_IFMT;
+
+ newpath.mnt = ofs->upper_mnt;
+ newpath.dentry = ovl_upper_create(upperdir, dentry, stat, link);
+ if (IS_ERR(newpath.dentry)) {
+ err = PTR_ERR(newpath.dentry);
+
+ /* Already copied up? */
+ if (err == -EEXIST && ovl_path_type(dentry) != OVL_PATH_LOWER)
+ return 0;
+
+ return err;
+ }
+
+ /* FIXME: recovery from failure to copy up */
+
+ if (S_ISREG(stat->mode)) {
+ err = ovl_copy_up_data(lowerpath, &newpath, stat->size);
+ if (err)
+ return err;
+ }
+
+ err = ovl_copy_up_xattr(lowerpath->dentry, newpath.dentry);
+ if (err)
+ return err;
+
+ mutex_lock(&newpath.dentry->d_inode->i_mutex);
+ err = ovl_set_mode(newpath.dentry, mode);
+ if (!err)
+ err = ovl_set_timestamps(newpath.dentry, stat);
+ mutex_unlock(&newpath.dentry->d_inode->i_mutex);
+ if (err)
+ return err;
+
+ ovl_dentry_update(dentry, newpath.dentry);
+
+ /*
+ * Easiest way to get rid of the lower dentry reference is to
+ * drop this dentry. This is neither needed nor possible for
+ * directories.
+ */
+ if (!S_ISDIR(stat->mode))
+ d_drop(dentry);
+
+ return 0;
+}
+
+static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
+ struct path *lowerpath, struct kstat *stat)
+{
+ int err;
+ struct kstat pstat;
+ struct path parentpath;
+ struct dentry *upperdir;
+ const struct cred *old_cred;
+ struct cred *override_cred;
+ char *link = NULL;
+
+ ovl_path_upper(parent, &parentpath);
+ upperdir = parentpath.dentry;
+
+ err = vfs_getattr(parentpath.mnt, parentpath.dentry, &pstat);
+ if (err)
+ return err;
+
+ if (S_ISLNK(stat->mode)) {
+ link = ovl_read_symlink(lowerpath->dentry);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ }
+
+ err = -ENOMEM;
+ override_cred = prepare_creds();
+ if (!override_cred)
+ goto out_free_link;
+
+ override_cred->fsuid = stat->uid;
+ override_cred->fsgid = stat->gid;
+ /*
+ * CAP_SYS_ADMIN for copying up extended attributes
+ * CAP_DAC_OVERRIDE for create
+ * CAP_FOWNER for chmod, timestamp update
+ * CAP_FSETID for chmod
+ * CAP_MKNOD for mknod
+ */
+ cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+ cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
+ cap_raise(override_cred->cap_effective, CAP_FOWNER);
+ cap_raise(override_cred->cap_effective, CAP_FSETID);
+ cap_raise(override_cred->cap_effective, CAP_MKNOD);
+ old_cred = override_creds(override_cred);
+
+ mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
+ /*
+ * Using upper filesystem locking to protect against copy up
+ * racing with rename (rename means the copy up was already
+ * successful).
+ */
+ if (dentry->d_parent != parent) {
+ WARN_ON((ovl_path_type(dentry) == OVL_PATH_LOWER));
+ err = 0;
+ } else {
+ err = ovl_copy_up_locked(upperdir, dentry, lowerpath,
+ stat, link);
+ if (!err) {
+ /* Restore timestamps on parent (best effort) */
+ ovl_set_timestamps(upperdir, &pstat);
+ }
+ }
+
+ mutex_unlock(&upperdir->d_inode->i_mutex);
+
+ revert_creds(old_cred);
+ put_cred(override_cred);
+
+out_free_link:
+ if (link)
+ free_page((unsigned long) link);
+
+ return err;
+}
+
+static int ovl_copy_up(struct dentry *dentry)
+{
+ int err;
+
+ err = 0;
+ while (!err) {
+ struct dentry *next;
+ struct dentry *parent;
+ struct path lowerpath;
+ struct kstat stat;
+ enum ovl_path_type type = ovl_path_type(dentry);
+
+ if (type != OVL_PATH_LOWER)
+ break;
+
+ next = dget(dentry);
+ /* find the topmost dentry not yet copied up */
+ for (;;) {
+ parent = dget_parent(next);
+
+ type = ovl_path_type(parent);
+ if (type != OVL_PATH_LOWER)
+ break;
+
+ dput(next);
+ next = parent;
+ }
+
+ ovl_path_lower(next, &lowerpath);
+ err = vfs_getattr(lowerpath.mnt, lowerpath.dentry, &stat);
+ if (!err)
+ err = ovl_copy_up_one(parent, next, &lowerpath, &stat);
+
+ dput(parent);
+ dput(next);
+ }
+
+ return err;
+}
+
+/* Optimize by not copying up the file first and truncating later */
+static int ovl_copy_up_truncate(struct dentry *dentry, loff_t size)
+{
+ int err;
+ struct kstat stat;
+ struct path lowerpath;
+ struct dentry *parent = dget_parent(dentry);
+
+ err = ovl_copy_up(parent);
+ if (err)
+ goto out_dput_parent;
+
+ ovl_path_lower(dentry, &lowerpath);
+ err = vfs_getattr(lowerpath.mnt, lowerpath.dentry, &stat);
+ if (err)
+ goto out_dput_parent;
+
+ if (size < stat.size)
+ stat.size = size;
+
+ err = ovl_copy_up_one(parent, dentry, &lowerpath, &stat);
+
+out_dput_parent:
+ dput(parent);
+ return err;
+}
+
+static int ovl_setattr(struct dentry *dentry, struct iattr *attr)
+{
+ struct dentry *upperdentry;
+ int err;
+
+ if ((attr->ia_valid & ATTR_SIZE) && !ovl_dentry_upper(dentry))
+ err = ovl_copy_up_truncate(dentry, attr->ia_size);
+ else
+ err = ovl_copy_up(dentry);
+ if (err)
+ return err;
+
+ upperdentry = ovl_dentry_upper(dentry);
+
+ mutex_lock(&upperdentry->d_inode->i_mutex);
+ err = notify_change(upperdentry, attr);
+ mutex_unlock(&upperdentry->d_inode->i_mutex);
+
+ return err;
+}
+
+static int ovl_getattr(struct vfsmount *mnt, struct dentry *dentry,
+ struct kstat *stat)
+{
+ struct path realpath;
+
+ ovl_path_real(dentry, &realpath);
+ return vfs_getattr(realpath.mnt, realpath.dentry, stat);
+}
+
+static int ovl_dir_getattr(struct vfsmount *mnt, struct dentry *dentry,
+ struct kstat *stat)
+{
+ int err;
+ enum ovl_path_type type;
+ struct path realpath;
+
+ type = ovl_path_real(dentry, &realpath);
+ err = vfs_getattr(realpath.mnt, realpath.dentry, stat);
+ if (err)
+ return err;
+
+ stat->dev = dentry->d_sb->s_dev;
+ stat->ino = dentry->d_inode->i_ino;
+
+ /*
+ * It's probably not worth it to count subdirs to get the
+ * correct link count. nlink=1 seems to pacify 'find' and
+ * other utilities.
+ */
+ if (type == OVL_PATH_MERGE)
+ stat->nlink = 1;
+
+ return 0;
+}
+
+static int ovl_permission(struct dentry *dentry, int mask)
+{
+ enum ovl_path_type type;
+ struct path realpath;
+ struct inode *inode;
+ int err;
+
+ type = ovl_path_real(dentry, &realpath);
+ if (type != OVL_PATH_LOWER)
+ return dentry_permission(realpath.dentry, mask);
+
+ inode = realpath.dentry->d_inode;
+ if (!(mask & MAY_WRITE) || special_file(inode->i_mode))
+ return dentry_permission(realpath.dentry, mask);
+
+ /* Don't check for read-only fs */
+ if (mask & MAY_WRITE) {
+ if (IS_IMMUTABLE(inode))
+ return -EACCES;
+ }
+
+ if (inode->i_op->permission)
+ err = inode->i_op->permission(realpath.dentry, mask);
+ else
+ err = generic_permission(inode, mask, inode->i_op->check_acl);
+
+ if (err)
+ return err;
+
+ return security_inode_permission(inode, mask);
+}
+
+static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev,
+ const char *link)
+{
+ int err;
+ struct dentry *newdentry;
+ struct dentry *upperdir;
+ struct inode *inode;
+ struct kstat stat = {
+ .mode = mode,
+ .rdev = rdev,
+ };
+
+ err = -ENOMEM;
+ inode = ovl_new_inode(dentry->d_sb, mode);
+ if (!inode)
+ goto out;
+
+ err = ovl_copy_up(dentry->d_parent);
+ if (err)
+ goto out_iput;
+
+ upperdir = ovl_dentry_upper(dentry->d_parent);
+ mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
+
+ newdentry = ovl_upper_create(upperdir, dentry, &stat, link);
+ err = PTR_ERR(newdentry);
+ if (IS_ERR(newdentry))
+ goto out_unlock;
+
+ if (ovl_dentry_is_opaque(dentry) && S_ISDIR(mode)) {
+ err = ovl_set_opaque(newdentry);
+ if (err)
+ goto out_dput;
+ }
+ ovl_dentry_update(dentry, newdentry);
+ d_instantiate(dentry, inode);
+ inode = NULL;
+ newdentry = NULL;
+ err = 0;
+
+out_dput:
+ dput(newdentry);
+out_unlock:
+ mutex_unlock(&upperdir->d_inode->i_mutex);
+out_iput:
+ iput(inode);
+out:
+ return err;
+}
+
+static int ovl_create(struct inode *dir, struct dentry *dentry, int mode,
+ struct nameidata *nd)
+{
+ return ovl_create_object(dentry, (mode & 07777) | S_IFREG, 0, NULL);
+}
+
+static int ovl_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+ return ovl_create_object(dentry, (mode & 07777) | S_IFDIR, 0, NULL);
+}
+
+static int ovl_mknod(struct inode *dir, struct dentry *dentry, int mode,
+ dev_t rdev)
+{
+ return ovl_create_object(dentry, mode, rdev, NULL);
+}
+
+static int ovl_symlink(struct inode *dir, struct dentry *dentry,
+ const char *link)
+{
+ return ovl_create_object(dentry, S_IFLNK, 0, link);
+}
+
+struct ovl_link_data {
+ struct dentry *realdentry;
+ void *cookie;
+};
+
+static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+ void *ret;
+ struct dentry *realdentry;
+ struct inode *realinode;
+
+ realdentry = ovl_dentry_real(dentry);
+ realinode = realdentry->d_inode;
+
+ if (WARN_ON(!realinode->i_op->follow_link))
+ return ERR_PTR(-EPERM);
+
+ ret = realinode->i_op->follow_link(realdentry, nd);
+ if (IS_ERR(ret))
+ return ret;
+
+ if (realinode->i_op->put_link) {
+ struct ovl_link_data *data;
+
+ data = kmalloc(sizeof(struct ovl_link_data), GFP_KERNEL);
+ if (!data) {
+ realinode->i_op->put_link(realdentry, nd, ret);
+ return ERR_PTR(-ENOMEM);
+ }
+ data->realdentry = realdentry;
+ data->cookie = ret;
+
+ return data;
+ } else {
+ return NULL;
+ }
+}
+
+static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+{
+ struct inode *realinode;
+ struct ovl_link_data *data = c;
+
+ if (!data)
+ return;
+
+ realinode = data->realdentry->d_inode;
+ realinode->i_op->put_link(data->realdentry, nd, data->cookie);
+ kfree(data);
+}
+
+static int ovl_readlink(struct dentry *dentry, char __user *buf, int bufsiz)
+{
+ struct path realpath;
+ struct inode *realinode;
+
+ ovl_path_real(dentry, &realpath);
+ realinode = realpath.dentry->d_inode;
+
+ if (!realinode->i_op->readlink)
+ return -EINVAL;
+
+ touch_atime(realpath.mnt, realpath.dentry);
+
+ return realinode->i_op->readlink(realpath.dentry, buf, bufsiz);
+}
+
+static int ovl_whiteout(struct dentry *upperdir, struct dentry *dentry)
+{
+ int err;
+ struct dentry *newdentry;
+ const struct cred *old_cred;
+ struct cred *override_cred;
+
+ /* FIXME: recheck lower dentry to see if whiteout is really needed */
+
+ err = -ENOMEM;
+ override_cred = prepare_creds();
+ if (!override_cred)
+ goto out;
+
+ /*
+ * CAP_SYS_ADMIN for setxattr
+ * CAP_DAC_OVERRIDE for symlink creation
+ */
+ cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
+ cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
+ override_cred->fsuid = 0;
+ override_cred->fsgid = 0;
+ old_cred = override_creds(override_cred);
+
+ newdentry = lookup_one_len(dentry->d_name.name, upperdir,
+ dentry->d_name.len);
+ err = PTR_ERR(newdentry);
+ if (IS_ERR(newdentry))
+ goto out_put_cred;
+
+ /* Just been removed within the same locked region */
+ BUG_ON(newdentry->d_inode);
+
+ err = vfs_symlink(upperdir->d_inode, newdentry, ovl_whiteout_symlink);
+ if (err)
+ goto out_dput;
+
+ err = vfs_setxattr(newdentry, ovl_whiteout_xattr, "y", 1, 0);
+
+out_dput:
+ dput(newdentry);
+out_put_cred:
+ revert_creds(old_cred);
+ put_cred(override_cred);
+out:
+ return err;
+}
+
+static int ovl_do_remove(struct dentry *dentry, bool is_dir)
+{
+ int err;
+ enum ovl_path_type type;
+ struct path realpath;
+ struct dentry *upperdir;
+
+ err = ovl_copy_up(dentry->d_parent);
+ if (err)
+ return err;
+
+ upperdir = ovl_dentry_upper(dentry->d_parent);
+ mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
+ type = ovl_path_real(dentry, &realpath);
+ if (type != OVL_PATH_LOWER) {
+ err = -ESTALE;
+ if (realpath.dentry->d_parent != upperdir)
+ goto out_d_drop;
+
+ if (is_dir)
+ err = vfs_rmdir(upperdir->d_inode, realpath.dentry);
+ else
+ err = vfs_unlink(upperdir->d_inode, realpath.dentry);
+ if (err)
+ goto out_d_drop;
+ }
+
+ if (type != OVL_PATH_UPPER || ovl_dentry_is_opaque(dentry))
+ err = ovl_whiteout(upperdir, dentry);
+
+ /*
+ * Keeping this dentry hashed would mean having to release
+ * upperpath/lowerpath, which could only be done if we are the
+ * sole user of this dentry. Too tricky... Just unhash for
+ * now.
+ */
+out_d_drop:
+ d_drop(dentry);
+ mutex_unlock(&upperdir->d_inode->i_mutex);
+
+ return err;
+}
+
+static int ovl_unlink(struct inode *dir, struct dentry *dentry)
+{
+ return ovl_do_remove(dentry, false);
+}
+
+static int ovl_check_empty_dir(struct dentry *dentry)
+{
+ int err;
+ struct ovl_cache_entry *p;
+ struct path lowerpath;
+ struct path upperpath;
+ struct ovl_readdir_data rdd = {
+ .list = NULL,
+ .endp = &rdd.list,
+ };
+
+ ovl_path_upper(dentry, &upperpath);
+ ovl_path_lower(dentry, &lowerpath);
+
+ if (upperpath.dentry) {
+ rdd.dir = upperpath.dentry;
+ err = ovl_dir_read(&upperpath, &rdd, ovl_fill_upper);
+ if (err)
+ return err;
+ }
+ err = ovl_dir_read(&lowerpath, &rdd, ovl_fill_lower);
+ if (err)
+ return err;
+
+ err = 0;
+ for (p = rdd.list; p; p = p->next) {
+ if (p->is_whiteout)
+ continue;
+
+ if (p->name.name[0] == '.') {
+ if (p->name.len == 1)
+ continue;
+ if (p->name.len == 2 && p->name.name[1] == '.')
+ continue;
+ }
+ err = -ENOTEMPTY;
+ break;
+ }
+
+ ovl_cache_free(rdd.list);
+
+ return err;
+}
+
+static int ovl_unlink_whiteout(void *buf, const char *name, int namlen,
+ loff_t offset, u64 ino, unsigned int d_type)
+{
+ struct ovl_readdir_data *rdd = buf;
+
+ rdd->count++;
+ /* check d_type to filter out "." and ".." */
+ if (d_type == DT_LNK) {
+ struct dentry *dentry;
+
+ dentry = lookup_one_len(name, rdd->dir, strlen(name));
+ if (IS_ERR(dentry)) {
+ rdd->err = PTR_ERR(dentry);
+ } else {
+ rdd->err = vfs_unlink(rdd->dir->d_inode, dentry);
+ dput(dentry);
+ }
+ }
+
+ return rdd->err;
+}
+
+static int ovl_remove_whiteouts(struct dentry *dentry)
+{
+ struct path upperpath;
+ struct ovl_readdir_data rdd = { .list = NULL };
+
+ ovl_path_upper(dentry, &upperpath);
+ rdd.dir = upperpath.dentry;
+
+ return ovl_dir_read(&upperpath, &rdd, ovl_unlink_whiteout);
+}
+
+static int ovl_rmdir(struct inode *dir, struct dentry *dentry)
+{
+ int err;
+ enum ovl_path_type type;
+
+ type = ovl_path_type(dentry);
+ if (type != OVL_PATH_UPPER) {
+ err = ovl_check_empty_dir(dentry);
+ if (err)
+ return err;
+
+ if (type == OVL_PATH_MERGE) {
+ err = ovl_remove_whiteouts(dentry);
+ if (err)
+ return err;
+ }
+ }
+
+ return ovl_do_remove(dentry, true);
+}
+
+static int ovl_link(struct dentry *old, struct inode *newdir,
+ struct dentry *new)
+{
+ int err;
+ struct dentry *olddentry;
+ struct dentry *newdentry;
+ struct dentry *upperdir;
+
+ err = ovl_copy_up(old);
+ if (err)
+ goto out;
+
+ err = ovl_copy_up(new->d_parent);
+ if (err)
+ goto out;
+
+ upperdir = ovl_dentry_upper(new->d_parent);
+ mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
+ newdentry = ovl_lookup_create(upperdir, new);
+ err = PTR_ERR(newdentry);
+ if (IS_ERR(newdentry))
+ goto out_unlock;
+
+ olddentry = ovl_dentry_upper(old);
+ err = vfs_link(olddentry, upperdir->d_inode, newdentry);
+ if (!err) {
+ ovl_dentry_update(new, newdentry);
+
+ atomic_inc(&old->d_inode->i_count);
+ d_instantiate(new, old->d_inode);
+ } else {
+ dput(newdentry);
+ }
+out_unlock:
+ mutex_unlock(&upperdir->d_inode->i_mutex);
+out:
+ return err;
+
+}
+
+static int ovl_rename(struct inode *olddir, struct dentry *old,
+ struct inode *newdir, struct dentry *new)
+{
+ int err;
+ enum ovl_path_type type;
+ struct dentry *old_upperdir;
+ struct dentry *new_upperdir;
+ struct dentry *olddentry;
+ struct dentry *newdentry;
+ struct dentry *trap;
+ bool is_dir = S_ISDIR(old->d_inode->i_mode);
+
+ /* Don't copy up directory trees */
+ type = ovl_path_type(old);
+ if (type != OVL_PATH_UPPER && is_dir)
+ return -EXDEV;
+
+ if (new->d_inode) {
+ type = ovl_path_type(new);
+ if (type != OVL_PATH_UPPER && S_ISDIR(new->d_inode->i_mode)) {
+ err = ovl_check_empty_dir(new);
+ if (err)
+ return err;
+
+ if (type == OVL_PATH_MERGE) {
+ err = ovl_remove_whiteouts(new);
+ if (err)
+ return err;
+ }
+ }
+ }
+
+ err = ovl_copy_up(old);
+ if (err)
+ return err;
+
+ err = ovl_copy_up(new->d_parent);
+ if (err)
+ return err;
+
+ old_upperdir = ovl_dentry_upper(old->d_parent);
+ new_upperdir = ovl_dentry_upper(new->d_parent);
+
+ trap = lock_rename(new_upperdir, old_upperdir);
+
+ olddentry = ovl_dentry_upper(old);
+ newdentry = ovl_dentry_upper(new);
+ if (newdentry) {
+ dget(newdentry);
+ } else {
+ newdentry = ovl_lookup_create(new_upperdir, new);
+ err = PTR_ERR(newdentry);
+ if (IS_ERR(newdentry))
+ goto out_unlock;
+ }
+
+ err = -ESTALE;
+ if (olddentry->d_parent != old_upperdir)
+ goto out_dput;
+ if (newdentry->d_parent != new_upperdir)
+ goto out_dput;
+ if (olddentry == trap)
+ goto out_dput;
+ if (newdentry == trap)
+ goto out_dput;
+
+ err = vfs_rename(old_upperdir->d_inode, olddentry,
+ new_upperdir->d_inode, newdentry);
+
+ if (!err) {
+ bool old_opaque = ovl_dentry_is_opaque(old);
+ bool new_opaque = ovl_dentry_is_opaque(new);
+
+ if (ovl_path_type(new) != OVL_PATH_UPPER)
+ new_opaque = true;
+
+ if (old_opaque)
+ err = ovl_whiteout(old_upperdir, old);
+ if (!err && is_dir) {
+ if (old_opaque && !new_opaque) {
+ ovl_remove_opaque(olddentry);
+ ovl_dentry_set_opaque(old, false);
+ }
+ if (!old_opaque && new_opaque) {
+ err = ovl_set_opaque(olddentry);
+ ovl_dentry_set_opaque(old, true);
+ }
+ }
+ }
+
+out_dput:
+ dput(newdentry);
+out_unlock:
+ unlock_rename(new_upperdir, old_upperdir);
+ return err;
+}
+
+static bool ovl_is_private_xattr(const char *name)
+{
+ return strncmp(name, "trusted.overlay.", 14) == 0;
+}
+
+static int ovl_setxattr(struct dentry *dentry, const char *name,
+ const void *value, size_t size, int flags)
+{
+ int err;
+ struct dentry *upperdentry;
+
+ if (ovl_is_private_xattr(name))
+ return -EPERM;
+
+ err = ovl_copy_up(dentry);
+ if (err)
+ return err;
+
+ upperdentry = ovl_dentry_upper(dentry);
+ return vfs_setxattr(upperdentry, name, value, size, flags);
+}
+
+static ssize_t ovl_getxattr(struct dentry *dentry, const char *name,
+ void *value, size_t size)
+{
+ if (ovl_path_type(dentry->d_parent) == OVL_PATH_MERGE &&
+ ovl_is_private_xattr(name))
+ return -ENODATA;
+
+ return vfs_getxattr(ovl_dentry_real(dentry), name, value, size);
+}
+
+static ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size)
+{
+ ssize_t res;
+ int off;
+
+ res = vfs_listxattr(ovl_dentry_real(dentry), list, size);
+ if (res <= 0 || size == 0)
+ return res;
+
+ if (ovl_path_type(dentry->d_parent) != OVL_PATH_MERGE)
+ return res;
+
+ /* filter out private xattrs */
+ for (off = 0; off < res;) {
+ char *s = list + off;
+ size_t slen = strlen(s) + 1;
+
+ BUG_ON(off + slen > res);
+
+ if (ovl_is_private_xattr(s)) {
+ res -= slen;
+ memmove(s, s + slen, res - off);
+ } else {
+ off += slen;
+ }
+ }
+
+ return res;
+}
+
+static int ovl_removexattr(struct dentry *dentry, const char *name)
+{
+ int err;
+ struct path realpath;
+ enum ovl_path_type type;
+
+ if (ovl_path_type(dentry->d_parent) == OVL_PATH_MERGE &&
+ ovl_is_private_xattr(name))
+ return -ENODATA;
+
+ type = ovl_path_real(dentry, &realpath);
+ if (type == OVL_PATH_LOWER) {
+ err = vfs_getxattr(realpath.dentry, name, NULL, 0);
+ if (err < 0)
+ return err;
+
+ err = ovl_copy_up(dentry);
+ if (err)
+ return err;
+
+ ovl_path_upper(dentry, &realpath);
+ }
+
+ return vfs_removexattr(realpath.dentry, name);
+}
+
+static const struct inode_operations ovl_dir_inode_operations = {
+ .lookup = ovl_lookup,
+ .mkdir = ovl_mkdir,
+ .symlink = ovl_symlink,
+ .unlink = ovl_unlink,
+ .rmdir = ovl_rmdir,
+ .rename = ovl_rename,
+ .link = ovl_link,
+ .setattr = ovl_setattr,
+ .create = ovl_create,
+ .mknod = ovl_mknod,
+ .permission = ovl_permission,
+ .getattr = ovl_dir_getattr,
+ .setxattr = ovl_setxattr,
+ .getxattr = ovl_getxattr,
+ .listxattr = ovl_listxattr,
+ .removexattr = ovl_removexattr,
+};
+
+static const struct inode_operations ovl_file_inode_operations = {
+ .setattr = ovl_setattr,
+ .permission = ovl_permission,
+ .getattr = ovl_getattr,
+ .setxattr = ovl_setxattr,
+ .getxattr = ovl_getxattr,
+ .listxattr = ovl_listxattr,
+ .removexattr = ovl_removexattr,
+};
+
+static const struct inode_operations ovl_symlink_inode_operations = {
+ .setattr = ovl_setattr,
+ .follow_link = ovl_follow_link,
+ .put_link = ovl_put_link,
+ .readlink = ovl_readlink,
+ .getattr = ovl_getattr,
+ .setxattr = ovl_setxattr,
+ .getxattr = ovl_getxattr,
+ .listxattr = ovl_listxattr,
+ .removexattr = ovl_removexattr,
+};
+
+static bool ovl_open_need_copy_up(struct file *file, enum ovl_path_type type,
+ struct dentry *realdentry)
+{
+ if (type != OVL_PATH_LOWER)
+ return false;
+
+ if (special_file(realdentry->d_inode->i_mode))
+ return false;
+
+ if (!(file->f_mode & FMODE_WRITE) && !(file->f_flags & O_TRUNC))
+ return false;
+
+ return true;
+}
+
+static struct file *ovl_open(struct file *file)
+{
+ int err;
+ struct path realpath;
+ enum ovl_path_type type;
+ struct dentry *dentry = file->f_path.dentry;
+
+ type = ovl_path_real(dentry, &realpath);
+ if (ovl_open_need_copy_up(file, type, realpath.dentry)) {
+ if (file->f_flags & O_TRUNC)
+ err = ovl_copy_up_truncate(dentry, 0);
+ else
+ err = ovl_copy_up(dentry);
+ if (err)
+ return ERR_PTR(err);
+
+ ovl_path_upper(dentry, &realpath);
+ }
+
+ return path_open(&realpath, file->f_flags);
+}
+
+static const struct file_operations ovl_file_operations = {
+ .open_other = ovl_open,
+};
+
+static void ovl_put_super(struct super_block *sb)
+{
+ struct ovl_fs *ufs = sb->s_fs_info;
+
+ if (!(sb->s_flags & MS_RDONLY))
+ mnt_drop_write(ufs->upper_mnt);
+
+ mntput(ufs->upper_mnt);
+ mntput(ufs->lower_mnt);
+
+ iput(ufs->symlink_inode);
+ iput(ufs->regular_inode);
+ iput(ufs->special_inode);
+ kfree(ufs);
+}
+
+static int ovl_remount_fs(struct super_block *sb, int *flagsp, char *data)
+{
+ int flags = *flagsp;
+ struct ovl_fs *ufs = sb->s_fs_info;
+
+ /* When remounting rw or ro, we need to adjust the write access to the
+ * upper fs.
+ */
+ if (((flags ^ sb->s_flags) & MS_RDONLY) == 0)
+ /* No change to readonly status */
+ return 0;
+
+ if (flags & MS_RDONLY) {
+ mnt_drop_write(ufs->upper_mnt);
+ return 0;
+ } else
+ return mnt_want_write(ufs->upper_mnt);
+}
+
+static const struct super_operations ovl_super_operations = {
+ .put_super = ovl_put_super,
+ .remount_fs = ovl_remount_fs,
+};
+
+struct ovl_config {
+ char *lowerdir;
+ char *upperdir;
+};
+
+enum {
+ Opt_lowerdir,
+ Opt_upperdir,
+ Opt_err,
+};
+
+static const match_table_t ovl_tokens = {
+ {Opt_lowerdir, "lowerdir=%s"},
+ {Opt_upperdir, "upperdir=%s"},
+ {Opt_err, NULL}
+};
+
+static int ovl_parse_opt(char *opt, struct ovl_config *config)
+{
+ char *p;
+
+ config->upperdir = NULL;
+ config->lowerdir = NULL;
+
+ while ((p = strsep(&opt, ",")) != NULL) {
+ int token;
+ substring_t args[MAX_OPT_ARGS];
+
+ if (!*p)
+ continue;
+
+ token = match_token(p, ovl_tokens, args);
+ switch (token) {
+ case Opt_upperdir:
+ kfree(config->upperdir);
+ config->upperdir = match_strdup(&args[0]);
+ if (!config->upperdir)
+ return -ENOMEM;
+ break;
+
+ case Opt_lowerdir:
+ kfree(config->lowerdir);
+ config->lowerdir = match_strdup(&args[0]);
+ if (!config->lowerdir)
+ return -ENOMEM;
+ break;
+
+ default:
+ return -EINVAL;
+ }
+ }
+ return 0;
+}
+
+static int ovl_fill_super(struct super_block *sb, void *data, int silent)
+{
+ struct path lowerpath;
+ struct path upperpath;
+ struct inode *root_inode;
+ struct dentry *root_dentry;
+ struct ovl_entry *oe;
+ struct ovl_fs *ufs;
+ struct ovl_config config;
+ int err;
+
+ err = ovl_parse_opt((char *) data, &config);
+ if (err)
+ goto out;
+
+ err = -EINVAL;
+ if (!config.upperdir || !config.lowerdir)
+ goto out_free_config;
+
+ err = -ENOMEM;
+ ufs = kmalloc(sizeof(struct ovl_fs), GFP_KERNEL);
+ if (!ufs)
+ goto out_free_config;
+
+ ufs->symlink_inode = new_inode(sb);
+ if (!ufs->symlink_inode)
+ goto out_free_ufs;
+
+ ufs->regular_inode = new_inode(sb);
+ if (!ufs->regular_inode)
+ goto out_put_symlink_inode;
+
+ ufs->special_inode = new_inode(sb);
+ if (!ufs->special_inode)
+ goto out_put_regular_inode;
+
+ ufs->symlink_inode->i_flags |= S_NOATIME|S_NOCMTIME;
+ ufs->symlink_inode->i_mode = S_IFLNK;
+ ufs->symlink_inode->i_op = &ovl_symlink_inode_operations;
+
+ ufs->regular_inode->i_flags |= S_NOATIME|S_NOCMTIME;
+ ufs->regular_inode->i_mode = S_IFREG;
+ ufs->regular_inode->i_op = &ovl_file_inode_operations;
+ ufs->regular_inode->i_fop = &ovl_file_operations;
+
+ ufs->special_inode->i_flags |= S_NOATIME|S_NOCMTIME;
+ ufs->special_inode->i_mode = S_IFSOCK;
+ ufs->special_inode->i_op = &ovl_file_inode_operations;
+ ufs->special_inode->i_fop = &ovl_file_operations;
+
+ root_inode = ovl_new_inode(sb, S_IFDIR);
+ if (!root_inode)
+ goto out_put_special_inode;
+
+ oe = ovl_alloc_entry();
+ if (oe == NULL)
+ goto out_put_root;
+
+ err = kern_path(config.upperdir, LOOKUP_FOLLOW, &upperpath);
+ if (err)
+ goto out_free_oe;
+
+ err = kern_path(config.lowerdir, LOOKUP_FOLLOW, &lowerpath);
+ if (err)
+ goto out_put_upperpath;
+
+ err = -ENOTDIR;
+ if (!S_ISDIR(upperpath.dentry->d_inode->i_mode) ||
+ !S_ISDIR(lowerpath.dentry->d_inode->i_mode))
+ goto out_put_lowerpath;
+
+ if (!(sb->s_flags & MS_RDONLY)) {
+ err = mnt_want_write(upperpath.mnt);
+ if (err)
+ goto out_put_lowerpath;
+ }
+
+ err = -ENOMEM;
+ root_dentry = d_alloc_root(root_inode);
+ if (!root_dentry)
+ goto out_drop_write;
+
+ ufs->upper_mnt = upperpath.mnt;
+ ufs->lower_mnt = lowerpath.mnt;
+
+ oe->__upperdentry = upperpath.dentry;
+ oe->lowerdentry = lowerpath.dentry;
+
+ root_dentry->d_fsdata = oe;
+ root_dentry->d_op = &ovl_dentry_operations;
+
+ sb->s_op = &ovl_super_operations;
+ sb->s_root = root_dentry;
+ sb->s_fs_info = ufs;
+
+ return 0;
+
+out_drop_write:
+ if (!(sb->s_flags & MS_RDONLY))
+ mnt_drop_write(upperpath.mnt);
+out_put_lowerpath:
+ path_put(&lowerpath);
+out_put_upperpath:
+ path_put(&upperpath);
+out_free_oe:
+ kfree(oe);
+out_put_root:
+ iput(root_inode);
+out_put_special_inode:
+ iput(ufs->special_inode);
+out_put_regular_inode:
+ iput(ufs->regular_inode);
+out_put_symlink_inode:
+ iput(ufs->symlink_inode);
+out_free_ufs:
+ kfree(ufs);
+out_free_config:
+ kfree(config.lowerdir);
+ kfree(config.upperdir);
+out:
+ return err;
+}
+
+static int ovl_get_sb(struct file_system_type *fs_type,
+ int flags, const char *dev_name,
+ void *raw_data, struct vfsmount *mnt)
+{
+ return get_sb_nodev(fs_type, flags, raw_data, ovl_fill_super, mnt);
+}
+
+static struct file_system_type ovl_fs_type = {
+ .owner = THIS_MODULE,
+ .name = "overlayfs",
+ .fs_flags = FS_RENAME_SELF_ALLOW,
+ .get_sb = ovl_get_sb,
+ .kill_sb = kill_anon_super,
+};
+
+static int __init ovl_init(void)
+{
+ return register_filesystem(&ovl_fs_type);
+}
+
+static void __exit ovl_exit(void)
+{
+ unregister_filesystem(&ovl_fs_type);
+}
+
+module_init(ovl_init);
+module_exit(ovl_exit);
Index: linux-2.6/fs/Kconfig
===================================================================
--- linux-2.6.orig/fs/Kconfig 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/Kconfig 2010-09-20 15:10:15.000000000 +0200
@@ -62,6 +62,7 @@ source "fs/quota/Kconfig"
source "fs/autofs/Kconfig"
source "fs/autofs4/Kconfig"
source "fs/fuse/Kconfig"
+source "fs/overlayfs/Kconfig"
config CUSE
tristate "Character device in Userspace support"
Index: linux-2.6/fs/Makefile
===================================================================
--- linux-2.6.orig/fs/Makefile 2010-09-20 12:33:25.000000000 +0200
+++ linux-2.6/fs/Makefile 2010-09-20 15:10:15.000000000 +0200
@@ -108,6 +108,7 @@ obj-$(CONFIG_AUTOFS_FS) += autofs/
obj-$(CONFIG_AUTOFS4_FS) += autofs4/
obj-$(CONFIG_ADFS_FS) += adfs/
obj-$(CONFIG_FUSE_FS) += fuse/
+obj-$(CONFIG_OVERLAYFS_FS) += overlayfs/
obj-$(CONFIG_UDF_FS) += udf/
obj-$(CONFIG_SUN_OPENPROMFS) += openpromfs/
obj-$(CONFIG_OMFS_FS) += omfs/
Index: linux-2.6/fs/overlayfs/Kconfig
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/fs/overlayfs/Kconfig 2010-09-20 15:10:15.000000000 +0200
@@ -0,0 +1,4 @@
+config OVERLAYFS_FS
+ tristate "Overlay filesystem support"
+ help
+ Add support for overlay filesystem.
Index: linux-2.6/fs/overlayfs/Makefile
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/fs/overlayfs/Makefile 2010-09-20 15:10:15.000000000 +0200
@@ -0,0 +1,5 @@
+#
+# Makefile for the overlay filesystem.
+#
+
+obj-$(CONFIG_OVERLAYFS_FS) += overlayfs.o
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 7/7 v3] overlay: overlay filesystem documentation
2010-09-20 18:04 [PATCH 0/7 v3] overlay filesystem prototype Miklos Szeredi
` (5 preceding siblings ...)
2010-09-20 18:04 ` [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype Miklos Szeredi
@ 2010-09-20 18:04 ` Miklos Szeredi
2010-09-21 1:31 ` [PATCH 0/7 v3] overlay filesystem prototype Neil Brown
7 siblings, 0 replies; 26+ messages in thread
From: Miklos Szeredi @ 2010-09-20 18:04 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel; +Cc: vaurora, neilb, viro
[-- Attachment #1: overlayfs-documentation.patch --]
[-- Type: text/plain, Size: 7744 bytes --]
From: Neil Brown <neilb@suse.de>
Document the overlay filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
Documentation/filesystems/overlayfs.txt | 163 ++++++++++++++++++++++++++++++++
1 file changed, 163 insertions(+)
Index: linux-2.6/Documentation/filesystems/overlayfs.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/Documentation/filesystems/overlayfs.txt 2010-09-20 15:17:15.000000000 +0200
@@ -0,0 +1,163 @@
+Written by: Neil Brown <neilb@suse.de>
+
+Overlay Filesystem
+==================
+
+This document describes a prototype for a new approach to providing
+overlay-filesystem functionality in Linux (sometimes referred to as
+union-filesystems). An overlay-filesystem tries to present a
+filesystem which is the result over overlaying one filesystem on top
+of the other.
+
+The result will inevitably fail to look exactly like a normal
+filesystem for various technical reasons. The expectation is that
+many use cases will be able to ignore these differences.
+
+This approach is 'hybrid' because the objects that appear in the
+filesystem do not all appear to belong to that filesystem. In many
+case an object accessed in the union will be indistinguishable
+from accessing the corresponding object from the original filesystem.
+This is most obvious from the 'st_dev' field returned by stat(2).
+
+While directories will report an st_dev for the overlay-filesystem,
+all non-directory objects will report an st_dev whichever of the
+'lower' or 'upper' filesystem that is providing the object. Similarly
+st_ino will only be unique when combined with st_dev, and both of
+these can change over the lifetime of a non-directory object. Many
+applications and tools ignore these values and will not be affected.
+
+Upper and Lower
+---------------
+
+An overlay filesystem combines two filesystems - an 'upper' filesystem
+and a 'lower' filesystem. When a name exists in both filesystems, the
+object in the 'upper' filesystem is visible while the object in the
+'lower' filesystem is either hidden or, in the case of directories,
+merged with the 'upper' object.
+
+It would be more correct to refer to an upper and lower 'directory
+tree' rather than 'filesystem' as it is quite possible for both
+directory trees to be in the same filesystem and there is no
+requirement that the root of a filesystem be given for either upper or
+lower.
+
+The lower filesystem can be any filesystem supported by Linux and does
+not need to be writable. Theoretically it could even be another
+overlayfs, but this is not yet supported. The upper filesystem will
+normally be writeable and if it is it must support the creation of
+trusted.* extended attributes, and must provide valid d_type in
+readdir responses, at least for symbolic links - so NFS is not
+suitable.
+
+A read-only overlay of two read-only filesystems may use any
+filesystem type.
+
+Directories
+-----------
+
+Overlaying mainly involved directories. If a given name appears in both
+upper and lower filesystems and refers to a non-directory in either,
+then the lower object is hidden - the name refers only to the upper
+object.
+
+Where both upper and lower objects are directories, a merged directory
+is formed.
+
+At mount time, the two directories given as mount options are combined
+into a merged directory. Then whenever a lookup is requested in such
+a merged directory, the lookup is performed in each actual directory
+and the combined result is cached in the dentry belonging to the overlay
+filesystem. If both actual lookups find directories, both are stored
+and a merged directory is created, otherwise only one is stored: the
+upper if it exists, else the lower.
+
+Only the lists of names from directories are merged. Other content
+such as metadata and extended attributes are reported for the upper
+directory only. These attributes of the lower directory are hidden.
+
+whiteouts and opaque directories
+--------------------------------
+
+In order to support rm and rmdir without changing the lower
+filesystem, an overlay filesystem needs to record in the upper filesystem
+that files have been removed. This is done using whiteouts and opaque
+directories (non-directories are always opaque).
+
+The overlay filesystem uses extended attributes with a
+"trusted.overlay." prefix to record these details.
+
+A whiteout is created as a symbolic link with target
+"(overlay-whiteout)" and with xattr "trusted.overlay.whiteout" set to "y".
+When a whiteout is found in the upper level of a merged directory, any
+matching name in the lower level is ignored, and the whiteout itself
+is also hidden.
+
+A directory is made opaque by setting the xattr "trusted.overlay.opaque"
+to "y". Where the upper filesystem contains an opaque directory, any
+directory in the lower filesystem with the same name is ignored.
+
+readdir
+-------
+
+When a 'readdir' request is made on a merged directory, the upper and
+lower directories are each read and the name lists merged in the
+obvious way (upper is read first, then lower - entries that already
+exist are not re-added). This merged name list is cached in the
+'struct file' and so remains as long as the file is kept open. If the
+directory is opened and read by two processes at the same time, they
+will each have separate caches. A seekdir to the start of the
+directory (offset 0) followed by a readdir will cause the cache to be
+discarded and rebuilt.
+
+This means that changes to the merged directory do not appear while a
+directory is being read. This is unlikely to be noticed by many
+programs.
+
+seek offsets are assigned sequentially when the directories are read.
+Thus if
+ - read part of a directory
+ - remember an offset, and close the directory
+ - re-open the directory some time later
+ - seek to the remembered offset
+
+there may be little correlation between the old and new locations in
+the list of filenames, particularly if anything has changed in the
+directory.
+
+Readdir on directories that are not merged is simply handled by the
+underlying directory (upper or lower).
+
+
+Non-directories
+---------------
+
+Objects that are not directories (files, symlinks, device-special
+files etc) are presented either from the upper or lower filesystem as
+appropriate. When a file in the lower filesystem is accessed in a way
+the requires write-access; such as opening for write access, changing
+some metadata etc, the file is first copied from the lower filesystem
+to the upper filesystem (copy_up). Note that creating a hard-link
+also requires copy-up, though of course creation of a symlink does
+not.
+
+The copy_up process first makes sure that the containing directory
+exists in the upper filesystem - creating it and any parents as
+necessary. It then creates the object with the same metadata (owner,
+mode, mtime, symlink-target etc) and then if the object is a file, the
+data is copied from the lower to the upper filesystem. Finally any
+extended attributes are copied up.
+
+Once the copy_up is complete, the overlay filesystem simply
+provides direct access to the newly created file in the upper
+filesystem - future operations on the file are barely noticed by the
+overlay filesystem (though an operation on the name of the file such as
+rename or unlink will of course be noticed and handled).
+
+Changes to underlying filesystems
+---------------------------------
+
+Offline changes, when the overlay is not mounted, are allowed to either
+the upper or the lower trees.
+
+Changes to the underlying filesystems while part of a mounted overlay
+filesystem are not allowed. This is not yet enforced, but will be in
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/7 v3] overlay filesystem prototype
2010-09-20 18:04 [PATCH 0/7 v3] overlay filesystem prototype Miklos Szeredi
` (6 preceding siblings ...)
2010-09-20 18:04 ` [PATCH 7/7 v3] overlay: overlay filesystem documentation Miklos Szeredi
@ 2010-09-21 1:31 ` Neil Brown
2010-09-22 9:50 ` Miklos Szeredi
7 siblings, 1 reply; 26+ messages in thread
From: Neil Brown @ 2010-09-21 1:31 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, vaurora, viro
On Mon, 20 Sep 2010 20:04:04 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:
> Here's an updated patch series.
>
> For now I reverted Neil's revalidation patch. Not requiring strict
> read-only would make sense for just trying it out and experimenting.
> But for real uses, I'm not sure...
:-)
I think you significantly reduce the value by insisting on read-only but as
this is purely a theoretical perspective at the moment (I have no concrete
use-case) I won't push it.
I had another patch I was working on which caused overlayfs to keep negative
dentries in upperdentry or lowerdentry rather than just setting them to
NULL. This would allow revalidation to notice objects appearing in the
underlying filesystem. I guess you won't want that now .... I think it made
some of the code a bit neater, but I never finished it so I cannot be sure of
the overall effect.
I'm curious as to why upperdentry is now called __upperdentry - it isn't
clear from a quick reading..
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/7 v3] overlay filesystem prototype
2010-09-21 1:31 ` [PATCH 0/7 v3] overlay filesystem prototype Neil Brown
@ 2010-09-22 9:50 ` Miklos Szeredi
0 siblings, 0 replies; 26+ messages in thread
From: Miklos Szeredi @ 2010-09-22 9:50 UTC (permalink / raw)
To: Neil Brown; +Cc: miklos, linux-fsdevel, linux-kernel, vaurora, viro
On Tue, 21 Sep 2010, Neil Brown wrote:
> On Mon, 20 Sep 2010 20:04:04 +0200
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> > Here's an updated patch series.
> >
> > For now I reverted Neil's revalidation patch. Not requiring strict
> > read-only would make sense for just trying it out and experimenting.
> > But for real uses, I'm not sure...
>
> :-)
>
> I think you significantly reduce the value by insisting on read-only but as
> this is purely a theoretical perspective at the moment (I have no concrete
> use-case) I won't push it.
I'm not insisting on read-only. I think there's value to enabling
external modification in special circumstances, but not by default.
I also think it might not be a good idea to waste CPU cycles and brain
cycles on implementing (an always imperfect) revalidation. A similar
mechanism is "mount -oremount mountpoint", which simply throws out all
unused dentries, effectively forcing a revalidation of the whole
filesystem. Less selective than ->d_revalidate(), but it shouldn't be
a big issue since the underlying filesystem dentries remain cached.
> I had another patch I was working on which caused overlayfs to keep negative
> dentries in upperdentry or lowerdentry rather than just setting them to
> NULL. This would allow revalidation to notice objects appearing in the
> underlying filesystem. I guess you won't want that now .... I think it made
> some of the code a bit neater, but I never finished it so I cannot be sure of
> the overall effect.
>
> I'm curious as to why upperdentry is now called __upperdentry - it isn't
> clear from a quick reading..
(I need to add more comments to the code...)
The underscores are meant to imply that unlinke ->lowerdentry it's not
qute safe to access directly. See discussion about memory barriers:
http://lkml.org/lkml/2010/9/19/142
It's not as clear as one might like. To sum up:
- an smp_read_barrier_depends() is needed on SMP Alpha, but not on
other archs
- an ACCESS_ONCE() is needed on some theoretical compiler that does
weird optimizations, but apparently not on current versions of gcc
The VFS lives happily without either, locklessly accessing
dentry->d_inode.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-20 18:04 ` [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype Miklos Szeredi
@ 2010-09-22 23:21 ` Valerie Aurora
2010-09-24 14:33 ` Jens Axboe
2010-09-24 17:56 ` Valerie Aurora
1 sibling, 1 reply; 26+ messages in thread
From: Valerie Aurora @ 2010-09-22 23:21 UTC (permalink / raw)
To: Miklos Szeredi, Jens Axboe; +Cc: linux-fsdevel, linux-kernel, neilb, viro
On Mon, Sep 20, 2010 at 08:04:10PM +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
>
> This overlay filesystem is a hybrid of entirely filesystem based
> (unionfs, aufs) and entierly VFS based (union mounts) solutions.
[...]
> +static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
> +{
> + struct file *old_file;
> + struct file *new_file;
> + int error = 0;
> +
> + if (len == 0)
> + return 0;
> +
> + old_file = path_open(old, O_RDONLY);
> + if (IS_ERR(old_file))
> + return PTR_ERR(old_file);
> +
> + new_file = path_open(new, O_WRONLY);
> + if (IS_ERR(new_file)) {
> + error = PTR_ERR(new_file);
> + goto out_fput;
> + }
> +
> + /* FIXME: copy up sparse files efficiently */
> + while (len) {
> + loff_t offset = new_file->f_pos;
> + size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
> + long bytes;
> +
> + if (len < this_len)
> + this_len = len;
> +
> + if (signal_pending_state(TASK_KILLABLE, current))
> + return -EINTR;
> +
> + bytes = do_splice_direct(old_file, &offset, new_file, this_len,
> + SPLICE_F_MOVE);
Interruptible copyup is good. But it looks like splice setup is kind
of heavyweight and we should do it as seldom as possible.
What about implementing splice flag SPLICE_F_INTERRUPTIBLE instead?
-VAL
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 5/7 v3] vfs: fix possible use after free in finish_open()
2010-09-20 18:04 ` [PATCH 5/7 v3] vfs: fix possible use after free in finish_open() Miklos Szeredi
@ 2010-09-23 20:19 ` Valerie Aurora
0 siblings, 0 replies; 26+ messages in thread
From: Valerie Aurora @ 2010-09-23 20:19 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, neilb, viro, stable
On Mon, Sep 20, 2010 at 08:04:09PM +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
>
> If open(O_TRUNC) is called and the actual open fails, then nd->path
> will be released by nameidata_to_filp(). If this races with an
> unmount then mnt_drop_write() can Oops.
>
> Fix by acquiring a ref to nd->path and releasing after
> mnt_drop_write().
>
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> CC: stable@kernel.org
> ---
> fs/namei.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/fs/namei.c
> ===================================================================
> --- linux-2.6.orig/fs/namei.c 2010-09-20 13:32:35.000000000 +0200
> +++ linux-2.6/fs/namei.c 2010-09-20 13:33:14.000000000 +0200
> @@ -1559,6 +1559,11 @@ static struct file *finish_open(struct n
> mnt_drop_write(nd->path.mnt);
> goto exit;
> }
> + if (will_truncate) {
> + /* nameidata_to_filp() puts nd->path! */
> + path_get(&nd->path);
> + }
> +
> filp = nameidata_to_filp(nd);
> if (!IS_ERR(filp)) {
> error = ima_file_check(filp, acc_mode);
> @@ -1581,8 +1586,10 @@ static struct file *finish_open(struct n
> * because the filp has had a write taken
> * on its behalf.
> */
> - if (will_truncate)
> + if (will_truncate) {
> mnt_drop_write(nd->path.mnt);
> + path_put(&nd->path);
> + }
> return filp;
>
> exit:
>
Nice catch!
Reviewed-by: Valerie Aurora <vaurora@redhat.com>
-VAL
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/7 v3] vfs: add flag to allow rename to same inode
2010-09-20 18:04 ` [PATCH 3/7 v3] vfs: add flag to allow rename to same inode Miklos Szeredi
@ 2010-09-23 22:04 ` Valerie Aurora
0 siblings, 0 replies; 26+ messages in thread
From: Valerie Aurora @ 2010-09-23 22:04 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, neilb, viro
On Mon, Sep 20, 2010 at 08:04:07PM +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
>
> The overlay filesystem uses dummy inodes for non-directories. Allow
> rename to work in this case despite the inode being the same.
>
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> ---
> fs/namei.c | 4 +++-
> include/linux/fs.h | 1 +
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h 2010-08-25 14:19:34.000000000 +0200
> +++ linux-2.6/include/linux/fs.h 2010-08-25 14:19:53.000000000 +0200
> @@ -179,6 +179,7 @@ struct inodes_stat_t {
> #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move()
> * during rename() internally.
> */
> +#define FS_RENAME_SELF_ALLOW 65536 /* Allow rename to same inode */
>
> /*
> * These are the fs-independent mount-flags: up to 32 flags are supported
> Index: linux-2.6/fs/namei.c
> ===================================================================
> --- linux-2.6.orig/fs/namei.c 2010-08-25 10:19:53.000000000 +0200
> +++ linux-2.6/fs/namei.c 2010-08-25 14:22:56.000000000 +0200
> @@ -2620,8 +2620,10 @@ int vfs_rename(struct inode *old_dir, st
> int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
> const unsigned char *old_name;
>
> - if (old_dentry->d_inode == new_dentry->d_inode)
> + if (old_dentry->d_inode == new_dentry->d_inode &&
> + !(old_dir->i_sb->s_type->fs_flags & FS_RENAME_SELF_ALLOW)) {
> return 0;
> + }
>
> error = may_delete(old_dir, old_dentry, is_dir);
> if (error)
>
Perhaps a note in the commit message to say that the (inode == inode)
check on the lower layer is done when overlayfs calls vfs_rename() on
the targets?
What other issues arise with dummy inodes?
-VAL
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-22 23:21 ` Valerie Aurora
@ 2010-09-24 14:33 ` Jens Axboe
2010-09-24 17:16 ` Valerie Aurora
0 siblings, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2010-09-24 14:33 UTC (permalink / raw)
To: Valerie Aurora; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, neilb, viro
On 2010-09-23 01:21, Valerie Aurora wrote:
> On Mon, Sep 20, 2010 at 08:04:10PM +0200, Miklos Szeredi wrote:
>> From: Miklos Szeredi <mszeredi@suse.cz>
>>
>> This overlay filesystem is a hybrid of entirely filesystem based
>> (unionfs, aufs) and entierly VFS based (union mounts) solutions.
>
> [...]
>
>> +static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
>> +{
>> + struct file *old_file;
>> + struct file *new_file;
>> + int error = 0;
>> +
>> + if (len == 0)
>> + return 0;
>> +
>> + old_file = path_open(old, O_RDONLY);
>> + if (IS_ERR(old_file))
>> + return PTR_ERR(old_file);
>> +
>> + new_file = path_open(new, O_WRONLY);
>> + if (IS_ERR(new_file)) {
>> + error = PTR_ERR(new_file);
>> + goto out_fput;
>> + }
>> +
>> + /* FIXME: copy up sparse files efficiently */
>> + while (len) {
>> + loff_t offset = new_file->f_pos;
>> + size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
>> + long bytes;
>> +
>> + if (len < this_len)
>> + this_len = len;
>> +
>> + if (signal_pending_state(TASK_KILLABLE, current))
>> + return -EINTR;
>> +
>> + bytes = do_splice_direct(old_file, &offset, new_file, this_len,
>> + SPLICE_F_MOVE);
>
> Interruptible copyup is good. But it looks like splice setup is kind
> of heavyweight and we should do it as seldom as possible.
>
> What about implementing splice flag SPLICE_F_INTERRUPTIBLE instead?
The pipe alloc and such? That is lazily done and sticks around.
--
Jens Axboe
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-24 14:33 ` Jens Axboe
@ 2010-09-24 17:16 ` Valerie Aurora
0 siblings, 0 replies; 26+ messages in thread
From: Valerie Aurora @ 2010-09-24 17:16 UTC (permalink / raw)
To: Jens Axboe; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, neilb, viro
On Fri, Sep 24, 2010 at 04:33:05PM +0200, Jens Axboe wrote:
> On 2010-09-23 01:21, Valerie Aurora wrote:
> > On Mon, Sep 20, 2010 at 08:04:10PM +0200, Miklos Szeredi wrote:
> >> From: Miklos Szeredi <mszeredi@suse.cz>
> >>
> >> This overlay filesystem is a hybrid of entirely filesystem based
> >> (unionfs, aufs) and entierly VFS based (union mounts) solutions.
> >
> > [...]
> >
> >> +static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
> >> +{
> >> + struct file *old_file;
> >> + struct file *new_file;
> >> + int error = 0;
> >> +
> >> + if (len == 0)
> >> + return 0;
> >> +
> >> + old_file = path_open(old, O_RDONLY);
> >> + if (IS_ERR(old_file))
> >> + return PTR_ERR(old_file);
> >> +
> >> + new_file = path_open(new, O_WRONLY);
> >> + if (IS_ERR(new_file)) {
> >> + error = PTR_ERR(new_file);
> >> + goto out_fput;
> >> + }
> >> +
> >> + /* FIXME: copy up sparse files efficiently */
> >> + while (len) {
> >> + loff_t offset = new_file->f_pos;
> >> + size_t this_len = OVL_COPY_UP_CHUNK_SIZE;
> >> + long bytes;
> >> +
> >> + if (len < this_len)
> >> + this_len = len;
> >> +
> >> + if (signal_pending_state(TASK_KILLABLE, current))
> >> + return -EINTR;
> >> +
> >> + bytes = do_splice_direct(old_file, &offset, new_file, this_len,
> >> + SPLICE_F_MOVE);
> >
> > Interruptible copyup is good. But it looks like splice setup is kind
> > of heavyweight and we should do it as seldom as possible.
> >
> > What about implementing splice flag SPLICE_F_INTERRUPTIBLE instead?
>
> The pipe alloc and such? That is lazily done and sticks around.
Thanks! So this looks like a good way to implement interruptible
in-kernel file copyup?
-VAL
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-20 18:04 ` [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype Miklos Szeredi
2010-09-22 23:21 ` Valerie Aurora
@ 2010-09-24 17:56 ` Valerie Aurora
2010-09-27 8:11 ` Miklos Szeredi
1 sibling, 1 reply; 26+ messages in thread
From: Valerie Aurora @ 2010-09-24 17:56 UTC (permalink / raw)
To: Miklos Szeredi, Andreas Gruenbacher, alias ram Ram Pai
Cc: linux-fsdevel, linux-kernel, neilb, viro
On Mon, Sep 20, 2010 at 08:04:10PM +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
>
> This overlay filesystem is a hybrid of entirely filesystem based
> (unionfs, aufs) and entierly VFS based (union mounts) solutions.
[...]
> +static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev,
> + const char *link)
> +{
> + int err;
> + struct dentry *newdentry;
> + struct dentry *upperdir;
> + struct inode *inode;
> + struct kstat stat = {
> + .mode = mode,
> + .rdev = rdev,
> + };
> +
> + err = -ENOMEM;
> + inode = ovl_new_inode(dentry->d_sb, mode);
> + if (!inode)
> + goto out;
> +
> + err = ovl_copy_up(dentry->d_parent);
> + if (err)
> + goto out_iput;
> +
> + upperdir = ovl_dentry_upper(dentry->d_parent);
> + mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
> +
> + newdentry = ovl_upper_create(upperdir, dentry, &stat, link);
> + err = PTR_ERR(newdentry);
> + if (IS_ERR(newdentry))
> + goto out_unlock;
> +
> + if (ovl_dentry_is_opaque(dentry) && S_ISDIR(mode)) {
> + err = ovl_set_opaque(newdentry);
> + if (err)
> + goto out_dput;
> + }
Andreas Gruenbacher just convinced me that every single new directory
created in the unioned file system should be marked opaque. "New"
means either it replaces a whiteout or has no matching directory on
the lower layer. The theory is that the topmost file system changes
should take precedence and override any changes (off-line) in the
lower file system.
What do you think?
-VAL
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-24 17:56 ` Valerie Aurora
@ 2010-09-27 8:11 ` Miklos Szeredi
2010-09-27 11:49 ` Andreas Gruenbacher
2010-09-27 18:47 ` Valerie Aurora
0 siblings, 2 replies; 26+ messages in thread
From: Miklos Szeredi @ 2010-09-27 8:11 UTC (permalink / raw)
To: Valerie Aurora
Cc: miklos, agruen, linuxram, linux-fsdevel, linux-kernel, neilb,
viro
On Fri, 24 Sep 2010, Valerie Aurora wrote:
> On Mon, Sep 20, 2010 at 08:04:10PM +0200, Miklos Szeredi wrote:
> > From: Miklos Szeredi <mszeredi@suse.cz>
> >
> > This overlay filesystem is a hybrid of entirely filesystem based
> > (unionfs, aufs) and entierly VFS based (union mounts) solutions.
>
> [...]
>
> > +static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev,
> > + const char *link)
> > +{
> > + int err;
> > + struct dentry *newdentry;
> > + struct dentry *upperdir;
> > + struct inode *inode;
> > + struct kstat stat = {
> > + .mode = mode,
> > + .rdev = rdev,
> > + };
> > +
> > + err = -ENOMEM;
> > + inode = ovl_new_inode(dentry->d_sb, mode);
> > + if (!inode)
> > + goto out;
> > +
> > + err = ovl_copy_up(dentry->d_parent);
> > + if (err)
> > + goto out_iput;
> > +
> > + upperdir = ovl_dentry_upper(dentry->d_parent);
> > + mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
> > +
> > + newdentry = ovl_upper_create(upperdir, dentry, &stat, link);
> > + err = PTR_ERR(newdentry);
> > + if (IS_ERR(newdentry))
> > + goto out_unlock;
> > +
> > + if (ovl_dentry_is_opaque(dentry) && S_ISDIR(mode)) {
> > + err = ovl_set_opaque(newdentry);
> > + if (err)
> > + goto out_dput;
> > + }
>
> Andreas Gruenbacher just convinced me that every single new directory
> created in the unioned file system should be marked opaque. "New"
> means either it replaces a whiteout or has no matching directory on
> the lower layer. The theory is that the topmost file system changes
> should take precedence and override any changes (off-line) in the
> lower file system.
That's logical. However marking new directories opaque is only a half
solution. E.g. consider the case when we have /a/b/c/ on the lower fs
and /a/b/ on the upper, which is not opaque. Then /a/b/c/ is created
on the upper fs off-line. The union logic can't notice that "c"
should really be opaque and will merge with the contents of the lower
layer.
The real solution to this problem is to make opaque the default and
only mark *non* opaque directories. These are only created on copy-up
or by explicit admin action on the upper fs.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-27 8:11 ` Miklos Szeredi
@ 2010-09-27 11:49 ` Andreas Gruenbacher
2010-09-27 12:15 ` J. R. Okajima
2010-09-27 18:47 ` Valerie Aurora
1 sibling, 1 reply; 26+ messages in thread
From: Andreas Gruenbacher @ 2010-09-27 11:49 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Valerie Aurora, linuxram, linux-fsdevel, linux-kernel, neilb,
viro
On Monday 27 September 2010 10:11:53 Miklos Szeredi wrote:
> [...] marking new directories opaque is only a half solution.
> E.g. consider the case when we have /a/b/c/ on the lower fs
> and /a/b/ on the upper, which is not opaque. Then /a/b/c/ is created
> on the upper fs off-line. The union logic can't notice that "c"
> should really be opaque and will merge with the contents of the lower
> layer.
I can think of arguments for either behavior, perhaps with a slight preference
for your suggestion (default = opaque).
In any case, admins will need a way to flip opaque flags and remove undesired
whiteouts.
Thanks,
Andreas
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-27 11:49 ` Andreas Gruenbacher
@ 2010-09-27 12:15 ` J. R. Okajima
0 siblings, 0 replies; 26+ messages in thread
From: J. R. Okajima @ 2010-09-27 12:15 UTC (permalink / raw)
To: Andreas Gruenbacher
Cc: Miklos Szeredi, Valerie Aurora, linuxram, linux-fsdevel,
linux-kernel, neilb, viro
Andreas Gruenbacher:
> I can think of arguments for either behavior, perhaps with a slight preference
> for your suggestion (default = opaque).
>
> In any case, admins will need a way to flip opaque flags and remove undesired
> whiteouts.
Agreed.
Aufs is providing two options for this issue, "diropq=always" and
"diropq=whiteouted". Several years ago, the default was "always". But
soon it changed to "whiteouted" and I have never received objection
from users.
As for as I know, some users are happy since the number of whiteouts was
reduced. Because they merge layers manually for system
maintenance. Unnecessary whiteouts are not good to them.
J. R. Okajima
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-27 8:11 ` Miklos Szeredi
2010-09-27 11:49 ` Andreas Gruenbacher
@ 2010-09-27 18:47 ` Valerie Aurora
2010-09-28 8:24 ` Andreas Gruenbacher
1 sibling, 1 reply; 26+ messages in thread
From: Valerie Aurora @ 2010-09-27 18:47 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: agruen, linuxram, linux-fsdevel, linux-kernel, neilb, viro
On Mon, Sep 27, 2010 at 10:11:53AM +0200, Miklos Szeredi wrote:
> On Fri, 24 Sep 2010, Valerie Aurora wrote:
> > On Mon, Sep 20, 2010 at 08:04:10PM +0200, Miklos Szeredi wrote:
> > > From: Miklos Szeredi <mszeredi@suse.cz>
> > >
> > > This overlay filesystem is a hybrid of entirely filesystem based
> > > (unionfs, aufs) and entierly VFS based (union mounts) solutions.
> >
> > [...]
> >
> > > +static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev,
> > > + const char *link)
> > > +{
> > > + int err;
> > > + struct dentry *newdentry;
> > > + struct dentry *upperdir;
> > > + struct inode *inode;
> > > + struct kstat stat = {
> > > + .mode = mode,
> > > + .rdev = rdev,
> > > + };
> > > +
> > > + err = -ENOMEM;
> > > + inode = ovl_new_inode(dentry->d_sb, mode);
> > > + if (!inode)
> > > + goto out;
> > > +
> > > + err = ovl_copy_up(dentry->d_parent);
> > > + if (err)
> > > + goto out_iput;
> > > +
> > > + upperdir = ovl_dentry_upper(dentry->d_parent);
> > > + mutex_lock_nested(&upperdir->d_inode->i_mutex, I_MUTEX_PARENT);
> > > +
> > > + newdentry = ovl_upper_create(upperdir, dentry, &stat, link);
> > > + err = PTR_ERR(newdentry);
> > > + if (IS_ERR(newdentry))
> > > + goto out_unlock;
> > > +
> > > + if (ovl_dentry_is_opaque(dentry) && S_ISDIR(mode)) {
> > > + err = ovl_set_opaque(newdentry);
> > > + if (err)
> > > + goto out_dput;
> > > + }
> >
> > Andreas Gruenbacher just convinced me that every single new directory
> > created in the unioned file system should be marked opaque. "New"
> > means either it replaces a whiteout or has no matching directory on
> > the lower layer. The theory is that the topmost file system changes
> > should take precedence and override any changes (off-line) in the
> > lower file system.
>
> That's logical. However marking new directories opaque is only a half
> solution. E.g. consider the case when we have /a/b/c/ on the lower fs
> and /a/b/ on the upper, which is not opaque. Then /a/b/c/ is created
> on the upper fs off-line. The union logic can't notice that "c"
> should really be opaque and will merge with the contents of the lower
> layer.
Maybe I don't understand. It seems like directories created when the
file system is *not* union mounted should definitely be merged with
matching directories on the lower layer.
Take the case of /etc/fstab. The first union mount never touches /etc
and it doesn't exist on the topmost layer. Then we unmount the upper
layer, mount it somewhere else as a plain mount, and create /etc/ and
/etc/fstab. When we union mount it back over the lower layer again,
we still want the lower layer /etc/ to be merged with the topmost
/etc/, or else init.d will disappear.
However, if while the file system is union mounted, /etc/ doesn't
exist, and /etc/ is created, a later mount shouldn't merge a newly
created /etc/ on the lower layer.
> The real solution to this problem is to make opaque the default and
> only mark *non* opaque directories. These are only created on copy-up
> or by explicit admin action on the upper fs.
Again, maybe I'm misunderstanding, but this doesn't make much sense to
me. Say I create:
/upper/a_dir/upper_file
/lower/a_dir/lower_file
Then when I union mount them, I want a_dir/ to be transparent
automatically and show both upper_file and lower_file, without marking
it manually.
-VAL
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-27 18:47 ` Valerie Aurora
@ 2010-09-28 8:24 ` Andreas Gruenbacher
2010-09-30 21:51 ` Valerie Aurora
0 siblings, 1 reply; 26+ messages in thread
From: Andreas Gruenbacher @ 2010-09-28 8:24 UTC (permalink / raw)
To: Valerie Aurora
Cc: Miklos Szeredi, linuxram, linux-fsdevel, linux-kernel, neilb,
viro
On Monday 27 September 2010 20:47:47 Valerie Aurora wrote:
> Maybe I don't understand. It seems like directories created when the
> file system is *not* union mounted should definitely be merged with
> matching directories on the lower layer.
>
> Take the case of /etc/fstab. The first union mount never touches /etc
> and it doesn't exist on the topmost layer. Then we unmount the upper
> layer, mount it somewhere else as a plain mount, and create /etc/ and
> /etc/fstab. When we union mount it back over the lower layer again,
> we still want the lower layer /etc/ to be merged with the topmost
> /etc/, or else init.d will disappear.
I can't think of a reason why the upper layer would really *need* to be
modified separately as in this example though, and I'm sure that examples for
opaqueness by default can be constructed as well. Transparency comes at a
cost though (lookup, readdir, whiteouts), and defaulting to opaque directories
will be more efficient in some cases. This is why I think that opaqueness by
default is preferable.
> Again, maybe I'm misunderstanding, but this doesn't make much sense to
> me. Say I create:
>
> /upper/a_dir/upper_file
> /lower/a_dir/lower_file
>
> Then when I union mount them, I want a_dir/ to be transparent
> automatically and show both upper_file and lower_file, without marking
> it manually.
Why?
Thanks,
Andreas
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-28 8:24 ` Andreas Gruenbacher
@ 2010-09-30 21:51 ` Valerie Aurora
2010-10-01 9:34 ` Andreas Gruenbacher
0 siblings, 1 reply; 26+ messages in thread
From: Valerie Aurora @ 2010-09-30 21:51 UTC (permalink / raw)
To: Andreas Gruenbacher
Cc: Miklos Szeredi, linuxram, linux-fsdevel, linux-kernel, neilb,
viro
On Tue, Sep 28, 2010 at 10:24:59AM +0200, Andreas Gruenbacher wrote:
> On Monday 27 September 2010 20:47:47 Valerie Aurora wrote:
> > Maybe I don't understand. It seems like directories created when the
> > file system is *not* union mounted should definitely be merged with
> > matching directories on the lower layer.
> >
> > Take the case of /etc/fstab. The first union mount never touches /etc
> > and it doesn't exist on the topmost layer. Then we unmount the upper
> > layer, mount it somewhere else as a plain mount, and create /etc/ and
> > /etc/fstab. When we union mount it back over the lower layer again,
> > we still want the lower layer /etc/ to be merged with the topmost
> > /etc/, or else init.d will disappear.
>
> I can't think of a reason why the upper layer would really *need* to be
> modified separately as in this example though, and I'm sure that examples for
> opaqueness by default can be constructed as well. Transparency comes at a
> cost though (lookup, readdir, whiteouts), and defaulting to opaque directories
> will be more efficient in some cases. This is why I think that opaqueness by
> default is preferable.
I agree with that for directories created while it is union mounted.
> > Again, maybe I'm misunderstanding, but this doesn't make much sense to
> > me. Say I create:
> >
> > /upper/a_dir/upper_file
> > /lower/a_dir/lower_file
> >
> > Then when I union mount them, I want a_dir/ to be transparent
> > automatically and show both upper_file and lower_file, without marking
> > it manually.
>
> Why?
Hm, this was a pretty basic assumption for me - that you'd want to
construct a topmost image offline that would be "merged" with the
lower layers. So, for example:
Topmost layer contains:
/etc/hostname
Lower layers contain everything else in /etc/. So /etc/ would exist
on the topmost layer at the time of union mount, but we would want it
to be transparent. But if we created a new dir *during* the union
mount, it would be opaque.
What was your model?
-VAL
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-09-30 21:51 ` Valerie Aurora
@ 2010-10-01 9:34 ` Andreas Gruenbacher
2010-10-06 17:31 ` Valerie Aurora
0 siblings, 1 reply; 26+ messages in thread
From: Andreas Gruenbacher @ 2010-10-01 9:34 UTC (permalink / raw)
To: Valerie Aurora
Cc: Miklos Szeredi, linuxram, linux-fsdevel, linux-kernel, neilb,
viro
On Thursday 30 September 2010 23:51:15 Valerie Aurora wrote:
> On Tue, Sep 28, 2010 at 10:24:59AM +0200, Andreas Gruenbacher wrote:
> > On Monday 27 September 2010 20:47:47 Valerie Aurora wrote:
> > > Again, maybe I'm misunderstanding, but this doesn't make much sense to
> > > me. Say I create:
> > >
> > > /upper/a_dir/upper_file
> > > /lower/a_dir/lower_file
> > >
> > > Then when I union mount them, I want a_dir/ to be transparent
> > > automatically and show both upper_file and lower_file, without marking
> > > it manually.
> >
> > Why?
>
> Hm, this was a pretty basic assumption for me - that you'd want to
> construct a topmost image offline that would be "merged" with the
> lower layers. So, for example:
>
> Topmost layer contains:
>
> /etc/hostname
>
> Lower layers contain everything else in /etc/. So /etc/ would exist
> on the topmost layer at the time of union mount, but we would want it
> to be transparent. But if we created a new dir *during* the union
> mount, it would be opaque.
>
> What was your model?
The prevalent use case probably is to start out with an empty topmost layer on
top of an existing file system. When things are modified, changes obviously
go into the topmost layer. Additional layers can later be stacked on top of
that, turning the previous topmost layer into a read-only lower layer.
Overlaying preexisting file systems doesn't seem that important; users
commonly should be able to start out with an empty topmost layer instead. To
also cover the less common cases, there should be a way to convert directories
in a union from opaque to transparent and back though, just like there should
be a way to get rid of a whiteout.
Makes sense?
Thanks,
Andreas
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-10-01 9:34 ` Andreas Gruenbacher
@ 2010-10-06 17:31 ` Valerie Aurora
2010-10-11 9:41 ` Michal Suchanek
2010-10-11 13:51 ` Scott James Remnant
0 siblings, 2 replies; 26+ messages in thread
From: Valerie Aurora @ 2010-10-06 17:31 UTC (permalink / raw)
To: Andreas Gruenbacher, Michal Suchanek, Andy Whitcroft,
Scott James Remnant, Vladimir Dronnikov <dronni
Cc: Miklos Szeredi, linuxram, linux-fsdevel, linux-kernel, neilb,
viro
On Fri, Oct 01, 2010 at 11:34:57AM +0200, Andreas Gruenbacher wrote:
> On Thursday 30 September 2010 23:51:15 Valerie Aurora wrote:
> >
> > Hm, this was a pretty basic assumption for me - that you'd want to
> > construct a topmost image offline that would be "merged" with the
> > lower layers. So, for example:
> >
> > Topmost layer contains:
> >
> > /etc/hostname
> >
> > Lower layers contain everything else in /etc/. So /etc/ would exist
> > on the topmost layer at the time of union mount, but we would want it
> > to be transparent. But if we created a new dir *during* the union
> > mount, it would be opaque.
> >
> > What was your model?
>
> The prevalent use case probably is to start out with an empty topmost layer on
> top of an existing file system. When things are modified, changes obviously
> go into the topmost layer. Additional layers can later be stacked on top of
> that, turning the previous topmost layer into a read-only lower layer.
>
> Overlaying preexisting file systems doesn't seem that important; users
> commonly should be able to start out with an empty topmost layer instead. To
Okay, that surprises me. Let me check my assumptions. I cc'd several
people who seem to be actively using unionfs or aufs in ways that we
want union mounts to replace. Do you start out with an empty topmost
file system in most cases? Or do you prepopulate with some files in
dirs you want to be transparent?
> also cover the less common cases, there should be a way to convert directories
> in a union from opaque to transparent and back though, just like there should
> be a way to get rid of a whiteout.
Conversion is a requirement, yes.
-VAL
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-10-06 17:31 ` Valerie Aurora
@ 2010-10-11 9:41 ` Michal Suchanek
2010-10-11 13:51 ` Scott James Remnant
1 sibling, 0 replies; 26+ messages in thread
From: Michal Suchanek @ 2010-10-11 9:41 UTC (permalink / raw)
To: Valerie Aurora
Cc: Andreas Gruenbacher, Andy Whitcroft, Scott James Remnant,
Vladimir Dronnikov, Felix Fietkau, Miklos Szeredi, linuxram,
linux-fsdevel, linux-kernel, neilb, viro
On 6 October 2010 19:31, Valerie Aurora <vaurora@redhat.com> wrote:
> On Fri, Oct 01, 2010 at 11:34:57AM +0200, Andreas Gruenbacher wrote:
>> On Thursday 30 September 2010 23:51:15 Valerie Aurora wrote:
>> >
>> > Hm, this was a pretty basic assumption for me - that you'd want to
>> > construct a topmost image offline that would be "merged" with the
>> > lower layers. So, for example:
>> >
>> > Topmost layer contains:
>> >
>> > /etc/hostname
>> >
>> > Lower layers contain everything else in /etc/. So /etc/ would exist
>> > on the topmost layer at the time of union mount, but we would want it
>> > to be transparent. But if we created a new dir *during* the union
>> > mount, it would be opaque.
>> >
>> > What was your model?
>>
>> The prevalent use case probably is to start out with an empty topmost layer on
>> top of an existing file system. When things are modified, changes obviously
>> go into the topmost layer. Additional layers can later be stacked on top of
>> that, turning the previous topmost layer into a read-only lower layer.
>>
>> Overlaying preexisting file systems doesn't seem that important; users
>> commonly should be able to start out with an empty topmost layer instead. To
>
> Okay, that surprises me. Let me check my assumptions. I cc'd several
> people who seem to be actively using unionfs or aufs in ways that we
> want union mounts to replace. Do you start out with an empty topmost
> file system in most cases? Or do you prepopulate with some files in
> dirs you want to be transparent?
In all the cases I used a union mouont I started with a blank top layer.
I commonly use it to make a readonly live CD filesystem writable so
that a system can run on top of it.
I tried to use a union to build a software on top of a readonly source
directory.
In both these cases the filesystem starts empty and is only populated
by writing into the union mount.
In both these cases the top layer is often thrown away after use but
can be saved to reconstruct the union later, either as filesystem
image or a tar/cpio archive. In the case of archive reconstructing the
union includes prepopulating the top layer.
In both these cases the bottom should typically not change between
unmounting the union and reconstructing it again but it may change if
the live CD or sources are updated between unmounting the union and
reconstructing it again. There are number of reasons why this may
break for the user but if unionmount does not support falling through
existing top level directories then this is one more reason, perhaps
unexpected.
Fine control over transparency of top layer is not required in most of
these use cases. A simple flag that can perhaps be specified on mount
and/or saved in superblock could say if the filesystem contains union
entries or not. In a plain non-union filesystem all directories are
transparent and whiteouts or fallthrus are invalid. In a filesystem
previously mounted as top union layer all directories are opaque. This
should cover most cases except the case when the opaque top layer can
hide updates to the bottom layer.
Note that whatever is implemented saving the top layer in an archive
is not likely to work because currently no archiving programs would
support whiteouts or fallthrus. With aufs this can work when these are
saved as specially named files or a separate table.
Either way implications of the current implementation should be
clearly documented.
I don't think manipulating the transparency and whiteouts would be
used for much but testing.
I can imagine some specialized tool that compares two bottom images
and then manipulates a saved top layer such that the differences
between the bottom images become visible when the new bottom is
overlayed with the saved top. Still it will not work in all cases
because changes between top and bottom and between old and new bottom
cannot always be merged automatically. A companion 'revert' tool which
changes an entry in mounted union into a fallthru would come handy I
guess.
I doubt many people would use anything this complex.
Thanks
Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype
2010-10-06 17:31 ` Valerie Aurora
2010-10-11 9:41 ` Michal Suchanek
@ 2010-10-11 13:51 ` Scott James Remnant
1 sibling, 0 replies; 26+ messages in thread
From: Scott James Remnant @ 2010-10-11 13:51 UTC (permalink / raw)
To: Valerie Aurora
Cc: Andreas Gruenbacher, Michal Suchanek, Andy Whitcroft,
Scott James Remnant, Vladimir Dronnikov, Felix Fietkau,
Miklos Szeredi, linuxram, linux-fsdevel, linux-kernel, neilb,
viro
On 06/10/2010 18:31, Valerie Aurora wrote:
> On Fri, Oct 01, 2010 at 11:34:57AM +0200, Andreas Gruenbacher wrote:
>> On Thursday 30 September 2010 23:51:15 Valerie Aurora wrote:
>>> Hm, this was a pretty basic assumption for me - that you'd want to
>>> construct a topmost image offline that would be "merged" with the
>>> lower layers. So, for example:
>>>
>>> Topmost layer contains:
>>>
>>> /etc/hostname
>>>
>>> Lower layers contain everything else in /etc/. So /etc/ would exist
>>> on the topmost layer at the time of union mount, but we would want it
>>> to be transparent. But if we created a new dir *during* the union
>>> mount, it would be opaque.
>>>
>>> What was your model?
>> The prevalent use case probably is to start out with an empty topmost layer on
>> top of an existing file system. When things are modified, changes obviously
>> go into the topmost layer. Additional layers can later be stacked on top of
>> that, turning the previous topmost layer into a read-only lower layer.
>>
>> Overlaying preexisting file systems doesn't seem that important; users
>> commonly should be able to start out with an empty topmost layer instead. To
> Okay, that surprises me. Let me check my assumptions. I cc'd several
> people who seem to be actively using unionfs or aufs in ways that we
> want union mounts to replace. Do you start out with an empty topmost
> file system in most cases? Or do you prepopulate with some files in
> dirs you want to be transparent?
>
Our use would be for the Live CD and for Update testing - in both of
these scenarios I imagine that the top-most layer would start empty, yes.
Scott
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2010-10-11 13:51 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-20 18:04 [PATCH 0/7 v3] overlay filesystem prototype Miklos Szeredi
2010-09-20 18:04 ` [PATCH 1/7 v3] vfs: implement open "forwarding" Miklos Szeredi
2010-09-20 18:04 ` [PATCH 2/7 v3] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
2010-09-20 18:04 ` [PATCH 3/7 v3] vfs: add flag to allow rename to same inode Miklos Szeredi
2010-09-23 22:04 ` Valerie Aurora
2010-09-20 18:04 ` [PATCH 4/7 v3] vfs: export do_splice_direct() to modules Miklos Szeredi
2010-09-20 18:04 ` [PATCH 5/7 v3] vfs: fix possible use after free in finish_open() Miklos Szeredi
2010-09-23 20:19 ` Valerie Aurora
2010-09-20 18:04 ` [PATCH 6/7 v3] overlay: hybrid overlay filesystem prototype Miklos Szeredi
2010-09-22 23:21 ` Valerie Aurora
2010-09-24 14:33 ` Jens Axboe
2010-09-24 17:16 ` Valerie Aurora
2010-09-24 17:56 ` Valerie Aurora
2010-09-27 8:11 ` Miklos Szeredi
2010-09-27 11:49 ` Andreas Gruenbacher
2010-09-27 12:15 ` J. R. Okajima
2010-09-27 18:47 ` Valerie Aurora
2010-09-28 8:24 ` Andreas Gruenbacher
2010-09-30 21:51 ` Valerie Aurora
2010-10-01 9:34 ` Andreas Gruenbacher
2010-10-06 17:31 ` Valerie Aurora
2010-10-11 9:41 ` Michal Suchanek
2010-10-11 13:51 ` Scott James Remnant
2010-09-20 18:04 ` [PATCH 7/7 v3] overlay: overlay filesystem documentation Miklos Szeredi
2010-09-21 1:31 ` [PATCH 0/7 v3] overlay filesystem prototype Neil Brown
2010-09-22 9:50 ` Miklos Szeredi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).