* [RFC][PATCH 00/27] Read-only bind mounts
@ 2006-06-08 0:10 Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 01/27] Add vfsmount writer count Dave Hansen
` (26 more replies)
0 siblings, 27 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
The following series implements read-only bind mounts. This set does
not take the previously tried approach of pushing down the vfsmount
structure deeply into call paths, such that it might be checked in
functions like permission(), may_create() and may_open(). Instead,
it does checks near the entry points in the kernel, bumping a
reference count in the vfsmount structure and effetively holding
a kind of a lock on the vfsmount.
This approach will come in handy in the future, allowing transitions
from rw to ro mounts while guaranteeing that no writers are currently
active.
I apologize for so many small patches, but breaking them up like this
really did help me find bugs. A rollup is here:
http://sr71.net/~dave/patches/ro-bind-mounts-06-07-2006.patch
This series passes a test I got from Herbert Poetzl, checking to see
that these r/o bind mounts have the same properties as regular r/o
device mounts for a number of syscalls.
http://vserver.13thfloor.at/Experimental/BME/fstest-0.03.tar.bz2
Here is the script I used to generate and compare the output from
that test:
#!/bin/sh
{ umount ro-bind-mount ro-mount;
rm -rf ro-mount-source ro-bind-mount-source ro-mount ro-bind-mount;
dd if=/dev/zero of=ro-mount-source count=10000;
mkfs.ext3 -F ro-mount-source;
mkdir ro-mount ro-bind-mount;
mkdir ro-bind-mount-source
} > /dev/null 2>&1
mount --bind -o ro ro-bind-mount-source ro-bind-mount
mount -t ext3 -o loop,ro ro-mount-source ro-mount
function fstest_and_diff
{
ARG1="$1"
shift
ARG2="$1"
./fstest -d "$ARG1" > "$ARG1".log;
./fstest -d "$ARG2" > "$ARG2".log;
diff -ru "$ARG1".log "$ARG2".log
}
fstest_and_diff ro-mount ro-bind-mount
umount ro-bind-mount ro-mount
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 01/27] Add vfsmount writer count
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 10:33 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 02/27] vfs_rmdir: change if() into goto Dave Hansen
` (25 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
This allows a vfsmount to keep track of how many instances
of files open for write there are at a given time. This
will be useful if it is ever desired to change a mount
from r/w to r/o at runtime.
A mount can also potentially refuse to allow writers; this
is why mnt_want_write() has a return value. However, that
functionality will be added later in the series. For now,
always allow new writers.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namespace.c | 1 +
lxc-dave/include/linux/mount.h | 14 ++++++++++++++
2 files changed, 15 insertions(+)
diff -puN fs/namespace.c~convert-permission-to-file-and-vfs fs/namespace.c
--- lxc/fs/namespace.c~convert-permission-to-file-and-vfs 2006-06-07 16:53:11.000000000 -0700
+++ lxc-dave/fs/namespace.c 2006-06-07 16:53:11.000000000 -0700
@@ -66,6 +66,7 @@ struct vfsmount *alloc_vfsmnt(const char
if (mnt) {
memset(mnt, 0, sizeof(struct vfsmount));
atomic_set(&mnt->mnt_count, 1);
+ /* atomic_set(&mnt->writer_count, 0); */
INIT_LIST_HEAD(&mnt->mnt_hash);
INIT_LIST_HEAD(&mnt->mnt_child);
INIT_LIST_HEAD(&mnt->mnt_mounts);
diff -puN include/linux/mount.h~convert-permission-to-file-and-vfs include/linux/mount.h
--- lxc/include/linux/mount.h~convert-permission-to-file-and-vfs 2006-06-07 16:53:11.000000000 -0700
+++ lxc-dave/include/linux/mount.h 2006-06-07 16:53:11.000000000 -0700
@@ -12,6 +12,8 @@
#define _LINUX_MOUNT_H
#ifdef __KERNEL__
+#include <linux/err.h>
+#include <linux/fs.h>
#include <linux/types.h>
#include <linux/list.h>
#include <linux/spinlock.h>
@@ -38,6 +40,7 @@ struct vfsmount {
struct list_head mnt_mounts; /* list of children, anchored here */
struct list_head mnt_child; /* and going through their mnt_child */
atomic_t mnt_count;
+ atomic_t mnt_writers;
int mnt_flags;
int mnt_expiry_mark; /* true if marked for expiry */
char *mnt_devname; /* Name of device e.g. /dev/dsk/hda1 */
@@ -58,6 +61,17 @@ static inline struct vfsmount *mntget(st
return mnt;
}
+static inline int mnt_want_write(struct vfsmount *mnt)
+{
+ atomic_inc(&mnt->mnt_writers);
+ return 0;
+}
+
+static inline void mnt_drop_write(struct vfsmount *mnt)
+{
+ atomic_dec(&mnt->mnt_writers);
+}
+
extern void mntput_no_expire(struct vfsmount *mnt);
extern void mnt_pin(struct vfsmount *mnt);
extern void mnt_unpin(struct vfsmount *mnt);
diff -L convert-permission-to-file-and-vfs -puN /dev/null /dev/null
diff -L convert-permission-to-file-and-vfs.patch -puN /dev/null /dev/null
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 02/27] vfs_rmdir: change if() into goto
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 01/27] Add vfsmount writer count Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 10:37 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 04/27] elevate mnt writers for vfs_unlink() callers Dave Hansen
` (24 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
We're going to add another condition here in a minute, so
add a goto instead of an if.
There will be a lot of patches like this one. They may seem
too finely-grained, but I did encounter a couple of bugs where
I botched one of these. If there is a bug in one, this level
of granularity will really help to track them down with a
chop-search.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namei.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff -puN fs/namei.c~rmdir fs/namei.c
--- lxc/fs/namei.c~rmdir 2006-06-07 16:53:12.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:12.000000000 -0700
@@ -2010,10 +2010,11 @@ static long do_rmdir(int dfd, const char
mutex_lock_nested(&nd.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
dentry = lookup_hash(&nd);
error = PTR_ERR(dentry);
- if (!IS_ERR(dentry)) {
- error = vfs_rmdir(nd.dentry->d_inode, dentry);
- dput(dentry);
- }
+ if (IS_ERR(dentry))
+ goto exit2;
+ error = vfs_rmdir(nd.dentry->d_inode, dentry);
+ dput(dentry);
+exit2:
mutex_unlock(&nd.dentry->d_inode->i_mutex);
exit1:
path_release(&nd);
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 04/27] elevate mnt writers for vfs_unlink() callers
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 01/27] Add vfsmount writer count Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 02/27] vfs_rmdir: change if() into goto Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 03/27] do_rmdir(): elevate write count Dave Hansen
` (23 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namei.c | 4 ++++
lxc-dave/ipc/mqueue.c | 5 ++++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff -puN fs/namei.c~elevate-writers-vfs_unlink fs/namei.c
--- lxc/fs/namei.c~elevate-writers-vfs_unlink 2006-06-07 16:53:13.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:13.000000000 -0700
@@ -2097,7 +2097,11 @@ static long do_unlinkat(int dfd, const c
inode = dentry->d_inode;
if (inode)
atomic_inc(&inode->i_count);
+ error = mnt_want_write(nd.mnt);
+ if (error)
+ goto exit2;
error = vfs_unlink(nd.dentry->d_inode, dentry);
+ mnt_drop_write(nd.mnt);
exit2:
dput(dentry);
}
diff -puN fs/nfsd/nfs4recover.c~elevate-writers-vfs_unlink fs/nfsd/nfs4recover.c
diff -puN fs/nfsd/vfs.c~elevate-writers-vfs_unlink fs/nfsd/vfs.c
diff -puN ipc/mqueue.c~elevate-writers-vfs_unlink ipc/mqueue.c
--- lxc/ipc/mqueue.c~elevate-writers-vfs_unlink 2006-06-07 16:53:13.000000000 -0700
+++ lxc-dave/ipc/mqueue.c 2006-06-07 16:53:13.000000000 -0700
@@ -738,8 +738,11 @@ asmlinkage long sys_mq_unlink(const char
inode = dentry->d_inode;
if (inode)
atomic_inc(&inode->i_count);
-
+ err = mnt_want_write(mqueue_mnt);
+ if (err)
+ goto out_err;
err = vfs_unlink(dentry->d_parent->d_inode, dentry);
+ mnt_drop_write(mqueue_mnt);
out_err:
dput(dentry);
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 03/27] do_rmdir(): elevate write count
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (2 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 04/27] elevate mnt writers for vfs_unlink() callers Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 10:42 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 05/27] elevate mnt writers for nfsd caller of vfs_mkdir() Dave Hansen
` (22 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Elevate the write count during the vfs_rmdir() call. There
will be a lot of patches similar to this one coming up.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namei.c | 5 +++++
1 files changed, 5 insertions(+)
diff -puN fs/namei.c~rmdir1 fs/namei.c
--- lxc/fs/namei.c~rmdir1 2006-06-07 16:53:13.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:13.000000000 -0700
@@ -2012,7 +2012,12 @@ static long do_rmdir(int dfd, const char
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto exit2;
+ error = mnt_want_write(nd.mnt);
+ if (error)
+ goto exit3;
error = vfs_rmdir(nd.dentry->d_inode, dentry);
+ mnt_drop_write(nd.mnt);
+exit3:
dput(dentry);
exit2:
mutex_unlock(&nd.dentry->d_inode->i_mutex);
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 05/27] elevate mnt writers for nfsd caller of vfs_mkdir()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (3 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 03/27] do_rmdir(): elevate write count Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 06/27] elevate write count during entire ncp_ioctl() Dave Hansen
` (21 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
fs/open.c | 0
lxc-dave/fs/nfsd/nfs4recover.c | 4 ++++
2 files changed, 4 insertions(+)
diff -puN fs/namei.c~elevate-writers-vfs_mkdir fs/namei.c
diff -puN fs/nfsd/nfs4recover.c~elevate-writers-vfs_mkdir fs/nfsd/nfs4recover.c
--- lxc/fs/nfsd/nfs4recover.c~elevate-writers-vfs_mkdir 2006-06-07 16:53:14.000000000 -0700
+++ lxc-dave/fs/nfsd/nfs4recover.c 2006-06-07 16:53:14.000000000 -0700
@@ -155,7 +155,11 @@ nfsd4_create_clid_dir(struct nfs4_client
dprintk("NFSD: nfsd4_create_clid_dir: DIRECTORY EXISTS\n");
goto out_put;
}
+ status = mnt_want_write(rec_dir.mnt);
+ if (status)
+ goto out_put;
status = vfs_mkdir(rec_dir.dentry->d_inode, dentry, S_IRWXU);
+ mnt_drop_write(rec_dir.mnt);
out_put:
dput(dentry);
out_unlock:
diff -puN fs/nfsd/vfs.c~elevate-writers-vfs_mkdir fs/nfsd/vfs.c
diff -puN ipc/mqueue.c~elevate-writers-vfs_mkdir ipc/mqueue.c
diff -puN fs/open.c~elevate-writers-vfs_mkdir fs/open.c
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 06/27] elevate write count during entire ncp_ioctl()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (4 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 05/27] elevate mnt writers for nfsd caller of vfs_mkdir() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 10:44 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 07/27] sys_mkdirat(): collapse if() Dave Hansen
` (20 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Some ioctls need write access, but others don't. Make a helper
function to decode when write access is needed, and take it.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
fs/nfsd/vfs.c | 0
lxc-dave/fs/ncpfs/ioctl.c | 54 +++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 53 insertions(+), 1 deletion(-)
diff -puN fs/namei.c~elevate-writers-file_permission-callers fs/namei.c
diff -puN fs/open.c~elevate-writers-file_permission-callers fs/open.c
diff -puN fs/exec.c~elevate-writers-file_permission-callers fs/exec.c
diff -puN fs/ncpfs/ioctl.c~elevate-writers-file_permission-callers fs/ncpfs/ioctl.c
--- lxc/fs/ncpfs/ioctl.c~elevate-writers-file_permission-callers 2006-06-07 16:53:15.000000000 -0700
+++ lxc-dave/fs/ncpfs/ioctl.c 2006-06-07 16:53:15.000000000 -0700
@@ -183,7 +183,7 @@ ncp_get_charsets(struct ncp_server* serv
}
#endif /* CONFIG_NCPFS_NLS */
-int ncp_ioctl(struct inode *inode, struct file *filp,
+static int __ncp_ioctl(struct inode *inode, struct file *filp,
unsigned int cmd, unsigned long arg)
{
struct ncp_server *server = NCP_SERVER(inode);
@@ -654,3 +654,55 @@ outrel:
/* #endif */
return -EINVAL;
}
+
+static int ncp_ioctl_need_write(unsigned int cmd)
+{
+ switch (cmd) {
+ case NCP_IOC_GET_FS_INFO:
+ case NCP_IOC_GET_FS_INFO_V2:
+ case NCP_IOC_NCPREQUEST:
+ case NCP_IOC_SETDENTRYTTL:
+ case NCP_IOC_SIGN_INIT:
+ case NCP_IOC_LOCKUNLOCK:
+ case NCP_IOC_SET_SIGN_WANTED:
+ return 0;
+ case NCP_IOC_GETOBJECTNAME:
+ case NCP_IOC_SETOBJECTNAME:
+ case NCP_IOC_GETPRIVATEDATA:
+ case NCP_IOC_SETPRIVATEDATA:
+ case NCP_IOC_SETCHARSETS:
+ case NCP_IOC_GETCHARSETS:
+ case NCP_IOC_CONN_LOGGED_IN:
+ case NCP_IOC_GETDENTRYTTL:
+ case NCP_IOC_GETMOUNTUID2:
+ case NCP_IOC_SIGN_WANTED:
+ case NCP_IOC_GETROOT:
+ case NCP_IOC_SETROOT:
+ return 0;
+ default:
+ /* unkown IOCTL command, assume write */
+ WARN_ON();
+ }
+ return 1;
+}
+
+int ncp_ioctl(struct inode *inode, struct file *filp,
+ unsigned int cmd, unsigned long arg)
+{
+ int ret;
+
+ if (ncp_ioctl_need_write(cmd)) {
+ /*
+ * inside the ioctl(), any failures which
+ * are because of file_permission() are
+ * -EACCESS, so it seems consistent to keep
+ * that here.
+ */
+ if (mnt_want_write(filp->f_vfsmnt))
+ return -EACCESS;
+ }
+ ret = __ncp_ioctl(inode, filp, cmd, arg);
+ if (ncp_ioctl_need_write(cmd)
+ mnt_drop_write(filp->->f_vfsmnt;
+ return ret;
+}
diff -puN fs/nfsd/vfs.c~elevate-writers-file_permission-callers fs/nfsd/vfs.c
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 07/27] sys_mkdirat(): collapse if()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (5 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 06/27] elevate write count during entire ncp_ioctl() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 10:46 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 09/27] elevate mnt writers for sys_mkdirat() call of vfs_mkdir() Dave Hansen
` (19 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Take the entire "if (IS_ERR(tmp))" block, and use a goto,
instead.
This should not make any functional changes.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namei.c | 38 +++++++++++++++++++-------------------
1 files changed, 19 insertions(+), 19 deletions(-)
diff -puN fs/namei.c~sys_mkdir_at-move-cases fs/namei.c
--- lxc/fs/namei.c~sys_mkdir_at-move-cases 2006-06-07 16:53:16.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:16.000000000 -0700
@@ -1888,30 +1888,30 @@ asmlinkage long sys_mkdirat(int dfd, con
{
int error = 0;
char * tmp;
+ struct dentry *dentry;
+ struct nameidata nd;
tmp = getname(pathname);
error = PTR_ERR(tmp);
- if (!IS_ERR(tmp)) {
- struct dentry *dentry;
- struct nameidata nd;
+ if (IS_ERR(tmp))
+ goto out_err;
- error = do_path_lookup(dfd, tmp, LOOKUP_PARENT, &nd);
- if (error)
- goto out;
- dentry = lookup_create(&nd, 1);
- error = PTR_ERR(dentry);
- if (!IS_ERR(dentry)) {
- if (!IS_POSIXACL(nd.dentry->d_inode))
- mode &= ~current->fs->umask;
- error = vfs_mkdir(nd.dentry->d_inode, dentry, mode);
- dput(dentry);
- }
- mutex_unlock(&nd.dentry->d_inode->i_mutex);
- path_release(&nd);
-out:
- putname(tmp);
+ error = do_path_lookup(dfd, tmp, LOOKUP_PARENT, &nd);
+ if (error)
+ goto out;
+ dentry = lookup_create(&nd, 1);
+ error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ if (!IS_POSIXACL(nd.dentry->d_inode))
+ mode &= ~current->fs->umask;
+ error = vfs_mkdir(nd.dentry->d_inode, dentry, mode);
+ dput(dentry);
}
-
+ mutex_unlock(&nd.dentry->d_inode->i_mutex);
+ path_release(&nd);
+out:
+ putname(tmp);
+out_err:
return error;
}
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 08/27] sys_mkdirat(): one more goto
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (7 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 09/27] elevate mnt writers for sys_mkdirat() call of vfs_mkdir() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 10:48 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 10/27] sys_symlinkat() collapse if()s Dave Hansen
` (17 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Add one more goto to collapse another if().
This should not cause any functional chagnes.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namei.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
diff -puN fs/namei.c~sys_mkdir_at-move-cases-another-goto fs/namei.c
--- lxc/fs/namei.c~sys_mkdir_at-move-cases-another-goto 2006-06-07 16:53:16.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:16.000000000 -0700
@@ -1901,12 +1901,14 @@ asmlinkage long sys_mkdirat(int dfd, con
goto out;
dentry = lookup_create(&nd, 1);
error = PTR_ERR(dentry);
- if (!IS_ERR(dentry)) {
- if (!IS_POSIXACL(nd.dentry->d_inode))
- mode &= ~current->fs->umask;
- error = vfs_mkdir(nd.dentry->d_inode, dentry, mode);
- dput(dentry);
- }
+ if (IS_ERR(dentry))
+ goto out_unlock;
+
+ if (!IS_POSIXACL(nd.dentry->d_inode))
+ mode &= ~current->fs->umask;
+ error = vfs_mkdir(nd.dentry->d_inode, dentry, mode);
+ dput(dentry);
+out_unlock:
mutex_unlock(&nd.dentry->d_inode->i_mutex);
path_release(&nd);
out:
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 09/27] elevate mnt writers for sys_mkdirat() call of vfs_mkdir()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (6 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 07/27] sys_mkdirat(): collapse if() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 08/27] sys_mkdirat(): one more goto Dave Hansen
` (18 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
fs/nfsd/nfs4recover.c | 0
lxc-dave/fs/namei.c | 5 +++++
2 files changed, 5 insertions(+)
diff -puN fs/namei.c~sys_mkdir_at-do-mnt_want_write fs/namei.c
--- lxc/fs/namei.c~sys_mkdir_at-do-mnt_want_write 2006-06-07 16:53:17.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:17.000000000 -0700
@@ -1906,7 +1906,12 @@ asmlinkage long sys_mkdirat(int dfd, con
if (!IS_POSIXACL(nd.dentry->d_inode))
mode &= ~current->fs->umask;
+ error = mnt_want_write(nd.mnt);
+ if (error)
+ goto out_dput;
error = vfs_mkdir(nd.dentry->d_inode, dentry, mode);
+ mnt_drop_write(nd.mnt);
+out_dput:
dput(dentry);
out_unlock:
mutex_unlock(&nd.dentry->d_inode->i_mutex);
diff -puN fs/nfsd/nfs4recover.c~sys_mkdir_at-do-mnt_want_write fs/nfsd/nfs4recover.c
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 10/27] sys_symlinkat() collapse if()s
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (8 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 08/27] sys_mkdirat(): one more goto Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 11/27] sys_symlinkat() collapse one more if () Dave Hansen
` (16 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namei.c | 33 +++++++++++++++++----------------
1 files changed, 17 insertions(+), 16 deletions(-)
diff -puN fs/namei.c~elevate-writers-vfs_symlink fs/namei.c
--- lxc/fs/namei.c~elevate-writers-vfs_symlink 2006-06-07 16:53:17.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:18.000000000 -0700
@@ -2170,30 +2170,31 @@ asmlinkage long sys_symlinkat(const char
int error = 0;
char * from;
char * to;
+ struct dentry *dentry;
+ struct nameidata nd;
from = getname(oldname);
if(IS_ERR(from))
return PTR_ERR(from);
to = getname(newname);
error = PTR_ERR(to);
- if (!IS_ERR(to)) {
- struct dentry *dentry;
- struct nameidata nd;
+ if (IS_ERR(to))
+ goto out_putname;
- error = do_path_lookup(newdfd, to, LOOKUP_PARENT, &nd);
- if (error)
- goto out;
- dentry = lookup_create(&nd, 0);
- error = PTR_ERR(dentry);
- if (!IS_ERR(dentry)) {
- error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO);
- dput(dentry);
- }
- mutex_unlock(&nd.dentry->d_inode->i_mutex);
- path_release(&nd);
-out:
- putname(to);
+ error = do_path_lookup(newdfd, to, LOOKUP_PARENT, &nd);
+ if (error)
+ goto out;
+ dentry = lookup_create(&nd, 0);
+ error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO);
+ dput(dentry);
}
+ mutex_unlock(&nd.dentry->d_inode->i_mutex);
+ path_release(&nd);
+out:
+ putname(to);
+out_putname:
putname(from);
return error;
}
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 11/27] sys_symlinkat() collapse one more if ()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (9 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 10/27] sys_symlinkat() collapse if()s Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 10:49 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 13/27] sys_linkat(): elevate write count around vfs_link() Dave Hansen
` (15 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Collapse another if() to make way for a new condition.
This patch should make no functional changes.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namei.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff -puN fs/namei.c~elevate-writers-vfs_symlink-part2 fs/namei.c
--- lxc/fs/namei.c~elevate-writers-vfs_symlink-part2 2006-06-07 16:53:18.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:18.000000000 -0700
@@ -2186,10 +2186,12 @@ asmlinkage long sys_symlinkat(const char
goto out;
dentry = lookup_create(&nd, 0);
error = PTR_ERR(dentry);
- if (!IS_ERR(dentry)) {
- error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO);
- dput(dentry);
- }
+ if (IS_ERR(dentry))
+ goto out_unlock;
+
+ error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO);
+ dput(dentry);
+out_unlock:
mutex_unlock(&nd.dentry->d_inode->i_mutex);
path_release(&nd);
out:
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 12/27] sys_symlinkat() elevate write count around vfs_symlink()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (11 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 13/27] sys_linkat(): elevate write count around vfs_link() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 14/27] tricky: elevate write count files are open()ed Dave Hansen
` (13 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namei.c | 5 +++++
1 files changed, 5 insertions(+)
diff -puN fs/namei.c~elevate-writers-vfs_symlink-part3 fs/namei.c
--- lxc/fs/namei.c~elevate-writers-vfs_symlink-part3 2006-06-07 16:53:19.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:19.000000000 -0700
@@ -2189,7 +2189,12 @@ asmlinkage long sys_symlinkat(const char
if (IS_ERR(dentry))
goto out_unlock;
+ error = mnt_want_write(nd.mnt);
+ if (error)
+ goto out_dput;
error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO);
+ mnt_drop_write(nd.mnt);
+out_dput:
dput(dentry);
out_unlock:
mutex_unlock(&nd.dentry->d_inode->i_mutex);
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 13/27] sys_linkat(): elevate write count around vfs_link()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (10 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 11/27] sys_symlinkat() collapse one more if () Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 12/27] sys_symlinkat() elevate write count around vfs_symlink() Dave Hansen
` (14 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namei.c | 14 ++++++++++----
1 files changed, 10 insertions(+), 4 deletions(-)
diff -puN fs/namei.c~elevate-writers-vfs_link-part1 fs/namei.c
--- lxc/fs/namei.c~elevate-writers-vfs_link-part1 2006-06-07 16:53:19.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:19.000000000 -0700
@@ -2285,10 +2285,16 @@ asmlinkage long sys_linkat(int olddfd, c
goto out_release;
new_dentry = lookup_create(&nd, 0);
error = PTR_ERR(new_dentry);
- if (!IS_ERR(new_dentry)) {
- error = vfs_link(old_nd.dentry, nd.dentry->d_inode, new_dentry);
- dput(new_dentry);
- }
+ if (IS_ERR(new_dentry))
+ goto out_unlock;
+ error = mnt_want_write(nd.mnt);
+ if (error)
+ goto out_dput;
+ error = vfs_link(old_nd.dentry, nd.dentry->d_inode, new_dentry);
+ mnt_drop_write(nd.mnt);
+out_dput:
+ dput(new_dentry);
+out_unlock:
mutex_unlock(&nd.dentry->d_inode->i_mutex);
out_release:
path_release(&nd);
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 14/27] tricky: elevate write count files are open()ed
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (12 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 12/27] sys_symlinkat() elevate write count around vfs_symlink() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 10:54 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 15/27] elevate writer count for do_sys_truncate() Dave Hansen
` (12 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
This is the first really tricky patch in the series. It
elevates the writer count on a mount each time a
non-special file is opened for write.
This is not completely apparent in the patch because the
two if() conditions in may_open() above the
mnt_want_write() call are, combined, equivalent to
special_file().
There is also an elevated count around the vfs_create()
call in open_namei(). The count does not need to be
kept elevated all the way into the may_open() call because
after creation, the write bits of the acc_mode are cleared.
This keeps may_open() from ever failing. Howver, this may
open one potential race where a change from a r/w to a r/o
mount could occur between the mnt_drop_write() and may_open()
allowing a user to obtain a r/w file on what is now a r/w
mount. But, this functionality does not yet exist.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/file_table.c | 5 ++++-
lxc-dave/fs/namei.c | 29 ++++++++++++++++++++++++++---
lxc-dave/ipc/mqueue.c | 3 +++
3 files changed, 33 insertions(+), 4 deletions(-)
diff -puN fs/namei.c~elevate-writers-opens-part1 fs/namei.c
--- lxc/fs/namei.c~elevate-writers-opens-part1 2006-06-07 16:53:20.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:20.000000000 -0700
@@ -1511,8 +1511,17 @@ int may_open(struct nameidata *nd, int a
return -EACCES;
flag &= ~O_TRUNC;
- } else if (IS_RDONLY(inode) && (flag & FMODE_WRITE))
- return -EROFS;
+ } else if (flag & FMODE_WRITE) {
+ /*
+ * effectively: !special_file()
+ * balanced by __fput()
+ */
+ error = mnt_want_write(nd->mnt);
+ if (error)
+ return error;
+ if (IS_RDONLY(inode))
+ return -EROFS;
+ }
/*
* An append-only file must be opened in append mode for writing.
*/
@@ -1642,10 +1651,24 @@ do_last:
if (!path.dentry->d_inode) {
if (!IS_POSIXACL(dir->d_inode))
mode &= ~current->fs->umask;
- error = vfs_create(dir->d_inode, path.dentry, mode, nd);
+ /*
+ * this serves dual roles, it makes sure there is no
+ * r/o mount, and keeps the write count for what is
+ * the newly created file
+ */
+ error = mnt_want_write(nd->mnt);
+ if (!error)
+ error = vfs_create(dir->d_inode, path.dentry, mode, nd);
mutex_unlock(&dir->d_inode->i_mutex);
dput(nd->dentry);
nd->dentry = path.dentry;
+ /*
+ * Unconditionally drop the write access because
+ * the acc_mode=0 set below will keep may_open()
+ * from ever failing if there was a r/o mount
+ * between here and there
+ */
+ mnt_drop_write(nd->mnt);
if (error)
goto exit;
/* Don't check for write permission, don't truncate */
diff -puN fs/open.c~elevate-writers-opens-part1 fs/open.c
diff -puN include/linux/mount.h~elevate-writers-opens-part1 include/linux/mount.h
diff -puN fs/file_table.c~elevate-writers-opens-part1 fs/file_table.c
--- lxc/fs/file_table.c~elevate-writers-opens-part1 2006-06-07 16:53:20.000000000 -0700
+++ lxc-dave/fs/file_table.c 2006-06-07 16:53:20.000000000 -0700
@@ -180,8 +180,11 @@ void fastcall __fput(struct file *file)
if (unlikely(inode->i_cdev != NULL))
cdev_put(inode->i_cdev);
fops_put(file->f_op);
- if (file->f_mode & FMODE_WRITE)
+ if (file->f_mode & FMODE_WRITE) {
put_write_access(inode);
+ if(!special_file(inode->i_mode))
+ mnt_drop_write(mnt);
+ }
file_kill(file);
file->f_dentry = NULL;
file->f_vfsmnt = NULL;
diff -puN ipc/mqueue.c~elevate-writers-opens-part1 ipc/mqueue.c
--- lxc/ipc/mqueue.c~elevate-writers-opens-part1 2006-06-07 16:53:20.000000000 -0700
+++ lxc-dave/ipc/mqueue.c 2006-06-07 16:53:20.000000000 -0700
@@ -679,6 +679,9 @@ asmlinkage long sys_mq_open(const char _
goto out;
filp = do_open(dentry, oflag);
} else {
+ error = mnt_want_write(mqueue_mnt);
+ if (error)
+ goto out;
filp = do_create(mqueue_mnt->mnt_root, dentry,
oflag, mode, u_attr);
}
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 15/27] elevate writer count for do_sys_truncate()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (13 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 14/27] tricky: elevate write count files are open()ed Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 16/27] elevate write count for do_sys_utime() Dave Hansen
` (11 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/open.c | 16 +++++++++++-----
1 files changed, 11 insertions(+), 5 deletions(-)
diff -puN fs/open.c~elevate-writers-opens-part2-do_sys_truncate fs/open.c
--- lxc/fs/open.c~elevate-writers-opens-part2-do_sys_truncate 2006-06-07 16:53:21.000000000 -0700
+++ lxc-dave/fs/open.c 2006-06-07 16:53:21.000000000 -0700
@@ -244,28 +244,32 @@ static long do_sys_truncate(const char _
if (!S_ISREG(inode->i_mode))
goto dput_and_out;
- error = vfs_permission(&nd, MAY_WRITE);
+ error = mnt_want_write(nd.mnt);
if (error)
goto dput_and_out;
+ error = vfs_permission(&nd, MAY_WRITE);
+ if (error)
+ goto mnt_drop_write_and_out;
+
error = -EROFS;
if (IS_RDONLY(inode))
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
error = -EPERM;
if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
/*
* Make sure that there are no leases.
*/
error = break_lease(inode, FMODE_WRITE);
if (error)
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
error = get_write_access(inode);
if (error)
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
error = locks_verify_truncate(inode, NULL, length);
if (!error) {
@@ -274,6 +278,8 @@ static long do_sys_truncate(const char _
}
put_write_access(inode);
+mnt_drop_write_and_out:
+ mnt_drop_write(nd.mnt);
dput_and_out:
path_release(&nd);
out:
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 16/27] elevate write count for do_sys_utime()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (14 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 15/27] elevate writer count for do_sys_truncate() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 17/27] elevate write count for do_utimes() Dave Hansen
` (10 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/open.c | 16 +++++++++++-----
1 files changed, 11 insertions(+), 5 deletions(-)
diff -puN fs/open.c~elevate-writers-opens-part2-do_sys_utime fs/open.c
--- lxc/fs/open.c~elevate-writers-opens-part2-do_sys_utime 2006-06-07 16:53:21.000000000 -0700
+++ lxc-dave/fs/open.c 2006-06-07 16:53:21.000000000 -0700
@@ -384,16 +384,20 @@ asmlinkage long sys_utime(char __user *
goto out;
inode = nd.dentry->d_inode;
+ error = mnt_want_write(nd.mnt);
+ if (error)
+ goto dput_and_out;
+
error = -EROFS;
if (IS_RDONLY(inode))
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
/* Don't worry, the checks are done in inode_change_ok() */
newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
if (times) {
error = -EPERM;
if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
error = get_user(newattrs.ia_atime.tv_sec, ×->actime);
newattrs.ia_atime.tv_nsec = 0;
@@ -401,21 +405,23 @@ asmlinkage long sys_utime(char __user *
error = get_user(newattrs.ia_mtime.tv_sec, ×->modtime);
newattrs.ia_mtime.tv_nsec = 0;
if (error)
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
newattrs.ia_valid |= ATTR_ATIME_SET | ATTR_MTIME_SET;
} else {
error = -EACCES;
if (IS_IMMUTABLE(inode))
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
if (current->fsuid != inode->i_uid &&
(error = vfs_permission(&nd, MAY_WRITE)) != 0)
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
}
mutex_lock(&inode->i_mutex);
error = notify_change(nd.dentry, &newattrs);
mutex_unlock(&inode->i_mutex);
+mnt_drop_write_and_out:
+ mnt_drop_write(nd.mnt);
dput_and_out:
path_release(&nd);
out:
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 17/27] elevate write count for do_utimes()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (15 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 16/27] elevate write count for do_sys_utime() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 18/27] sys_faccessat(): collapse if() Dave Hansen
` (9 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/open.c | 13 +++++++++----
1 files changed, 9 insertions(+), 4 deletions(-)
diff -puN fs/open.c~elevate-writers-opens-part2-do_utimes fs/open.c
--- lxc/fs/open.c~elevate-writers-opens-part2-do_utimes 2006-06-07 16:53:22.000000000 -0700
+++ lxc-dave/fs/open.c 2006-06-07 16:53:22.000000000 -0700
@@ -447,16 +447,19 @@ long do_utimes(int dfd, char __user *fil
goto out;
inode = nd.dentry->d_inode;
+ error = mnt_want_write(nd.mnt);
+ if (error)
+ goto dput_and_out;
error = -EROFS;
if (IS_RDONLY(inode))
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
/* Don't worry, the checks are done in inode_change_ok() */
newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
if (times) {
error = -EPERM;
if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
newattrs.ia_atime.tv_sec = times[0].tv_sec;
newattrs.ia_atime.tv_nsec = times[0].tv_usec * 1000;
@@ -466,15 +469,17 @@ long do_utimes(int dfd, char __user *fil
} else {
error = -EACCES;
if (IS_IMMUTABLE(inode))
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
if (current->fsuid != inode->i_uid &&
(error = vfs_permission(&nd, MAY_WRITE)) != 0)
- goto dput_and_out;
+ goto mnt_drop_write_and_out;
}
mutex_lock(&inode->i_mutex);
error = notify_change(nd.dentry, &newattrs);
mutex_unlock(&inode->i_mutex);
+mnt_drop_write_and_out:
+ mnt_drop_write(nd.mnt);
dput_and_out:
path_release(&nd);
out:
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 19/27] sys_faccessat() elevate writer count
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (17 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 18/27] sys_faccessat(): collapse if() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 11:03 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 20/27] unix_find_other() elevate write count for touch_atime() Dave Hansen
` (7 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/open.c | 16 ++++++++++++++--
1 files changed, 14 insertions(+), 2 deletions(-)
diff -puN fs/open.c~elevate-writers-opens-part2-sys_faccessat2 fs/open.c
--- lxc/fs/open.c~elevate-writers-opens-part2-sys_faccessat2 2006-06-07 16:53:23.000000000 -0700
+++ lxc-dave/fs/open.c 2006-06-07 16:53:23.000000000 -0700
@@ -542,9 +542,21 @@ asmlinkage long sys_faccessat(int dfd, c
res = vfs_permission(&nd, mode);
/* SuS v2 requires we report a read only fs too */
- if(!res && (mode & S_IWOTH) && IS_RDONLY(nd.dentry->d_inode)
- && !special_file(nd.dentry->d_inode->i_mode))
+ if(res)
+ goto out_path_release;
+ if (!(mode & S_IWOTH))
+ goto out_path_release;
+ if (special_file(nd.dentry->d_inode->i_mode))
+ goto out_path_release;
+
+ res = mnt_want_write(nd.mnt);
+ if (res) {
+ mnt_drop_write(nd.mnt);
+ goto out_path_release;
+ }
+ if (IS_RDONLY(nd.dentry->d_inode))
res = -EROFS;
+out_path_release:
path_release(&nd);
out:
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 18/27] sys_faccessat(): collapse if()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (16 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 17/27] elevate write count for do_utimes() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 11:05 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 19/27] sys_faccessat() elevate writer count Dave Hansen
` (8 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
This patch should make no functional changes.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/open.c | 18 ++++++++++--------
1 files changed, 10 insertions(+), 8 deletions(-)
diff -puN fs/open.c~elevate-writers-opens-part2-sys_faccessat fs/open.c
--- lxc/fs/open.c~elevate-writers-opens-part2-sys_faccessat 2006-06-07 16:53:23.000000000 -0700
+++ lxc-dave/fs/open.c 2006-06-07 16:53:23.000000000 -0700
@@ -537,15 +537,17 @@ asmlinkage long sys_faccessat(int dfd, c
current->cap_effective = current->cap_permitted;
res = __user_walk_fd(dfd, filename, LOOKUP_FOLLOW|LOOKUP_ACCESS, &nd);
- if (!res) {
- res = vfs_permission(&nd, mode);
- /* SuS v2 requires we report a read only fs too */
- if(!res && (mode & S_IWOTH) && IS_RDONLY(nd.dentry->d_inode)
- && !special_file(nd.dentry->d_inode->i_mode))
- res = -EROFS;
- path_release(&nd);
- }
+ if (res)
+ goto out;
+ res = vfs_permission(&nd, mode);
+ /* SuS v2 requires we report a read only fs too */
+ if(!res && (mode & S_IWOTH) && IS_RDONLY(nd.dentry->d_inode)
+ && !special_file(nd.dentry->d_inode->i_mode))
+ res = -EROFS;
+ path_release(&nd);
+
+out:
current->fsuid = old_fsuid;
current->fsgid = old_fsgid;
current->cap_effective = old_cap;
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 20/27] unix_find_other() elevate write count for touch_atime()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (18 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 19/27] sys_faccessat() elevate writer count Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 11:07 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 21/27] mount_is_safe(): add comment Dave Hansen
` (6 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/net/unix/af_unix.c | 16 ++++++++++++----
1 files changed, 12 insertions(+), 4 deletions(-)
diff -puN net/unix/af_unix.c~elevate-writers-opens-part4 net/unix/af_unix.c
--- lxc/net/unix/af_unix.c~elevate-writers-opens-part4 2006-06-07 16:53:24.000000000 -0700
+++ lxc-dave/net/unix/af_unix.c 2006-06-07 16:53:24.000000000 -0700
@@ -676,21 +676,27 @@ static struct sock *unix_find_other(stru
err = path_lookup(sunname->sun_path, LOOKUP_FOLLOW, &nd);
if (err)
goto fail;
+
+ err = mnt_want_write(nd.mnt);
+ if (err)
+ goto put_path_fail;
+
err = vfs_permission(&nd, MAY_WRITE);
if (err)
- goto put_fail;
+ goto put_write_fail;
err = -ECONNREFUSED;
if (!S_ISSOCK(nd.dentry->d_inode->i_mode))
- goto put_fail;
+ goto put_write_fail;
u=unix_find_socket_byinode(nd.dentry->d_inode);
if (!u)
- goto put_fail;
+ goto put_write_fail;
if (u->sk_type == type)
touch_atime(nd.mnt, nd.dentry);
path_release(&nd);
+ mnt_drop_write(nd.mnt);
err=-EPROTOTYPE;
if (u->sk_type != type) {
@@ -710,7 +716,9 @@ static struct sock *unix_find_other(stru
}
return u;
-put_fail:
+put_write_fail:
+ mnt_drop_write(nd.mnt);
+put_path_fail:
path_release(&nd);
fail:
*error=err;
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 21/27] mount_is_safe(): add comment
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (19 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 20/27] unix_find_other() elevate write count for touch_atime() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 22/27] sys_mknodat(): elevate write count for vfs_mknod/create() Dave Hansen
` (5 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
This area of code is currently #ifdef'd out, so add a comment
for the time when it is actually used.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namespace.c | 4 ++++
1 files changed, 4 insertions(+)
diff -puN fs/namespace.c~elevate-writers-opens-part5 fs/namespace.c
--- lxc/fs/namespace.c~elevate-writers-opens-part5 2006-06-07 16:53:24.000000000 -0700
+++ lxc-dave/fs/namespace.c 2006-06-07 16:53:24.000000000 -0700
@@ -679,6 +679,10 @@ static int mount_is_safe(struct nameidat
if (current->uid != nd->dentry->d_inode->i_uid)
return -EPERM;
}
+ /*
+ * We will eventually check for the mnt->writer_count here,
+ * but since the code is not used now, skipt it - Dave Hansen
+ */
if (vfs_permission(nd, MAY_WRITE))
return -EPERM;
return 0;
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 22/27] sys_mknodat(): elevate write count for vfs_mknod/create()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (20 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 21/27] mount_is_safe(): add comment Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 11:16 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 23/27] elevate write count over calls to vfs_rename() Dave Hansen
` (4 subsequent siblings)
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
This takes care of all of the direct callers of vfs_mknod().
Since a few of these cases also handle normal file creation
as well, this also covers some calls to vfs_create().
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
ipc/mqueue.c | 0
lxc-dave/fs/namei.c | 48 ++++++++++++++++++++++++++++----------------
lxc-dave/fs/nfsd/vfs.c | 4 +++
lxc-dave/net/unix/af_unix.c | 4 +++
4 files changed, 39 insertions(+), 17 deletions(-)
diff -puN fs/namei.c~elevate-writers-vfs_mknod-try2 fs/namei.c
--- lxc/fs/namei.c~elevate-writers-vfs_mknod-try2 2006-06-07 16:53:25.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:25.000000000 -0700
@@ -1852,26 +1852,40 @@ asmlinkage long sys_mknodat(int dfd, con
if (!IS_POSIXACL(nd.dentry->d_inode))
mode &= ~current->fs->umask;
- if (!IS_ERR(dentry)) {
- switch (mode & S_IFMT) {
- case 0: case S_IFREG:
- error = vfs_create(nd.dentry->d_inode,dentry,mode,&nd);
- break;
- case S_IFCHR: case S_IFBLK:
- error = vfs_mknod(nd.dentry->d_inode,dentry,mode,
- new_decode_dev(dev));
- break;
- case S_IFIFO: case S_IFSOCK:
- error = vfs_mknod(nd.dentry->d_inode,dentry,mode,0);
+ if (IS_ERR(dentry))
+ goto out_unlock;
+ error = mnt_want_write(nd.mnt);
+ if (error)
+ goto out_dput;
+ switch (mode & S_IFMT) {
+ case 0: case S_IFREG:
+ error = vfs_create(nd.dentry->d_inode,dentry,mode,&nd);
+ break;
+ case S_IFCHR: case S_IFBLK:
+ error = mnt_want_write(nd.mnt);
+ if (error)
break;
- case S_IFDIR:
- error = -EPERM;
+ error = vfs_mknod(nd.dentry->d_inode,dentry,mode,
+ new_decode_dev(dev));
+ mnt_drop_write(nd.mnt);
+ break;
+ case S_IFIFO: case S_IFSOCK:
+ error = mnt_want_write(nd.mnt);
+ if (error)
break;
- default:
- error = -EINVAL;
- }
- dput(dentry);
+ error = vfs_mknod(nd.dentry->d_inode,dentry,mode,0);
+ mnt_drop_write(nd.mnt);
+ break;
+ case S_IFDIR:
+ error = -EPERM;
+ break;
+ default:
+ error = -EINVAL;
}
+ mnt_drop_write(nd.mnt);
+out_dput:
+ dput(dentry);
+out_unlock:
mutex_unlock(&nd.dentry->d_inode->i_mutex);
path_release(&nd);
out:
diff -puN fs/nfsd/vfs.c~elevate-writers-vfs_mknod-try2 fs/nfsd/vfs.c
--- lxc/fs/nfsd/vfs.c~elevate-writers-vfs_mknod-try2 2006-06-07 16:53:25.000000000 -0700
+++ lxc-dave/fs/nfsd/vfs.c 2006-06-07 16:53:25.000000000 -0700
@@ -1153,6 +1153,9 @@ nfsd_create(struct svc_rqst *rqstp, stru
/*
* Get the dir op function pointer.
*/
+ err = mnt_want_write(fhp->fh_export->ex_mnt);
+ if (err)
+ goto out_nfserr;
err = nfserr_perm;
switch (type) {
case S_IFREG:
@@ -1171,6 +1174,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
printk("nfsd: bad file type %o in nfsd_create\n", type);
err = -EINVAL;
}
+ mnt_drop_write(fhp->fh_export->ex_mnt);
if (err < 0)
goto out_nfserr;
diff -puN net/unix/af_unix.c~elevate-writers-vfs_mknod-try2 net/unix/af_unix.c
--- lxc/net/unix/af_unix.c~elevate-writers-vfs_mknod-try2 2006-06-07 16:53:25.000000000 -0700
+++ lxc-dave/net/unix/af_unix.c 2006-06-07 16:53:25.000000000 -0700
@@ -789,7 +789,11 @@ static int unix_bind(struct socket *sock
*/
mode = S_IFSOCK |
(SOCK_INODE(sock)->i_mode & ~current->fs->umask);
+ err = mnt_want_write(nd.mnt);
+ if (err)
+ goto out_mknod_dput;
err = vfs_mknod(nd.dentry->d_inode, dentry, mode, 0);
+ mnt_drop_write(nd.mnt);
if (err)
goto out_mknod_dput;
mutex_unlock(&nd.dentry->d_inode->i_mutex);
diff -puN ipc/mqueue.c~elevate-writers-vfs_mknod-try2 ipc/mqueue.c
diff -L xattr.patch -puN /dev/null /dev/null
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 23/27] elevate write count over calls to vfs_rename()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (21 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 22/27] sys_mknodat(): elevate write count for vfs_mknod/create() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 11:23 ` Herbert Poetzl
2006-06-12 18:18 ` Al Viro
2006-06-08 0:10 ` [RFC][PATCH 24/27] elevate mount count for extended attributes Dave Hansen
` (3 subsequent siblings)
26 siblings, 2 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
fs/xattr.c | 0
lxc-dave/fs/namei.c | 40 ++++++++++++++++++++++++++--------------
lxc-dave/fs/nfsd/vfs.c | 12 +++++++++++-
3 files changed, 37 insertions(+), 15 deletions(-)
diff -puN fs/namei.c~elevate-writers-vfs_rename-part1 fs/namei.c
--- lxc/fs/namei.c~elevate-writers-vfs_rename-part1 2006-06-07 16:53:26.000000000 -0700
+++ lxc-dave/fs/namei.c 2006-06-07 16:53:26.000000000 -0700
@@ -2507,29 +2507,37 @@ static int do_rename(int olddfd, const c
if (error)
goto exit;
- error = do_path_lookup(newdfd, newname, LOOKUP_PARENT, &newnd);
+ error = mnt_want_write(oldnd.mnt);
if (error)
goto exit1;
+ error = do_path_lookup(newdfd, newname, LOOKUP_PARENT, &newnd);
+ if (error)
+ goto exit2;
+
+ error = mnt_want_write(oldnd.mnt);
+ if (error)
+ goto exit3;
+
error = -EXDEV;
if (oldnd.mnt != newnd.mnt)
- goto exit2;
+ goto exit4;
old_dir = oldnd.dentry;
error = -EBUSY;
if (oldnd.last_type != LAST_NORM)
- goto exit2;
+ goto exit4;
new_dir = newnd.dentry;
if (newnd.last_type != LAST_NORM)
- goto exit2;
+ goto exit4;
trap = lock_rename(new_dir, old_dir);
old_dentry = lookup_hash(&oldnd);
error = PTR_ERR(old_dentry);
if (IS_ERR(old_dentry))
- goto exit3;
+ goto exit5;
/* source must exist */
error = -ENOENT;
if (!old_dentry->d_inode)
@@ -2538,33 +2546,37 @@ static int do_rename(int olddfd, const c
if (!S_ISDIR(old_dentry->d_inode->i_mode)) {
error = -ENOTDIR;
if (oldnd.last.name[oldnd.last.len])
- goto exit4;
+ goto exit6;
if (newnd.last.name[newnd.last.len])
- goto exit4;
+ goto exit6;
}
/* source should not be ancestor of target */
error = -EINVAL;
if (old_dentry == trap)
- goto exit4;
+ goto exit6;
new_dentry = lookup_hash(&newnd);
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry))
- goto exit4;
+ goto exit6;
/* target should not be an ancestor of source */
error = -ENOTEMPTY;
if (new_dentry == trap)
- goto exit5;
+ goto exit7;
error = vfs_rename(old_dir->d_inode, old_dentry,
new_dir->d_inode, new_dentry);
-exit5:
+exit7:
dput(new_dentry);
-exit4:
+exit6:
dput(old_dentry);
-exit3:
+exit5:
unlock_rename(new_dir, old_dir);
-exit2:
+exit4:
+ mnt_drop_write(newnd.mnt);
+exit3:
path_release(&newnd);
+exit2:
+ mnt_drop_write(oldnd.mnt);
exit1:
path_release(&oldnd);
exit:
diff -L ser -puN /dev/null /dev/null
diff -puN fs/ecryptfs/inode.c~elevate-writers-vfs_rename-part1 fs/ecryptfs/inode.c
diff -puN fs/nfsd/vfs.c~elevate-writers-vfs_rename-part1 fs/nfsd/vfs.c
--- lxc/fs/nfsd/vfs.c~elevate-writers-vfs_rename-part1 2006-06-07 16:53:26.000000000 -0700
+++ lxc-dave/fs/nfsd/vfs.c 2006-06-07 16:53:26.000000000 -0700
@@ -1601,13 +1601,23 @@ nfsd_rename(struct svc_rqst *rqstp, stru
err = -EPERM;
} else
#endif
+ err = mnt_want_write(ffhp->fh_export->ex_mnt);
+ if (err)
+ goto out_dput_new;
+
+ err = mnt_want_write(tfhp->fh_export->ex_mnt);
+ if (err)
+ goto out_mnt_drop_write_old;
+
err = vfs_rename(fdir, odentry, tdir, ndentry);
if (!err && EX_ISSYNC(tfhp->fh_export)) {
err = nfsd_sync_dir(tdentry);
if (!err)
err = nfsd_sync_dir(fdentry);
}
-
+ mnt_drop_write(tfhp->fh_export->ex_mnt);
+ out_mnt_drop_write_old:
+ mnt_drop_write(ffhp->fh_export->ex_mnt);
out_dput_new:
dput(ndentry);
out_dput_old:
diff -puN fs/nfsd/nfsfh.c~elevate-writers-vfs_rename-part1 fs/nfsd/nfsfh.c
diff -puN fs/nfsd/nfs3proc.c~elevate-writers-vfs_rename-part1 fs/nfsd/nfs3proc.c
diff -puN fs/xattr.c~elevate-writers-vfs_rename-part1 fs/xattr.c
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 24/27] elevate mount count for extended attributes
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (22 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 23/27] elevate write count over calls to vfs_rename() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 25/27] /proc/mounts: prep for flags from sb or mnt Dave Hansen
` (2 subsequent siblings)
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
This basically audits the callers of xattr_permission(), which
calls permission() itself.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
fs/namei.c | 0
lxc-dave/fs/nfsd/nfs4proc.c | 7 ++++++-
lxc-dave/fs/xattr.c | 14 ++++++++++++++
3 files changed, 20 insertions(+), 1 deletion(-)
diff -puN fs/xattr.c~xattr fs/xattr.c
--- lxc/fs/xattr.c~xattr 2006-06-07 16:53:27.000000000 -0700
+++ lxc-dave/fs/xattr.c 2006-06-07 16:53:27.000000000 -0700
@@ -12,6 +12,7 @@
#include <linux/smp_lock.h>
#include <linux/file.h>
#include <linux/xattr.h>
+#include <linux/mount.h>
#include <linux/namei.h>
#include <linux/security.h>
#include <linux/syscalls.h>
@@ -210,7 +211,11 @@ sys_setxattr(char __user *path, char __u
error = user_path_walk(path, &nd);
if (error)
return error;
+ error = mnt_want_write(nd.mnt);
+ if (error)
+ return error;
error = setxattr(nd.dentry, name, value, size, flags);
+ mnt_drop_write(nd.mnt);
path_release(&nd);
return error;
}
@@ -225,7 +230,11 @@ sys_lsetxattr(char __user *path, char __
error = user_path_walk_link(path, &nd);
if (error)
return error;
+ error = mnt_want_write(nd.mnt);
+ if (error)
+ return error;
error = setxattr(nd.dentry, name, value, size, flags);
+ mnt_drop_write(nd.mnt);
path_release(&nd);
return error;
}
@@ -241,9 +250,14 @@ sys_fsetxattr(int fd, char __user *name,
f = fget(fd);
if (!f)
return error;
+ error = mnt_want_write(f->f_vfsmnt);
+ if (error)
+ goto out_fput;
dentry = f->f_dentry;
audit_inode(NULL, dentry->d_inode, 0);
error = setxattr(dentry, name, value, size, flags);
+ mnt_drop_write(f->f_vfsmnt);
+out_fput:
fput(f);
return error;
}
diff -puN fs/nfsd/vfs.c~xattr fs/nfsd/vfs.c
diff -puN fs/nfsd/nfs4proc.c~xattr fs/nfsd/nfs4proc.c
--- lxc/fs/nfsd/nfs4proc.c~xattr 2006-06-07 16:53:27.000000000 -0700
+++ lxc-dave/fs/nfsd/nfs4proc.c 2006-06-07 16:53:27.000000000 -0700
@@ -604,13 +604,18 @@ nfsd4_setattr(struct svc_rqst *rqstp, st
return status;
}
}
+ status = mnt_want_write(current_fh->fh_export->ex_mnt);
+ if (status)
+ return status;
status = nfs_ok;
if (setattr->sa_acl != NULL)
status = nfsd4_set_nfs4_acl(rqstp, current_fh, setattr->sa_acl);
if (status)
- return status;
+ goto out;
status = nfsd_setattr(rqstp, current_fh, &setattr->sa_iattr,
0, (time_t)0);
+out:
+ mnt_drop_write(current_fh->fh_export->ex_mnt);
return status;
}
diff -puN fs/namei.c~xattr fs/namei.c
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 25/27] /proc/mounts: prep for flags from sb or mnt
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (23 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 24/27] elevate mount count for extended attributes Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 11:25 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 27/27] create and pass read-only mnt flag into do_loopback() Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 26/27] /proc/mounts: treat ro/rw like the rest Dave Hansen
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Originally from: Herbert Poetzl <herbert@13thfloor.at>
Right now, show_vfsmnt() will produce output such as "ro" or
"nodev" based on sb flags or mount flags. Each output string
can only come from one of these sources.
This patch prepares show_vfsmnt() so that a single string can
come from either source. We need this for r/o bind mounts.
The for loop's terminating condition was getting a little bit
long, so I used ARRAY_SIZE() instead of checking for the NULL
terminator.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namespace.c | 46 ++++++++++++++++++++++++----------------------
1 files changed, 24 insertions(+), 22 deletions(-)
diff -puN fs/namespace.c~D6-proc-show-ro-attr fs/namespace.c
--- lxc/fs/namespace.c~D6-proc-show-ro-attr 2006-06-07 16:53:27.000000000 -0700
+++ lxc-dave/fs/namespace.c 2006-06-07 16:53:27.000000000 -0700
@@ -354,24 +354,22 @@ static int show_vfsmnt(struct seq_file *
{
struct vfsmount *mnt = v;
int err = 0;
+ int i;
static struct proc_fs_info {
- int flag;
- char *str;
+ int s_flag;
+ int mnt_flag;
+ char *set_str;
+ char *unset_str;
} fs_info[] = {
- { MS_SYNCHRONOUS, ",sync" },
- { MS_DIRSYNC, ",dirsync" },
- { MS_MANDLOCK, ",mand" },
- { 0, NULL }
- };
- static struct proc_fs_info mnt_info[] = {
- { MNT_NOSUID, ",nosuid" },
- { MNT_NODEV, ",nodev" },
- { MNT_NOEXEC, ",noexec" },
- { MNT_NOATIME, ",noatime" },
- { MNT_NODIRATIME, ",nodiratime" },
- { 0, NULL }
+ { MS_SYNCHRONOUS, 0, ",sync", NULL },
+ { MS_DIRSYNC, 0, ",dirsync", NULL },
+ { MS_MANDLOCK, 0, ",mand", NULL },
+ { 0, MNT_NOSUID, ",nosuid", NULL },
+ { 0, MNT_NODEV, ",nodev", NULL },
+ { 0, MNT_NOEXEC, ",noexec", NULL },
+ { 0, MNT_NOATIME, ",noatime", NULL },
+ { 0, MNT_NODIRATIME, ",nodiratime", NULL }
};
- struct proc_fs_info *fs_infop;
mangle(m, mnt->mnt_devname ? mnt->mnt_devname : "none");
seq_putc(m, ' ');
@@ -379,13 +377,17 @@ static int show_vfsmnt(struct seq_file *
seq_putc(m, ' ');
mangle(m, mnt->mnt_sb->s_type->name);
seq_puts(m, mnt->mnt_sb->s_flags & MS_RDONLY ? " ro" : " rw");
- for (fs_infop = fs_info; fs_infop->flag; fs_infop++) {
- if (mnt->mnt_sb->s_flags & fs_infop->flag)
- seq_puts(m, fs_infop->str);
- }
- for (fs_infop = mnt_info; fs_infop->flag; fs_infop++) {
- if (mnt->mnt_flags & fs_infop->flag)
- seq_puts(m, fs_infop->str);
+ for (i = 0; i < ARRAY_SIZE(fs_info); i++) {
+ struct proc_fs_info *fs_infop = &fs_info[i];
+ char *str = NULL;
+ if ((mnt->mnt_sb->s_flags & fs_infop->s_flag) ||
+ (mnt->mnt_flags & fs_infop->mnt_flag))
+ str = fs_infop->set_str;
+ else
+ str = fs_infop->unset_str;
+
+ if (str)
+ seq_puts(m, str);
}
if (mnt->mnt_sb->s_op->show_options)
err = mnt->mnt_sb->s_op->show_options(m, mnt);
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 26/27] /proc/mounts: treat ro/rw like the rest
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (25 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 27/27] create and pass read-only mnt flag into do_loopback() Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 11:26 ` Herbert Poetzl
26 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Originally from: Herbert Poetzl <herbert@13thfloor.at>
Now that we have the set/unset fields in the fs_info[]
array, we can stop special-casing the ro/rw bits.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/namespace.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletion(-)
diff -puN fs/namespace.c~D7-proc-show-ro-attr-actual fs/namespace.c
--- lxc/fs/namespace.c~D7-proc-show-ro-attr-actual 2006-06-07 16:53:28.000000000 -0700
+++ lxc-dave/fs/namespace.c 2006-06-07 16:53:28.000000000 -0700
@@ -361,6 +361,7 @@ static int show_vfsmnt(struct seq_file *
char *set_str;
char *unset_str;
} fs_info[] = {
+ { MS_RDONLY, MNT_RDONLY, "ro", "rw" },
{ MS_SYNCHRONOUS, 0, ",sync", NULL },
{ MS_DIRSYNC, 0, ",dirsync", NULL },
{ MS_MANDLOCK, 0, ",mand", NULL },
@@ -376,7 +377,7 @@ static int show_vfsmnt(struct seq_file *
seq_path(m, mnt, mnt->mnt_root, " \t\n\\");
seq_putc(m, ' ');
mangle(m, mnt->mnt_sb->s_type->name);
- seq_puts(m, mnt->mnt_sb->s_flags & MS_RDONLY ? " ro" : " rw");
+ seq_putc(m, ' ');
for (i = 0; i < ARRAY_SIZE(fs_info); i++) {
struct proc_fs_info *fs_infop = &fs_info[i];
char *str = NULL;
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC][PATCH 27/27] create and pass read-only mnt flag into do_loopback()
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
` (24 preceding siblings ...)
2006-06-08 0:10 ` [RFC][PATCH 25/27] /proc/mounts: prep for flags from sb or mnt Dave Hansen
@ 2006-06-08 0:10 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 26/27] /proc/mounts: treat ro/rw like the rest Dave Hansen
26 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 0:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: herbert, viro, hch, trond.myklebust, Dave Hansen
Originally from: Herbert Poetzl <herbert@13thfloor.at>
This is the core of the read-only bind mount patch set.
This patch introduces MNT_RDONLY and ensures that it is
checked before any users may obtain write permissions
for a file through any mount.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/fs/file.c | 1 +
lxc-dave/fs/inode.c | 2 +-
lxc-dave/fs/namespace.c | 8 ++++++--
lxc-dave/include/linux/mount.h | 3 +++
lxc-dave/mm/filemap.c | 3 +++
5 files changed, 14 insertions(+), 3 deletions(-)
diff -puN fs/file.c~D8-actually-add-flags fs/file.c
--- lxc/fs/file.c~D8-actually-add-flags 2006-06-07 16:53:29.000000000 -0700
+++ lxc-dave/fs/file.c 2006-06-07 16:53:29.000000000 -0700
@@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/file.h>
+#include <linux/mount.h>
#include <linux/bitops.h>
#include <linux/interrupt.h>
#include <linux/spinlock.h>
diff -puN fs/inode.c~D8-actually-add-flags fs/inode.c
--- lxc/fs/inode.c~D8-actually-add-flags 2006-06-07 16:53:29.000000000 -0700
+++ lxc-dave/fs/inode.c 2006-06-07 16:53:29.000000000 -0700
@@ -1185,7 +1185,7 @@ void touch_atime(struct vfsmount *mnt, s
struct inode *inode = dentry->d_inode;
struct timespec now;
- if (IS_RDONLY(inode))
+ if (IS_RDONLY(inode) || (mnt->mnt_flags & MNT_RDONLY))
return;
if ((inode->i_flags & S_NOATIME) ||
diff -puN fs/namespace.c~D8-actually-add-flags fs/namespace.c
--- lxc/fs/namespace.c~D8-actually-add-flags 2006-06-07 16:53:29.000000000 -0700
+++ lxc-dave/fs/namespace.c 2006-06-07 16:53:29.000000000 -0700
@@ -905,7 +905,8 @@ static int do_change_type(struct nameida
/*
* do loopback mount.
*/
-static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
+static int do_loopback(struct nameidata *nd, char *old_name,
+ int recurse, int mnt_flags)
{
struct nameidata old_nd;
struct vfsmount *mnt = NULL;
@@ -943,6 +944,7 @@ static int do_loopback(struct nameidata
spin_unlock(&vfsmount_lock);
release_mounts(&umount_list);
}
+ mnt->mnt_flags = mnt_flags;
out:
up_write(&namespace_sem);
@@ -1403,6 +1405,8 @@ long do_mount(char *dev_name, char *dir_
((char *)data_page)[PAGE_SIZE - 1] = 0;
/* Separate the per-mountpoint flags */
+ if (flags & MS_RDONLY)
+ mnt_flags |= MNT_RDONLY;
if (flags & MS_NOSUID)
mnt_flags |= MNT_NOSUID;
if (flags & MS_NODEV)
@@ -1430,7 +1434,7 @@ long do_mount(char *dev_name, char *dir_
retval = do_remount(&nd, flags & ~MS_REMOUNT, mnt_flags,
data_page);
else if (flags & MS_BIND)
- retval = do_loopback(&nd, dev_name, flags & MS_REC);
+ retval = do_loopback(&nd, dev_name, flags & MS_REC, mnt_flags);
else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
retval = do_change_type(&nd, flags);
else if (flags & MS_MOVE)
diff -puN include/linux/mount.h~D8-actually-add-flags include/linux/mount.h
--- lxc/include/linux/mount.h~D8-actually-add-flags 2006-06-07 16:53:29.000000000 -0700
+++ lxc-dave/include/linux/mount.h 2006-06-07 16:53:29.000000000 -0700
@@ -24,6 +24,7 @@
#define MNT_NOEXEC 0x04
#define MNT_NOATIME 0x08
#define MNT_NODIRATIME 0x10
+#define MNT_RDONLY 0x20
#define MNT_SHRINKABLE 0x100
@@ -63,6 +64,8 @@ static inline struct vfsmount *mntget(st
static inline int mnt_want_write(struct vfsmount *mnt)
{
+ if (mnt->mnt_flags & MNT_RDONLY)
+ return -EROFS;
atomic_inc(&mnt->mnt_writers);
return 0;
}
diff -puN mm/filemap.c~D8-actually-add-flags mm/filemap.c
--- lxc/mm/filemap.c~D8-actually-add-flags 2006-06-07 16:53:29.000000000 -0700
+++ lxc-dave/mm/filemap.c 2006-06-07 16:53:29.000000000 -0700
@@ -23,6 +23,7 @@
#include <linux/mman.h>
#include <linux/pagemap.h>
#include <linux/file.h>
+#include <linux/mount.h>
#include <linux/uio.h>
#include <linux/hash.h>
#include <linux/writeback.h>
@@ -1960,6 +1961,8 @@ inline int generic_write_checks(struct f
return -EINVAL;
if (!isblk) {
+ if (file->f_vfsmnt->mnt_flags & MNT_RDONLY)
+ return -EROFS;
/* FIXME: this is for backwards compatibility with 2.4 */
if (file->f_flags & O_APPEND)
*pos = i_size_read(inode);
_
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 01/27] Add vfsmount writer count
2006-06-08 0:10 ` [RFC][PATCH 01/27] Add vfsmount writer count Dave Hansen
@ 2006-06-08 10:33 ` Herbert Poetzl
0 siblings, 0 replies; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 10:33 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:14PM -0700, Dave Hansen wrote:
>
> This allows a vfsmount to keep track of how many instances
> of files open for write there are at a given time. This
> will be useful if it is ever desired to change a mount
> from r/w to r/o at runtime.
>
> A mount can also potentially refuse to allow writers; this
> is why mnt_want_write() has a return value. However, that
> functionality will be added later in the series. For now,
> always allow new writers.
>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/namespace.c | 1 +
> lxc-dave/include/linux/mount.h | 14 ++++++++++++++
> 2 files changed, 15 insertions(+)
>
> diff -puN fs/namespace.c~convert-permission-to-file-and-vfs fs/namespace.c
> --- lxc/fs/namespace.c~convert-permission-to-file-and-vfs 2006-06-07 16:53:11.000000000 -0700
> +++ lxc-dave/fs/namespace.c 2006-06-07 16:53:11.000000000 -0700
> @@ -66,6 +66,7 @@ struct vfsmount *alloc_vfsmnt(const char
> if (mnt) {
> memset(mnt, 0, sizeof(struct vfsmount));
> atomic_set(&mnt->mnt_count, 1);
> + /* atomic_set(&mnt->writer_count, 0); */
IMHO you should not assume that atomic_set(,0) will not
be required after memset(,0,), so doing it here is the
right thing (tm)
> INIT_LIST_HEAD(&mnt->mnt_hash);
> INIT_LIST_HEAD(&mnt->mnt_child);
> INIT_LIST_HEAD(&mnt->mnt_mounts);
> diff -puN include/linux/mount.h~convert-permission-to-file-and-vfs include/linux/mount.h
> --- lxc/include/linux/mount.h~convert-permission-to-file-and-vfs 2006-06-07 16:53:11.000000000 -0700
> +++ lxc-dave/include/linux/mount.h 2006-06-07 16:53:11.000000000 -0700
> @@ -12,6 +12,8 @@
> #define _LINUX_MOUNT_H
> #ifdef __KERNEL__
>
> +#include <linux/err.h>
> +#include <linux/fs.h>
> #include <linux/types.h>
> #include <linux/list.h>
> #include <linux/spinlock.h>
> @@ -38,6 +40,7 @@ struct vfsmount {
> struct list_head mnt_mounts; /* list of children, anchored here */
> struct list_head mnt_child; /* and going through their mnt_child */
> atomic_t mnt_count;
> + atomic_t mnt_writers;
> int mnt_flags;
> int mnt_expiry_mark; /* true if marked for expiry */
> char *mnt_devname; /* Name of device e.g. /dev/dsk/hda1 */
> @@ -58,6 +61,17 @@ static inline struct vfsmount *mntget(st
> return mnt;
> }
>
> +static inline int mnt_want_write(struct vfsmount *mnt)
> +{
> + atomic_inc(&mnt->mnt_writers);
> + return 0;
> +}
> +
> +static inline void mnt_drop_write(struct vfsmount *mnt)
> +{
> + atomic_dec(&mnt->mnt_writers);
> +}
> +
> extern void mntput_no_expire(struct vfsmount *mnt);
> extern void mnt_pin(struct vfsmount *mnt);
> extern void mnt_unpin(struct vfsmount *mnt);
> diff -L convert-permission-to-file-and-vfs -puN /dev/null /dev/null
> diff -L convert-permission-to-file-and-vfs.patch -puN /dev/null /dev/null
> _
best,
Herbert
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 02/27] vfs_rmdir: change if() into goto
2006-06-08 0:10 ` [RFC][PATCH 02/27] vfs_rmdir: change if() into goto Dave Hansen
@ 2006-06-08 10:37 ` Herbert Poetzl
0 siblings, 0 replies; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 10:37 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:15PM -0700, Dave Hansen wrote:
>
> We're going to add another condition here in a minute, so
> add a goto instead of an if.
>
> There will be a lot of patches like this one. They may seem
> too finely-grained, but I did encounter a couple of bugs where
> I botched one of these. If there is a bug in one, this level
> of granularity will really help to track them down with a
> chop-search.
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/namei.c | 9 +++++----
> 1 files changed, 5 insertions(+), 4 deletions(-)
>
> diff -puN fs/namei.c~rmdir fs/namei.c
> --- lxc/fs/namei.c~rmdir 2006-06-07 16:53:12.000000000 -0700
> +++ lxc-dave/fs/namei.c 2006-06-07 16:53:12.000000000 -0700
> @@ -2010,10 +2010,11 @@ static long do_rmdir(int dfd, const char
> mutex_lock_nested(&nd.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
> dentry = lookup_hash(&nd);
> error = PTR_ERR(dentry);
> - if (!IS_ERR(dentry)) {
> - error = vfs_rmdir(nd.dentry->d_inode, dentry);
> - dput(dentry);
> - }
> + if (IS_ERR(dentry))
> + goto exit2;
> + error = vfs_rmdir(nd.dentry->d_inode, dentry);
> + dput(dentry);
> +exit2:
> mutex_unlock(&nd.dentry->d_inode->i_mutex);
> exit1:
> path_release(&nd);
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 03/27] do_rmdir(): elevate write count
2006-06-08 0:10 ` [RFC][PATCH 03/27] do_rmdir(): elevate write count Dave Hansen
@ 2006-06-08 10:42 ` Herbert Poetzl
2006-06-08 15:04 ` Dave Hansen
0 siblings, 1 reply; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 10:42 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:16PM -0700, Dave Hansen wrote:
>
> Elevate the write count during the vfs_rmdir() call. There
> will be a lot of patches similar to this one coming up.
>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/namei.c | 5 +++++
> 1 files changed, 5 insertions(+)
>
> diff -puN fs/namei.c~rmdir1 fs/namei.c
> --- lxc/fs/namei.c~rmdir1 2006-06-07 16:53:13.000000000 -0700
> +++ lxc-dave/fs/namei.c 2006-06-07 16:53:13.000000000 -0700
> @@ -2012,7 +2012,12 @@ static long do_rmdir(int dfd, const char
> error = PTR_ERR(dentry);
> if (IS_ERR(dentry))
> goto exit2;
> + error = mnt_want_write(nd.mnt);
> + if (error)
> + goto exit3;
> error = vfs_rmdir(nd.dentry->d_inode, dentry);
> + mnt_drop_write(nd.mnt);
> +exit3:
IMHO, for consistency this should be named similar in
all the different places, so maybe make that?
out_put:
or maybe:
out_rdonly:
(same goest for the following patches)
otherwise fine
best,
Herbert
> dput(dentry);
> exit2:
> mutex_unlock(&nd.dentry->d_inode->i_mutex);
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 06/27] elevate write count during entire ncp_ioctl()
2006-06-08 0:10 ` [RFC][PATCH 06/27] elevate write count during entire ncp_ioctl() Dave Hansen
@ 2006-06-08 10:44 ` Herbert Poetzl
2006-06-08 15:07 ` Dave Hansen
0 siblings, 1 reply; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 10:44 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:18PM -0700, Dave Hansen wrote:
>
> Some ioctls need write access, but others don't. Make a helper
> function to decode when write access is needed, and take it.
>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> fs/nfsd/vfs.c | 0
> lxc-dave/fs/ncpfs/ioctl.c | 54 +++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 53 insertions(+), 1 deletion(-)
>
> diff -puN fs/namei.c~elevate-writers-file_permission-callers fs/namei.c
> diff -puN fs/open.c~elevate-writers-file_permission-callers fs/open.c
> diff -puN fs/exec.c~elevate-writers-file_permission-callers fs/exec.c
> diff -puN fs/ncpfs/ioctl.c~elevate-writers-file_permission-callers fs/ncpfs/ioctl.c
> --- lxc/fs/ncpfs/ioctl.c~elevate-writers-file_permission-callers 2006-06-07 16:53:15.000000000 -0700
> +++ lxc-dave/fs/ncpfs/ioctl.c 2006-06-07 16:53:15.000000000 -0700
> @@ -183,7 +183,7 @@ ncp_get_charsets(struct ncp_server* serv
> }
> #endif /* CONFIG_NCPFS_NLS */
>
> -int ncp_ioctl(struct inode *inode, struct file *filp,
> +static int __ncp_ioctl(struct inode *inode, struct file *filp,
> unsigned int cmd, unsigned long arg)
> {
> struct ncp_server *server = NCP_SERVER(inode);
> @@ -654,3 +654,55 @@ outrel:
> /* #endif */
> return -EINVAL;
> }
> +
> +static int ncp_ioctl_need_write(unsigned int cmd)
> +{
> + switch (cmd) {
> + case NCP_IOC_GET_FS_INFO:
> + case NCP_IOC_GET_FS_INFO_V2:
> + case NCP_IOC_NCPREQUEST:
> + case NCP_IOC_SETDENTRYTTL:
> + case NCP_IOC_SIGN_INIT:
> + case NCP_IOC_LOCKUNLOCK:
> + case NCP_IOC_SET_SIGN_WANTED:
> + return 0;
> + case NCP_IOC_GETOBJECTNAME:
> + case NCP_IOC_SETOBJECTNAME:
> + case NCP_IOC_GETPRIVATEDATA:
> + case NCP_IOC_SETPRIVATEDATA:
> + case NCP_IOC_SETCHARSETS:
> + case NCP_IOC_GETCHARSETS:
> + case NCP_IOC_CONN_LOGGED_IN:
> + case NCP_IOC_GETDENTRYTTL:
> + case NCP_IOC_GETMOUNTUID2:
> + case NCP_IOC_SIGN_WANTED:
> + case NCP_IOC_GETROOT:
> + case NCP_IOC_SETROOT:
> + return 0;
I'd assume one of those should be a return 1 :)
best,
Herbert
> + default:
> + /* unkown IOCTL command, assume write */
> + WARN_ON();
> + }
> + return 1;
> +}
> +
> +int ncp_ioctl(struct inode *inode, struct file *filp,
> + unsigned int cmd, unsigned long arg)
> +{
> + int ret;
> +
> + if (ncp_ioctl_need_write(cmd)) {
> + /*
> + * inside the ioctl(), any failures which
> + * are because of file_permission() are
> + * -EACCESS, so it seems consistent to keep
> + * that here.
> + */
> + if (mnt_want_write(filp->f_vfsmnt))
> + return -EACCESS;
> + }
> + ret = __ncp_ioctl(inode, filp, cmd, arg);
> + if (ncp_ioctl_need_write(cmd)
> + mnt_drop_write(filp->->f_vfsmnt;
> + return ret;
> +}
> diff -puN fs/nfsd/vfs.c~elevate-writers-file_permission-callers fs/nfsd/vfs.c
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 07/27] sys_mkdirat(): collapse if()
2006-06-08 0:10 ` [RFC][PATCH 07/27] sys_mkdirat(): collapse if() Dave Hansen
@ 2006-06-08 10:46 ` Herbert Poetzl
2006-06-08 15:10 ` Dave Hansen
0 siblings, 1 reply; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 10:46 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:19PM -0700, Dave Hansen wrote:
>
> Take the entire "if (IS_ERR(tmp))" block, and use a goto,
> instead.
>
> This should not make any functional changes.
AFAICR, my test tool was done before the *at funtions
were added to the kernel. did you extend it or test
them in addition to the normal tests?
best,
Herbert
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/namei.c | 38 +++++++++++++++++++-------------------
> 1 files changed, 19 insertions(+), 19 deletions(-)
>
> diff -puN fs/namei.c~sys_mkdir_at-move-cases fs/namei.c
> --- lxc/fs/namei.c~sys_mkdir_at-move-cases 2006-06-07 16:53:16.000000000 -0700
> +++ lxc-dave/fs/namei.c 2006-06-07 16:53:16.000000000 -0700
> @@ -1888,30 +1888,30 @@ asmlinkage long sys_mkdirat(int dfd, con
> {
> int error = 0;
> char * tmp;
> + struct dentry *dentry;
> + struct nameidata nd;
>
> tmp = getname(pathname);
> error = PTR_ERR(tmp);
> - if (!IS_ERR(tmp)) {
> - struct dentry *dentry;
> - struct nameidata nd;
> + if (IS_ERR(tmp))
> + goto out_err;
>
> - error = do_path_lookup(dfd, tmp, LOOKUP_PARENT, &nd);
> - if (error)
> - goto out;
> - dentry = lookup_create(&nd, 1);
> - error = PTR_ERR(dentry);
> - if (!IS_ERR(dentry)) {
> - if (!IS_POSIXACL(nd.dentry->d_inode))
> - mode &= ~current->fs->umask;
> - error = vfs_mkdir(nd.dentry->d_inode, dentry, mode);
> - dput(dentry);
> - }
> - mutex_unlock(&nd.dentry->d_inode->i_mutex);
> - path_release(&nd);
> -out:
> - putname(tmp);
> + error = do_path_lookup(dfd, tmp, LOOKUP_PARENT, &nd);
> + if (error)
> + goto out;
> + dentry = lookup_create(&nd, 1);
> + error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + if (!IS_POSIXACL(nd.dentry->d_inode))
> + mode &= ~current->fs->umask;
> + error = vfs_mkdir(nd.dentry->d_inode, dentry, mode);
> + dput(dentry);
> }
> -
> + mutex_unlock(&nd.dentry->d_inode->i_mutex);
> + path_release(&nd);
> +out:
> + putname(tmp);
> +out_err:
> return error;
> }
>
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 08/27] sys_mkdirat(): one more goto
2006-06-08 0:10 ` [RFC][PATCH 08/27] sys_mkdirat(): one more goto Dave Hansen
@ 2006-06-08 10:48 ` Herbert Poetzl
0 siblings, 0 replies; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 10:48 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:20PM -0700, Dave Hansen wrote:
>
> Add one more goto to collapse another if().
>
> This should not cause any functional chagnes.
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/namei.c | 14 ++++++++------
> 1 files changed, 8 insertions(+), 6 deletions(-)
>
> diff -puN fs/namei.c~sys_mkdir_at-move-cases-another-goto fs/namei.c
> --- lxc/fs/namei.c~sys_mkdir_at-move-cases-another-goto 2006-06-07 16:53:16.000000000 -0700
> +++ lxc-dave/fs/namei.c 2006-06-07 16:53:16.000000000 -0700
> @@ -1901,12 +1901,14 @@ asmlinkage long sys_mkdirat(int dfd, con
> goto out;
> dentry = lookup_create(&nd, 1);
> error = PTR_ERR(dentry);
> - if (!IS_ERR(dentry)) {
> - if (!IS_POSIXACL(nd.dentry->d_inode))
> - mode &= ~current->fs->umask;
> - error = vfs_mkdir(nd.dentry->d_inode, dentry, mode);
> - dput(dentry);
> - }
> + if (IS_ERR(dentry))
> + goto out_unlock;
> +
> + if (!IS_POSIXACL(nd.dentry->d_inode))
> + mode &= ~current->fs->umask;
> + error = vfs_mkdir(nd.dentry->d_inode, dentry, mode);
> + dput(dentry);
> +out_unlock:
> mutex_unlock(&nd.dentry->d_inode->i_mutex);
> path_release(&nd);
> out:
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 11/27] sys_symlinkat() collapse one more if ()
2006-06-08 0:10 ` [RFC][PATCH 11/27] sys_symlinkat() collapse one more if () Dave Hansen
@ 2006-06-08 10:49 ` Herbert Poetzl
0 siblings, 0 replies; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 10:49 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:23PM -0700, Dave Hansen wrote:
>
> Collapse another if() to make way for a new condition.
>
> This patch should make no functional changes.
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/namei.c | 10 ++++++----
> 1 files changed, 6 insertions(+), 4 deletions(-)
>
> diff -puN fs/namei.c~elevate-writers-vfs_symlink-part2 fs/namei.c
> --- lxc/fs/namei.c~elevate-writers-vfs_symlink-part2 2006-06-07 16:53:18.000000000 -0700
> +++ lxc-dave/fs/namei.c 2006-06-07 16:53:18.000000000 -0700
> @@ -2186,10 +2186,12 @@ asmlinkage long sys_symlinkat(const char
> goto out;
> dentry = lookup_create(&nd, 0);
> error = PTR_ERR(dentry);
> - if (!IS_ERR(dentry)) {
> - error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO);
> - dput(dentry);
> - }
> + if (IS_ERR(dentry))
> + goto out_unlock;
> +
> + error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO);
> + dput(dentry);
> +out_unlock:
> mutex_unlock(&nd.dentry->d_inode->i_mutex);
> path_release(&nd);
> out:
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 14/27] tricky: elevate write count files are open()ed
2006-06-08 0:10 ` [RFC][PATCH 14/27] tricky: elevate write count files are open()ed Dave Hansen
@ 2006-06-08 10:54 ` Herbert Poetzl
2006-06-08 15:12 ` Dave Hansen
0 siblings, 1 reply; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 10:54 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:25PM -0700, Dave Hansen wrote:
>
> This is the first really tricky patch in the series. It
> elevates the writer count on a mount each time a
> non-special file is opened for write.
>
> This is not completely apparent in the patch because the
> two if() conditions in may_open() above the
> mnt_want_write() call are, combined, equivalent to
> special_file().
>
> There is also an elevated count around the vfs_create()
> call in open_namei(). The count does not need to be
> kept elevated all the way into the may_open() call because
> after creation, the write bits of the acc_mode are cleared.
> This keeps may_open() from ever failing. Howver, this may
> open one potential race where a change from a r/w to a r/o
> mount could occur between the mnt_drop_write() and may_open()
> allowing a user to obtain a r/w file on what is now a r/w
probably means 'read only mount'
what about using atomic_dec_and_test() to avoid
having such cases (or a lock if required)?
best,
Herbert
> mount. But, this functionality does not yet exist.
>
>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/file_table.c | 5 ++++-
> lxc-dave/fs/namei.c | 29 ++++++++++++++++++++++++++---
> lxc-dave/ipc/mqueue.c | 3 +++
> 3 files changed, 33 insertions(+), 4 deletions(-)
>
> diff -puN fs/namei.c~elevate-writers-opens-part1 fs/namei.c
> --- lxc/fs/namei.c~elevate-writers-opens-part1 2006-06-07 16:53:20.000000000 -0700
> +++ lxc-dave/fs/namei.c 2006-06-07 16:53:20.000000000 -0700
> @@ -1511,8 +1511,17 @@ int may_open(struct nameidata *nd, int a
> return -EACCES;
>
> flag &= ~O_TRUNC;
> - } else if (IS_RDONLY(inode) && (flag & FMODE_WRITE))
> - return -EROFS;
> + } else if (flag & FMODE_WRITE) {
> + /*
> + * effectively: !special_file()
> + * balanced by __fput()
> + */
> + error = mnt_want_write(nd->mnt);
> + if (error)
> + return error;
> + if (IS_RDONLY(inode))
> + return -EROFS;
> + }
> /*
> * An append-only file must be opened in append mode for writing.
> */
> @@ -1642,10 +1651,24 @@ do_last:
> if (!path.dentry->d_inode) {
> if (!IS_POSIXACL(dir->d_inode))
> mode &= ~current->fs->umask;
> - error = vfs_create(dir->d_inode, path.dentry, mode, nd);
> + /*
> + * this serves dual roles, it makes sure there is no
> + * r/o mount, and keeps the write count for what is
> + * the newly created file
> + */
> + error = mnt_want_write(nd->mnt);
> + if (!error)
> + error = vfs_create(dir->d_inode, path.dentry, mode, nd);
> mutex_unlock(&dir->d_inode->i_mutex);
> dput(nd->dentry);
> nd->dentry = path.dentry;
> + /*
> + * Unconditionally drop the write access because
> + * the acc_mode=0 set below will keep may_open()
> + * from ever failing if there was a r/o mount
> + * between here and there
> + */
> + mnt_drop_write(nd->mnt);
> if (error)
> goto exit;
> /* Don't check for write permission, don't truncate */
> diff -puN fs/open.c~elevate-writers-opens-part1 fs/open.c
> diff -puN include/linux/mount.h~elevate-writers-opens-part1 include/linux/mount.h
> diff -puN fs/file_table.c~elevate-writers-opens-part1 fs/file_table.c
> --- lxc/fs/file_table.c~elevate-writers-opens-part1 2006-06-07 16:53:20.000000000 -0700
> +++ lxc-dave/fs/file_table.c 2006-06-07 16:53:20.000000000 -0700
> @@ -180,8 +180,11 @@ void fastcall __fput(struct file *file)
> if (unlikely(inode->i_cdev != NULL))
> cdev_put(inode->i_cdev);
> fops_put(file->f_op);
> - if (file->f_mode & FMODE_WRITE)
> + if (file->f_mode & FMODE_WRITE) {
> put_write_access(inode);
> + if(!special_file(inode->i_mode))
> + mnt_drop_write(mnt);
> + }
> file_kill(file);
> file->f_dentry = NULL;
> file->f_vfsmnt = NULL;
> diff -puN ipc/mqueue.c~elevate-writers-opens-part1 ipc/mqueue.c
> --- lxc/ipc/mqueue.c~elevate-writers-opens-part1 2006-06-07 16:53:20.000000000 -0700
> +++ lxc-dave/ipc/mqueue.c 2006-06-07 16:53:20.000000000 -0700
> @@ -679,6 +679,9 @@ asmlinkage long sys_mq_open(const char _
> goto out;
> filp = do_open(dentry, oflag);
> } else {
> + error = mnt_want_write(mqueue_mnt);
> + if (error)
> + goto out;
> filp = do_create(mqueue_mnt->mnt_root, dentry,
> oflag, mode, u_attr);
> }
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 19/27] sys_faccessat() elevate writer count
2006-06-08 0:10 ` [RFC][PATCH 19/27] sys_faccessat() elevate writer count Dave Hansen
@ 2006-06-08 11:03 ` Herbert Poetzl
2006-06-08 15:15 ` Dave Hansen
0 siblings, 1 reply; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 11:03 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:29PM -0700, Dave Hansen wrote:
this is the first non-trivial change to the
condition checks (regarding goto out_path_release)
why didn't you break that out too?
best,
Herbert
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/open.c | 16 ++++++++++++++--
> 1 files changed, 14 insertions(+), 2 deletions(-)
>
> diff -puN fs/open.c~elevate-writers-opens-part2-sys_faccessat2 fs/open.c
> --- lxc/fs/open.c~elevate-writers-opens-part2-sys_faccessat2 2006-06-07 16:53:23.000000000 -0700
> +++ lxc-dave/fs/open.c 2006-06-07 16:53:23.000000000 -0700
> @@ -542,9 +542,21 @@ asmlinkage long sys_faccessat(int dfd, c
>
> res = vfs_permission(&nd, mode);
> /* SuS v2 requires we report a read only fs too */
> - if(!res && (mode & S_IWOTH) && IS_RDONLY(nd.dentry->d_inode)
> - && !special_file(nd.dentry->d_inode->i_mode))
> + if(res)
> + goto out_path_release;
> + if (!(mode & S_IWOTH))
> + goto out_path_release;
> + if (special_file(nd.dentry->d_inode->i_mode))
> + goto out_path_release;
> +
> + res = mnt_want_write(nd.mnt);
> + if (res) {
> + mnt_drop_write(nd.mnt);
> + goto out_path_release;
> + }
> + if (IS_RDONLY(nd.dentry->d_inode))
> res = -EROFS;
> +out_path_release:
> path_release(&nd);
>
> out:
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 18/27] sys_faccessat(): collapse if()
2006-06-08 0:10 ` [RFC][PATCH 18/27] sys_faccessat(): collapse if() Dave Hansen
@ 2006-06-08 11:05 ` Herbert Poetzl
0 siblings, 0 replies; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 11:05 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:29PM -0700, Dave Hansen wrote:
>
> This patch should make no functional changes.
wrong order, but IMHO this one should do the
goto out_path_release change of the previous
one
best,
Herbert
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/open.c | 18 ++++++++++--------
> 1 files changed, 10 insertions(+), 8 deletions(-)
>
> diff -puN fs/open.c~elevate-writers-opens-part2-sys_faccessat fs/open.c
> --- lxc/fs/open.c~elevate-writers-opens-part2-sys_faccessat 2006-06-07 16:53:23.000000000 -0700
> +++ lxc-dave/fs/open.c 2006-06-07 16:53:23.000000000 -0700
> @@ -537,15 +537,17 @@ asmlinkage long sys_faccessat(int dfd, c
> current->cap_effective = current->cap_permitted;
>
> res = __user_walk_fd(dfd, filename, LOOKUP_FOLLOW|LOOKUP_ACCESS, &nd);
> - if (!res) {
> - res = vfs_permission(&nd, mode);
> - /* SuS v2 requires we report a read only fs too */
> - if(!res && (mode & S_IWOTH) && IS_RDONLY(nd.dentry->d_inode)
> - && !special_file(nd.dentry->d_inode->i_mode))
> - res = -EROFS;
> - path_release(&nd);
> - }
> + if (res)
> + goto out;
>
> + res = vfs_permission(&nd, mode);
> + /* SuS v2 requires we report a read only fs too */
> + if(!res && (mode & S_IWOTH) && IS_RDONLY(nd.dentry->d_inode)
> + && !special_file(nd.dentry->d_inode->i_mode))
> + res = -EROFS;
> + path_release(&nd);
> +
> +out:
> current->fsuid = old_fsuid;
> current->fsgid = old_fsgid;
> current->cap_effective = old_cap;
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 20/27] unix_find_other() elevate write count for touch_atime()
2006-06-08 0:10 ` [RFC][PATCH 20/27] unix_find_other() elevate write count for touch_atime() Dave Hansen
@ 2006-06-08 11:07 ` Herbert Poetzl
0 siblings, 0 replies; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 11:07 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:30PM -0700, Dave Hansen wrote:
>
>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/net/unix/af_unix.c | 16 ++++++++++++----
> 1 files changed, 12 insertions(+), 4 deletions(-)
>
> diff -puN net/unix/af_unix.c~elevate-writers-opens-part4 net/unix/af_unix.c
> --- lxc/net/unix/af_unix.c~elevate-writers-opens-part4 2006-06-07 16:53:24.000000000 -0700
> +++ lxc-dave/net/unix/af_unix.c 2006-06-07 16:53:24.000000000 -0700
> @@ -676,21 +676,27 @@ static struct sock *unix_find_other(stru
> err = path_lookup(sunname->sun_path, LOOKUP_FOLLOW, &nd);
> if (err)
> goto fail;
> +
> + err = mnt_want_write(nd.mnt);
> + if (err)
> + goto put_path_fail;
> +
> err = vfs_permission(&nd, MAY_WRITE);
> if (err)
> - goto put_fail;
> + goto put_write_fail;
mnt_drop_write_fail maybe?
btw, IMHO *drop_write* would be enough for
the error labels, no need for the mnt part
best,
Herbert
> err = -ECONNREFUSED;
> if (!S_ISSOCK(nd.dentry->d_inode->i_mode))
> - goto put_fail;
> + goto put_write_fail;
> u=unix_find_socket_byinode(nd.dentry->d_inode);
> if (!u)
> - goto put_fail;
> + goto put_write_fail;
>
> if (u->sk_type == type)
> touch_atime(nd.mnt, nd.dentry);
>
> path_release(&nd);
> + mnt_drop_write(nd.mnt);
>
> err=-EPROTOTYPE;
> if (u->sk_type != type) {
> @@ -710,7 +716,9 @@ static struct sock *unix_find_other(stru
> }
> return u;
>
> -put_fail:
> +put_write_fail:
> + mnt_drop_write(nd.mnt);
> +put_path_fail:
> path_release(&nd);
> fail:
> *error=err;
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 22/27] sys_mknodat(): elevate write count for vfs_mknod/create()
2006-06-08 0:10 ` [RFC][PATCH 22/27] sys_mknodat(): elevate write count for vfs_mknod/create() Dave Hansen
@ 2006-06-08 11:16 ` Herbert Poetzl
2006-06-08 15:23 ` Dave Hansen
0 siblings, 1 reply; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 11:16 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:35PM -0700, Dave Hansen wrote:
> This takes care of all of the direct callers of vfs_mknod().
> Since a few of these cases also handle normal file creation
> as well, this also covers some calls to vfs_create().
>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> ipc/mqueue.c | 0
> lxc-dave/fs/namei.c | 48 ++++++++++++++++++++++++++++----------------
> lxc-dave/fs/nfsd/vfs.c | 4 +++
> lxc-dave/net/unix/af_unix.c | 4 +++
> 4 files changed, 39 insertions(+), 17 deletions(-)
>
> diff -puN fs/namei.c~elevate-writers-vfs_mknod-try2 fs/namei.c
> --- lxc/fs/namei.c~elevate-writers-vfs_mknod-try2 2006-06-07 16:53:25.000000000 -0700
> +++ lxc-dave/fs/namei.c 2006-06-07 16:53:25.000000000 -0700
> @@ -1852,26 +1852,40 @@ asmlinkage long sys_mknodat(int dfd, con
>
> if (!IS_POSIXACL(nd.dentry->d_inode))
> mode &= ~current->fs->umask;
> - if (!IS_ERR(dentry)) {
> - switch (mode & S_IFMT) {
> - case 0: case S_IFREG:
> - error = vfs_create(nd.dentry->d_inode,dentry,mode,&nd);
> - break;
> - case S_IFCHR: case S_IFBLK:
> - error = vfs_mknod(nd.dentry->d_inode,dentry,mode,
> - new_decode_dev(dev));
> - break;
> - case S_IFIFO: case S_IFSOCK:
> - error = vfs_mknod(nd.dentry->d_inode,dentry,mode,0);
> + if (IS_ERR(dentry))
> + goto out_unlock;
> + error = mnt_want_write(nd.mnt);
> + if (error)
> + goto out_dput;
> + switch (mode & S_IFMT) {
> + case 0: case S_IFREG:
> + error = vfs_create(nd.dentry->d_inode,dentry,mode,&nd);
> + break;
> + case S_IFCHR: case S_IFBLK:
hmm, could you outline some cases where the following
fails, but the previous (above) check succeeded?
(except for dubious locking issues)
> + error = mnt_want_write(nd.mnt);
> + if (error)
> break;
> - case S_IFDIR:
> - error = -EPERM;
> + error = vfs_mknod(nd.dentry->d_inode,dentry,mode,
> + new_decode_dev(dev));
> + mnt_drop_write(nd.mnt);
> + break;
> + case S_IFIFO: case S_IFSOCK:
same here
> + error = mnt_want_write(nd.mnt);
> + if (error)
> break;
> - default:
> - error = -EINVAL;
> - }
> - dput(dentry);
> + error = vfs_mknod(nd.dentry->d_inode,dentry,mode,0);
> + mnt_drop_write(nd.mnt);
> + break;
> + case S_IFDIR:
> + error = -EPERM;
this will give -EPERM on a r/w filesystem, but
-EROFS on r/o, is that consistant with the current
behaviour?
> + break;
> + default:
> + error = -EINVAL;
same here for -EINVAL and -EROFS, maybe the
mnt_want_write() wants to go into vfs_mknod()?
TIA,
Herbert
> }
> + mnt_drop_write(nd.mnt);
> +out_dput:
> + dput(dentry);
> +out_unlock:
> mutex_unlock(&nd.dentry->d_inode->i_mutex);
> path_release(&nd);
> out:
> diff -puN fs/nfsd/vfs.c~elevate-writers-vfs_mknod-try2 fs/nfsd/vfs.c
> --- lxc/fs/nfsd/vfs.c~elevate-writers-vfs_mknod-try2 2006-06-07 16:53:25.000000000 -0700
> +++ lxc-dave/fs/nfsd/vfs.c 2006-06-07 16:53:25.000000000 -0700
> @@ -1153,6 +1153,9 @@ nfsd_create(struct svc_rqst *rqstp, stru
> /*
> * Get the dir op function pointer.
> */
> + err = mnt_want_write(fhp->fh_export->ex_mnt);
> + if (err)
> + goto out_nfserr;
> err = nfserr_perm;
> switch (type) {
> case S_IFREG:
> @@ -1171,6 +1174,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
> printk("nfsd: bad file type %o in nfsd_create\n", type);
> err = -EINVAL;
> }
> + mnt_drop_write(fhp->fh_export->ex_mnt);
> if (err < 0)
> goto out_nfserr;
>
> diff -puN net/unix/af_unix.c~elevate-writers-vfs_mknod-try2 net/unix/af_unix.c
> --- lxc/net/unix/af_unix.c~elevate-writers-vfs_mknod-try2 2006-06-07 16:53:25.000000000 -0700
> +++ lxc-dave/net/unix/af_unix.c 2006-06-07 16:53:25.000000000 -0700
> @@ -789,7 +789,11 @@ static int unix_bind(struct socket *sock
> */
> mode = S_IFSOCK |
> (SOCK_INODE(sock)->i_mode & ~current->fs->umask);
> + err = mnt_want_write(nd.mnt);
> + if (err)
> + goto out_mknod_dput;
> err = vfs_mknod(nd.dentry->d_inode, dentry, mode, 0);
> + mnt_drop_write(nd.mnt);
> if (err)
> goto out_mknod_dput;
> mutex_unlock(&nd.dentry->d_inode->i_mutex);
> diff -puN ipc/mqueue.c~elevate-writers-vfs_mknod-try2 ipc/mqueue.c
> diff -L xattr.patch -puN /dev/null /dev/null
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 23/27] elevate write count over calls to vfs_rename()
2006-06-08 0:10 ` [RFC][PATCH 23/27] elevate write count over calls to vfs_rename() Dave Hansen
@ 2006-06-08 11:23 ` Herbert Poetzl
2006-06-08 15:24 ` Dave Hansen
2006-06-12 18:18 ` Al Viro
1 sibling, 1 reply; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 11:23 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:36PM -0700, Dave Hansen wrote:
>
>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> fs/xattr.c | 0
> lxc-dave/fs/namei.c | 40 ++++++++++++++++++++++++++--------------
> lxc-dave/fs/nfsd/vfs.c | 12 +++++++++++-
> 3 files changed, 37 insertions(+), 15 deletions(-)
>
> diff -puN fs/namei.c~elevate-writers-vfs_rename-part1 fs/namei.c
> --- lxc/fs/namei.c~elevate-writers-vfs_rename-part1 2006-06-07 16:53:26.000000000 -0700
> +++ lxc-dave/fs/namei.c 2006-06-07 16:53:26.000000000 -0700
> @@ -2507,29 +2507,37 @@ static int do_rename(int olddfd, const c
> if (error)
> goto exit;
>
> - error = do_path_lookup(newdfd, newname, LOOKUP_PARENT, &newnd);
> + error = mnt_want_write(oldnd.mnt);
> if (error)
> goto exit1;
same here, I'd suspect it changes the return code
for the case where the 'files' reside on different
mount points from -EXDEV and for the busy case
from -EBUSY to -EROFS ....
don't get me wrong, I'm probably fine with those
changes if they are consistant and do not depend
on whether the inode is RO or the entire mnt point
best,
Herbert
PS: IIRC, my test does not check this case either.
bugger!
>
> + error = do_path_lookup(newdfd, newname, LOOKUP_PARENT, &newnd);
> + if (error)
> + goto exit2;
> +
> + error = mnt_want_write(oldnd.mnt);
> + if (error)
> + goto exit3;
> +
> error = -EXDEV;
> if (oldnd.mnt != newnd.mnt)
> - goto exit2;
> + goto exit4;
>
> old_dir = oldnd.dentry;
> error = -EBUSY;
> if (oldnd.last_type != LAST_NORM)
> - goto exit2;
> + goto exit4;
>
> new_dir = newnd.dentry;
> if (newnd.last_type != LAST_NORM)
> - goto exit2;
> + goto exit4;
>
> trap = lock_rename(new_dir, old_dir);
>
> old_dentry = lookup_hash(&oldnd);
> error = PTR_ERR(old_dentry);
> if (IS_ERR(old_dentry))
> - goto exit3;
> + goto exit5;
> /* source must exist */
> error = -ENOENT;
> if (!old_dentry->d_inode)
> @@ -2538,33 +2546,37 @@ static int do_rename(int olddfd, const c
> if (!S_ISDIR(old_dentry->d_inode->i_mode)) {
> error = -ENOTDIR;
> if (oldnd.last.name[oldnd.last.len])
> - goto exit4;
> + goto exit6;
> if (newnd.last.name[newnd.last.len])
> - goto exit4;
> + goto exit6;
> }
> /* source should not be ancestor of target */
> error = -EINVAL;
> if (old_dentry == trap)
> - goto exit4;
> + goto exit6;
> new_dentry = lookup_hash(&newnd);
> error = PTR_ERR(new_dentry);
> if (IS_ERR(new_dentry))
> - goto exit4;
> + goto exit6;
> /* target should not be an ancestor of source */
> error = -ENOTEMPTY;
> if (new_dentry == trap)
> - goto exit5;
> + goto exit7;
>
> error = vfs_rename(old_dir->d_inode, old_dentry,
> new_dir->d_inode, new_dentry);
> -exit5:
> +exit7:
> dput(new_dentry);
> -exit4:
> +exit6:
> dput(old_dentry);
> -exit3:
> +exit5:
> unlock_rename(new_dir, old_dir);
> -exit2:
> +exit4:
> + mnt_drop_write(newnd.mnt);
> +exit3:
> path_release(&newnd);
> +exit2:
> + mnt_drop_write(oldnd.mnt);
> exit1:
> path_release(&oldnd);
> exit:
> diff -L ser -puN /dev/null /dev/null
> diff -puN fs/ecryptfs/inode.c~elevate-writers-vfs_rename-part1 fs/ecryptfs/inode.c
> diff -puN fs/nfsd/vfs.c~elevate-writers-vfs_rename-part1 fs/nfsd/vfs.c
> --- lxc/fs/nfsd/vfs.c~elevate-writers-vfs_rename-part1 2006-06-07 16:53:26.000000000 -0700
> +++ lxc-dave/fs/nfsd/vfs.c 2006-06-07 16:53:26.000000000 -0700
> @@ -1601,13 +1601,23 @@ nfsd_rename(struct svc_rqst *rqstp, stru
> err = -EPERM;
> } else
> #endif
> + err = mnt_want_write(ffhp->fh_export->ex_mnt);
> + if (err)
> + goto out_dput_new;
> +
> + err = mnt_want_write(tfhp->fh_export->ex_mnt);
> + if (err)
> + goto out_mnt_drop_write_old;
> +
> err = vfs_rename(fdir, odentry, tdir, ndentry);
> if (!err && EX_ISSYNC(tfhp->fh_export)) {
> err = nfsd_sync_dir(tdentry);
> if (!err)
> err = nfsd_sync_dir(fdentry);
> }
> -
> + mnt_drop_write(tfhp->fh_export->ex_mnt);
> + out_mnt_drop_write_old:
> + mnt_drop_write(ffhp->fh_export->ex_mnt);
> out_dput_new:
> dput(ndentry);
> out_dput_old:
> diff -puN fs/nfsd/nfsfh.c~elevate-writers-vfs_rename-part1 fs/nfsd/nfsfh.c
> diff -puN fs/nfsd/nfs3proc.c~elevate-writers-vfs_rename-part1 fs/nfsd/nfs3proc.c
> diff -puN fs/xattr.c~elevate-writers-vfs_rename-part1 fs/xattr.c
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 25/27] /proc/mounts: prep for flags from sb or mnt
2006-06-08 0:10 ` [RFC][PATCH 25/27] /proc/mounts: prep for flags from sb or mnt Dave Hansen
@ 2006-06-08 11:25 ` Herbert Poetzl
0 siblings, 0 replies; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 11:25 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:38PM -0700, Dave Hansen wrote:
>
> Originally from: Herbert Poetzl <herbert@13thfloor.at>
>
> Right now, show_vfsmnt() will produce output such as "ro" or
> "nodev" based on sb flags or mount flags. Each output string
> can only come from one of these sources.
>
> This patch prepares show_vfsmnt() so that a single string can
> come from either source. We need this for r/o bind mounts.
>
> The for loop's terminating condition was getting a little bit
> long, so I used ARRAY_SIZE() instead of checking for the NULL
> terminator.
IMHO this can go in unconditionally, even if there
are no read only MNT flags present, as it is supposed
to work now and in the future ... maybe I should
give that a special test run on its own ...
Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/namespace.c | 46 ++++++++++++++++++++++++----------------------
> 1 files changed, 24 insertions(+), 22 deletions(-)
>
> diff -puN fs/namespace.c~D6-proc-show-ro-attr fs/namespace.c
> --- lxc/fs/namespace.c~D6-proc-show-ro-attr 2006-06-07 16:53:27.000000000 -0700
> +++ lxc-dave/fs/namespace.c 2006-06-07 16:53:27.000000000 -0700
> @@ -354,24 +354,22 @@ static int show_vfsmnt(struct seq_file *
> {
> struct vfsmount *mnt = v;
> int err = 0;
> + int i;
> static struct proc_fs_info {
> - int flag;
> - char *str;
> + int s_flag;
> + int mnt_flag;
> + char *set_str;
> + char *unset_str;
> } fs_info[] = {
> - { MS_SYNCHRONOUS, ",sync" },
> - { MS_DIRSYNC, ",dirsync" },
> - { MS_MANDLOCK, ",mand" },
> - { 0, NULL }
> - };
> - static struct proc_fs_info mnt_info[] = {
> - { MNT_NOSUID, ",nosuid" },
> - { MNT_NODEV, ",nodev" },
> - { MNT_NOEXEC, ",noexec" },
> - { MNT_NOATIME, ",noatime" },
> - { MNT_NODIRATIME, ",nodiratime" },
> - { 0, NULL }
> + { MS_SYNCHRONOUS, 0, ",sync", NULL },
> + { MS_DIRSYNC, 0, ",dirsync", NULL },
> + { MS_MANDLOCK, 0, ",mand", NULL },
> + { 0, MNT_NOSUID, ",nosuid", NULL },
> + { 0, MNT_NODEV, ",nodev", NULL },
> + { 0, MNT_NOEXEC, ",noexec", NULL },
> + { 0, MNT_NOATIME, ",noatime", NULL },
> + { 0, MNT_NODIRATIME, ",nodiratime", NULL }
> };
> - struct proc_fs_info *fs_infop;
>
> mangle(m, mnt->mnt_devname ? mnt->mnt_devname : "none");
> seq_putc(m, ' ');
> @@ -379,13 +377,17 @@ static int show_vfsmnt(struct seq_file *
> seq_putc(m, ' ');
> mangle(m, mnt->mnt_sb->s_type->name);
> seq_puts(m, mnt->mnt_sb->s_flags & MS_RDONLY ? " ro" : " rw");
> - for (fs_infop = fs_info; fs_infop->flag; fs_infop++) {
> - if (mnt->mnt_sb->s_flags & fs_infop->flag)
> - seq_puts(m, fs_infop->str);
> - }
> - for (fs_infop = mnt_info; fs_infop->flag; fs_infop++) {
> - if (mnt->mnt_flags & fs_infop->flag)
> - seq_puts(m, fs_infop->str);
> + for (i = 0; i < ARRAY_SIZE(fs_info); i++) {
> + struct proc_fs_info *fs_infop = &fs_info[i];
> + char *str = NULL;
> + if ((mnt->mnt_sb->s_flags & fs_infop->s_flag) ||
> + (mnt->mnt_flags & fs_infop->mnt_flag))
> + str = fs_infop->set_str;
> + else
> + str = fs_infop->unset_str;
> +
> + if (str)
> + seq_puts(m, str);
> }
> if (mnt->mnt_sb->s_op->show_options)
> err = mnt->mnt_sb->s_op->show_options(m, mnt);
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 26/27] /proc/mounts: treat ro/rw like the rest
2006-06-08 0:10 ` [RFC][PATCH 26/27] /proc/mounts: treat ro/rw like the rest Dave Hansen
@ 2006-06-08 11:26 ` Herbert Poetzl
0 siblings, 0 replies; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 11:26 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:39PM -0700, Dave Hansen wrote:
>
> Originally from: Herbert Poetzl <herbert@13thfloor.at>
>
> Now that we have the set/unset fields in the fs_info[]
> array, we can stop special-casing the ro/rw bits.
Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> ---
>
> lxc-dave/fs/namespace.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletion(-)
>
> diff -puN fs/namespace.c~D7-proc-show-ro-attr-actual fs/namespace.c
> --- lxc/fs/namespace.c~D7-proc-show-ro-attr-actual 2006-06-07 16:53:28.000000000 -0700
> +++ lxc-dave/fs/namespace.c 2006-06-07 16:53:28.000000000 -0700
> @@ -361,6 +361,7 @@ static int show_vfsmnt(struct seq_file *
> char *set_str;
> char *unset_str;
> } fs_info[] = {
> + { MS_RDONLY, MNT_RDONLY, "ro", "rw" },
> { MS_SYNCHRONOUS, 0, ",sync", NULL },
> { MS_DIRSYNC, 0, ",dirsync", NULL },
> { MS_MANDLOCK, 0, ",mand", NULL },
> @@ -376,7 +377,7 @@ static int show_vfsmnt(struct seq_file *
> seq_path(m, mnt, mnt->mnt_root, " \t\n\\");
> seq_putc(m, ' ');
> mangle(m, mnt->mnt_sb->s_type->name);
> - seq_puts(m, mnt->mnt_sb->s_flags & MS_RDONLY ? " ro" : " rw");
> + seq_putc(m, ' ');
> for (i = 0; i < ARRAY_SIZE(fs_info); i++) {
> struct proc_fs_info *fs_infop = &fs_info[i];
> char *str = NULL;
> _
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 03/27] do_rmdir(): elevate write count
2006-06-08 10:42 ` Herbert Poetzl
@ 2006-06-08 15:04 ` Dave Hansen
0 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 15:04 UTC (permalink / raw)
To: Herbert Poetzl; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Thu, 2006-06-08 at 12:42 +0200, Herbert Poetzl wrote:
> > diff -puN fs/namei.c~rmdir1 fs/namei.c
> > --- lxc/fs/namei.c~rmdir1 2006-06-07 16:53:13.000000000 -0700
> > +++ lxc-dave/fs/namei.c 2006-06-07 16:53:13.000000000 -0700
> > @@ -2012,7 +2012,12 @@ static long do_rmdir(int dfd, const char
> > error = PTR_ERR(dentry);
> > if (IS_ERR(dentry))
> > goto exit2;
> > + error = mnt_want_write(nd.mnt);
> > + if (error)
> > + goto exit3;
> > error = vfs_rmdir(nd.dentry->d_inode, dentry);
> > + mnt_drop_write(nd.mnt);
> > +exit3:
>
> IMHO, for consistency this should be named similar in
> all the different places, so maybe make that?
>
> out_put:
>
> or maybe:
>
> out_rdonly:
>
> (same goest for the following patches)
> otherwise fine
I'm not too set on it either way. I tried to be consistent with some of
the style in the functions, especially when they had a _lot_ of goto
conditions. This one is right on the cusp of where I want to rock the
boat. ;)
-- Dave
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 06/27] elevate write count during entire ncp_ioctl()
2006-06-08 10:44 ` Herbert Poetzl
@ 2006-06-08 15:07 ` Dave Hansen
0 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 15:07 UTC (permalink / raw)
To: Herbert Poetzl; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Thu, 2006-06-08 at 12:44 +0200, Herbert Poetzl wrote:
>
> I'd assume one of those should be a return 1 :)
Uh huh. Thanks for catching that. I'll fix it up.
-- Dave
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 07/27] sys_mkdirat(): collapse if()
2006-06-08 10:46 ` Herbert Poetzl
@ 2006-06-08 15:10 ` Dave Hansen
2006-06-08 15:54 ` Herbert Poetzl
0 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 15:10 UTC (permalink / raw)
To: Herbert Poetzl; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Thu, 2006-06-08 at 12:46 +0200, Herbert Poetzl wrote:
> On Wed, Jun 07, 2006 at 05:10:19PM -0700, Dave Hansen wrote:
> >
> > Take the entire "if (IS_ERR(tmp))" block, and use a goto,
> > instead.
> >
> > This should not make any functional changes.
>
> AFAICR, my test tool was done before the *at funtions
> were added to the kernel. did you extend it or test
> them in addition to the normal tests?
No, I didn't extend it. However, I believe all of the internal kernel
implementations which are plain, like sys_mknod() call into the *at()
variants. I could look at extending that test, but I think all of the
code paths are still being exercised.
I actually audited from the bottom up, looking for callers of the
permission functions, and anywhere that checked an inode for a read-only
superblock.
-- Dave
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 14/27] tricky: elevate write count files are open()ed
2006-06-08 10:54 ` Herbert Poetzl
@ 2006-06-08 15:12 ` Dave Hansen
2006-06-08 16:07 ` Herbert Poetzl
0 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 15:12 UTC (permalink / raw)
To: Herbert Poetzl; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Thu, 2006-06-08 at 12:54 +0200, Herbert Poetzl wrote:
> > There is also an elevated count around the vfs_create()
> > call in open_namei(). The count does not need to be
> > kept elevated all the way into the may_open() call because
> > after creation, the write bits of the acc_mode are cleared.
> > This keeps may_open() from ever failing. Howver, this may
> > open one potential race where a change from a r/w to a r/o
> > mount could occur between the mnt_drop_write() and may_open()
> > allowing a user to obtain a r/w file on what is now a r/w
>
> probably means 'read only mount'
>
> what about using atomic_dec_and_test() to avoid
> having such cases (or a lock if required)?
I thought about that, but it isn't that simple. Such a case needs to
have controlled transitions away from when the counter is at '0'. It is
a similar problem as i_writecount and put/deny_write_access(), which end
up having to use a spinlock. I don't think we can do it with simple
atomics.
-- Dave
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 19/27] sys_faccessat() elevate writer count
2006-06-08 11:03 ` Herbert Poetzl
@ 2006-06-08 15:15 ` Dave Hansen
0 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 15:15 UTC (permalink / raw)
To: Herbert Poetzl; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Thu, 2006-06-08 at 13:03 +0200, Herbert Poetzl wrote:
> this is the first non-trivial change to the
> condition checks (regarding goto out_path_release)
> why didn't you break that out too?
Honestly, it was because I did this one before I even tried to compile
and boot the thing. Once I hit a bug or two, I started separating out
the control structure changes from the actual addition of the write
count changes. I admit, it is a bit inconsistent. I can definitely
break things out further in the next set, but I thought 27 patches was
enough for now. ;)
-- Dave
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 22/27] sys_mknodat(): elevate write count for vfs_mknod/create()
2006-06-08 11:16 ` Herbert Poetzl
@ 2006-06-08 15:23 ` Dave Hansen
0 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 15:23 UTC (permalink / raw)
To: Herbert Poetzl; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Thu, 2006-06-08 at 13:16 +0200, Herbert Poetzl wrote:
> On Wed, Jun 07, 2006 at 05:10:35PM -0700, Dave Hansen wrote:
> > This takes care of all of the direct callers of vfs_mknod().
> > Since a few of these cases also handle normal file creation
> > as well, this also covers some calls to vfs_create().
> >
> > Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> > ---
> >
> > ipc/mqueue.c | 0
> > lxc-dave/fs/namei.c | 48 ++++++++++++++++++++++++++++----------------
> > lxc-dave/fs/nfsd/vfs.c | 4 +++
> > lxc-dave/net/unix/af_unix.c | 4 +++
> > 4 files changed, 39 insertions(+), 17 deletions(-)
> >
> > diff -puN fs/namei.c~elevate-writers-vfs_mknod-try2 fs/namei.c
> > --- lxc/fs/namei.c~elevate-writers-vfs_mknod-try2 2006-06-07 16:53:25.000000000 -0700
> > +++ lxc-dave/fs/namei.c 2006-06-07 16:53:25.000000000 -0700
> > @@ -1852,26 +1852,40 @@ asmlinkage long sys_mknodat(int dfd, con
> >
> > if (!IS_POSIXACL(nd.dentry->d_inode))
> > mode &= ~current->fs->umask;
> > - if (!IS_ERR(dentry)) {
> > - switch (mode & S_IFMT) {
> > - case 0: case S_IFREG:
> > - error = vfs_create(nd.dentry->d_inode,dentry,mode,&nd);
> > - break;
> > - case S_IFCHR: case S_IFBLK:
> > - error = vfs_mknod(nd.dentry->d_inode,dentry,mode,
> > - new_decode_dev(dev));
> > - break;
> > - case S_IFIFO: case S_IFSOCK:
> > - error = vfs_mknod(nd.dentry->d_inode,dentry,mode,0);
> > + if (IS_ERR(dentry))
> > + goto out_unlock;
> > + error = mnt_want_write(nd.mnt);
> > + if (error)
> > + goto out_dput;
> > + switch (mode & S_IFMT) {
> > + case 0: case S_IFREG:
> > + error = vfs_create(nd.dentry->d_inode,dentry,mode,&nd);
> > + break;
> > + case S_IFCHR: case S_IFBLK:
>
> hmm, could you outline some cases where the following
> fails, but the previous (above) check succeeded?
> (except for dubious locking issues)
Are you asking about the vfs_create() call, or the second
mnt_want_write() call?
I appear to have munged two patches together here. One which elevates
the count over the entire switch(), and the other which does the
elevation only around the actual vfs_*() calls. It should only have the
lower-level calls.
> > + error = mnt_want_write(nd.mnt);
> > + if (error)
> > break;
> > - case S_IFDIR:
> > - error = -EPERM;
> > + error = vfs_mknod(nd.dentry->d_inode,dentry,mode,
> > + new_decode_dev(dev));
> > + mnt_drop_write(nd.mnt);
> > + break;
> > + case S_IFIFO: case S_IFSOCK:
>
> same here
>
> > + error = mnt_want_write(nd.mnt);
> > + if (error)
> > break;
> > - default:
> > - error = -EINVAL;
> > - }
> > - dput(dentry);
> > + error = vfs_mknod(nd.dentry->d_inode,dentry,mode,0);
> > + mnt_drop_write(nd.mnt);
> > + break;
> > + case S_IFDIR:
> > + error = -EPERM;
>
> this will give -EPERM on a r/w filesystem, but
> -EROFS on r/o, is that consistant with the current
> behaviour?
Nope. That's a bug, I think. I'll fix it up.
> > + break;
> > + default:
> > + error = -EINVAL;
>
> same here for -EINVAL and -EROFS, maybe the
> mnt_want_write() wants to go into vfs_mknod()?
I think I tried that once, and it didn't quite work out. In this case,
I think just giving vfs_mknod() the same treatment that I gave
vfs_create() should do for now.
-- Dave
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 23/27] elevate write count over calls to vfs_rename()
2006-06-08 11:23 ` Herbert Poetzl
@ 2006-06-08 15:24 ` Dave Hansen
0 siblings, 0 replies; 55+ messages in thread
From: Dave Hansen @ 2006-06-08 15:24 UTC (permalink / raw)
To: Herbert Poetzl; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Thu, 2006-06-08 at 13:23 +0200, Herbert Poetzl wrote:
> On Wed, Jun 07, 2006 at 05:10:36PM -0700, Dave Hansen wrote:
> > diff -puN fs/namei.c~elevate-writers-vfs_rename-part1 fs/namei.c
> > --- lxc/fs/namei.c~elevate-writers-vfs_rename-part1 2006-06-07 16:53:26.000000000 -0700
> > +++ lxc-dave/fs/namei.c 2006-06-07 16:53:26.000000000 -0700
> > @@ -2507,29 +2507,37 @@ static int do_rename(int olddfd, const c
> > if (error)
> > goto exit;
> >
> > - error = do_path_lookup(newdfd, newname, LOOKUP_PARENT, &newnd);
> > + error = mnt_want_write(oldnd.mnt);
> > if (error)
> > goto exit1;
>
> same here, I'd suspect it changes the return code
> for the case where the 'files' reside on different
> mount points from -EXDEV and for the busy case
> from -EBUSY to -EROFS ....
>
> don't get me wrong, I'm probably fine with those
> changes if they are consistant and do not depend
> on whether the inode is RO or the entire mnt point
I think I might be able to fix this one up a bit by changing the order
around. I'll have a good look at it.
-- Dave
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 07/27] sys_mkdirat(): collapse if()
2006-06-08 15:10 ` Dave Hansen
@ 2006-06-08 15:54 ` Herbert Poetzl
0 siblings, 0 replies; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 15:54 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Thu, Jun 08, 2006 at 08:10:40AM -0700, Dave Hansen wrote:
> On Thu, 2006-06-08 at 12:46 +0200, Herbert Poetzl wrote:
> > On Wed, Jun 07, 2006 at 05:10:19PM -0700, Dave Hansen wrote:
> > >
> > > Take the entire "if (IS_ERR(tmp))" block, and use a goto,
> > > instead.
> > >
> > > This should not make any functional changes.
> >
> > AFAICR, my test tool was done before the *at funtions
> > were added to the kernel. did you extend it or test
> > them in addition to the normal tests?
>
> No, I didn't extend it. However, I believe all of the internal kernel
> implementations which are plain, like sys_mknod() call into the *at()
> variants. I could look at extending that test, but I think all of the
> code paths are still being exercised.
>
> I actually audited from the bottom up, looking for callers of the
> permission functions, and anywhere that checked an inode for a
> read-only superblock.
well, this is a good start, but not nearly sufficient
to ensure that the semantics are not changed, see
my other replies to that issue ...
best,
Herbert
> -- Dave
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 14/27] tricky: elevate write count files are open()ed
2006-06-08 15:12 ` Dave Hansen
@ 2006-06-08 16:07 ` Herbert Poetzl
0 siblings, 0 replies; 55+ messages in thread
From: Herbert Poetzl @ 2006-06-08 16:07 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, viro, hch, trond.myklebust
On Thu, Jun 08, 2006 at 08:12:52AM -0700, Dave Hansen wrote:
> On Thu, 2006-06-08 at 12:54 +0200, Herbert Poetzl wrote:
> > > There is also an elevated count around the vfs_create()
> > > call in open_namei(). The count does not need to be
> > > kept elevated all the way into the may_open() call because
> > > after creation, the write bits of the acc_mode are cleared.
> > > This keeps may_open() from ever failing. Howver, this may
> > > open one potential race where a change from a r/w to a r/o
> > > mount could occur between the mnt_drop_write() and may_open()
> > > allowing a user to obtain a r/w file on what is now a r/w
> >
> > probably means 'read only mount'
> >
> > what about using atomic_dec_and_test() to avoid
> > having such cases (or a lock if required)?
>
> I thought about that, but it isn't that simple. Such a case needs to
> have controlled transitions away from when the counter is at '0'. It is
> a similar problem as i_writecount and put/deny_write_access(), which end
> up having to use a spinlock. I don't think we can do it with simple
> atomics.
hmm, care to elaborate? IIRC, certain parts like
the entire md subsystem,as well as parts of the
procfs use it quite fine ...
TIA,
Herbert
> -- Dave
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 23/27] elevate write count over calls to vfs_rename()
2006-06-08 0:10 ` [RFC][PATCH 23/27] elevate write count over calls to vfs_rename() Dave Hansen
2006-06-08 11:23 ` Herbert Poetzl
@ 2006-06-12 18:18 ` Al Viro
2006-06-12 18:29 ` Dave Hansen
1 sibling, 1 reply; 55+ messages in thread
From: Al Viro @ 2006-06-12 18:18 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, herbert, hch, trond.myklebust
On Wed, Jun 07, 2006 at 05:10:36PM -0700, Dave Hansen wrote:
> + error = mnt_want_write(oldnd.mnt);
> if (error)
> goto exit1;
>
> + error = do_path_lookup(newdfd, newname, LOOKUP_PARENT, &newnd);
> + if (error)
> + goto exit2;
> +
> + error = mnt_want_write(oldnd.mnt);
WTF? Why raise it twice?
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 23/27] elevate write count over calls to vfs_rename()
2006-06-12 18:18 ` Al Viro
@ 2006-06-12 18:29 ` Dave Hansen
2006-06-12 19:03 ` Al Viro
0 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2006-06-12 18:29 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, herbert, hch, trond.myklebust
On Mon, 2006-06-12 at 19:18 +0100, Al Viro wrote:
> On Wed, Jun 07, 2006 at 05:10:36PM -0700, Dave Hansen wrote:
> > + error = mnt_want_write(oldnd.mnt);
> > if (error)
> > goto exit1;
> >
> > + error = do_path_lookup(newdfd, newname, LOOKUP_PARENT, &newnd);
> > + if (error)
> > + goto exit2;
> > +
> > + error = mnt_want_write(oldnd.mnt);
>
> WTF? Why raise it twice?
That's a bug. The second raise should be for newnd.mnt. The fix will
be there in the next set.
-- Dave
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 23/27] elevate write count over calls to vfs_rename()
2006-06-12 18:29 ` Dave Hansen
@ 2006-06-12 19:03 ` Al Viro
0 siblings, 0 replies; 55+ messages in thread
From: Al Viro @ 2006-06-12 19:03 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-fsdevel, herbert, hch, trond.myklebust
On Mon, Jun 12, 2006 at 11:29:24AM -0700, Dave Hansen wrote:
> On Mon, 2006-06-12 at 19:18 +0100, Al Viro wrote:
> > On Wed, Jun 07, 2006 at 05:10:36PM -0700, Dave Hansen wrote:
> > > + error = mnt_want_write(oldnd.mnt);
> > > if (error)
> > > goto exit1;
> > >
> > > + error = do_path_lookup(newdfd, newname, LOOKUP_PARENT, &newnd);
> > > + if (error)
> > > + goto exit2;
> > > +
> > > + error = mnt_want_write(oldnd.mnt);
> >
> > WTF? Why raise it twice?
>
> That's a bug. The second raise should be for newnd.mnt.
No, it should not. These two should be collapsed into one and done after
we'd checked that vfsmounts are equal
^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2006-06-12 19:03 UTC | newest]
Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-08 0:10 [RFC][PATCH 00/27] Read-only bind mounts Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 01/27] Add vfsmount writer count Dave Hansen
2006-06-08 10:33 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 02/27] vfs_rmdir: change if() into goto Dave Hansen
2006-06-08 10:37 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 04/27] elevate mnt writers for vfs_unlink() callers Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 03/27] do_rmdir(): elevate write count Dave Hansen
2006-06-08 10:42 ` Herbert Poetzl
2006-06-08 15:04 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 05/27] elevate mnt writers for nfsd caller of vfs_mkdir() Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 06/27] elevate write count during entire ncp_ioctl() Dave Hansen
2006-06-08 10:44 ` Herbert Poetzl
2006-06-08 15:07 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 07/27] sys_mkdirat(): collapse if() Dave Hansen
2006-06-08 10:46 ` Herbert Poetzl
2006-06-08 15:10 ` Dave Hansen
2006-06-08 15:54 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 09/27] elevate mnt writers for sys_mkdirat() call of vfs_mkdir() Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 08/27] sys_mkdirat(): one more goto Dave Hansen
2006-06-08 10:48 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 10/27] sys_symlinkat() collapse if()s Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 11/27] sys_symlinkat() collapse one more if () Dave Hansen
2006-06-08 10:49 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 13/27] sys_linkat(): elevate write count around vfs_link() Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 12/27] sys_symlinkat() elevate write count around vfs_symlink() Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 14/27] tricky: elevate write count files are open()ed Dave Hansen
2006-06-08 10:54 ` Herbert Poetzl
2006-06-08 15:12 ` Dave Hansen
2006-06-08 16:07 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 15/27] elevate writer count for do_sys_truncate() Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 16/27] elevate write count for do_sys_utime() Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 17/27] elevate write count for do_utimes() Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 18/27] sys_faccessat(): collapse if() Dave Hansen
2006-06-08 11:05 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 19/27] sys_faccessat() elevate writer count Dave Hansen
2006-06-08 11:03 ` Herbert Poetzl
2006-06-08 15:15 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 20/27] unix_find_other() elevate write count for touch_atime() Dave Hansen
2006-06-08 11:07 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 21/27] mount_is_safe(): add comment Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 22/27] sys_mknodat(): elevate write count for vfs_mknod/create() Dave Hansen
2006-06-08 11:16 ` Herbert Poetzl
2006-06-08 15:23 ` Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 23/27] elevate write count over calls to vfs_rename() Dave Hansen
2006-06-08 11:23 ` Herbert Poetzl
2006-06-08 15:24 ` Dave Hansen
2006-06-12 18:18 ` Al Viro
2006-06-12 18:29 ` Dave Hansen
2006-06-12 19:03 ` Al Viro
2006-06-08 0:10 ` [RFC][PATCH 24/27] elevate mount count for extended attributes Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 25/27] /proc/mounts: prep for flags from sb or mnt Dave Hansen
2006-06-08 11:25 ` Herbert Poetzl
2006-06-08 0:10 ` [RFC][PATCH 27/27] create and pass read-only mnt flag into do_loopback() Dave Hansen
2006-06-08 0:10 ` [RFC][PATCH 26/27] /proc/mounts: treat ro/rw like the rest Dave Hansen
2006-06-08 11:26 ` Herbert Poetzl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).