linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] do remounts without consulting sb->s_files list
@ 2008-04-29 18:59 Dave Hansen
  2008-04-29 18:59 ` [RFC][PATCH 1/5] r/o bind mounts: change spinlock to mutex Dave Hansen
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Dave Hansen @ 2008-04-29 18:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, viro, hch, trond.myklebust, Dave Hansen

This series implements the ability to check a superblock
for writers without consulting the sb->s_files list.  This
is on Al and Christoph's wishlist as something that can
happen now that we have r/o bind mounts merged.

Trond, I'm ccing you because there is some minor NFS
touching going on here.

-- Dave

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 1/5] r/o bind mounts: change spinlock to mutex
  2008-04-29 18:59 [RFC][PATCH 0/5] do remounts without consulting sb->s_files list Dave Hansen
@ 2008-04-29 18:59 ` Dave Hansen
  2008-04-29 18:59 ` [RFC][PATCH 2/5] introduce simple_set_mnt_no_get() helper for NFS Dave Hansen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2008-04-29 18:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, viro, hch, trond.myklebust, Dave Hansen


In order to lock out new writers during remount operations,
we need to hold the cpu_writer->lock's.  But, remounts can
sleep, so we can not use spinlocks.  Make them mutexes.

We do the fiddling in mnt_drop_write() because we used to
rely on the spin_lock() disabling preemption.  Since the
mutex does not do that, we have to use another method
to avoid underflow.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/fs/namespace.c |   52 +++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 23 deletions(-)

diff -puN fs/namespace.c~make-cpu_writer-lock-a-mutex fs/namespace.c
--- linux-2.6.git/fs/namespace.c~make-cpu_writer-lock-a-mutex	2008-04-29 11:56:39.000000000 -0700
+++ linux-2.6.git-dave/fs/namespace.c	2008-04-29 11:56:39.000000000 -0700
@@ -173,7 +173,7 @@ struct mnt_writer {
 	 * If holding multiple instances of this lock, they
 	 * must be ordered by cpu number.
 	 */
-	spinlock_t lock;
+	struct mutex lock;
 	struct lock_class_key lock_class; /* compiles out with !lockdep */
 	unsigned long count;
 	struct vfsmount *mnt;
@@ -185,7 +185,7 @@ static int __init init_mnt_writers(void)
 	int cpu;
 	for_each_possible_cpu(cpu) {
 		struct mnt_writer *writer = &per_cpu(mnt_writers, cpu);
-		spin_lock_init(&writer->lock);
+		mutex_init(&writer->lock);
 		lockdep_set_class(&writer->lock, &writer->lock_class);
 		writer->count = 0;
 	}
@@ -200,7 +200,7 @@ static void unlock_mnt_writers(void)
 
 	for_each_possible_cpu(cpu) {
 		cpu_writer = &per_cpu(mnt_writers, cpu);
-		spin_unlock(&cpu_writer->lock);
+		mutex_unlock(&cpu_writer->lock);
 	}
 }
 
@@ -252,8 +252,8 @@ int mnt_want_write(struct vfsmount *mnt)
 	int ret = 0;
 	struct mnt_writer *cpu_writer;
 
-	cpu_writer = &get_cpu_var(mnt_writers);
-	spin_lock(&cpu_writer->lock);
+	cpu_writer = &__get_cpu_var(mnt_writers);
+	mutex_lock(&cpu_writer->lock);
 	if (__mnt_is_readonly(mnt)) {
 		ret = -EROFS;
 		goto out;
@@ -261,8 +261,7 @@ int mnt_want_write(struct vfsmount *mnt)
 	use_cpu_writer_for_mount(cpu_writer, mnt);
 	cpu_writer->count++;
 out:
-	spin_unlock(&cpu_writer->lock);
-	put_cpu_var(mnt_writers);
+	mutex_unlock(&cpu_writer->lock);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(mnt_want_write);
@@ -274,7 +273,7 @@ static void lock_mnt_writers(void)
 
 	for_each_possible_cpu(cpu) {
 		cpu_writer = &per_cpu(mnt_writers, cpu);
-		spin_lock(&cpu_writer->lock);
+		mutex_lock(&cpu_writer->lock);
 		__clear_mnt_count(cpu_writer);
 		cpu_writer->mnt = NULL;
 	}
@@ -333,18 +332,34 @@ void mnt_drop_write(struct vfsmount *mnt
 	int must_check_underflow = 0;
 	struct mnt_writer *cpu_writer;
 
-	cpu_writer = &get_cpu_var(mnt_writers);
-	spin_lock(&cpu_writer->lock);
+retry:
+	cpu_writer = &__get_cpu_var(mnt_writers);
+	mutex_lock(&cpu_writer->lock);
 
 	use_cpu_writer_for_mount(cpu_writer, mnt);
 	if (cpu_writer->count > 0) {
 		cpu_writer->count--;
 	} else {
-		must_check_underflow = 1;
+		/*
+		 * Without this check, it is theoretically
+		 * possible for large number of processes
+		 * to atomic_dec(__mnt_writers) and cause
+		 * it to underflow.  Because we are under
+		 * the mutex (and there are NR_CPUS
+		 * mutexes), this limits the number of
+		 * concurrent decrements and underflow
+		 * level to NR_CPUS.
+		 */
+		if (atomic_read(&mnt->__mnt_writers) <
+			MNT_WRITER_UNDERFLOW_LIMIT) {
+			mutex_unlock(&cpu_writer->lock);
+			goto retry;
+		}
 		atomic_dec(&mnt->__mnt_writers);
+		must_check_underflow = 1;
 	}
 
-	spin_unlock(&cpu_writer->lock);
+	mutex_unlock(&cpu_writer->lock);
 	/*
 	 * Logically, we could call this each time,
 	 * but the __mnt_writers cacheline tends to
@@ -352,15 +367,6 @@ void mnt_drop_write(struct vfsmount *mnt
 	 */
 	if (must_check_underflow)
 		handle_write_count_underflow(mnt);
-	/*
-	 * This could be done right after the spinlock
-	 * is taken because the spinlock keeps us on
-	 * the cpu, and disables preemption.  However,
-	 * putting it here bounds the amount that
-	 * __mnt_writers can underflow.  Without it,
-	 * we could theoretically wrap __mnt_writers.
-	 */
-	put_cpu_var(mnt_writers);
 }
 EXPORT_SYMBOL_GPL(mnt_drop_write);
 
@@ -615,7 +621,7 @@ static inline void __mntput(struct vfsmo
 		struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu);
 		if (cpu_writer->mnt != mnt)
 			continue;
-		spin_lock(&cpu_writer->lock);
+		mutex_lock(&cpu_writer->lock);
 		atomic_add(cpu_writer->count, &mnt->__mnt_writers);
 		cpu_writer->count = 0;
 		/*
@@ -624,7 +630,7 @@ static inline void __mntput(struct vfsmo
 		 * it to be valid.
 		 */
 		cpu_writer->mnt = NULL;
-		spin_unlock(&cpu_writer->lock);
+		mutex_unlock(&cpu_writer->lock);
 	}
 	/*
 	 * This probably indicates that somebody messed
_

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 2/5] introduce simple_set_mnt_no_get() helper for NFS
  2008-04-29 18:59 [RFC][PATCH 0/5] do remounts without consulting sb->s_files list Dave Hansen
  2008-04-29 18:59 ` [RFC][PATCH 1/5] r/o bind mounts: change spinlock to mutex Dave Hansen
@ 2008-04-29 18:59 ` Dave Hansen
  2008-04-29 21:34   ` Trond Myklebust
  2008-04-29 18:59 ` [RFC][PATCH 3/5] reintroduce list of vfsmounts over superblock Dave Hansen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2008-04-29 18:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, viro, hch, trond.myklebust, Dave Hansen


The next patch will introduce a list of all mounts for a given
superblock.  In order to keep the list, we need to make sure
all filesystems attaching a mount to a superblock get added
to this list.

NFS currently bypasses the simple_set_mnt() function, and sets
mnt_sb directly.  This patch makes it use a helper function,
instead.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/fs/namespace.c     |   11 +++++++++--
 linux-2.6.git-dave/fs/nfs/super.c     |   10 +++++-----
 linux-2.6.git-dave/include/linux/fs.h |    1 +
 3 files changed, 15 insertions(+), 7 deletions(-)

diff -puN fs/namespace.c~introduce_simple_set_mnt_no_get_helper_for_NFS fs/namespace.c
--- linux-2.6.git/fs/namespace.c~introduce_simple_set_mnt_no_get_helper_for_NFS	2008-04-29 11:56:39.000000000 -0700
+++ linux-2.6.git-dave/fs/namespace.c	2008-04-29 11:56:39.000000000 -0700
@@ -402,12 +402,19 @@ static void __mnt_unmake_readonly(struct
 	spin_unlock(&vfsmount_lock);
 }
 
-int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
+int simple_set_mnt_no_get(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
-	mnt->mnt_root = dget(sb->s_root);
+	add_mount_to_sb_list(mnt, sb);
 	return 0;
 }
+EXPORT_SYMBOL(simple_set_mnt_no_get);
+
+int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
+{
+	mnt->mnt_root = dget(sb->s_root);
+	return simple_set_mnt_no_get(mnt, sb);
+}
 
 EXPORT_SYMBOL(simple_set_mnt);
 
diff -puN fs/nfs/super.c~introduce_simple_set_mnt_no_get_helper_for_NFS fs/nfs/super.c
--- linux-2.6.git/fs/nfs/super.c~introduce_simple_set_mnt_no_get_helper_for_NFS	2008-04-29 11:56:39.000000000 -0700
+++ linux-2.6.git-dave/fs/nfs/super.c	2008-04-29 11:56:39.000000000 -0700
@@ -1635,7 +1635,7 @@ static int nfs_get_sb(struct file_system
 		goto error_splat_root;
 
 	s->s_flags |= MS_ACTIVE;
-	mnt->mnt_sb = s;
+	simple_set_mnt_no_get(mnt, s);
 	mnt->mnt_root = mntroot;
 	error = 0;
 
@@ -1727,7 +1727,7 @@ static int nfs_xdev_get_sb(struct file_s
 	}
 
 	s->s_flags |= MS_ACTIVE;
-	mnt->mnt_sb = s;
+	simple_set_mnt_no_get(mnt, s);
 	mnt->mnt_root = mntroot;
 
 	/* clone any lsm security options from the parent to the new sb */
@@ -1998,7 +1998,7 @@ static int nfs4_get_sb(struct file_syste
 	}
 
 	s->s_flags |= MS_ACTIVE;
-	mnt->mnt_sb = s;
+	simple_set_mnt_no_get(mnt, s);
 	mnt->mnt_root = mntroot;
 	error = 0;
 
@@ -2089,7 +2089,7 @@ static int nfs4_xdev_get_sb(struct file_
 	}
 
 	s->s_flags |= MS_ACTIVE;
-	mnt->mnt_sb = s;
+	simple_set_mnt_no_get(mnt, s);
 	mnt->mnt_root = mntroot;
 
 	dprintk("<-- nfs4_xdev_get_sb() = 0\n");
@@ -2168,7 +2168,7 @@ static int nfs4_referral_get_sb(struct f
 	}
 
 	s->s_flags |= MS_ACTIVE;
-	mnt->mnt_sb = s;
+	simple_set_mnt_no_get(mnt, s);
 	mnt->mnt_root = mntroot;
 
 	dprintk("<-- nfs4_referral_get_sb() = 0\n");
diff -puN include/linux/fs.h~introduce_simple_set_mnt_no_get_helper_for_NFS include/linux/fs.h
--- linux-2.6.git/include/linux/fs.h~introduce_simple_set_mnt_no_get_helper_for_NFS	2008-04-29 11:56:39.000000000 -0700
+++ linux-2.6.git-dave/include/linux/fs.h	2008-04-29 11:56:39.000000000 -0700
@@ -1521,6 +1521,7 @@ extern int get_sb_pseudo(struct file_sys
 	const struct super_operations *ops, unsigned long,
 	struct vfsmount *mnt);
 extern int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb);
+extern int simple_set_mnt_no_get(struct vfsmount *mnt, struct super_block *sb);
 int __put_super(struct super_block *sb);
 int __put_super_and_need_restart(struct super_block *sb);
 void unnamed_dev_init(void);
_

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 3/5] reintroduce list of vfsmounts over superblock
  2008-04-29 18:59 [RFC][PATCH 0/5] do remounts without consulting sb->s_files list Dave Hansen
  2008-04-29 18:59 ` [RFC][PATCH 1/5] r/o bind mounts: change spinlock to mutex Dave Hansen
  2008-04-29 18:59 ` [RFC][PATCH 2/5] introduce simple_set_mnt_no_get() helper for NFS Dave Hansen
@ 2008-04-29 18:59 ` Dave Hansen
  2008-04-29 18:59 ` [RFC][PATCH 4/5] must hold lock_super() to set initial mount writer Dave Hansen
  2008-04-29 18:59 ` [RFC][PATCH 5/5] check mount writers at superblock remount Dave Hansen
  4 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2008-04-29 18:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, viro, hch, trond.myklebust, Dave Hansen


We're moving a big chunk of the burden of keeping people from
writing to r/o filesystems from the superblock into the
vfsmount.  This requires that we consult the superblock's
vfsmounts when things like remounts occur.

So, introduce a list of vfsmounts hanging off the superblock.
We'll use this in a bit.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/fs/namespace.c        |   12 ++++++++++++
 linux-2.6.git-dave/fs/super.c            |    1 +
 linux-2.6.git-dave/include/linux/fs.h    |    1 +
 linux-2.6.git-dave/include/linux/mount.h |    1 +
 4 files changed, 15 insertions(+)

diff -puN fs/namespace.c~reintroduce_list_of_vfsmounts_over_superblock fs/namespace.c
--- linux-2.6.git/fs/namespace.c~reintroduce_list_of_vfsmounts_over_superblock	2008-04-29 11:56:40.000000000 -0700
+++ linux-2.6.git-dave/fs/namespace.c	2008-04-29 11:56:40.000000000 -0700
@@ -402,6 +402,13 @@ static void __mnt_unmake_readonly(struct
 	spin_unlock(&vfsmount_lock);
 }
 
+void add_mount_to_sb_list(struct vfsmount *mnt, struct super_block *sb)
+{
+	spin_lock(&vfsmount_lock);
+	list_add(&mnt->mnt_sb_list, &sb->s_vfsmounts);
+	spin_unlock(&vfsmount_lock);
+}
+
 int simple_set_mnt_no_get(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
@@ -420,6 +427,10 @@ EXPORT_SYMBOL(simple_set_mnt);
 
 void free_vfsmnt(struct vfsmount *mnt)
 {
+	spin_lock(&vfsmount_lock);
+	if (mnt->mnt_sb)
+		list_del(&mnt->mnt_sb_list);
+	spin_unlock(&vfsmount_lock);
 	kfree(mnt->mnt_devname);
 	mnt_free_id(mnt);
 	kmem_cache_free(mnt_cache, mnt);
@@ -585,6 +596,7 @@ static struct vfsmount *clone_mnt(struct
 		mnt->mnt_root = dget(root);
 		mnt->mnt_mountpoint = mnt->mnt_root;
 		mnt->mnt_parent = mnt;
+		add_mount_to_sb_list(mnt, sb);
 
 		if (flag & CL_SLAVE) {
 			list_add(&mnt->mnt_slave, &old->mnt_slave_list);
diff -puN fs/super.c~reintroduce_list_of_vfsmounts_over_superblock fs/super.c
--- linux-2.6.git/fs/super.c~reintroduce_list_of_vfsmounts_over_superblock	2008-04-29 11:56:40.000000000 -0700
+++ linux-2.6.git-dave/fs/super.c	2008-04-29 11:56:40.000000000 -0700
@@ -67,6 +67,7 @@ static struct super_block *alloc_super(s
 		INIT_LIST_HEAD(&s->s_io);
 		INIT_LIST_HEAD(&s->s_more_io);
 		INIT_LIST_HEAD(&s->s_files);
+		INIT_LIST_HEAD(&s->s_vfsmounts);
 		INIT_LIST_HEAD(&s->s_instances);
 		INIT_HLIST_HEAD(&s->s_anon);
 		INIT_LIST_HEAD(&s->s_inodes);
diff -puN include/linux/fs.h~reintroduce_list_of_vfsmounts_over_superblock include/linux/fs.h
--- linux-2.6.git/include/linux/fs.h~reintroduce_list_of_vfsmounts_over_superblock	2008-04-29 11:56:40.000000000 -0700
+++ linux-2.6.git-dave/include/linux/fs.h	2008-04-29 11:56:40.000000000 -0700
@@ -1058,6 +1058,7 @@ struct super_block {
 	struct list_head	s_io;		/* parked for writeback */
 	struct list_head	s_more_io;	/* parked for more writeback */
 	struct hlist_head	s_anon;		/* anonymous dentries for (nfs) exporting */
+	struct list_head	s_vfsmounts;
 	struct list_head	s_files;
 
 	struct block_device	*s_bdev;
diff -puN include/linux/mount.h~reintroduce_list_of_vfsmounts_over_superblock include/linux/mount.h
--- linux-2.6.git/include/linux/mount.h~reintroduce_list_of_vfsmounts_over_superblock	2008-04-29 11:56:40.000000000 -0700
+++ linux-2.6.git-dave/include/linux/mount.h	2008-04-29 11:56:40.000000000 -0700
@@ -44,6 +44,7 @@ struct vfsmount {
 	struct dentry *mnt_mountpoint;	/* dentry of mountpoint */
 	struct dentry *mnt_root;	/* root of the mounted tree */
 	struct super_block *mnt_sb;	/* pointer to superblock */
+	struct list_head mnt_sb_list;	/* list of all mounts on same sb */
 	struct list_head mnt_mounts;	/* list of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
 	int mnt_flags;
_

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 4/5] must hold lock_super() to set initial mount writer
  2008-04-29 18:59 [RFC][PATCH 0/5] do remounts without consulting sb->s_files list Dave Hansen
                   ` (2 preceding siblings ...)
  2008-04-29 18:59 ` [RFC][PATCH 3/5] reintroduce list of vfsmounts over superblock Dave Hansen
@ 2008-04-29 18:59 ` Dave Hansen
  2008-04-30  9:25   ` Miklos Szeredi
  2008-04-29 18:59 ` [RFC][PATCH 5/5] check mount writers at superblock remount Dave Hansen
  4 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2008-04-29 18:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, viro, hch, trond.myklebust, Dave Hansen


We need lock_mnt_writers() during a remount in order
to keep mnt->__mnt_writers from changing so that we
get a consistent look at if a sb currently has anyone
writing to it.

But, we need to lock writers out for an extended
period, even during the ->remount_fs() operation.
That's because we do conclusively make the fs
r/o until *after* the ->remount_fs().
lock_mnt_writers() is an awfully blunt instrument
for this since it will lock out *ALL* new writers
on the entire system.  That's bad because that
remount might take a bit.

So, we introduce a new semantic.  Before a new
cpu_writer is established to a mount, it must
lock_super().  This is already a slow path because
the old cpu_writer->count must be coalesced into
mnt->__mnt_writers.  The fast path where (mnt ==
cpu_writer->mnt) is unaffected.


Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/fs/namespace.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff -puN fs/namespace.c~must-hold-lock_super-to-set-initial-mnt_writer fs/namespace.c
--- linux-2.6.git/fs/namespace.c~must-hold-lock_super-to-set-initial-mnt_writer	2008-04-29 11:56:41.000000000 -0700
+++ linux-2.6.git-dave/fs/namespace.c	2008-04-29 11:56:41.000000000 -0700
@@ -225,8 +225,20 @@ static inline void use_cpu_writer_for_mo
 {
 	if (cpu_writer->mnt == mnt)
 		return;
+	/*
+	 * This is a slow path taken the first time that
+	 * a particular cpu_writer is used for a given
+	 * mount.  It lets someone (like remount) clear
+	 * all the cpu_writer->mnts by lock_mnt_writers(),
+	 * take lock_super(), and guarantee that no one
+	 * else can get through here and acquire a new
+	 * mnt_want_write() until after they
+	 * unlock_super().
+	 */
+	lock_super(mnt->mnt_sb);
 	__clear_mnt_count(cpu_writer);
 	cpu_writer->mnt = mnt;
+	unlock_super(mnt->mnt_sb);
 }
 
 /*
_

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 5/5] check mount writers at superblock remount
  2008-04-29 18:59 [RFC][PATCH 0/5] do remounts without consulting sb->s_files list Dave Hansen
                   ` (3 preceding siblings ...)
  2008-04-29 18:59 ` [RFC][PATCH 4/5] must hold lock_super() to set initial mount writer Dave Hansen
@ 2008-04-29 18:59 ` Dave Hansen
  4 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2008-04-29 18:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, viro, hch, trond.myklebust, Dave Hansen


We're moving a big chunk of the burden of keeping people from
writing to r/o filesystems from the superblock into the
vfsmount.  This requires that we keep track somehow of the
mounts that might allow writes to a superblock.

You could track this directly by keeping a count of such
mounts in the superblock, but I think this is much simpler
in practice.

On remount rw->ro operations:
1. Use lock_mnt_writers() to exclude any new mnt_writers
   for any mount on the entire system, which also clears
   each cpu_writer->mnt.
2. Lock superblock to stop anyone from setting any new
   cpu_writer->mnt if the mount is of this sb.
3. unlock_mnt_writers() to let other fs's on the system
   to go about their business.
4. Consult each mount of the sb to examine its
   mnt->__mnt_writers.  Remember, this is stable because
   (1) cleared out all the per-cpu writers, and in (2)
   the sb lock is held, keeping out any new writers.
5. release sb lock at end of remount operation (after
   ->remount_fs())

I'm not completely happy with the sb_locked thing in
do_remount_sb().  Expanding the use of lock_super() over
a larger area may make the code look simpler, but will
not be as obviously correct.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/fs/file_table.c       |   24 -----------
 linux-2.6.git-dave/fs/namespace.c        |    4 -
 linux-2.6.git-dave/fs/super.c            |   65 ++++++++++++++++++++++++++-----
 linux-2.6.git-dave/include/linux/fs.h    |    2 
 linux-2.6.git-dave/include/linux/mount.h |   10 +++-
 5 files changed, 65 insertions(+), 40 deletions(-)

diff -puN fs/file_table.c~check_mount_writers_at_superblock_remount fs/file_table.c
--- linux-2.6.git/fs/file_table.c~check_mount_writers_at_superblock_remount	2008-04-29 11:56:42.000000000 -0700
+++ linux-2.6.git-dave/fs/file_table.c	2008-04-29 11:56:42.000000000 -0700
@@ -365,30 +365,6 @@ void file_kill(struct file *file)
 	}
 }
 
-int fs_may_remount_ro(struct super_block *sb)
-{
-	struct file *file;
-
-	/* Check that no files are currently opened for writing. */
-	file_list_lock();
-	list_for_each_entry(file, &sb->s_files, f_u.fu_list) {
-		struct inode *inode = file->f_path.dentry->d_inode;
-
-		/* File with pending delete? */
-		if (inode->i_nlink == 0)
-			goto too_bad;
-
-		/* Writeable file? */
-		if (S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE))
-			goto too_bad;
-	}
-	file_list_unlock();
-	return 1; /* Tis' cool bro. */
-too_bad:
-	file_list_unlock();
-	return 0;
-}
-
 void __init files_init(unsigned long mempages)
 { 
 	int n; 
diff -puN fs/namespace.c~check_mount_writers_at_superblock_remount fs/namespace.c
--- linux-2.6.git/fs/namespace.c~check_mount_writers_at_superblock_remount	2008-04-29 11:56:42.000000000 -0700
+++ linux-2.6.git-dave/fs/namespace.c	2008-04-29 11:56:42.000000000 -0700
@@ -193,7 +193,7 @@ static int __init init_mnt_writers(void)
 }
 fs_initcall(init_mnt_writers);
 
-static void unlock_mnt_writers(void)
+void unlock_mnt_writers(void)
 {
 	int cpu;
 	struct mnt_writer *cpu_writer;
@@ -278,7 +278,7 @@ out:
 }
 EXPORT_SYMBOL_GPL(mnt_want_write);
 
-static void lock_mnt_writers(void)
+void lock_mnt_writers(void)
 {
 	int cpu;
 	struct mnt_writer *cpu_writer;
diff -puN fs/super.c~check_mount_writers_at_superblock_remount fs/super.c
--- linux-2.6.git/fs/super.c~check_mount_writers_at_superblock_remount	2008-04-29 11:56:42.000000000 -0700
+++ linux-2.6.git-dave/fs/super.c	2008-04-29 11:56:42.000000000 -0700
@@ -36,6 +36,7 @@
 #include <linux/writeback.h>		/* for the emergency remount stuff */
 #include <linux/idr.h>
 #include <linux/kobject.h>
+#include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/file.h>
 #include <asm/uaccess.h>
@@ -598,6 +599,32 @@ retry:
 }
 
 /**
+ *	__sb_can_remount_ro - check a superblock for active writers
+ *	@sb: superblock in question
+ *
+ *	Must ensure that no new writers to the superblock can come
+ *	in (must hold lock_mnt_writers()) and that the s_vfsmounts
+ *	list can not change (must acquire lock_super()).
+ *
+ *	Returns 0 on success.
+ */
+static int __sb_can_remount_ro(struct super_block *sb)
+{
+	struct list_head *entry;
+	int ret = 0;
+
+	list_for_each(entry, &sb->s_vfsmounts) {
+		struct vfsmount *mnt;
+		mnt = list_entry(entry, struct vfsmount, mnt_sb_list);
+		if (!atomic_read(&mnt->__mnt_writers))
+			continue;
+		ret = -EBUSY;
+		break;
+	}
+	return ret;
+}
+
+/**
  *	do_remount_sb - asks filesystem to change mount options.
  *	@sb:	superblock in question
  *	@flags:	numeric part of options
@@ -608,8 +635,9 @@ retry:
  */
 int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 {
-	int retval;
+	int retval = 0;
 	int remount_rw;
+	int sb_locked = 0;
 	
 #ifdef CONFIG_BLOCK
 	if (!(flags & MS_RDONLY) && bdev_read_only(sb->s_bdev))
@@ -623,27 +651,44 @@ int do_remount_sb(struct super_block *sb
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
 	if ((flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY)) {
-		if (force)
+		if (force) {
 			mark_files_ro(sb);
-		else if (!fs_may_remount_ro(sb))
-			return -EBUSY;
+		} else {
+			lock_mnt_writers();
+			/*
+			 * this locks out any new writers
+			 * on any of this sb's mounts.
+			 */
+			lock_super(sb);
+			sb_locked = 1;
+			retval = __sb_can_remount_ro(sb);
+			unlock_mnt_writers();
+			if (retval)
+				goto out;
+		}
 		retval = DQUOT_OFF(sb, 1);
-		if (retval < 0 && retval != -ENOSYS)
-			return -EBUSY;
+		if (retval < 0 && retval != -ENOSYS) {
+			retval = -EBUSY;
+			goto out;
+		}
 	}
 	remount_rw = !(flags & MS_RDONLY) && (sb->s_flags & MS_RDONLY);
 
 	if (sb->s_op->remount_fs) {
-		lock_super(sb);
+		if (!sb_locked)
+			lock_super(sb);
+		sb_locked = 1;
 		retval = sb->s_op->remount_fs(sb, &flags, data);
-		unlock_super(sb);
 		if (retval)
-			return retval;
+			goto out;
 	}
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
 	if (remount_rw)
 		DQUOT_ON_REMOUNT(sb);
-	return 0;
+out:
+	if (sb_locked)
+		unlock_super(sb);
+	return retval;
 }
 
 static void do_emergency_remount(unsigned long foo)
diff -puN include/linux/fs.h~check_mount_writers_at_superblock_remount include/linux/fs.h
--- linux-2.6.git/include/linux/fs.h~check_mount_writers_at_superblock_remount	2008-04-29 11:56:42.000000000 -0700
+++ linux-2.6.git-dave/include/linux/fs.h	2008-04-29 11:56:42.000000000 -0700
@@ -1699,8 +1699,6 @@ extern const struct file_operations read
 extern const struct file_operations write_fifo_fops;
 extern const struct file_operations rdwr_fifo_fops;
 
-extern int fs_may_remount_ro(struct super_block *);
-
 #ifdef CONFIG_BLOCK
 /*
  * return READ, READA, or WRITE
diff -puN include/linux/mount.h~check_mount_writers_at_superblock_remount include/linux/mount.h
--- linux-2.6.git/include/linux/mount.h~check_mount_writers_at_superblock_remount	2008-04-29 11:56:42.000000000 -0700
+++ linux-2.6.git-dave/include/linux/mount.h	2008-04-29 11:56:42.000000000 -0700
@@ -69,8 +69,12 @@ struct vfsmount {
 	int mnt_pinned;
 	int mnt_ghosts;
 	/*
-	 * This value is not stable unless all of the mnt_writers[] spinlocks
-	 * are held, and all mnt_writer[]s on this mount have 0 as their ->count
+	 * This value is stable for all mounts when all of the
+	 * mnt_writers[] spinlocks are held.
+	 *
+	 * It is stable for all mounts of an sb when a lock_super()
+	 * is performed under a lock_mnt_writers() as long as the
+	 * sb stays locked.  See do_remount_sb(). - Dave Hansen
 	 */
 	atomic_t __mnt_writers;
 };
@@ -84,6 +88,8 @@ static inline struct vfsmount *mntget(st
 
 extern int mnt_want_write(struct vfsmount *mnt);
 extern void mnt_drop_write(struct vfsmount *mnt);
+extern void lock_mnt_writers(void);
+extern void unlock_mnt_writers(void);
 extern void mntput_no_expire(struct vfsmount *mnt);
 extern void mnt_pin(struct vfsmount *mnt);
 extern void mnt_unpin(struct vfsmount *mnt);
_

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH 2/5] introduce simple_set_mnt_no_get() helper for NFS
  2008-04-29 18:59 ` [RFC][PATCH 2/5] introduce simple_set_mnt_no_get() helper for NFS Dave Hansen
@ 2008-04-29 21:34   ` Trond Myklebust
  2008-04-30  1:46     ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2008-04-29 21:34 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-fsdevel, viro, hch


On Tue, 2008-04-29 at 11:59 -0700, Dave Hansen wrote:
> The next patch will introduce a list of all mounts for a given
> superblock.  In order to keep the list, we need to make sure
> all filesystems attaching a mount to a superblock get added
> to this list.
> 
> NFS currently bypasses the simple_set_mnt() function, and sets
> mnt_sb directly.  This patch makes it use a helper function,
> instead.
> 
> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
> ---
> 
>  linux-2.6.git-dave/fs/namespace.c     |   11 +++++++++--
>  linux-2.6.git-dave/fs/nfs/super.c     |   10 +++++-----
>  linux-2.6.git-dave/include/linux/fs.h |    1 +
>  3 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff -puN fs/namespace.c~introduce_simple_set_mnt_no_get_helper_for_NFS fs/namespace.c
> --- linux-2.6.git/fs/namespace.c~introduce_simple_set_mnt_no_get_helper_for_NFS	2008-04-29 11:56:39.000000000 -0700
> +++ linux-2.6.git-dave/fs/namespace.c	2008-04-29 11:56:39.000000000 -0700
> @@ -402,12 +402,19 @@ static void __mnt_unmake_readonly(struct
>  	spin_unlock(&vfsmount_lock);
>  }
>  
> -int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
> +int simple_set_mnt_no_get(struct vfsmount *mnt, struct super_block *sb)
>  {
>  	mnt->mnt_sb = sb;
> -	mnt->mnt_root = dget(sb->s_root);
> +	add_mount_to_sb_list(mnt, sb);
>  	return 0;
>  }
> +EXPORT_SYMBOL(simple_set_mnt_no_get);
> +
> +int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
> +{
> +	mnt->mnt_root = dget(sb->s_root);
> +	return simple_set_mnt_no_get(mnt, sb);
> +}
>  

My only concern is the proliferation of 'simple_*' operations: in some
cases in libfs.c we explicitly label those as being for in-memory/ramfs
filesystems, whereas in other cases (such as this one) the name appears
to be used for more generic functions.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH 2/5] introduce simple_set_mnt_no_get() helper for NFS
  2008-04-29 21:34   ` Trond Myklebust
@ 2008-04-30  1:46     ` Dave Hansen
  2008-05-01 16:58       ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2008-04-30  1:46 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, linux-fsdevel, viro, hch

On Tue, 2008-04-29 at 14:34 -0700, Trond Myklebust wrote:
> My only concern is the proliferation of 'simple_*' operations: in some
> cases in libfs.c we explicitly label those as being for in-memory/ramfs
> filesystems, whereas in other cases (such as this one) the name appears
> to be used for more generic functions.

I'm all to happy to give it a different name.  Any suggestions?

-- Dave


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH 4/5] must hold lock_super() to set initial mount writer
  2008-04-29 18:59 ` [RFC][PATCH 4/5] must hold lock_super() to set initial mount writer Dave Hansen
@ 2008-04-30  9:25   ` Miklos Szeredi
  2008-05-01 16:26     ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: Miklos Szeredi @ 2008-04-30  9:25 UTC (permalink / raw)
  To: dave; +Cc: linux-kernel, linux-fsdevel, viro, hch, trond.myklebust, dave

> We need lock_mnt_writers() during a remount in order
> to keep mnt->__mnt_writers from changing so that we
> get a consistent look at if a sb currently has anyone
> writing to it.
> 
> But, we need to lock writers out for an extended
> period, even during the ->remount_fs() operation.
> That's because we do conclusively make the fs
> r/o until *after* the ->remount_fs().

So?  Why don't we mark the fs r/o _before_ calling ->remount_fs() and
if that fails, just mark it r/w again.

OK, we'll deny writes in that interval, but I don't see that as a big
problem.  And it would simplify the implementation considerably.

Miklos

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH 4/5] must hold lock_super() to set initial mount writer
  2008-04-30  9:25   ` Miklos Szeredi
@ 2008-05-01 16:26     ` Dave Hansen
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2008-05-01 16:26 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel, viro, hch, trond.myklebust

On Wed, 2008-04-30 at 11:25 +0200, Miklos Szeredi wrote:
> > We need lock_mnt_writers() during a remount in order
> > to keep mnt->__mnt_writers from changing so that we
> > get a consistent look at if a sb currently has anyone
> > writing to it.
> > 
> > But, we need to lock writers out for an extended
> > period, even during the ->remount_fs() operation.
> > That's because we do conclusively make the fs
> > r/o until *after* the ->remount_fs().
> 
> So?  Why don't we mark the fs r/o _before_ calling ->remount_fs() and
> if that fails, just mark it r/w again.
> 
> OK, we'll deny writes in that interval, but I don't see that as a big
> problem.  And it would simplify the implementation considerably.

Personally, I think that's a bit messy.  People might start getting
-EROFS when they never, ever *HAD* a r/o FS.  They may have made a
request for one, but they never actually hard one.

I also understand what you're saying.  If we were able to loosen up some
of the requirements it would certainly make the patches simpler.  How
does everyone else feel about this?

-- Dave


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH 2/5] introduce simple_set_mnt_no_get() helper for NFS
  2008-04-30  1:46     ` Dave Hansen
@ 2008-05-01 16:58       ` Matthew Wilcox
  0 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox @ 2008-05-01 16:58 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Trond Myklebust, linux-kernel, linux-fsdevel, viro, hch

On Tue, Apr 29, 2008 at 06:46:55PM -0700, Dave Hansen wrote:
> On Tue, 2008-04-29 at 14:34 -0700, Trond Myklebust wrote:
> > My only concern is the proliferation of 'simple_*' operations: in some
> > cases in libfs.c we explicitly label those as being for in-memory/ramfs
> > filesystems, whereas in other cases (such as this one) the name appears
> > to be used for more generic functions.
> 
> I'm all to happy to give it a different name.  Any suggestions?

would the vfs_* prefix make sense here?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-05-01 16:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-29 18:59 [RFC][PATCH 0/5] do remounts without consulting sb->s_files list Dave Hansen
2008-04-29 18:59 ` [RFC][PATCH 1/5] r/o bind mounts: change spinlock to mutex Dave Hansen
2008-04-29 18:59 ` [RFC][PATCH 2/5] introduce simple_set_mnt_no_get() helper for NFS Dave Hansen
2008-04-29 21:34   ` Trond Myklebust
2008-04-30  1:46     ` Dave Hansen
2008-05-01 16:58       ` Matthew Wilcox
2008-04-29 18:59 ` [RFC][PATCH 3/5] reintroduce list of vfsmounts over superblock Dave Hansen
2008-04-29 18:59 ` [RFC][PATCH 4/5] must hold lock_super() to set initial mount writer Dave Hansen
2008-04-30  9:25   ` Miklos Szeredi
2008-05-01 16:26     ` Dave Hansen
2008-04-29 18:59 ` [RFC][PATCH 5/5] check mount writers at superblock remount Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).