getting rid of filp search in fs_may_remount

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* getting rid of filp search in fs_may_remount_ro()
       [not found] ` <20071226141214.GA31455@lst.de>
@ 2007-12-31 19:54   ` Dave Hansen
  2007-12-31 22:54     ` Dave Hansen
  2008-01-02 20:31     ` Christoph Hellwig
  0 siblings, 2 replies; 4+ messages in thread
From: Dave Hansen @ 2007-12-31 19:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel@vger.kernel.org

On Wed, 2007-12-26 at 15:12 +0100, Christoph Hellwig wrote:
> Btw, I just noticed in current -mm fs_may_remount_ro() is still around
> and not replaced by ther per-sb writers count.  That surely sounds like
> some kind of mismerge..

I was actually leaving that for later.  Getting rid of the filp search
is a great benefit of the r/o bind patches, but it isn't strictly
necessary and it doesn't really hurt anything to keep it.

The reason that it was contentious was that we need some way to be able
to do an sb-to-mount mapping.  When remounting the sb, we need to
determine whether *any* of the mounts of that sb have any writers.

We don't currently have any mechanisms to do direct lookups from sb to
mount.  The only alternative I can see right now is to walk over all
tasks, then walk over all vfs namespaces, and walk each mount tree to
see if any mounts are of the sb we're looking for.  This needs to be
done while already holding the mnt_writers[] locks so that no new mnt
writers can come in.

*THAT* is going to be a heavyweight operation.  I need to go look in
detail at how the mount trees are kept, and we'll need some kind of
mechanism to keep track of which vfs namespaces we've looked at during
the search so we don't search them twice. 

Can you think of a simpler way to do it?

-- Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: getting rid of filp search in fs_may_remount_ro()
  2007-12-31 19:54   ` getting rid of filp search in fs_may_remount_ro() Dave Hansen
@ 2007-12-31 22:54     ` Dave Hansen
  2008-01-02 20:33       ` Christoph Hellwig
  2008-01-02 20:31     ` Christoph Hellwig
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Hansen @ 2007-12-31 22:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel@vger.kernel.org

On Mon, 2007-12-31 at 11:54 -0800, Dave Hansen wrote:
> On Wed, 2007-12-26 at 15:12 +0100, Christoph Hellwig wrote:
> > Btw, I just noticed in current -mm fs_may_remount_ro() is still around
> > and not replaced by ther per-sb writers count.  That surely sounds like
> > some kind of mismerge..
> 
> I was actually leaving that for later.  Getting rid of the filp search
> is a great benefit of the r/o bind patches, but it isn't strictly
> necessary and it doesn't really hurt anything to keep it.
> 
> The reason that it was contentious was that we need some way to be able
> to do an sb-to-mount mapping.  When remounting the sb, we need to
> determine whether *any* of the mounts of that sb have any writers.
> 
> We don't currently have any mechanisms to do direct lookups from sb to
> mount.  The only alternative I can see right now is to walk over all
> tasks, then walk over all vfs namespaces, and walk each mount tree to
> see if any mounts are of the sb we're looking for.  This needs to be
> done while already holding the mnt_writers[] locks so that no new mnt
> writers can come in.
> 
> *THAT* is going to be a heavyweight operation.  I need to go look in
> detail at how the mount trees are kept, and we'll need some kind of
> mechanism to keep track of which vfs namespaces we've looked at during
> the search so we don't search them twice. 
> 
> Can you think of a simpler way to do it?

Here's one blatantly untested idea I have.  The idea is to keep track if
anyone might be writing to a mnt.  We keep track on a flag in the mnt.
When we set the flag, we increment a counter in the sb and decrement
when the flag is cleared.

We can't simply look at mnt->__mnt_writers because there might be
"checked-out" writers in the mnt_writers[] array.  We also have to keep
new writers from coming in while we do this, so we use the spinlocks in
the mnt_writers[] array for exclusion.  This is a pretty heavyweight
lock, but it only gets used at rw->ro transitions.

-- Dave

--- ./fs/file_table.c.orig	2007-12-31 11:14:59.000000000 -0800
+++ ./fs/file_table.c	2007-12-31 14:38:19.000000000 -0800
@@ -376,26 +376,12 @@
 
 int fs_may_remount_ro(struct super_block *sb)
 {
-	struct file *file;
-
-	/* Check that no files are currently opened for writing. */
-	file_list_lock();
-	list_for_each_entry(file, &sb->s_files, f_u.fu_list) {
-		struct inode *inode = file->f_path.dentry->d_inode;
-
-		/* File with pending delete? */
-		if (inode->i_nlink == 0)
-			goto too_bad;
-
-		/* Writeable file? */
-		if (S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE))
-			goto too_bad;
-	}
-	file_list_unlock();
-	return 1; /* Tis' cool bro. */
-too_bad:
-	file_list_unlock();
-	return 0;
+	int ret = 1;
+	lock_mnt_writers();
+	if (atomic_read(&sb->__s_mnt_writers))
+		ret = 0;
+	unlock_mnt_writers();
+	return ret;
 }
 
 void __init files_init(unsigned long mempages)
--- ./fs/namespace.c.orig	2007-12-31 13:52:36.000000000 -0800
+++ ./fs/namespace.c	2007-12-31 14:53:56.000000000 -0800
@@ -138,7 +138,7 @@
 }
 fs_initcall(init_mnt_writers);
 
-static void mnt_unlock_cpus(void)
+void mnt_unlock_cpus(void)
 {
 	int cpu;
 	struct mnt_writer *cpu_writer;
@@ -149,6 +149,22 @@
 	}
 }
 
+static void mark_mnt_has_writer(struct vfsmount *mnt)
+{
+	if (!test_and_set_bit(ilog2(MNT_MAY_HAVE_WRITERS), &mnt->mnt_flags))
+		atomic_inc(&mnt->mnt_sb->s_possible_mnt_writers);
+}
+
+static void check_mnt_for_writers(struct vfsmount *mnt)
+{
+	int bitnr = ilog2(MNT_MAY_HAVE_WRITERS);
+
+	if (atomic_read(&mnt->__mnt_writers))
+		mark_mnt_has_writer(mnt);
+	else if (test_and_clear_bit(bitnr, &mnt->mnt_flags))
+		atomic_dec(&mnt->mnt_sb->s_possible_mnt_writers);
+}
+
 static inline void __clear_mnt_count(struct mnt_writer *cpu_writer)
 {
 	if (!cpu_writer->mnt)
@@ -199,6 +215,7 @@
 	}
 	use_cpu_writer_for_mount(cpu_writer, mnt);
 	cpu_writer->count++;
+	mark_mnt_has_writer(mnt);
 out:
 	spin_unlock(&cpu_writer->lock);
 	put_cpu_var(mnt_writers);
@@ -215,11 +232,33 @@
 		cpu_writer = &per_cpu(mnt_writers, cpu);
 		spin_lock(&cpu_writer->lock);
 		__clear_mnt_count(cpu_writer);
+		/*
+		 * __mnt_writers may temporarily hit zero
+		 * (and trigger MNT_MAY_HAVE_WRITERS to get
+		 * cleared), but it will get set again if
+		 * and when another mnt_writer[] has an
+		 * entry for that mnt later in this loop.
+		 */
+		check_mnt_has_writers(cpu_writer->mnt);
 		cpu_writer->mnt = NULL;
 	}
 }
 
 /*
+ * This is just an external interface.  I want
+ * to use the long names in here, but leave the
+ * simpler names for external users.
+ */
+void lock_mnt_writers()
+{
+	lock_and_coalesce_cpu_mnt_writer_counts();
+}
+void unlock_mnt_writers()
+{
+	mnt_unlock_cpus();
+}
+
+/*
  * These per-cpu write counts are not guaranteed to have
  * matched increments and decrements on any given cpu.
  * A file open()ed for write on one cpu and close()d on
--- ./include/linux/fs.h.orig	2007-12-31 14:08:20.000000000 -0800
+++ ./include/linux/fs.h	2007-12-31 14:31:12.000000000 -0800
@@ -1005,6 +1005,9 @@
 #ifdef CONFIG_SECURITY
 	void                    *s_security;
 #endif
+	atomic_t		__s_mnt_writers;/* how many mounts of this sb might
+						 * have writers.  Only stable with
+						 * all mnt_writers[] locks held */
 	struct xattr_handler	**s_xattr;
 
 	struct list_head	s_inodes;	/* all inodes */
--- ./include/linux/mount.h.orig	2007-12-31 14:00:13.000000000 -0800
+++ ./include/linux/mount.h	2007-12-31 14:33:44.000000000 -0800
@@ -33,6 +33,7 @@
 
 #define MNT_SHRINKABLE	0x100
 #define MNT_IMBALANCED_WRITE_COUNT	0x200 /* just for debugging */
+#define MNT_MAY_HAVE_WRITERS		0x400 /* did this ever have a writer? */
 
 #define MNT_SHARED	0x1000	/* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE	0x2000	/* if the vfsmount is a unbindable mount */
@@ -80,6 +81,8 @@
 
 extern int mnt_want_write(struct vfsmount *mnt);
 extern void mnt_drop_write(struct vfsmount *mnt);
+extern void lock_mnt_writers(void);
+extern void unlock_mnt_writers(void);
 extern void mntput_no_expire(struct vfsmount *mnt);
 extern void mnt_pin(struct vfsmount *mnt);
 extern void mnt_unpin(struct vfsmount *mnt);



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: getting rid of filp search in fs_may_remount_ro()
  2007-12-31 19:54   ` getting rid of filp search in fs_may_remount_ro() Dave Hansen
  2007-12-31 22:54     ` Dave Hansen
@ 2008-01-02 20:31     ` Christoph Hellwig
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2008-01-02 20:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Christoph Hellwig, linux-kernel@vger.kernel.org, Peter Zijlstra

On Mon, Dec 31, 2007 at 11:54:26AM -0800, Dave Hansen wrote:
> > Btw, I just noticed in current -mm fs_may_remount_ro() is still around
> > and not replaced by ther per-sb writers count.  That surely sounds like
> > some kind of mismerge..
> 
> I was actually leaving that for later.  Getting rid of the filp search
> is a great benefit of the r/o bind patches, but it isn't strictly
> necessary and it doesn't really hurt anything to keep it.

Well, it's one of the primary arguments for the expensive infrastructure
behind the per-mount r/o patches, as checkin the counter here means
we can now do exact tracking of the writable status of the superblock
and thus avoid the remount r/o races.

But there's another argument for it which is unrelated to the per-mount
r/o patches.  fs_may_remount_ro is the last non-trivial user of
sb->s_files and the file_list_lock.  Peter Zijlstra identified the
latter as scalabilty problem, and I'd really prefer to get rid of the
list and the lock instead of hacking around these issues.  Removing
sb->s_files will also allow removing a list_head from struct file
with some additional hackery to the tty core which should be a nice
memory safer on big systems.

> The reason that it was contentious was that we need some way to be able
> to do an sb-to-mount mapping.  When remounting the sb, we need to
> determine whether *any* of the mounts of that sb have any writers.
> 
> We don't currently have any mechanisms to do direct lookups from sb to
> mount.  The only alternative I can see right now is to walk over all
> tasks, then walk over all vfs namespaces, and walk each mount tree to
> see if any mounts are of the sb we're looking for.  This needs to be
> done while already holding the mnt_writers[] locks so that no new mnt
> writers can come in.
> 
> *THAT* is going to be a heavyweight operation.  I need to go look in
> detail at how the mount trees are kept, and we'll need some kind of
> mechanism to keep track of which vfs namespaces we've looked at during
> the search so we don't search them twice. 
> 
> Can you think of a simpler way to do it?

Well, the first revisions of the patches added a list of vfsmounts
to the superblock, and I wasn't quite happy with that because I though
we could get around that easily.  Given that we can't get around it
easily I'd be happy to take my comments back.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: getting rid of filp search in fs_may_remount_ro()
  2007-12-31 22:54     ` Dave Hansen
@ 2008-01-02 20:33       ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2008-01-02 20:33 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Christoph Hellwig, linux-kernel@vger.kernel.org

On Mon, Dec 31, 2007 at 02:54:48PM -0800, Dave Hansen wrote:
> Here's one blatantly untested idea I have.  The idea is to keep track if
> anyone might be writing to a mnt.  We keep track on a flag in the mnt.
> When we set the flag, we increment a counter in the sb and decrement
> when the flag is cleared.
> 
> We can't simply look at mnt->__mnt_writers because there might be
> "checked-out" writers in the mnt_writers[] array.  We also have to keep
> new writers from coming in while we do this, so we use the spinlocks in
> the mnt_writers[] array for exclusion.  This is a pretty heavyweight
> lock, but it only gets used at rw->ro transitions.

This idea looks good to me, but if you're going to a final version
please just inline fs_may_remount_ro into its only caller.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-01-02 20:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20071126135220.GA17244@lst.de>
     [not found] ` <20071226141214.GA31455@lst.de>
2007-12-31 19:54   ` getting rid of filp search in fs_may_remount_ro() Dave Hansen
2007-12-31 22:54     ` Dave Hansen
2008-01-02 20:33       ` Christoph Hellwig
2008-01-02 20:31     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox