linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] read-only remount fixes
@ 2010-10-05 10:31 Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race Miklos Szeredi
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

Al,

This series fixes some of the races and other bugs in the read-only
remount code.  Please review.

Git tree is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git read-only-remount-fixes

Thanks,
Miklos
---

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:49   ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 2/9] vfs: ignore error on forced remount Miklos Szeredi
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-fix-clone-flags.patch --]
[-- Type: text/plain, Size: 979 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

If clone_mnt() happens while mnt_make_readonly() is running, the
cloned mount might have MNT_WRITE_HOLD flag set, which results in
mnt_want_write() spinning forever on this mount.

Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen
accidentally.  But if it does happen it can hang the machine.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namespace.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-01 21:54:17.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-01 21:54:30.000000000 +0200
@@ -595,7 +595,7 @@ static struct vfsmount *clone_mnt(struct
 				goto out_free;
 		}
 
-		mnt->mnt_flags = old->mnt_flags;
+		mnt->mnt_flags = old->mnt_flags & ~MNT_WRITE_HOLD;
 		atomic_inc(&sb->s_active);
 		mnt->mnt_sb = sb;
 		mnt->mnt_root = dget(root);

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/9] vfs: ignore error on forced remount
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 3/9] vfs: fix per mount read-write Miklos Szeredi
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-fix-force-remount-readonly.patch --]
[-- Type: text/plain, Size: 843 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

On emergency remount we want to force MS_RDONLY on the super block
even if ->remount_fs() failed for some reason.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/super.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-01 20:40:42.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-01 21:55:01.000000000 +0200
@@ -579,7 +579,8 @@ int do_remount_sb(struct super_block *sb
 
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
-		if (retval)
+		/* If forced remount, go ahead despite any errors */
+		if (retval && !force)
 			return retval;
 	}
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/9] vfs: fix per mount read-write
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 2/9] vfs: ignore error on forced remount Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 4/9] vfs: add sb_force_remount_readonly() helper Miklos Szeredi
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-fix-bind-remount-read-write.patch --]
[-- Type: text/plain, Size: 1332 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Don't allow remounting the vfsmount read-write if the superblock is
read-only.  Return -EROFS in this case.

Rename __mnt_unmake_readonly() to mnt_make_writable() to match
mnt_make_readonly().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namespace.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-09-30 16:53:30.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-09-30 16:57:47.000000000 +0200
@@ -399,11 +399,18 @@ static int mnt_make_readonly(struct vfsm
 	return ret;
 }
 
-static void __mnt_unmake_readonly(struct vfsmount *mnt)
+static int mnt_make_writable(struct vfsmount *mnt)
 {
+	int err = 0;
+
 	br_write_lock(vfsmount_lock);
-	mnt->mnt_flags &= ~MNT_READONLY;
+	if (mnt->mnt_sb->s_flags & MS_RDONLY)
+		err = -EROFS;
+	else
+		mnt->mnt_flags &= ~MNT_READONLY;
 	br_write_unlock(vfsmount_lock);
+
+	return err;
 }
 
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
@@ -1600,7 +1607,7 @@ static int change_mount_flags(struct vfs
 	if (readonly_request)
 		error = mnt_make_readonly(mnt);
 	else
-		__mnt_unmake_readonly(mnt);
+		error = mnt_make_writable(mnt);
 	return error;
 }
 

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/9] vfs: add sb_force_remount_readonly() helper
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (2 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 3/9] vfs: fix per mount read-write Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 5/9] vfs: allow mnt_want_write() to sleep Miklos Szeredi
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-force_remount_readonly-helper.patch --]
[-- Type: text/plain, Size: 10917 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Add helper for filesystems to call when forcefully remounting
read-only (such as on filesystem errors).

Functionally this doesn't change anything.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 drivers/staging/pohmelfs/net.c |    2 +-
 fs/affs/amigaffs.c             |    2 +-
 fs/ext2/super.c                |    2 +-
 fs/ext3/super.c                |    4 ++--
 fs/ext4/super.c                |    4 ++--
 fs/fat/misc.c                  |    2 +-
 fs/hpfs/super.c                |    2 +-
 fs/jfs/super.c                 |    2 +-
 fs/namespace.c                 |    6 ++++++
 fs/nilfs2/super.c              |    2 +-
 fs/ocfs2/super.c               |    2 +-
 fs/reiserfs/journal.c          |    2 +-
 fs/reiserfs/prints.c           |    4 ++--
 fs/ubifs/io.c                  |    2 +-
 fs/ufs/super.c                 |    4 ++--
 include/linux/fs.h             |    1 +
 16 files changed, 25 insertions(+), 18 deletions(-)

Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-04 12:12:10.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-04 12:15:17.000000000 +0200
@@ -413,6 +413,12 @@ static int mnt_make_writable(struct vfsm
 	return err;
 }
 
+void sb_force_remount_readonly(struct super_block *sb)
+{
+	sb->s_flags |= MS_RDONLY;
+}
+EXPORT_SYMBOL(sb_force_remount_readonly);
+
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
Index: linux-2.6/drivers/staging/pohmelfs/net.c
===================================================================
--- linux-2.6.orig/drivers/staging/pohmelfs/net.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/drivers/staging/pohmelfs/net.c	2010-10-04 12:13:00.000000000 +0200
@@ -673,7 +673,7 @@ static int pohmelfs_root_cap_response(st
 	psb->state_flags = cap->flags;
 
 	if (psb->state_flags & POHMELFS_FLAGS_RO) {
-		psb->sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(psb->sb);
 		printk(KERN_INFO "Mounting POHMELFS (%d) read-only.\n", psb->idx);
 	}
 
Index: linux-2.6/fs/affs/amigaffs.c
===================================================================
--- linux-2.6.orig/fs/affs/amigaffs.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/affs/amigaffs.c	2010-10-04 12:13:00.000000000 +0200
@@ -458,7 +458,7 @@ affs_error(struct super_block *sb, const
 		function,ErrorBuffer);
 	if (!(sb->s_flags & MS_RDONLY))
 		printk(KERN_WARNING "AFFS: Remounting filesystem read-only\n");
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 }
 
 void
Index: linux-2.6/fs/ext2/super.c
===================================================================
--- linux-2.6.orig/fs/ext2/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ext2/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -69,7 +69,7 @@ void ext2_error (struct super_block * sb
 	if (test_opt(sb, ERRORS_RO)) {
 		ext2_msg(sb, KERN_CRIT,
 			     "error: remounting filesystem read-only");
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 	}
 }
 
Index: linux-2.6/fs/ext3/super.c
===================================================================
--- linux-2.6.orig/fs/ext3/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ext3/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -188,7 +188,7 @@ static void ext3_handle_error(struct sup
 	if (test_opt (sb, ERRORS_RO)) {
 		ext3_msg(sb, KERN_CRIT,
 			"error: remounting filesystem read-only");
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 	}
 	ext3_commit_super(sb, es, 1);
 	if (test_opt(sb, ERRORS_PANIC))
@@ -295,7 +295,7 @@ void ext3_abort (struct super_block * sb
 	ext3_msg(sb, KERN_CRIT,
 		"error: remounting filesystem read-only");
 	EXT3_SB(sb)->s_mount_state |= EXT3_ERROR_FS;
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	set_opt(EXT3_SB(sb)->s_mount_opt, ABORT);
 	if (EXT3_SB(sb)->s_journal)
 		journal_abort(EXT3_SB(sb)->s_journal, -EIO);
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-10-04 11:57:35.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-10-04 12:13:00.000000000 +0200
@@ -2057,6 +2057,7 @@ extern const struct file_operations writ
 extern const struct file_operations rdwr_pipefifo_fops;
 
 extern int fs_may_remount_ro(struct super_block *);
+extern void sb_force_remount_readonly(struct super_block *);
 
 #ifdef CONFIG_BLOCK
 /*
Index: linux-2.6/fs/ext4/super.c
===================================================================
--- linux-2.6.orig/fs/ext4/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ext4/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -371,7 +371,7 @@ static void ext4_handle_error(struct sup
 	}
 	if (test_opt(sb, ERRORS_RO)) {
 		ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 	}
 	if (test_opt(sb, ERRORS_PANIC))
 		panic("EXT4-fs (device %s): panic forced after error\n",
@@ -526,7 +526,7 @@ void __ext4_abort(struct super_block *sb
 
 	if ((sb->s_flags & MS_RDONLY) == 0) {
 		ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 		EXT4_SB(sb)->s_mount_flags |= EXT4_MF_FS_ABORTED;
 		if (EXT4_SB(sb)->s_journal)
 			jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO);
Index: linux-2.6/fs/fat/misc.c
===================================================================
--- linux-2.6.orig/fs/fat/misc.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/fat/misc.c	2010-10-04 12:13:00.000000000 +0200
@@ -38,7 +38,7 @@ void __fat_fs_error(struct super_block *
 	if (opts->errors == FAT_ERRORS_PANIC)
 		panic("FAT: fs panic from previous error\n");
 	else if (opts->errors == FAT_ERRORS_RO && !(s->s_flags & MS_RDONLY)) {
-		s->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(s);
 		printk(KERN_ERR "FAT: Filesystem has been set read-only\n");
 	}
 }
Index: linux-2.6/fs/hpfs/super.c
===================================================================
--- linux-2.6.orig/fs/hpfs/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/hpfs/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -71,7 +71,7 @@ void hpfs_error(struct super_block *s, c
 			else {
 				printk("; remounting read-only\n");
 				mark_dirty(s);
-				s->s_flags |= MS_RDONLY;
+				sb_force_remount_readonly(s);
 			}
 		} else if (s->s_flags & MS_RDONLY) printk("; going on - but anything won't be destroyed because it's read-only\n");
 		else printk("; corrupted filesystem mounted read/write - your computer will explode within 20 seconds ... but you wanted it so!\n");
Index: linux-2.6/fs/jfs/super.c
===================================================================
--- linux-2.6.orig/fs/jfs/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/jfs/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -86,7 +86,7 @@ static void jfs_handle_error(struct supe
 		jfs_err("ERROR: (device %s): remounting filesystem "
 			"as read-only\n",
 			sb->s_id);
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 	}
 
 	/* nothing is done for continue beyond marking the superblock dirty */
Index: linux-2.6/fs/nilfs2/super.c
===================================================================
--- linux-2.6.orig/fs/nilfs2/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/nilfs2/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -124,7 +124,7 @@ void nilfs_error(struct super_block *sb,
 
 		if (nilfs_test_opt(sbi, ERRORS_RO)) {
 			printk(KERN_CRIT "Remounting filesystem read-only\n");
-			sb->s_flags |= MS_RDONLY;
+			sb_force_remount_readonly(sb);
 		}
 	}
 
Index: linux-2.6/fs/ocfs2/super.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ocfs2/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -2504,7 +2504,7 @@ static void ocfs2_handle_error(struct su
 	printk(KERN_CRIT "File system is now read-only due to the potential "
 	       "of on-disk corruption. Please run fsck.ocfs2 once the file "
 	       "system is unmounted.\n");
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	ocfs2_set_ro_flag(osb, 0);
 }
 
Index: linux-2.6/fs/reiserfs/journal.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/journal.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/reiserfs/journal.c	2010-10-04 12:13:00.000000000 +0200
@@ -4383,7 +4383,7 @@ void reiserfs_abort_journal(struct super
 	if (!journal->j_errno)
 		journal->j_errno = errno;
 
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	set_bit(J_ABORTED, &journal->j_state);
 
 #ifdef CONFIG_REISERFS_CHECK
Index: linux-2.6/fs/reiserfs/prints.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/prints.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/reiserfs/prints.c	2010-10-04 12:13:00.000000000 +0200
@@ -387,7 +387,7 @@ void __reiserfs_error(struct super_block
 		return;
 
 	reiserfs_info(sb, "Remounting filesystem read-only\n");
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	reiserfs_abort_journal(sb, -EIO);
 }
 
@@ -406,7 +406,7 @@ void reiserfs_abort(struct super_block *
 	printk(KERN_CRIT "REISERFS abort (device %s): %s\n", sb->s_id,
 	       error_buf);
 
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	reiserfs_abort_journal(sb, errno);
 }
 
Index: linux-2.6/fs/ubifs/io.c
===================================================================
--- linux-2.6.orig/fs/ubifs/io.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ubifs/io.c	2010-10-04 12:13:00.000000000 +0200
@@ -64,7 +64,7 @@ void ubifs_ro_mode(struct ubifs_info *c,
 	if (!c->ro_media) {
 		c->ro_media = 1;
 		c->no_chk_data_crc = 0;
-		c->vfs_sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(c->vfs_sb);
 		ubifs_warn("switched to read-only mode, error %d", err);
 		dbg_dump_stack();
 	}
Index: linux-2.6/fs/ufs/super.c
===================================================================
--- linux-2.6.orig/fs/ufs/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ufs/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -288,7 +288,7 @@ void ufs_error (struct super_block * sb,
 		usb1->fs_clean = UFS_FSBAD;
 		ubh_mark_buffer_dirty(USPI_UBH(uspi));
 		sb->s_dirt = 1;
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 	}
 	va_start (args, fmt);
 	vsnprintf (error_buf, sizeof(error_buf), fmt, args);
@@ -325,7 +325,7 @@ void ufs_panic (struct super_block * sb,
 	va_start (args, fmt);
 	vsnprintf (error_buf, sizeof(error_buf), fmt, args);
 	va_end (args);
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	printk (KERN_CRIT "UFS-fs panic (device %s): %s: %s\n",
 		sb->s_id, function, error_buf);
 }

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/9] vfs: allow mnt_want_write() to sleep
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (3 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 4/9] vfs: add sb_force_remount_readonly() helper Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 6/9] vfs: keep list of mounts for each superblock Miklos Szeredi
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-sleeping-mnt_want_write.patch --]
[-- Type: text/plain, Size: 2706 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Allow mnt_want_write() to sleep.

This is necessary to enable holding off write requests for the
duration of ->remount_fs(), which may sleep.

A quick audit didn't turn up any callers from atomic context.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namespace.c     |   11 +++++++----
 fs/super.c         |    1 +
 include/linux/fs.h |    5 +++++
 3 files changed, 13 insertions(+), 4 deletions(-)

Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-04 12:15:17.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-04 16:50:44.000000000 +0200
@@ -281,8 +281,12 @@ int mnt_want_write(struct vfsmount *mnt)
 	 * incremented count after it has set MNT_WRITE_HOLD.
 	 */
 	smp_mb();
-	while (mnt->mnt_flags & MNT_WRITE_HOLD)
-		cpu_relax();
+	if (unlikely(mnt->mnt_flags & MNT_WRITE_HOLD)) {
+		preempt_enable();
+		wait_event(mnt->mnt_sb->s_wait_remount_readonly,
+			   !(mnt->mnt_flags & MNT_WRITE_HOLD));
+		preempt_disable();
+	}
 	/*
 	 * After the slowpath clears MNT_WRITE_HOLD, mnt_is_readonly will
 	 * be set to match its requirements. So we must not load that until
@@ -292,9 +296,7 @@ int mnt_want_write(struct vfsmount *mnt)
 	if (__mnt_is_readonly(mnt)) {
 		dec_mnt_writers(mnt);
 		ret = -EROFS;
-		goto out;
 	}
-out:
 	preempt_enable();
 	return ret;
 }
@@ -395,6 +397,7 @@ static int mnt_make_readonly(struct vfsm
 	 */
 	smp_wmb();
 	mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+	wake_up_all(&mnt->mnt_sb->s_wait_remount_readonly);
 	br_write_unlock(vfsmount_lock);
 	return ret;
 }
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-04 12:12:01.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-04 16:50:25.000000000 +0200
@@ -107,6 +107,7 @@ static struct super_block *alloc_super(s
 		mutex_init(&s->s_dquot.dqonoff_mutex);
 		init_rwsem(&s->s_dquot.dqptr_sem);
 		init_waitqueue_head(&s->s_wait_unfrozen);
+		init_waitqueue_head(&s->s_wait_remount_readonly);
 		s->s_maxbytes = MAX_NON_LFS;
 		s->s_op = &default_op;
 		s->s_time_gran = 1000000000;
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-10-04 12:13:00.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-10-04 16:50:25.000000000 +0200
@@ -1385,6 +1385,11 @@ struct super_block {
 	 * generic_show_options()
 	 */
 	char *s_options;
+
+	/*
+	 * Wait queue for remouting read-only
+	 */
+	wait_queue_head_t s_wait_remount_readonly;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/9] vfs: keep list of mounts for each superblock
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (4 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 5/9] vfs: allow mnt_want_write() to sleep Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 7/9] vfs: protect remounting superblock read-only Miklos Szeredi
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-mount-list.patch --]
[-- Type: text/plain, Size: 3370 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Keep track of vfsmounts belonging to a superblock.  List is protected
by vfsmount_lock.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namespace.c        |    5 +++++
 fs/super.c            |    7 +++++++
 include/linux/fs.h    |    1 +
 include/linux/mount.h |    1 +
 4 files changed, 14 insertions(+)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-10-04 12:15:30.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-10-04 12:38:35.000000000 +0200
@@ -1346,6 +1346,7 @@ struct super_block {
 #else
 	struct list_head	s_files;
 #endif
+	struct list_head	s_mounts;	/* list of mounts */
 	/* s_dentry_lru and s_nr_dentry_unused are protected by dcache_lock */
 	struct list_head	s_dentry_lru;	/* unused dentry lru */
 	int			s_nr_dentry_unused;	/* # of dentry on lru */
Index: linux-2.6/include/linux/mount.h
===================================================================
--- linux-2.6.orig/include/linux/mount.h	2010-10-01 20:40:41.000000000 +0200
+++ linux-2.6/include/linux/mount.h	2010-10-04 12:38:35.000000000 +0200
@@ -54,6 +54,7 @@ struct vfsmount {
 	struct super_block *mnt_sb;	/* pointer to superblock */
 	struct list_head mnt_mounts;	/* list of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
+	struct list_head mnt_instance;	/* mount instance on sb->s_mounts */
 	int mnt_flags;
 	/* 4 bytes hole on 64bits arches without fsnotify */
 #ifdef CONFIG_FSNOTIFY
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-04 12:15:30.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-04 12:38:35.000000000 +0200
@@ -638,6 +638,10 @@ static struct vfsmount *clone_mnt(struct
 			if (!list_empty(&old->mnt_expire))
 				list_add(&mnt->mnt_expire, &old->mnt_expire);
 		}
+
+		br_write_lock(vfsmount_lock);
+		list_add_tail(&mnt->mnt_instance, &mnt->mnt_sb->s_mounts);
+		br_write_unlock(vfsmount_lock);
 	}
 	return mnt;
 
@@ -677,6 +681,7 @@ repeat:
 		return;
 	}
 	if (likely(!mnt->mnt_pinned)) {
+		list_del(&mnt->mnt_instance);
 		br_write_unlock(vfsmount_lock);
 		__mntput(mnt);
 		return;
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-04 12:15:30.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-04 12:38:35.000000000 +0200
@@ -74,6 +74,7 @@ static struct super_block *alloc_super(s
 		INIT_HLIST_HEAD(&s->s_anon);
 		INIT_LIST_HEAD(&s->s_inodes);
 		INIT_LIST_HEAD(&s->s_dentry_lru);
+		INIT_LIST_HEAD(&s->s_mounts);
 		init_rwsem(&s->s_umount);
 		mutex_init(&s->s_lock);
 		lockdep_set_class(&s->s_umount, &type->s_umount_key);
@@ -128,6 +129,7 @@ static inline void destroy_super(struct
 	free_percpu(s->s_files);
 #endif
 	security_sb_free(s);
+	WARN_ON(!list_empty(&s->s_mounts));
 	kfree(s->s_subtype);
 	kfree(s->s_options);
 	kfree(s);
@@ -969,6 +971,11 @@ vfs_kern_mount(struct file_system_type *
 	mnt->mnt_parent = mnt;
 	up_write(&mnt->mnt_sb->s_umount);
 	free_secdata(secdata);
+
+	br_write_lock(vfsmount_lock);
+	list_add_tail(&mnt->mnt_instance, &mnt->mnt_sb->s_mounts);
+	br_write_unlock(vfsmount_lock);
+
 	return mnt;
 out_sb:
 	dput(mnt->mnt_root);

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 7/9] vfs: protect remounting superblock read-only
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (5 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 6/9] vfs: keep list of mounts for each superblock Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-22  6:46   ` Al Viro
  2010-10-05 10:31 ` [PATCH 8/9] vfs: fs_may_remount_ro: turn unnecessary check into a WARN_ON Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 9/9] vfs: mark mounts read-only on forced remount Miklos Szeredi
  8 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-protect-sb-remount.patch --]
[-- Type: text/plain, Size: 6118 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Currently remouting superblock read-only is racy in a major way.

With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.

Before starting the remount read-only, set MNT_WRITE_HOLD on all
mounts so that writes are held off until the remount either succeeds
or fails.

After this proceed with the remount and if successful mark all mounts
MNT_READONLY and release MNT_WRITE_HOLD.  If unsuccessful, just
release MNT_WRITE_HOLD so that write operations can proceed normally.

Careful thought needs to be given to places where mnt_flags are set
such as do_add_mount() and clone_mnt() to ensure that a read-write
mount is not added to a read-only superblock.

Small races still remain because delayed writes due to nlink going to
zero but inode still having a reference are not dealt with by this
patch.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/internal.h  |    3 ++
 fs/namespace.c |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/super.c     |   25 +++++++++++++++++---
 3 files changed, 93 insertions(+), 5 deletions(-)

Index: linux-2.6/fs/internal.h
===================================================================
--- linux-2.6.orig/fs/internal.h	2010-10-01 20:40:41.000000000 +0200
+++ linux-2.6/fs/internal.h	2010-10-04 13:02:24.000000000 +0200
@@ -69,6 +69,9 @@ extern void mnt_set_mountpoint(struct vf
 extern void release_mounts(struct list_head *);
 extern void umount_tree(struct vfsmount *, int, struct list_head *);
 extern struct vfsmount *copy_tree(struct vfsmount *, struct dentry *, int);
+extern int sb_prepare_remount_readonly(struct super_block *);
+extern void sb_cancel_remount_readonly(struct super_block *);
+extern void sb_finish_remount_readonly(struct super_block *);
 
 extern void __init mnt_init(void);
 
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-04 12:38:35.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-04 13:02:24.000000000 +0200
@@ -422,6 +422,61 @@ void sb_force_remount_readonly(struct su
 }
 EXPORT_SYMBOL(sb_force_remount_readonly);
 
+int sb_prepare_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+	int err = 0;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (!(mnt->mnt_flags & MNT_READONLY)) {
+			mnt->mnt_flags |= MNT_WRITE_HOLD;
+			smp_mb();
+			if (count_mnt_writers(mnt) > 0) {
+				err = -EBUSY;
+				break;
+			}
+		}
+	}
+	br_write_unlock(vfsmount_lock);
+
+	if (err)
+		sb_cancel_remount_readonly(sb);
+
+	return err;
+}
+
+void sb_cancel_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WRITE_HOLD)
+			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+	}
+
+	br_write_unlock(vfsmount_lock);
+	wake_up_all(&sb->s_wait_remount_readonly);
+}
+
+void sb_finish_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (!(mnt->mnt_flags & MNT_READONLY)) {
+			mnt->mnt_flags |= MNT_READONLY;
+			smp_wmb();
+			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+		}
+	}
+	sb->s_flags |= MS_RDONLY;
+	br_write_unlock(vfsmount_lock);
+	wake_up_all(&sb->s_wait_remount_readonly);
+}
+
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
@@ -611,7 +666,7 @@ static struct vfsmount *clone_mnt(struct
 				goto out_free;
 		}
 
-		mnt->mnt_flags = old->mnt_flags & ~MNT_WRITE_HOLD;
+		mnt->mnt_flags = old->mnt_flags & MNT_PROPAGATION_MASK;
 		atomic_inc(&sb->s_active);
 		mnt->mnt_sb = sb;
 		mnt->mnt_root = dget(root);
@@ -639,7 +694,13 @@ static struct vfsmount *clone_mnt(struct
 				list_add(&mnt->mnt_expire, &old->mnt_expire);
 		}
 
+		/*
+		 * Add new mount to s_mounts.  Do this after fiddling
+		 * with propagation flags to prevent races with
+		 * remount ro/rw.
+		 */
 		br_write_lock(vfsmount_lock);
+		mnt->mnt_flags |= old->mnt_flags & ~MNT_PROPAGATION_MASK;
 		list_add_tail(&mnt->mnt_instance, &mnt->mnt_sb->s_mounts);
 		br_write_unlock(vfsmount_lock);
 	}
@@ -1804,7 +1865,14 @@ int do_add_mount(struct vfsmount *newmnt
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
+	/* Locking is necessary to prevent racing with remount r/o */
+	br_read_lock(vfsmount_lock);
+	if (newmnt->mnt_sb->s_flags & MS_RDONLY)
+		mnt_flags |= MNT_READONLY;
+
 	newmnt->mnt_flags = mnt_flags;
+	br_read_unlock(vfsmount_lock);
+
 	if ((err = graft_tree(newmnt, path)))
 		goto unlock;
 
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-04 12:38:35.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-04 13:02:24.000000000 +0200
@@ -574,18 +574,31 @@ int do_remount_sb(struct super_block *sb
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
 	if (remount_ro) {
-		if (force)
+		if (force) {
 			mark_files_ro(sb);
-		else if (!fs_may_remount_ro(sb))
-			return -EBUSY;
+		} else {
+			retval = sb_prepare_remount_readonly(sb);
+			if (retval)
+				return retval;
+
+			retval = -EBUSY;
+			if (!fs_may_remount_ro(sb))
+				goto cancel_readonly;
+		}
 	}
 
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		/* If forced remount, go ahead despite any errors */
-		if (retval && !force)
+		if (retval && !force) {
+			if (remount_ro)
+				goto cancel_readonly;
 			return retval;
+		}
 	}
+	if (remount_ro && !force)
+		sb_finish_remount_readonly(sb);
+
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
 
 	/*
@@ -599,6 +612,10 @@ int do_remount_sb(struct super_block *sb
 	if (remount_ro && sb->s_bdev)
 		invalidate_bdev(sb->s_bdev);
 	return 0;
+
+cancel_readonly:
+	sb_cancel_remount_readonly(sb);
+	return retval;
 }
 
 static void do_emergency_remount(struct work_struct *work)

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 8/9] vfs: fs_may_remount_ro: turn unnecessary check into a WARN_ON
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (6 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 7/9] vfs: protect remounting superblock read-only Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 9/9] vfs: mark mounts read-only on forced remount Miklos Szeredi
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-remove-unnecessary-remount-check.patch --]
[-- Type: text/plain, Size: 1017 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Now a successful sb_prepare_remount_readonly() should ensure that no
writable files remain for this superblock.  So turn this check into a
WARN_ON.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/file_table.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6/fs/file_table.c
===================================================================
--- linux-2.6.orig/fs/file_table.c	2010-10-04 10:44:23.000000000 +0200
+++ linux-2.6/fs/file_table.c	2010-10-04 15:07:29.000000000 +0200
@@ -437,8 +437,11 @@ int fs_may_remount_ro(struct super_block
 		if (inode->i_nlink == 0)
 			goto too_bad;
 
-		/* Writeable file? */
-		if (S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE))
+		/*
+		 * Writable file?
+		 * Should be caught by sb_prepare_remount_readonly().
+		 */
+		if (WARN_ON(S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE)))
 			goto too_bad;
 	} while_file_list_for_each_entry;
 	lg_global_unlock(files_lglock);

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 9/9] vfs: mark mounts read-only on forced remount
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (7 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 8/9] vfs: fs_may_remount_ro: turn unnecessary check into a WARN_ON Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-force-mounts-readonly.patch --]
[-- Type: text/plain, Size: 3483 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

When the superblock is forcefully remounted read-only, as in case of
an emergency remount or filesystem errors, make sure that the mounts
belonging to the superblock are also marked read-only.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/internal.h  |    1 +
 fs/namespace.c |   16 ++++++++++++++++
 fs/super.c     |   16 ++++++++++++----
 3 files changed, 29 insertions(+), 4 deletions(-)

Index: linux-2.6/fs/internal.h
===================================================================
--- linux-2.6.orig/fs/internal.h	2010-10-04 13:02:24.000000000 +0200
+++ linux-2.6/fs/internal.h	2010-10-04 15:09:49.000000000 +0200
@@ -72,6 +72,7 @@ extern struct vfsmount *copy_tree(struct
 extern int sb_prepare_remount_readonly(struct super_block *);
 extern void sb_cancel_remount_readonly(struct super_block *);
 extern void sb_finish_remount_readonly(struct super_block *);
+extern void mark_mounts_readonly(struct super_block *);
 
 extern void __init mnt_init(void);
 
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-04 13:02:24.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-04 15:09:49.000000000 +0200
@@ -418,7 +418,13 @@ static int mnt_make_writable(struct vfsm
 
 void sb_force_remount_readonly(struct super_block *sb)
 {
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance)
+		mnt->mnt_flags |= MNT_READONLY;
 	sb->s_flags |= MS_RDONLY;
+	br_write_unlock(vfsmount_lock);
 }
 EXPORT_SYMBOL(sb_force_remount_readonly);
 
@@ -477,6 +483,16 @@ void sb_finish_remount_readonly(struct s
 	wake_up_all(&sb->s_wait_remount_readonly);
 }
 
+void mark_mounts_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance)
+		mnt->mnt_flags |= MNT_READONLY;
+	br_write_unlock(vfsmount_lock);
+}
+
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-04 13:02:24.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-04 15:09:49.000000000 +0200
@@ -555,6 +555,7 @@ int do_remount_sb(struct super_block *sb
 {
 	int retval;
 	int remount_ro;
+	bool old_ro;
 
 	if (sb->s_frozen != SB_UNFROZEN)
 		return -EBUSY;
@@ -569,12 +570,14 @@ int do_remount_sb(struct super_block *sb
 	shrink_dcache_sb(sb);
 	sync_filesystem(sb);
 
-	remount_ro = (flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY);
+	old_ro = (sb->s_flags & MS_RDONLY) != 0;
+	remount_ro = (flags & MS_RDONLY) && !old_ro;
 
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
 	if (remount_ro) {
 		if (force) {
+			mark_mounts_readonly(sb);
 			mark_files_ro(sb);
 		} else {
 			retval = sb_prepare_remount_readonly(sb);
@@ -596,9 +599,14 @@ int do_remount_sb(struct super_block *sb
 			return retval;
 		}
 	}
-	if (remount_ro && !force)
-		sb_finish_remount_readonly(sb);
-
+	if (remount_ro) {
+		WARN_ON(!(flags & MS_RDONLY));
+		if (!force)
+			sb_finish_remount_readonly(sb);
+	} else if ((flags & MS_RDONLY) && !old_ro) {
+		/* filesystem remounted r/o without being asked to */
+		sb_force_remount_readonly(sb);
+	}
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
 
 	/*

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race
  2010-10-05 10:31 ` [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race Miklos Szeredi
@ 2010-10-05 10:49   ` Miklos Szeredi
  0 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:49 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

On Tue, 05 Oct 2010, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> If clone_mnt() happens while mnt_make_readonly() is running, the
> cloned mount might have MNT_WRITE_HOLD flag set, which results in
> mnt_want_write() spinning forever on this mount.
> 
> Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen
> accidentally.  But if it does happen it can hang the machine.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>

Forgot to add

CC: stable@kernel.org

> ---
>  fs/namespace.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6/fs/namespace.c
> ===================================================================
> --- linux-2.6.orig/fs/namespace.c	2010-10-01 21:54:17.000000000 +0200
> +++ linux-2.6/fs/namespace.c	2010-10-01 21:54:30.000000000 +0200
> @@ -595,7 +595,7 @@ static struct vfsmount *clone_mnt(struct
>  				goto out_free;
>  		}
>  
> -		mnt->mnt_flags = old->mnt_flags;
> +		mnt->mnt_flags = old->mnt_flags & ~MNT_WRITE_HOLD;
>  		atomic_inc(&sb->s_active);
>  		mnt->mnt_sb = sb;
>  		mnt->mnt_root = dget(root);
> 
> -- 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9] vfs: protect remounting superblock read-only
  2010-10-05 10:31 ` [PATCH 7/9] vfs: protect remounting superblock read-only Miklos Szeredi
@ 2010-10-22  6:46   ` Al Viro
  2010-10-22 12:25     ` Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2010-10-22  6:46 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: dave, akpm, linux-fsdevel, linux-kernel

On Tue, Oct 05, 2010 at 12:31:15PM +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> Currently remouting superblock read-only is racy in a major way.
> 
> With the per mount read-only infrastructure it is now possible to
> prevent most races, which this patch attempts.
> 
> Before starting the remount read-only, set MNT_WRITE_HOLD on all
> mounts so that writes are held off until the remount either succeeds
> or fails.

Umm...  What'll happen if your remount will say mnt_want_write() on
e.g. internal vfsmount over that sb?  Or, more subtle, tries to
update atime on some opened struct file on that sb.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9] vfs: protect remounting superblock read-only
  2010-10-22  6:46   ` Al Viro
@ 2010-10-22 12:25     ` Miklos Szeredi
  2010-10-22 16:10       ` Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-22 12:25 UTC (permalink / raw)
  To: Al Viro; +Cc: miklos, dave, akpm, linux-fsdevel, linux-kernel

On Fri, 22 Oct 2010, Al Viro wrote:
> On Tue, Oct 05, 2010 at 12:31:15PM +0200, Miklos Szeredi wrote:
> > From: Miklos Szeredi <mszeredi@suse.cz>
> > 
> > Currently remouting superblock read-only is racy in a major way.
> > 
> > With the per mount read-only infrastructure it is now possible to
> > prevent most races, which this patch attempts.
> > 
> > Before starting the remount read-only, set MNT_WRITE_HOLD on all
> > mounts so that writes are held off until the remount either succeeds
> > or fails.
> 
> Umm...  What'll happen if your remount will say mnt_want_write() on
> e.g. internal vfsmount over that sb?

Quickly looking through filesystems I didn't find this sort of usage.

>   Or, more subtle, tries to
> update atime on some opened struct file on that sb.

Hmm, load_nls() will apparently do that.  Drat.

Plan B: remount all vfsmounts read-only before ->remount_fs() but
remember which ones were read-write and restore on failure.  Will
result in ugly transient write failures if remount_fs() fails, but I
don't think anybody would care in practice.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9] vfs: protect remounting superblock read-only
  2010-10-22 12:25     ` Miklos Szeredi
@ 2010-10-22 16:10       ` Miklos Szeredi
  2010-10-22 16:14         ` [PATCH 7/9 updated] " Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-22 16:10 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

On Fri, 22 Oct 2010, Miklos Szeredi wrote:
> >   Or, more subtle, tries to
> > update atime on some opened struct file on that sb.
> 
> Hmm, load_nls() will apparently do that.  Drat.
> 
> Plan B: remount all vfsmounts read-only before ->remount_fs() but
> remember which ones were read-write and restore on failure.  Will
> result in ugly transient write failures if remount_fs() fails, but I
> don't think anybody would care in practice.

And it's actually a pretty simple change.  Here's the incremental
patch to "vfs: protect remounting superblock read-only", with "vfs:
allow mnt_want_write() to sleep" reverted.

The full, updated patch with follow.

Thanks,
Miklos



Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-22 16:24:41.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-22 16:17:59.000000000 +0200
@@ -435,11 +435,17 @@ int sb_prepare_remount_readonly(struct s
 			}
 		}
 	}
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WRITE_HOLD) {
+			if (!err) {
+				mnt->mnt_flags |= MNT_READONLY | MNT_WAS_WRITE;
+				smp_wmb();
+			}
+			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+		}
+	}
 	br_write_unlock(vfsmount_lock);
 
-	if (err)
-		sb_cancel_remount_readonly(sb);
-
 	return err;
 }
 
@@ -449,12 +455,11 @@ void sb_cancel_remount_readonly(struct s
 
 	br_write_lock(vfsmount_lock);
 	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (mnt->mnt_flags & MNT_WRITE_HOLD)
-			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+		if (mnt->mnt_flags & MNT_WAS_WRITE)
+			mnt->mnt_flags &= ~(MNT_WAS_WRITE | MNT_READONLY);
 	}
 
 	br_write_unlock(vfsmount_lock);
-	wake_up_all(&sb->s_wait_remount_readonly);
 }
 
 void sb_finish_remount_readonly(struct super_block *sb)
@@ -463,15 +468,11 @@ void sb_finish_remount_readonly(struct s
 
 	br_write_lock(vfsmount_lock);
 	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (!(mnt->mnt_flags & MNT_READONLY)) {
-			mnt->mnt_flags |= MNT_READONLY;
-			smp_wmb();
-			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
-		}
+		if (mnt->mnt_flags & MNT_WAS_WRITE)
+			mnt->mnt_flags &= ~MNT_WAS_WRITE;
 	}
 	sb->s_flags |= MS_RDONLY;
 	br_write_unlock(vfsmount_lock);
-	wake_up_all(&sb->s_wait_remount_readonly);
 }
 
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
Index: linux-2.6/include/linux/mount.h
===================================================================
--- linux-2.6.orig/include/linux/mount.h	2010-10-22 16:02:14.000000000 +0200
+++ linux-2.6/include/linux/mount.h	2010-10-22 16:20:26.000000000 +0200
@@ -30,6 +30,7 @@ struct mnt_namespace;
 
 #define MNT_SHRINKABLE	0x100
 #define MNT_WRITE_HOLD	0x200
+#define MNT_WAS_WRITE	0x400   /* used by remount to remember write mounts */
 
 #define MNT_SHARED	0x1000	/* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE	0x2000	/* if the vfsmount is a unbindable mount */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 7/9 updated] vfs: protect remounting superblock read-only
  2010-10-22 16:10       ` Miklos Szeredi
@ 2010-10-22 16:14         ` Miklos Szeredi
  2010-10-23 16:19           ` Al Viro
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-22 16:14 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

From: Miklos Szeredi <mszeredi@suse.cz>

Currently remouting superblock read-only is racy in a major way.

With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.

Before starting the remount read-only, set MNT_READONLY on all mounts
so that writes are prevented during the remount.  Also set
MNT_WAS_WRITE on mounts that were not read-only.

If the remounting is unsuccessful, restore the mounts to read-write.
This can result in transient EROFS errors.  Unfortunately hodling off
writes is difficult as remount itself may touch the filesystem
(e.g. through load_nls()) which would then deadlock.

Careful thought needs to be given to places where mnt_flags are set
such as do_add_mount() and clone_mnt() to ensure that a read-write
mount is not added to a read-only superblock.

Small races still remain because delayed writes due to nlink going to
zero but inode still having a reference are not dealt with by this
patch.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/internal.h         |    3 ++
 fs/namespace.c        |   73 ++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/super.c            |   25 ++++++++++++++---
 include/linux/mount.h |    1 
 4 files changed, 96 insertions(+), 6 deletions(-)

Index: linux-2.6/fs/internal.h
===================================================================
--- linux-2.6.orig/fs/internal.h	2010-10-21 19:38:36.000000000 +0200
+++ linux-2.6/fs/internal.h	2010-10-22 17:48:16.000000000 +0200
@@ -69,6 +69,9 @@ extern void mnt_set_mountpoint(struct vf
 extern void release_mounts(struct list_head *);
 extern void umount_tree(struct vfsmount *, int, struct list_head *);
 extern struct vfsmount *copy_tree(struct vfsmount *, struct dentry *, int);
+extern int sb_prepare_remount_readonly(struct super_block *);
+extern void sb_cancel_remount_readonly(struct super_block *);
+extern void sb_finish_remount_readonly(struct super_block *);
 
 extern void __init mnt_init(void);
 
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-22 16:02:14.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-22 17:59:30.000000000 +0200
@@ -419,6 +419,62 @@ void sb_force_remount_readonly(struct su
 }
 EXPORT_SYMBOL(sb_force_remount_readonly);
 
+int sb_prepare_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+	int err = 0;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (!(mnt->mnt_flags & MNT_READONLY)) {
+			mnt->mnt_flags |= MNT_WRITE_HOLD;
+			smp_mb();
+			if (count_mnt_writers(mnt) > 0) {
+				err = -EBUSY;
+				break;
+			}
+		}
+	}
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WRITE_HOLD) {
+			if (!err) {
+				mnt->mnt_flags |= MNT_READONLY | MNT_WAS_WRITE;
+				smp_wmb();
+			}
+			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+		}
+	}
+	br_write_unlock(vfsmount_lock);
+
+	return err;
+}
+
+void sb_cancel_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WAS_WRITE)
+			mnt->mnt_flags &= ~(MNT_WAS_WRITE | MNT_READONLY);
+	}
+
+	br_write_unlock(vfsmount_lock);
+}
+
+void sb_finish_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WAS_WRITE)
+			mnt->mnt_flags &= ~MNT_WAS_WRITE;
+	}
+	sb->s_flags |= MS_RDONLY;
+	br_write_unlock(vfsmount_lock);
+}
+
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
@@ -608,7 +664,7 @@ static struct vfsmount *clone_mnt(struct
 				goto out_free;
 		}
 
-		mnt->mnt_flags = old->mnt_flags & ~MNT_WRITE_HOLD;
+		mnt->mnt_flags = old->mnt_flags & MNT_PROPAGATION_MASK;
 		atomic_inc(&sb->s_active);
 		mnt->mnt_sb = sb;
 		mnt->mnt_root = dget(root);
@@ -636,7 +692,13 @@ static struct vfsmount *clone_mnt(struct
 				list_add(&mnt->mnt_expire, &old->mnt_expire);
 		}
 
+		/*
+		 * Add new mount to s_mounts.  Do this after fiddling
+		 * with propagation flags to prevent races with
+		 * remount ro/rw.
+		 */
 		br_write_lock(vfsmount_lock);
+		mnt->mnt_flags |= old->mnt_flags & ~MNT_PROPAGATION_MASK;
 		list_add_tail(&mnt->mnt_instance, &mnt->mnt_sb->s_mounts);
 		br_write_unlock(vfsmount_lock);
 	}
@@ -1782,6 +1844,14 @@ int do_add_mount(struct vfsmount *newmnt
 
 	mnt_flags &= ~(MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL);
 
+	/* Locking is necessary to prevent racing with remount r/o */
+	down_read(&newmnt->mnt_sb->s_umount);
+	if (newmnt->mnt_sb->s_flags & MS_RDONLY)
+		mnt_flags |= MNT_READONLY;
+
+	newmnt->mnt_flags = mnt_flags;
+	up_read(&newmnt->mnt_sb->s_umount);
+
 	down_write(&namespace_sem);
 	/* Something was mounted here while we slept */
 	while (d_mountpoint(path->dentry) &&
@@ -1801,7 +1871,6 @@ int do_add_mount(struct vfsmount *newmnt
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
-	newmnt->mnt_flags = mnt_flags;
 	if ((err = graft_tree(newmnt, path)))
 		goto unlock;
 
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-22 16:02:14.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-22 17:48:16.000000000 +0200
@@ -573,18 +573,31 @@ int do_remount_sb(struct super_block *sb
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
 	if (remount_ro) {
-		if (force)
+		if (force) {
 			mark_files_ro(sb);
-		else if (!fs_may_remount_ro(sb))
-			return -EBUSY;
+		} else {
+			retval = sb_prepare_remount_readonly(sb);
+			if (retval)
+				return retval;
+
+			retval = -EBUSY;
+			if (!fs_may_remount_ro(sb))
+				goto cancel_readonly;
+		}
 	}
 
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		/* If forced remount, go ahead despite any errors */
-		if (retval && !force)
+		if (retval && !force) {
+			if (remount_ro)
+				goto cancel_readonly;
 			return retval;
+		}
 	}
+	if (remount_ro && !force)
+		sb_finish_remount_readonly(sb);
+
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
 
 	/*
@@ -598,6 +611,10 @@ int do_remount_sb(struct super_block *sb
 	if (remount_ro && sb->s_bdev)
 		invalidate_bdev(sb->s_bdev);
 	return 0;
+
+cancel_readonly:
+	sb_cancel_remount_readonly(sb);
+	return retval;
 }
 
 static void do_emergency_remount(struct work_struct *work)
Index: linux-2.6/include/linux/mount.h
===================================================================
--- linux-2.6.orig/include/linux/mount.h	2010-10-22 16:02:14.000000000 +0200
+++ linux-2.6/include/linux/mount.h	2010-10-22 17:59:30.000000000 +0200
@@ -30,6 +30,7 @@ struct mnt_namespace;
 
 #define MNT_SHRINKABLE	0x100
 #define MNT_WRITE_HOLD	0x200
+#define MNT_WAS_WRITE	0x400   /* used by remount to remember write mounts */
 
 #define MNT_SHARED	0x1000	/* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE	0x2000	/* if the vfsmount is a unbindable mount */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9 updated] vfs: protect remounting superblock read-only
  2010-10-22 16:14         ` [PATCH 7/9 updated] " Miklos Szeredi
@ 2010-10-23 16:19           ` Al Viro
  2010-10-23 19:35             ` Miklos Szeredi
  2010-10-25 12:36             ` Miklos Szeredi
  0 siblings, 2 replies; 19+ messages in thread
From: Al Viro @ 2010-10-23 16:19 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: dave, akpm, linux-fsdevel, linux-kernel

On Fri, Oct 22, 2010 at 06:14:01PM +0200, Miklos Szeredi wrote:

> @@ -1782,6 +1844,14 @@ int do_add_mount(struct vfsmount *newmnt
>  
>  	mnt_flags &= ~(MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL);

Obviously not enough - you've just added a new flag that needs to be
trimmed from mnt_flags.

> +	/* Locking is necessary to prevent racing with remount r/o */
> +	down_read(&newmnt->mnt_sb->s_umount);
> +	if (newmnt->mnt_sb->s_flags & MS_RDONLY)
> +		mnt_flags |= MNT_READONLY;
> +
> +	newmnt->mnt_flags = mnt_flags;
> +	up_read(&newmnt->mnt_sb->s_umount);

FWIW, I really don't like the way you are doing that; what we really need
there is a per-sb analog of mnt_want_write()/mnt_drop_write().  With
mnt_want_write() bumping per-sb write count, which would solve all these
problems quite nicely.

NOTE: vfsmount being ro and sb being ro are *independent* things; either
is enough to deny writes.  Having remount ro + remount rw lose the state
of other vfsmounts is a Bad Thing(tm).

Another thing:
	"If clone_mnt() happens while mnt_make_readonly() is running, the
	cloned mount might have MNT_WRITE_HOLD flag set, which results in
	mnt_want_write() spinning forever on this mount."
actually means
	"neither clone_mnt() nor remounts should ever be done without
namespace_sem held exclusive; if that ever happens, we have a serious
bug that can lead to any number of bad things happening".

Do you actually see such places?  If so, that's what needs fixing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9 updated] vfs: protect remounting superblock read-only
  2010-10-23 16:19           ` Al Viro
@ 2010-10-23 19:35             ` Miklos Szeredi
  2010-10-23 21:42               ` Al Viro
  2010-10-25 12:36             ` Miklos Szeredi
  1 sibling, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-23 19:35 UTC (permalink / raw)
  To: Al Viro; +Cc: miklos, dave, akpm, linux-fsdevel, linux-kernel

On Sat, 23 Oct 2010, Al Viro wrote:
> On Fri, Oct 22, 2010 at 06:14:01PM +0200, Miklos Szeredi wrote:
> 
> > @@ -1782,6 +1844,14 @@ int do_add_mount(struct vfsmount *newmnt
> >  
> >  	mnt_flags &= ~(MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL);
> 
> Obviously not enough - you've just added a new flag that needs to be
> trimmed from mnt_flags.
> 
> > +	/* Locking is necessary to prevent racing with remount r/o */
> > +	down_read(&newmnt->mnt_sb->s_umount);
> > +	if (newmnt->mnt_sb->s_flags & MS_RDONLY)
> > +		mnt_flags |= MNT_READONLY;
> > +
> > +	newmnt->mnt_flags = mnt_flags;
> > +	up_read(&newmnt->mnt_sb->s_umount);
> 
> FWIW, I really don't like the way you are doing that; what we really need
> there is a per-sb analog of mnt_want_write()/mnt_drop_write().  With
> mnt_want_write() bumping per-sb write count, which would solve all these
> problems quite nicely.
> 
> NOTE: vfsmount being ro and sb being ro are *independent* things;

Yes, except the mount(2) API which doens't quite let them be changed
independently.

>  either
> is enough to deny writes.  Having remount ro + remount rw lose the state
> of other vfsmounts is a Bad Thing(tm).

Hmm.

> 
> Another thing:
> 	"If clone_mnt() happens while mnt_make_readonly() is running, the
> 	cloned mount might have MNT_WRITE_HOLD flag set, which results in
> 	mnt_want_write() spinning forever on this mount."
> actually means
> 	"neither clone_mnt() nor remounts should ever be done without
> namespace_sem held exclusive; if that ever happens, we have a serious
> bug that can lead to any number of bad things happening".
> 
> Do you actually see such places?  If so, that's what needs fixing.

do_remount() takes s_umount, but not namespace_sem.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9 updated] vfs: protect remounting superblock read-only
  2010-10-23 19:35             ` Miklos Szeredi
@ 2010-10-23 21:42               ` Al Viro
  0 siblings, 0 replies; 19+ messages in thread
From: Al Viro @ 2010-10-23 21:42 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: dave, akpm, linux-fsdevel, linux-kernel

On Sat, Oct 23, 2010 at 09:35:16PM +0200, Miklos Szeredi wrote:

> > Another thing:
> > 	"If clone_mnt() happens while mnt_make_readonly() is running, the
> > 	cloned mount might have MNT_WRITE_HOLD flag set, which results in
> > 	mnt_want_write() spinning forever on this mount."
> > actually means
> > 	"neither clone_mnt() nor remounts should ever be done without
> > namespace_sem held exclusive; if that ever happens, we have a serious
> > bug that can lead to any number of bad things happening".
> > 
> > Do you actually see such places?  If so, that's what needs fixing.
> 
> do_remount() takes s_umount, but not namespace_sem.

Duh...  Right, ignore that part; we really don't want to do anything
blocking beyond simple allocations under namespace_sem (e.g. everything
that gets unmounted is collected to be dropped after namespace_sem
is released).

Applied.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9 updated] vfs: protect remounting superblock read-only
  2010-10-23 16:19           ` Al Viro
  2010-10-23 19:35             ` Miklos Szeredi
@ 2010-10-25 12:36             ` Miklos Szeredi
  1 sibling, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-25 12:36 UTC (permalink / raw)
  To: Al Viro; +Cc: miklos, dave, akpm, linux-fsdevel, linux-kernel

On Sat, 23 Oct 2010, Al Viro wrote:
> > +	/* Locking is necessary to prevent racing with remount r/o */
> > +	down_read(&newmnt->mnt_sb->s_umount);
> > +	if (newmnt->mnt_sb->s_flags & MS_RDONLY)
> > +		mnt_flags |= MNT_READONLY;
> > +
> > +	newmnt->mnt_flags = mnt_flags;
> > +	up_read(&newmnt->mnt_sb->s_umount);
> 
> FWIW, I really don't like the way you are doing that; what we really need
> there is a per-sb analog of mnt_want_write()/mnt_drop_write().  With
> mnt_want_write() bumping per-sb write count, which would solve all these
> problems quite nicely.

And add a nice bit of overhead to a hot path...  Making it per-cpu
would help, but then we'd end up with the same ifdef mess as
mnt_writers and still a non-zero overhead.

But what's the point anyway, the per-sb write count always equals
SUM(mnt->mnt_writers).

How about this variant?  It addresses the vfsmount vs. sb read only
status independence:


Index: linux-2.6/fs/internal.h
===================================================================
--- linux-2.6.orig/fs/internal.h	2010-10-25 12:45:22.000000000 +0200
+++ linux-2.6/fs/internal.h	2010-10-25 12:48:08.000000000 +0200
@@ -69,6 +69,7 @@ extern void mnt_set_mountpoint(struct vf
 extern void release_mounts(struct list_head *);
 extern void umount_tree(struct vfsmount *, int, struct list_head *);
 extern struct vfsmount *copy_tree(struct vfsmount *, struct dentry *, int);
+extern int sb_prepare_remount_readonly(struct super_block *);
 
 extern void __init mnt_init(void);
 
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-25 12:45:22.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-25 14:14:35.000000000 +0200
@@ -251,6 +251,15 @@ static unsigned int count_mnt_writers(st
 #endif
 }
 
+static int mnt_is_readonly(struct vfsmount *mnt)
+{
+	if (mnt->mnt_sb->s_readonly_remount)
+		return 1;
+	/* Order wrt setting s_flags/s_readonly_remount in do_remount() */
+	smp_rmb();
+	return __mnt_is_readonly(mnt);
+}
+
 /*
  * Most r/o checks on a fs are for operations that take
  * discrete amounts of time, like a write() or unlink().
@@ -289,7 +298,7 @@ int mnt_want_write(struct vfsmount *mnt)
 	 * MNT_WRITE_HOLD is cleared.
 	 */
 	smp_rmb();
-	if (__mnt_is_readonly(mnt)) {
+	if (mnt_is_readonly(mnt)) {
 		dec_mnt_writers(mnt);
 		ret = -EROFS;
 		goto out;
@@ -406,6 +415,35 @@ static void __mnt_unmake_readonly(struct
 	br_write_unlock(vfsmount_lock);
 }
 
+int sb_prepare_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+	int err = 0;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (!(mnt->mnt_flags & MNT_READONLY)) {
+			mnt->mnt_flags |= MNT_WRITE_HOLD;
+			smp_mb();
+			if (count_mnt_writers(mnt) > 0) {
+				err = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!err) {
+		sb->s_readonly_remount = 1;
+		smp_wmb();
+	}
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WRITE_HOLD)
+			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+	}
+	br_write_unlock(vfsmount_lock);
+
+	return err;
+}
+
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-25 12:45:22.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-25 14:08:58.000000000 +0200
@@ -573,19 +573,29 @@ int do_remount_sb(struct super_block *sb
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
 	if (remount_ro) {
-		if (force)
+		if (force) {
 			mark_files_ro(sb);
-		else if (!fs_may_remount_ro(sb))
-			return -EBUSY;
+		} else {
+			retval = sb_prepare_remount_readonly(sb);
+			if (retval)
+				return retval;
+
+			retval = -EBUSY;
+			if (!fs_may_remount_ro(sb))
+				goto cancel_readonly;
+		}
 	}
 
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		/* If forced remount, go ahead despite any errors */
 		if (retval && !force)
-			return retval;
+			goto cancel_readonly;
 	}
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
+	/* Needs to be ordered wrt mnt_is_readonly() */
+	smp_wmb();
+	sb->s_readonly_remount = 0;
 
 	/*
 	 * Some filesystems modify their metadata via some other path than the
@@ -598,6 +608,10 @@ int do_remount_sb(struct super_block *sb
 	if (remount_ro && sb->s_bdev)
 		invalidate_bdev(sb->s_bdev);
 	return 0;
+
+cancel_readonly:
+	sb->s_readonly_remount = 0;
+	return retval;
 }
 
 static void do_emergency_remount(struct work_struct *work)
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-10-25 11:43:25.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-10-25 13:38:26.000000000 +0200
@@ -1386,6 +1386,9 @@ struct super_block {
 	 * generic_show_options()
 	 */
 	char *s_options;
+
+	/* Being remounted read-only */
+	int s_readonly_remount;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-10-25 12:36 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
2010-10-05 10:31 ` [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race Miklos Szeredi
2010-10-05 10:49   ` Miklos Szeredi
2010-10-05 10:31 ` [PATCH 2/9] vfs: ignore error on forced remount Miklos Szeredi
2010-10-05 10:31 ` [PATCH 3/9] vfs: fix per mount read-write Miklos Szeredi
2010-10-05 10:31 ` [PATCH 4/9] vfs: add sb_force_remount_readonly() helper Miklos Szeredi
2010-10-05 10:31 ` [PATCH 5/9] vfs: allow mnt_want_write() to sleep Miklos Szeredi
2010-10-05 10:31 ` [PATCH 6/9] vfs: keep list of mounts for each superblock Miklos Szeredi
2010-10-05 10:31 ` [PATCH 7/9] vfs: protect remounting superblock read-only Miklos Szeredi
2010-10-22  6:46   ` Al Viro
2010-10-22 12:25     ` Miklos Szeredi
2010-10-22 16:10       ` Miklos Szeredi
2010-10-22 16:14         ` [PATCH 7/9 updated] " Miklos Szeredi
2010-10-23 16:19           ` Al Viro
2010-10-23 19:35             ` Miklos Szeredi
2010-10-23 21:42               ` Al Viro
2010-10-25 12:36             ` Miklos Szeredi
2010-10-05 10:31 ` [PATCH 8/9] vfs: fs_may_remount_ro: turn unnecessary check into a WARN_ON Miklos Szeredi
2010-10-05 10:31 ` [PATCH 9/9] vfs: mark mounts read-only on forced remount Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).