Re: [PATCH 20/20] honor r/w changes at do_remount() time

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@ftp.linux.org.uk>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: akpm@osdl.org, linux-kernel@vger.kernel.org,
	herbert@13thfloor.at, serue@us.ibm.com
Subject: Re: [PATCH 20/20] honor r/w changes at do_remount() time
Date: Wed, 28 Jun 2006 06:19:35 +0100	[thread overview]
Message-ID: <20060628051935.GA29920@ftp.linux.org.uk> (raw)
In-Reply-To: <20060627221457.04ADBF71@localhost.localdomain>

On Tue, Jun 27, 2006 at 03:14:57PM -0700, Dave Hansen wrote:
> 
> Originally from: Herbert Poetzl <herbert@13thfloor.at>
> 
> This is the core of the read-only bind mount patch set.
> 
> Note that this does _not_ add a "ro" option directly to
> the bind mount operation.  If you require such a mount,
> you must first do the bind, then follow it up with a
> 'mount -o remount,ro' operation.

I guess the fundamental problem I have with that approach is that it's
a cop-out - we just declare rw state of vfsmount independent from that
of filesystem and add a "if a flag is set, act upon vfsmount".

And yes, some of that does make sense.  Fine, let's separate that
stuff; but then we'd better decide what rw superblock *is*.

We have a number of vfsmounts over given superblock.  OK, some are
"we don't even ask them to be r/w".  Some are "we want them r/w, but
don't actually use as such at the moment".  Some are "pinned down for
write now".  And we do get logics for "can't make it r/o right now".

But look - we have the _same_ logics for superblock itself.  Only it's
full of holes.  And since you have rw states for those completely
unrelated to those of vfsmount, we get a ridiculous situation - we
*do* mark the moments when superblock becomes impossible to remount
r/o and we even mostly get the moments when it ceases to be busy
writing (unlinked-but-opened files are major exception).  But we can't
use that information.

So "can we remount superblock ro?" turns into kinda-sorta duplicate of
the same for vfsmounts, but it's racy as hell and bloody incomplete;
we don't even get "if some vfsmount over it is busy writing, we won't
remount r/o".  Approximation is done, but that's it.  E.g. mkdir() in
progress does *not* stop remount of superblock r/o (it does prevent
remount of vfsmount with your patchset).

FWIW, I suspect that the root of the problem is that we confuse different
states of filesystem.  E.g. one obviously useful feature would be to have
soft r/o - filesystem that is (from the driver POV) mounted readonly,
but would get transparently switched r/w at the first request.  And you
have all vfsmount-side infrastructure for that, BTW.  Add something like
mechanism we use for expiry and you've got a very tasty feature for e.g.
laptop users: e.g. userland asking to switch fs soft-ro every 15 minutes
and if nobody had wanted it r/w since the last time, do the transition;
if asked r/w again, r/w it goes on its own.  IOW, there's more to it than
one bit.  And I'm talking about superblock state...

BTW, it might be worth doing the following:
	* reintroduce the list of vfsmounts over given superblock (protected
by vfsmount_lock)
	* keep ro flag separate from counter and split it in two.
	* all decrements are with atomic_dec_and_lock()
	* all increments are with atomic_add_unless() + spin_lock() +
check flags + atomic_add_return() + possible spin_unlock
	* if writers count goes from non-zero to zero or vice versa
increment/decrement superblock counter (number of vfsmounts that really
want write access).
	* make the moments when i_nlink hits 0 bump the superblock writers
count; drop it when such sucker gets freed on final iput.
	* kill the sodding "traverse the list of opened files" logics in
remounting superblock r/o.  Instead of that, grab spinlock, check writers
count, bail out if non-zero, grab vfsmount_lock, traverse the list over
superblock and set one of the flags, drop the locks and proceed.
	* when remounting superblock r/w, traverse the list and knock out
the same flag.

At least that way we'd get the majority of "can remount ro" logics right...

Another fun issues:
	a) MS_REC handling with MS_BIND remounts (trivial)
	b) figuring out what (if anything) should be done with propagation
when we have shared subtrees... (not trivial at all)

next prev parent reply	other threads:[~2006-06-28  5:20 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-27 22:14 [PATCH 00/20] Mount writer count and read-only bind mounts (v3) Dave Hansen
2006-06-27 22:14 ` [PATCH 01/20] prepare for write access checks: collapse if() Dave Hansen
2006-06-27 22:14 ` [PATCH 02/20] r/o bind mount prepwork: move open_namei()'s vfs_create() Dave Hansen
2006-06-27 22:14 ` [PATCH 03/20] Add vfsmount writer count Dave Hansen
2006-06-27 22:14 ` [PATCH 04/20] elevate mnt writers for callers of vfs_mkdir() Dave Hansen
2006-06-27 22:14 ` [PATCH 05/20] elevate write count during entire ncp_ioctl() Dave Hansen
2006-06-27 22:14 ` [PATCH 06/20] sys_symlinkat() elevate write count around vfs_symlink() Dave Hansen
2006-06-27 22:14 ` [PATCH 07/20] elevate mount count for extended attributes Dave Hansen
2006-06-27 22:14 ` [PATCH 08/20] sys_linkat(): elevate write count around vfs_link() Dave Hansen
2006-06-27 22:14 ` [PATCH 09/20] mount_is_safe(): add comment Dave Hansen
2006-06-27 22:14 ` [PATCH 11/20] elevate write count over calls to vfs_rename() Dave Hansen
2006-06-27 22:14 ` [PATCH 10/20] unix_find_other() elevate write count for touch_atime() Dave Hansen
2006-06-27 22:14 ` [PATCH 12/20] tricky: elevate write count files are open()ed Dave Hansen
2006-06-27 22:14 ` [PATCH 13/20] elevate writer count for do_sys_truncate() Dave Hansen
2006-06-27 22:14 ` [PATCH 14/20] elevate write count for do_utimes() Dave Hansen
2006-06-27 22:14 ` [PATCH 15/20] elevate write count for do_sys_utime() and touch_atime() Dave Hansen
2006-06-27 22:14 ` [PATCH 16/20] sys_mknodat(): elevate write count for vfs_mknod/create() Dave Hansen
2006-06-27 22:14 ` [PATCH 17/20] elevate mnt writers for vfs_unlink() callers Dave Hansen
2006-06-27 22:14 ` [PATCH 18/20] do_rmdir(): elevate write count Dave Hansen
2006-06-27 22:14 ` [PATCH 19/20] elevate writer count for custom 'struct file' Dave Hansen
2006-06-28  2:40   ` Andrew Morton
2006-06-28  3:59     ` Dave Hansen
2006-06-27 22:14 ` [PATCH 20/20] honor r/w changes at do_remount() time Dave Hansen
2006-06-28  5:19   ` Al Viro [this message]
2006-06-28 14:41     ` Herbert Poetzl
2006-06-28 15:01       ` Serge E. Hallyn
2006-07-03 17:30     ` Dave Hansen
2006-07-03 17:48       ` Al Viro
2006-07-03 18:23         ` Joshua Hudson
2006-07-03 18:38           ` Al Viro
2006-06-28  1:38 ` [PATCH 00/20] Mount writer count and read-only bind mounts (v3) Andrew Morton
2006-06-28  1:50   ` Dave Hansen
2006-06-28  2:17     ` Andrew Morton
2006-06-28  2:24       ` Randy.Dunlap
2006-06-28  2:30         ` Andrew Morton
2006-06-28  8:42       ` Arjan van de Ven
2006-06-28  5:56     ` Christoph Hellwig
2006-06-28 14:52     ` Herbert Poetzl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060628051935.GA29920@ftp.linux.org.uk \
    --to=viro@ftp.linux.org.uk \
    --cc=akpm@osdl.org \
    --cc=haveblue@us.ibm.com \
    --cc=herbert@13thfloor.at \
    --cc=linux-kernel@vger.kernel.org \
    --cc=serue@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox