Re: [PATCH 20/20] honor r/w changes at do_remount() time

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@ftp.linux.org.uk>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: akpm@osdl.org, linux-kernel@vger.kernel.org,
	herbert@13thfloor.at, serue@us.ibm.com
Subject: Re: [PATCH 20/20] honor r/w changes at do_remount() time
Date: Wed, 28 Jun 2006 06:19:35 +0100	[thread overview]
Message-ID: <20060628051935.GA29920@ftp.linux.org.uk> (raw)
In-Reply-To: <20060627221457.04ADBF71@localhost.localdomain>

On Tue, Jun 27, 2006 at 03:14:57PM -0700, Dave Hansen wrote:
> 
> Originally from: Herbert Poetzl <herbert@13thfloor.at>
> 
> This is the core of the read-only bind mount patch set.
> 
> Note that this does _not_ add a "ro" option directly to
> the bind mount operation.  If you require such a mount,
> you must first do the bind, then follow it up with a
> 'mount -o remount,ro' operation.

I guess the fundamental problem I have with that approach is that it's
a cop-out - we just declare rw state of vfsmount independent from that
of filesystem and add a "if a flag is set, act upon vfsmount".

And yes, some of that does make sense.  Fine, let's separate that
stuff; but then we'd better decide what rw superblock *is*.

We have a number of vfsmounts over given superblock.  OK, some are
"we don't even ask them to be r/w".  Some are "we want them r/w, but
don't actually use as such at the moment".  Some are "pinned down for
write now".  And we do get logics for "can't make it r/o right now".

But look - we have the _same_ logics for superblock itself.  Only it's
full of holes.  And since you have rw states for those completely
unrelated to those of vfsmount, we get a ridiculous situation - we
*do* mark the moments when superblock becomes impossible to remount
r/o and we even mostly get the moments when it ceases to be busy
writing (unlinked-but-opened files are major exception).  But we can't
use that information.

So "can we remount superblock ro?" turns into kinda-sorta duplicate of
the same for vfsmounts, but it's racy as hell and bloody incomplete;
we don't even get "if some vfsmount over it is busy writing, we won't
remount r/o".  Approximation is done, but that's it.  E.g. mkdir() in
progress does *not* stop remount of superblock r/o (it does prevent
remount of vfsmount with your patchset).

FWIW, I suspect that the root of the problem is that we confuse different
states of filesystem.  E.g. one obviously useful feature would be to have
soft r/o - filesystem that is (from the driver POV) mounted readonly,
but would get transparently switched r/w at the first request.  And you
have all vfsmount-side infrastructure for that, BTW.  Add something like
mechanism we use for expiry and you've got a very tasty feature for e.g.
laptop users: e.g. userland asking to switch fs soft-ro every 15 minutes
and if nobody had wanted it r/w since the last time, do the transition;
if asked r/w again, r/w it goes on its own.  IOW, there's more to it than
one bit.  And I'm talking about superblock state...

BTW, it might be worth doing the following:
	* reintroduce the list of vfsmounts over given superblock (protected
by vfsmount_lock)
	* keep ro flag separate from counter and split it in two.
	* all decrements are with atomic_dec_and_lock()
	* all increments are with atomic_add_unless() + spin_lock() +
check flags + atomic_add_return() + possible spin_unlock
	* if writers count goes from non-zero to zero or vice versa
increment/decrement superblock counter (number of vfsmounts that really
want write access).
	* make the moments when i_nlink hits 0 bump the superblock writers
count; drop it when such sucker gets freed on final iput.
	* kill the sodding "traverse the list of opened files" logics in
remounting superblock r/o.  Instead of that, grab spinlock, check writers
count, bail out if non-zero, grab vfsmount_lock, traverse the list over
superblock and set one of the flags, drop the locks and proceed.
	* when remounting superblock r/w, traverse the list and knock out
the same flag.

At least that way we'd get the majority of "can remount ro" logics right...

Another fun issues:
	a) MS_REC handling with MS_BIND remounts (trivial)
	b) figuring out what (if anything) should be done with propagation
when we have shared subtrees... (not trivial at all)

next prev parent reply	other threads:[~2006-06-28  5:20 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-27 22:14 [PATCH 00/20] Mount writer count and read-only bind mounts (v3) Dave Hansen
2006-06-27 22:14 ` [PATCH 01/20] prepare for write access checks: collapse if() Dave Hansen
2006-06-27 22:14 ` [PATCH 02/20] r/o bind mount prepwork: move open_namei()'s vfs_create() Dave Hansen
2006-06-27 22:14 ` [PATCH 03/20] Add vfsmount writer count Dave Hansen
2006-06-27 22:14 ` [PATCH 04/20] elevate mnt writers for callers of vfs_mkdir() Dave Hansen
2006-06-27 22:14 ` [PATCH 05/20] elevate write count during entire ncp_ioctl() Dave Hansen
2006-06-27 22:14 ` [PATCH 06/20] sys_symlinkat() elevate write count around vfs_symlink() Dave Hansen
2006-06-27 22:14 ` [PATCH 07/20] elevate mount count for extended attributes Dave Hansen
2006-06-27 22:14 ` [PATCH 08/20] sys_linkat(): elevate write count around vfs_link() Dave Hansen
2006-06-27 22:14 ` [PATCH 09/20] mount_is_safe(): add comment Dave Hansen
2006-06-27 22:14 ` [PATCH 10/20] unix_find_other() elevate write count for touch_atime() Dave Hansen
2006-06-27 22:14 ` [PATCH 11/20] elevate write count over calls to vfs_rename() Dave Hansen
2006-06-27 22:14 ` [PATCH 12/20] tricky: elevate write count files are open()ed Dave Hansen
2006-06-27 22:14 ` [PATCH 13/20] elevate writer count for do_sys_truncate() Dave Hansen
2006-06-27 22:14 ` [PATCH 14/20] elevate write count for do_utimes() Dave Hansen
2006-06-27 22:14 ` [PATCH 15/20] elevate write count for do_sys_utime() and touch_atime() Dave Hansen
2006-06-27 22:14 ` [PATCH 16/20] sys_mknodat(): elevate write count for vfs_mknod/create() Dave Hansen
2006-06-27 22:14 ` [PATCH 17/20] elevate mnt writers for vfs_unlink() callers Dave Hansen
2006-06-27 22:14 ` [PATCH 18/20] do_rmdir(): elevate write count Dave Hansen
2006-06-27 22:14 ` [PATCH 19/20] elevate writer count for custom 'struct file' Dave Hansen
2006-06-28  2:40   ` Andrew Morton
2006-06-28  3:59     ` Dave Hansen
2006-06-27 22:14 ` [PATCH 20/20] honor r/w changes at do_remount() time Dave Hansen
2006-06-28  5:19   ` Al Viro [this message]
2006-06-28 14:41     ` Herbert Poetzl
2006-06-28 15:01       ` Serge E. Hallyn
2006-07-03 17:30     ` Dave Hansen
2006-07-03 17:48       ` Al Viro
2006-07-03 18:23         ` Joshua Hudson
2006-07-03 18:38           ` Al Viro
2006-06-28  1:38 ` [PATCH 00/20] Mount writer count and read-only bind mounts (v3) Andrew Morton
2006-06-28  1:50   ` Dave Hansen
2006-06-28  2:17     ` Andrew Morton
2006-06-28  2:24       ` Randy.Dunlap
2006-06-28  2:30         ` Andrew Morton
2006-06-28  8:42       ` Arjan van de Ven
2006-06-28  5:56     ` Christoph Hellwig
2006-06-28 14:52     ` Herbert Poetzl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060628051935.GA29920@ftp.linux.org.uk \
    --to=viro@ftp.linux.org.uk \
    --cc=akpm@osdl.org \
    --cc=haveblue@us.ibm.com \
    --cc=herbert@13thfloor.at \
    --cc=linux-kernel@vger.kernel.org \
    --cc=serue@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.