Re: [PATCH] fs: Provide helpers for manipulating sb->s_readonly_remount

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	David Howells <dhowells@redhat.com>
Subject: Re: [PATCH] fs: Provide helpers for manipulating sb->s_readonly_remount
Date: Tue, 20 Jun 2023 09:11:20 +1000	[thread overview]
Message-ID: <ZJDgmDuoeSwinR27@dread.disaster.area> (raw)
In-Reply-To: <20230617-hitze-weingut-17034408ebc2@brauner>

On Sat, Jun 17, 2023 at 05:05:25PM +0200, Christian Brauner wrote:
> On Sat, Jun 17, 2023 at 09:33:42AM +1000, Dave Chinner wrote:
> > On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote:
> > > Provide helpers to set and clear sb->s_readonly_remount including
> > > appropriate memory barriers. Also use this opportunity to document what
> > > the barriers pair with and why they are needed.
> > > 
> > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > The helper conversion looks fine so from that perspective the patch
> > looks good.
> > 
> > However, I'm not sure the use of memory barriers is correct, though.
> > 
> > IIUC, we want mnt_is_readonly() to return true when ever
> > s_readonly_remount is set. Is that the behaviour we are trying to
> > acheive for both ro->rw and rw->ro transactions?
> > 
> > > ---
> > >  fs/internal.h      | 26 ++++++++++++++++++++++++++
> > >  fs/namespace.c     | 10 ++++------
> > >  fs/super.c         | 17 ++++++-----------
> > >  include/linux/fs.h |  2 +-
> > >  4 files changed, 37 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/fs/internal.h b/fs/internal.h
> > > index bd3b2810a36b..01bff3f6db79 100644
> > > --- a/fs/internal.h
> > > +++ b/fs/internal.h
> > > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb);
> > >  extern bool mount_capable(struct fs_context *);
> > >  int sb_init_dio_done_wq(struct super_block *sb);
> > >  
> > > +/*
> > > + * Prepare superblock for changing its read-only state (i.e., either remount
> > > + * read-write superblock read-only or vice versa). After this function returns
> > > + * mnt_is_readonly() will return true for any mount of the superblock if its
> > > + * caller is able to observe any changes done by the remount. This holds until
> > > + * sb_end_ro_state_change() is called.
> > > + */
> > > +static inline void sb_start_ro_state_change(struct super_block *sb)
> > > +{
> > > +	WRITE_ONCE(sb->s_readonly_remount, 1);
> > > +	/* The barrier pairs with the barrier in mnt_is_readonly() */
> > > +	smp_wmb();
> > > +}
> > 
> > I'm not sure how this wmb pairs with the memory barrier in
> > mnt_is_readonly() to provide the correct behavior. The barrier in
> > mnt_is_readonly() happens after it checks s_readonly_remount, so
> > the s_readonly_remount in mnt_is_readonly is not ordered in any way
> > against this barrier.
> > 
> > The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY
> > and MNT_READONLY are ordered after s_readonly_remount(), but we
> > don't change those flags until a long way after s_readonly_remount
> > is set.
> > 
> > Hence if this is a ro->rw transistion, then I can see that racing on
> > s_readonly_remount being isn't an issue, because the mount/sb
> > flags will have SB_RDONLY/MNT_READONLY set and the correct thing
> > will be done (i.e. consider code between sb_start_ro_state_change()
> > and sb_end_ro_state_change() is RO).
> > 
> > However, it's not obvious (to me, anyway) how this works at all for
> > a rw->ro transition - if we race on s_readonly_remount being set
> > then we'll consider the fs to still be read-write regardless of the
> > smp_rmb() in mnt_is_readonly() because neither SB_RDONLY or
> > MNT_READONLY are set at this point.
> 
> Let me try and remember it all. I've documented a good portion of this
> in the relevant functions but I should probably upstream some more
> longer documentation blurb as well.
> 
> A rw->ro transition happen in two ways.
> 
> (1) A mount or mount tree is made read-only via
>     mount_setattr(MNT_ATTR_READONLY) or
>     mount(MS_BIND|MS_RDONLY|MS_REMOUNT).
> (2) The filesystems/superblock is made read-only via fspick()+fsconfig()
>     or mount(MS_REMOUNT|MS_RDONLY).
> 
> For both (1) and (2) we grab lock_mount_hash() in relevant codepaths
> (because that's required for any vfsmount->mnt_flags changes) and then
> call mnt_hold_writers().
> 
> mnt_hold_writers() will first raise MNT_WRITE_HOLD in @mnt->mnt_flags
> before checking the write counter of that mount to see whether there are
> any active writers on that mount. If there are any active writers we'll
> fail mnt_hold_writers() and the whole rw->ro transition.
> 
> A memory barrier is used to order raising MNT_WRITE_HOLD against the
> increment of the write counter of that mount in __mnt_want_write().
> If __mnt_want_write() detects that MNT_WRITE_HOLD has been set after
> it incremented the write counter it will spin until MNT_WRITE_HOLD is
> cleared via mnt_unhold_writers(). This uses another memory barrier to
> ensure ordering with the mnt_is_readonly() check in __mnt_want_write().
> 
> __mnt_want_write() doesn't know about the ro/rw state of the mount at
> all until MNT_WRITE_HOLD has cleared. Then it calls mnt_is_readonly().
> 
> If the mount did indeed transition from rw->ro after MNT_WRITE_HOLD was
> cleared __mnt_want_write() will back off. If not write access to the
> mount is granted.
> 
> A superblock rw->ro transition is done the same way. It also requires
> mnt_hold_writers() to be done. This is done in
> sb_prepare_remount_readonly() which is called in reconfigure_super().
> 
> Only after mnt_hold_writers() has been raised successfully on every
> mount of that filesystem (i.e., all bind mounts) will
> sb->s_readonly_remount be set. After MNT_WRITE_HOLD is cleared and
> mnt_is_readonly() is called sb->s_readonly_remount is guaranteed to be
> visible or MNT_READONLY or SB_RDONLY are visible. The memory barrier in
> sb->s_readonly_remount orders it against reading sb->s_flags. It doesn't
> protect/order the rw->ro transition itself.
> 
> (The only exception is an emergency read-only remount where we don't
> know what state the fs is in and don't care for any active writers on
> that superblock so omit wading through all the mounts of that
> filesystem. But that's only doable from withing the kernel via
> SB_FORCE.)
> 
> Provided I understand your question/concern correctly.

Thank's for the info, Christian.  Yes, you did understand my
concern: that the memory barriers are poorly and/or incorrectly
documented. :/

Nothing I read in the code around the s_readonly_remount variable or
mnt_is_readonly() indicated that there was any serialisation around
__mnt_want_write() and MNT_WRITE_HOLD. I was completely unable to
make that jump from the code as it was written, or from the patch
that Jan proposed.

Now that I see mnt_hold_writers() and mnt_unhold_writers(), I see
more memory barriers, but I don't see any reference to
sb->s_readonly_remount in that code or the comments. Given that it
appears that these memory barriers are deeply intertwined, I think
that the mnt_[un]hold_writers() helpers also need to have their
documentation updated to indicate how they interact with
s_readonly_remount..

And for sb->s_readonly_remount helpers, some form of the above
explanation also needs to be added, or maybe just a pointer to the
documentation on the mnt_[un]hold_writers() helpers that also
explains how sb->s_readonly_remount factors into the memory barriers
in those helpers...

Cheers,

Dave.


-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2023-06-19 23:11 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-16 16:38 [PATCH] fs: Provide helpers for manipulating sb->s_readonly_remount Jan Kara
2023-06-16 23:33 ` Dave Chinner
2023-06-17 15:05   ` Christian Brauner
2023-06-19 23:11     ` Dave Chinner [this message]
2023-06-19 11:05   ` Jan Kara
2023-06-19 23:16     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJDgmDuoeSwinR27@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=brauner@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox