From: Dave Chinner <david@fromorbit.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
David Howells <dhowells@redhat.com>
Subject: Re: [PATCH] fs: Provide helpers for manipulating sb->s_readonly_remount
Date: Tue, 20 Jun 2023 09:11:20 +1000 [thread overview]
Message-ID: <ZJDgmDuoeSwinR27@dread.disaster.area> (raw)
In-Reply-To: <20230617-hitze-weingut-17034408ebc2@brauner>
On Sat, Jun 17, 2023 at 05:05:25PM +0200, Christian Brauner wrote:
> On Sat, Jun 17, 2023 at 09:33:42AM +1000, Dave Chinner wrote:
> > On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote:
> > > Provide helpers to set and clear sb->s_readonly_remount including
> > > appropriate memory barriers. Also use this opportunity to document what
> > > the barriers pair with and why they are needed.
> > >
> > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> >
> > The helper conversion looks fine so from that perspective the patch
> > looks good.
> >
> > However, I'm not sure the use of memory barriers is correct, though.
> >
> > IIUC, we want mnt_is_readonly() to return true when ever
> > s_readonly_remount is set. Is that the behaviour we are trying to
> > acheive for both ro->rw and rw->ro transactions?
> >
> > > ---
> > > fs/internal.h | 26 ++++++++++++++++++++++++++
> > > fs/namespace.c | 10 ++++------
> > > fs/super.c | 17 ++++++-----------
> > > include/linux/fs.h | 2 +-
> > > 4 files changed, 37 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/fs/internal.h b/fs/internal.h
> > > index bd3b2810a36b..01bff3f6db79 100644
> > > --- a/fs/internal.h
> > > +++ b/fs/internal.h
> > > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb);
> > > extern bool mount_capable(struct fs_context *);
> > > int sb_init_dio_done_wq(struct super_block *sb);
> > >
> > > +/*
> > > + * Prepare superblock for changing its read-only state (i.e., either remount
> > > + * read-write superblock read-only or vice versa). After this function returns
> > > + * mnt_is_readonly() will return true for any mount of the superblock if its
> > > + * caller is able to observe any changes done by the remount. This holds until
> > > + * sb_end_ro_state_change() is called.
> > > + */
> > > +static inline void sb_start_ro_state_change(struct super_block *sb)
> > > +{
> > > + WRITE_ONCE(sb->s_readonly_remount, 1);
> > > + /* The barrier pairs with the barrier in mnt_is_readonly() */
> > > + smp_wmb();
> > > +}
> >
> > I'm not sure how this wmb pairs with the memory barrier in
> > mnt_is_readonly() to provide the correct behavior. The barrier in
> > mnt_is_readonly() happens after it checks s_readonly_remount, so
> > the s_readonly_remount in mnt_is_readonly is not ordered in any way
> > against this barrier.
> >
> > The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY
> > and MNT_READONLY are ordered after s_readonly_remount(), but we
> > don't change those flags until a long way after s_readonly_remount
> > is set.
> >
> > Hence if this is a ro->rw transistion, then I can see that racing on
> > s_readonly_remount being isn't an issue, because the mount/sb
> > flags will have SB_RDONLY/MNT_READONLY set and the correct thing
> > will be done (i.e. consider code between sb_start_ro_state_change()
> > and sb_end_ro_state_change() is RO).
> >
> > However, it's not obvious (to me, anyway) how this works at all for
> > a rw->ro transition - if we race on s_readonly_remount being set
> > then we'll consider the fs to still be read-write regardless of the
> > smp_rmb() in mnt_is_readonly() because neither SB_RDONLY or
> > MNT_READONLY are set at this point.
>
> Let me try and remember it all. I've documented a good portion of this
> in the relevant functions but I should probably upstream some more
> longer documentation blurb as well.
>
> A rw->ro transition happen in two ways.
>
> (1) A mount or mount tree is made read-only via
> mount_setattr(MNT_ATTR_READONLY) or
> mount(MS_BIND|MS_RDONLY|MS_REMOUNT).
> (2) The filesystems/superblock is made read-only via fspick()+fsconfig()
> or mount(MS_REMOUNT|MS_RDONLY).
>
> For both (1) and (2) we grab lock_mount_hash() in relevant codepaths
> (because that's required for any vfsmount->mnt_flags changes) and then
> call mnt_hold_writers().
>
> mnt_hold_writers() will first raise MNT_WRITE_HOLD in @mnt->mnt_flags
> before checking the write counter of that mount to see whether there are
> any active writers on that mount. If there are any active writers we'll
> fail mnt_hold_writers() and the whole rw->ro transition.
>
> A memory barrier is used to order raising MNT_WRITE_HOLD against the
> increment of the write counter of that mount in __mnt_want_write().
> If __mnt_want_write() detects that MNT_WRITE_HOLD has been set after
> it incremented the write counter it will spin until MNT_WRITE_HOLD is
> cleared via mnt_unhold_writers(). This uses another memory barrier to
> ensure ordering with the mnt_is_readonly() check in __mnt_want_write().
>
> __mnt_want_write() doesn't know about the ro/rw state of the mount at
> all until MNT_WRITE_HOLD has cleared. Then it calls mnt_is_readonly().
>
> If the mount did indeed transition from rw->ro after MNT_WRITE_HOLD was
> cleared __mnt_want_write() will back off. If not write access to the
> mount is granted.
>
> A superblock rw->ro transition is done the same way. It also requires
> mnt_hold_writers() to be done. This is done in
> sb_prepare_remount_readonly() which is called in reconfigure_super().
>
> Only after mnt_hold_writers() has been raised successfully on every
> mount of that filesystem (i.e., all bind mounts) will
> sb->s_readonly_remount be set. After MNT_WRITE_HOLD is cleared and
> mnt_is_readonly() is called sb->s_readonly_remount is guaranteed to be
> visible or MNT_READONLY or SB_RDONLY are visible. The memory barrier in
> sb->s_readonly_remount orders it against reading sb->s_flags. It doesn't
> protect/order the rw->ro transition itself.
>
> (The only exception is an emergency read-only remount where we don't
> know what state the fs is in and don't care for any active writers on
> that superblock so omit wading through all the mounts of that
> filesystem. But that's only doable from withing the kernel via
> SB_FORCE.)
>
> Provided I understand your question/concern correctly.
Thank's for the info, Christian. Yes, you did understand my
concern: that the memory barriers are poorly and/or incorrectly
documented. :/
Nothing I read in the code around the s_readonly_remount variable or
mnt_is_readonly() indicated that there was any serialisation around
__mnt_want_write() and MNT_WRITE_HOLD. I was completely unable to
make that jump from the code as it was written, or from the patch
that Jan proposed.
Now that I see mnt_hold_writers() and mnt_unhold_writers(), I see
more memory barriers, but I don't see any reference to
sb->s_readonly_remount in that code or the comments. Given that it
appears that these memory barriers are deeply intertwined, I think
that the mnt_[un]hold_writers() helpers also need to have their
documentation updated to indicate how they interact with
s_readonly_remount..
And for sb->s_readonly_remount helpers, some form of the above
explanation also needs to be added, or maybe just a pointer to the
documentation on the mnt_[un]hold_writers() helpers that also
explains how sb->s_readonly_remount factors into the memory barriers
in those helpers...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-06-19 23:11 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-16 16:38 [PATCH] fs: Provide helpers for manipulating sb->s_readonly_remount Jan Kara
2023-06-16 23:33 ` Dave Chinner
2023-06-17 15:05 ` Christian Brauner
2023-06-19 23:11 ` Dave Chinner [this message]
2023-06-19 11:05 ` Jan Kara
2023-06-19 23:16 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZJDgmDuoeSwinR27@dread.disaster.area \
--to=david@fromorbit.com \
--cc=brauner@kernel.org \
--cc=dhowells@redhat.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox