From: Jan Kara <jack@suse.cz>
To: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de>,
Al Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Chris Mason <clm@fb.com>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>, Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Jaegeuk Kim <jaegeuk@kernel.org>, Chao Yu <chao@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
Jens Axboe <axboe@kernel.dk>,
linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net,
linux-nilfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-block@vger.kernel.org
Subject: Re: [PATCH 02/12] nilfs2: use setup_bdev_super to de-duplicate the mount code
Date: Thu, 10 Aug 2023 20:14:23 +0200 [thread overview]
Message-ID: <20230810181423.dfz3lrezwvutls2w@quack3> (raw)
In-Reply-To: <CAKFNMon_3A7dC+k1q_RjEnoXXNtxBJAUQud_FD4s4VrHZdCVzg@mail.gmail.com>
On Fri 11-08-23 01:39:10, Ryusuke Konishi wrote:
> On Thu, Aug 10, 2023 at 8:05 PM Jan Kara wrote:
> >
> > On Fri 04-08-23 11:01:39, Ryusuke Konishi wrote:
> > > On Thu, Aug 3, 2023 at 8:46 PM Jan Kara wrote:
> > > >
> > > > On Wed 02-08-23 17:41:21, Christoph Hellwig wrote:
> > > > > Use the generic setup_bdev_super helper to open the main block device
> > > > > and do various bits of superblock setup instead of duplicating the
> > > > > logic. This includes moving to the new scheme implemented in common
> > > > > code that only opens the block device after the superblock has allocated.
> > > > >
> > > > > It does not yet convert nilfs2 to the new mount API, but doing so will
> > > > > become a bit simpler after this first step.
> > > > >
> > > > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > >
> > > > AFAICS nilfs2 could *almost* use mount_bdev() directly and then just do its
> > >
> > > > snapshot thing after mount_bdev() returns. But it has this weird logic
> > > > that: "if the superblock is already mounted but we can shrink the whole
> > > > dcache, then do remount instead of ignoring mount options". Firstly, this
> > > > looks racy - what prevents someone from say opening a file on the sb just
> > > > after nilfs_tree_is_busy() shrinks dcache? Secondly, it is inconsistent
> > > > with any other filesystem so it's going to surprise sysadmins not
> > > > intimately knowing nilfs2. Thirdly, from userspace you cannot tell what
> > > > your mount call is going to do. Last but not least, what is it really good
> > > > for? Ryusuke, can you explain please?
> > > >
> > > > Honza
> > >
> > > I think you are referring to the following part:
> > >
> > > > if (!s->s_root) {
> > > ...
> > > > } else if (!sd.cno) {
> > > > if (nilfs_tree_is_busy(s->s_root)) {
> > > > if ((flags ^ s->s_flags) & SB_RDONLY) {
> > > > nilfs_err(s,
> > > > "the device already has a %s mount.",
> > > > sb_rdonly(s) ? "read-only" : "read/write");
> > > > err = -EBUSY;
> > > > goto failed_super;
> > > > }
> > > > } else {
> > > > /*
> > > > * Try remount to setup mount states if the current
> > > > * tree is not mounted and only snapshots use this sb.
> > > > */
> > > > err = nilfs_remount(s, &flags, data);
> > > > if (err)
> > > > goto failed_super;
> > > > }
> > > > }
> > >
> > > What this logic is trying to do is, if there is already a nilfs2 mount
> > > instance for the device, and are trying to mounting the current tree
> > > (sd.cno is 0, so this is not a snapshot mount), then will switch
> > > depending on whether the current tree has a mount:
> > >
> > > - If the current tree is mounted, it's just like a normal filesystem.
> > > (A read-only mount and a read/write mount can't coexist, so check
> > > that, and reuse the instance if possible)
> > > - Otherwise, i.e. for snapshot mounts only, do whatever is necessary
> > > to add a new current mount, such as starting a log writer.
> > > Since it does the same thing that nilfs_remount does, so
> > > nilfs_remount() is used there.
> > >
> > > Whether or not there is a current tree mount can be determined by
> > > d_count(s->s_root) > 1 as nilfs_tree_is_busy() does.
> > > Where s->s_root is always the root dentry of the current tree, not
> > > that of the mounted snapshot.
> >
> > I see now, thanks for explanation! But one thing still is not clear to me.
> > If you say have a snapshot mounted read-write and then you mount the
> > current snapshot (cno == 0) read-only, you'll switch the whole superblock
> > to read-only state. So also the mounted snapshot is suddently read-only
> > which is unexpected and actually supposedly breaks things because you can
> > still have file handles open for writing on the snapshot etc.. So how do
> > you solve that?
> >
> > Honza
>
> One thing I have to tell you as a premise is that nilfs2's snapshot
> mounts (cno != 0) are read-only.
>
> The read-only option is mandatory for nilfs2 snapshot mounts, so
> remounting to read/write mode will result in an error.
> This constraint is checked in nilfs_parse_snapshot_option() which is
> called from nilfs_identify().
>
> In fact, any write mode file/inode operations on a snapshot mount will
> result in an EROFS error, regardless of whether the coexisting current
> tree mount is read-only or read/write (i.e. regardless of the
> read-only flag of the superblock instance).
>
> This is mostly (and possibly entirely) accomplished at the vfs layer
> by checking the MNT_READONLY flag in mnt_flags of the vfsmount
> structure, and even on the nilfs2 side, iops->permission
> (=nilfs_permission) rejects write operations on snapshot mounts.
>
> Therefore, the problem you pointed out shouldn't occur in the first
> place since the situation where a snapshot with a handle in write mode
> suddenly becomes read-only doesn't happen. Unless I'm missing
> something..
No, I think you are correct. This particular case should be safe because
MNT_READONLY flags on the mounts used by snapshots will still keep them
read-only even if you remount the superblock to read-write mode for the
current snapshot. So I see why this is useful and I agree this isn't easy
to implement using mount_bdev() so no special code reduction here ;).
Thanks for patient explanation!
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2023-08-10 18:14 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-02 15:41 more blkdev_get and holder work Christoph Hellwig
2023-08-02 15:41 ` [PATCH 01/12] fs: export setup_bdev_super Christoph Hellwig
2023-08-03 18:04 ` Christian Brauner
2023-09-04 18:11 ` [f2fs-dev] " patchwork-bot+f2fs
2023-08-02 15:41 ` [PATCH 02/12] nilfs2: use setup_bdev_super to de-duplicate the mount code Christoph Hellwig
2023-08-03 11:46 ` Jan Kara
2023-08-04 2:01 ` Ryusuke Konishi
2023-08-10 11:05 ` Jan Kara
2023-08-10 16:39 ` Ryusuke Konishi
2023-08-10 18:14 ` Jan Kara [this message]
2023-08-04 5:04 ` Ryusuke Konishi
2023-08-02 15:41 ` [PATCH 03/12] btrfs: always open the device read-only in btrfs_scan_one_device Christoph Hellwig
2023-08-02 15:41 ` [PATCH 04/12] btrfs: open block devices after superblock creation Christoph Hellwig
2023-08-02 15:41 ` [PATCH 05/12] ext4: make the IS_EXT2_SB/IS_EXT3_SB checks more robust Christoph Hellwig
2023-08-03 11:21 ` Jan Kara
2023-08-03 18:10 ` Christian Brauner
2023-08-04 20:34 ` Theodore Ts'o
2023-08-02 15:41 ` [PATCH 06/12] fs: use the super_block as holder when mounting file systems Christoph Hellwig
2023-08-03 11:51 ` Jan Kara
2023-08-03 13:33 ` Jan Kara
2023-08-05 8:36 ` Christoph Hellwig
2023-08-03 18:11 ` Christian Brauner
2023-08-02 15:41 ` [PATCH 07/12] fs: stop using get_super in fs_mark_dead Christoph Hellwig
2023-08-03 13:12 ` Jan Kara
2023-08-03 18:15 ` Christian Brauner
2023-08-02 15:41 ` [PATCH 08/12] fs: export fs_holder_ops Christoph Hellwig
2023-08-03 13:16 ` Jan Kara
2023-08-03 18:15 ` Christian Brauner
2023-08-02 15:41 ` [PATCH 09/12] ext4: drop s_umount over opening the log device Christoph Hellwig
2023-08-03 13:25 ` Jan Kara
2023-08-03 18:16 ` Christian Brauner
2023-08-04 20:34 ` Theodore Ts'o
2023-08-02 15:41 ` [PATCH 10/12] ext4: use fs_holder_ops for " Christoph Hellwig
2023-08-03 13:26 ` Jan Kara
2023-08-02 15:41 ` [PATCH 11/12] xfs: drop s_umount over opening the log and RT devices Christoph Hellwig
2023-08-02 16:32 ` Darrick J. Wong
2023-08-05 8:32 ` Christoph Hellwig
2023-08-05 10:39 ` Christian Brauner
2023-08-05 16:19 ` Darrick J. Wong
2023-08-05 17:13 ` Christian Brauner
2023-08-02 15:41 ` [PATCH 12/12] xfs use fs_holder_ops for " Christoph Hellwig
2023-08-02 16:33 ` Darrick J. Wong
2023-08-14 10:58 ` Carlos Maiolino
2023-08-14 11:05 ` Carlos Maiolino
2023-08-04 15:39 ` more blkdev_get and holder work Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230810181423.dfz3lrezwvutls2w@quack3 \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=chao@kernel.org \
--cc=clm@fb.com \
--cc=djwong@kernel.org \
--cc=dsterba@suse.com \
--cc=hch@lst.de \
--cc=jaegeuk@kernel.org \
--cc=josef@toxicpanda.com \
--cc=konishi.ryusuke@gmail.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nilfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox