From: Filipe Manana <fdmanana@kernel.org>
To: Naohiro Aota <Naohiro.Aota@wdc.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"david@fromorbit.com" <david@fromorbit.com>
Subject: Re: [PATCH v2 2/4] btrfs: mark device addition as mnt_want_write_file
Date: Thu, 17 Mar 2022 10:50:16 +0000 [thread overview]
Message-ID: <YjMSaLIhKNcKUuHM@debian9.Home> (raw)
In-Reply-To: <20220317073628.slh4iyexpen7lmjh@naota-xeon>
On Thu, Mar 17, 2022 at 07:36:29AM +0000, Naohiro Aota wrote:
> On Wed, Mar 16, 2022 at 04:06:26PM +0000, Filipe Manana wrote:
> > On Wed, Mar 16, 2022 at 10:22:38PM +0900, Naohiro Aota wrote:
> > > btrfs_init_new_device() calls btrfs_relocate_sys_chunk() which incurs
> > > file-system internal writing. That writing can cause a deadlock with
> > > FS freezing like as described in like as described in commit
> > > 26559780b953 ("btrfs: zoned: mark relocation as writing").
> > >
> > > Mark the device addition as mnt_want_write_file. This is also consistent
> > > with the removing device ioctl counterpart.
> > >
> > > Cc: stable@vger.kernel.org # 4.9+
> > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > > ---
> > > fs/btrfs/ioctl.c | 11 +++++++++--
> > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > index 60c907b14547..a6982a1fde65 100644
> > > --- a/fs/btrfs/ioctl.c
> > > +++ b/fs/btrfs/ioctl.c
> > > @@ -3474,8 +3474,10 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
> > > return ret;
> > > }
> > >
> > > -static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
> > > +static long btrfs_ioctl_add_dev(struct file *file, void __user *arg)
> > > {
> > > + struct inode *inode = file_inode(file);
> > > + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> > > struct btrfs_ioctl_vol_args *vol_args;
> > > bool restore_op = false;
> > > int ret;
> > > @@ -3488,6 +3490,10 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
> > > return -EINVAL;
> > > }
> > >
> > > + ret = mnt_want_write_file(file);
> > > + if (ret)
> > > + return ret;
> >
> > So, this now breaks all test cases that exercise device seeding, and I clearly
> > forgot about seeding when I asked about why not use mnt_want_write_file()
> > instead of a bare call to sb_start_write():
>
> Ah, yes, I also confirmed they fail.
>
> >
> > $ ./check btrfs/161 btrfs/162 btrfs/163 btrfs/164 btrfs/248
> ><snip>
> > Ran: btrfs/161 btrfs/162 btrfs/163 btrfs/164 btrfs/248
> > Failures: btrfs/161 btrfs/162 btrfs/163 btrfs/164 btrfs/248
> > Failed 5 of 5 tests
> >
> > So device seeding introduces a special case. If we mount a seeding
> > filesystem, it's RO, so the mnt_want_write_file() fails.
>
> Yeah, so we are in a mixed state here. It's RO with a seeding
> device. Or, it must be RW otherwise (checked in
> btrfs_init_new_device()).
>
> > Something like this deals with it and it makes the tests pass:
> >
> ><snip>
> >
> > We are also changing the semantics as we no longer allow for adding a device
> > to a RO filesystem. So the lack of a mnt_want_write_file() was intentional
> > to deal with the seeding filesystem case. But calling mnt_want_write_file()
> > if we are not seeding, changes the semantics - I'm not sure if anyone relies
> > on the ability to add a device to a fs mounted RO, I'm not seeing if it's an
> > useful use case.
>
> Adding a device to RO FS anyway returns -EROFS from
> btrfs_init_new_device(). So, there is no change.
>
> > So either we do that special casing like in that diff, or we always do the
> > sb_start_write() / sb_end_write() - in any case please add a comment explaining
> > why we do it like that, why we can't use mnt_want_write_file().
>
> The conditional using of sb_start_write() or mnt_want_write_file()
> seems a bit dirty. And, I just thought, marking the FS "writing" when
> it's read-only also seems odd.
>
> I'm now thinking we should have sb_start_write() around here where the
> FS is surely RW.
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 393fc7db99d3..50e02dc4e2b2 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2731,6 +2731,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>
> mutex_unlock(&fs_devices->device_list_mutex);
>
> + sb_start_write(fs_info->sb);
> +
> if (seeding_dev) {
> mutex_lock(&fs_info->chunk_mutex);
> ret = init_first_rw_device(trans);
> @@ -2786,6 +2788,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> ret = btrfs_commit_transaction(trans);
> }
>
> + sb_end_write(fs_info->sb);
> +
> /*
> * Now that we have written a new super block to this device, check all
> * other fs_devices list if device_path alienates any other scanned
> @@ -2801,6 +2805,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> return ret;
>
> error_sysfs:
> + sb_end_write(fs_info->sb);
> +
> btrfs_sysfs_remove_device(device);
> mutex_lock(&fs_info->fs_devices->device_list_mutex);
> mutex_lock(&fs_info->chunk_mutex);
Why not just reduce the scope to surround the btrfs_relocate_sys_chunks() call?
It's simpler, and I don't see why all the other code needs to be surround by
sb_start_write() and sb_end_write().
Actually, relocating system chunks does not create ordered extents - that only
happens for data block groups. So we could could get away with all this, and
have the relocation code do the assertion only if we are relocating a data
block group - so no need to touch anything in the device add path.
Thanks.
>
> > Thanks.
> >
> >
> > > +
> > > if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_DEV_ADD)) {
> > > if (!btrfs_exclop_start_try_lock(fs_info, BTRFS_EXCLOP_DEV_ADD))
> > > return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
> > > @@ -3520,6 +3526,7 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
> > > btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
> > > else
> > > btrfs_exclop_finish(fs_info);
> > > + mnt_drop_write_file(file);
> > > return ret;
> > > }
> > >
> > > @@ -5443,7 +5450,7 @@ long btrfs_ioctl(struct file *file, unsigned int
> > > case BTRFS_IOC_RESIZE:
> > > return btrfs_ioctl_resize(file, argp);
> > > case BTRFS_IOC_ADD_DEV:
> > > - return btrfs_ioctl_add_dev(fs_info, argp);
> > > + return btrfs_ioctl_add_dev(file, argp);
> > > case BTRFS_IOC_RM_DEV:
> > > return btrfs_ioctl_rm_dev(file, argp);
> > > case BTRFS_IOC_RM_DEV_V2:
> > > --
> > > 2.35.1
> > >
next prev parent reply other threads:[~2022-03-17 10:50 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-16 13:22 [PATCH v2 0/4] protect relocation with sb_start_write Naohiro Aota
2022-03-16 13:22 ` [PATCH v2 1/4] btrfs: mark resumed async balance as writing Naohiro Aota
2022-03-16 15:57 ` Filipe Manana
2022-03-17 7:11 ` Naohiro Aota
2022-03-16 13:22 ` [PATCH v2 2/4] btrfs: mark device addition as mnt_want_write_file Naohiro Aota
2022-03-16 16:06 ` Filipe Manana
2022-03-17 7:36 ` Naohiro Aota
2022-03-17 10:50 ` Filipe Manana [this message]
2022-03-22 4:30 ` Naohiro Aota
2022-03-22 13:11 ` Filipe Manana
2022-03-23 2:26 ` Naohiro Aota
2022-03-18 7:56 ` [btrfs] cd452af388: xfstests.btrfs.218.fail kernel test robot
2022-03-16 13:22 ` [PATCH v2 3/4] fs: add check functions for sb_start_{write,pagefault,intwrite} Naohiro Aota
2022-03-17 8:34 ` Dave Chinner
2022-03-17 11:13 ` Naohiro Aota
2022-03-16 13:22 ` [PATCH v2 4/4] btrfs: assert that relocation is protected with sb_start_write() Naohiro Aota
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjMSaLIhKNcKUuHM@debian9.Home \
--to=fdmanana@kernel.org \
--cc=Johannes.Thumshirn@wdc.com \
--cc=Naohiro.Aota@wdc.com \
--cc=david@fromorbit.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).