From: Wang Yugui <wangyugui@e16-tech.com>
To: Anand Jain <anand.jain@oracle.com>
Cc: linux-btrfs@vger.kernel.org, Sherry Yang <sherry.yang@oracle.com>,
kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH] btrfs: fix mkfs/mount/check failures due to race with systemd-udevd scan
Date: Thu, 23 Mar 2023 19:57:11 +0800 [thread overview]
Message-ID: <20230323195710.433E.409509F4@e16-tech.com> (raw)
In-Reply-To: <998486a6a1bddf36c3d3dc92df08eaa1b3a6ee7f.1679557827.git.anand.jain@oracle.com>
Hi,
> During the device scan initiated by systemd-udevd, other user space
> EXCL operations such as mkfs, mount, or check may get blocked and result
> in a "Device or resource busy" error. This is because the device
> scan process opens the device with the EXCL flag in the kernel.
>
> Two reports were received:
>
> . One with the btrfs/179 testcase, where the fsck command failed with
> the -EBUSY error; and
>
> . Another with the LTP pwritev03 testcase, where mkfs.vfs failed with
> the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
> on the device.
>
> In both cases, fsck and mkfs (respectively) were racing with a
> systemd-udevd device scan, and systemd-udevd won, resulting in the
> -EBUSY error for fsck and mkfs.
>
> Reproducing the problem has been difficult because there is a very
> small timeframe during which these userspace threads can race to
> acquire the exclusive device open. Even on the system where the problem
> was observed, the problem occurances were anywhere between 10 to 400
> iterations and chances of reproducing lessen with debug printk()s.
>
> However, an exclusive device open is unnecessary for the scan process,
> as there are no write operations on the device during scan. Furthermore,
> during the mount process, the superblock is re-read in the below
> function stack.
>
> btrfs_mount_root
> btrfs_open_devices
> open_fs_devices
> btrfs_open_one_device
> btrfs_get_bdev_and_sb
>
> So, to fix this issue, this patch removes the FMODE_EXCL flag from the scan
> operation, and adds a comment.
>
> Reported-by: Sherry Yang <sherry.yang@oracle.com>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>
> This patch should be cc-ed to stable-5.15.y and stable-6.1.y. As for
> stable-5.10.y and stable-5.4.y, a conflict fix is necessary, which I
> will send separately.
>
> fs/btrfs/volumes.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 93bc45001e68..cc1871767c8c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1366,8 +1366,17 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
> * So, we need to add a special mount option to scan for
> * later supers, using BTRFS_SUPER_MIRROR_MAX instead
> */
> - flags |= FMODE_EXCL;
>
> + /*
> + * Avoid using flag |= FMODE_EXCL here, as the systemd-udev may
> + * initiate the device scan which may race with the user's mount
> + * or mkfs command, resulting in failure.
for FMODE_READ | FMODE_EXCL, we need some sleep/retry,
for FMODE_WRITE | FMODE_EXCL, we should fail immediately?
scan race with with mkfs may result worse?
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/03/23
> + * Since the device scan is solely for reading purposes, there is
> + * no need for FMODE_EXCL. Additionally, the devices are read again
> + * during the mount process. It is ok to get some inconsistent
> + * values temporarily, as the device paths of the fsid are the only
> + * required information for assembling the volume.
> + */
> bdev = blkdev_get_by_path(path, flags, holder);
> if (IS_ERR(bdev))
> return ERR_CAST(bdev);
> --
> 2.38.1
next prev parent reply other threads:[~2023-03-23 11:58 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-23 7:56 [PATCH] btrfs: fix mkfs/mount/check failures due to race with systemd-udevd scan Anand Jain
2023-03-23 11:57 ` Wang Yugui [this message]
2023-03-23 13:14 ` Anand Jain
2023-03-23 13:27 ` Wang Yugui
2023-03-23 18:27 ` David Sterba
2023-03-23 18:24 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230323195710.433E.409509F4@e16-tech.com \
--to=wangyugui@e16-tech.com \
--cc=anand.jain@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=oliver.sang@intel.com \
--cc=sherry.yang@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox