public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Wang Yugui <wangyugui@e16-tech.com>
To: Anand Jain <anand.jain@oracle.com>
Cc: linux-btrfs@vger.kernel.org, Sherry Yang <sherry.yang@oracle.com>,
	kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH] btrfs: fix mkfs/mount/check failures due to race with systemd-udevd scan
Date: Thu, 23 Mar 2023 19:57:11 +0800	[thread overview]
Message-ID: <20230323195710.433E.409509F4@e16-tech.com> (raw)
In-Reply-To: <998486a6a1bddf36c3d3dc92df08eaa1b3a6ee7f.1679557827.git.anand.jain@oracle.com>

Hi,

> During the device scan initiated by systemd-udevd, other user space
> EXCL operations such as mkfs, mount, or check may get blocked and result
> in a "Device or resource busy" error. This is because the device
> scan process opens the device with the EXCL flag in the kernel.
> 
> Two reports were received:
> 
>  . One with the btrfs/179 testcase, where the fsck command failed with
>    the -EBUSY error; and
> 
>  . Another with the LTP pwritev03 testcase, where mkfs.vfs failed with
>    the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
>    on the device.
> 
> In both cases, fsck and mkfs (respectively) were racing with a
> systemd-udevd device scan, and systemd-udevd won, resulting in the
> -EBUSY error for fsck and mkfs.
> 
> Reproducing the problem has been difficult because there is a very
> small timeframe during which these userspace threads can race to
> acquire the exclusive device open. Even on the system where the problem
> was observed, the problem occurances were anywhere between 10 to 400
> iterations and chances of reproducing lessen with debug printk()s.
> 
> However, an exclusive device open is unnecessary for the scan process,
> as there are no write operations on the device during scan. Furthermore,
> during the mount process, the superblock is re-read in the below
> function stack.
> 
>   btrfs_mount_root
>    btrfs_open_devices
>     open_fs_devices
>      btrfs_open_one_device
>        btrfs_get_bdev_and_sb
> 
> So, to fix this issue, this patch removes the FMODE_EXCL flag from the scan
> operation, and adds a comment.
> 
> Reported-by: Sherry Yang <sherry.yang@oracle.com>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> 
>  This patch should be cc-ed to stable-5.15.y and stable-6.1.y. As for
>  stable-5.10.y and stable-5.4.y, a conflict fix is necessary, which I
>  will send separately.
> 
>  fs/btrfs/volumes.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 93bc45001e68..cc1871767c8c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1366,8 +1366,17 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
>  	 * So, we need to add a special mount option to scan for
>  	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>  	 */
> -	flags |= FMODE_EXCL;
>  
> +	/*
> +	 * Avoid using flag |= FMODE_EXCL here, as the systemd-udev may
> +	 * initiate the device scan which may race with the user's mount
> +	 * or mkfs command, resulting in failure.

for  FMODE_READ | FMODE_EXCL, we need some sleep/retry,
for  FMODE_WRITE | FMODE_EXCL, we should fail immediately?

scan race with with mkfs may result worse?

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/03/23


> +	 * Since the device scan is solely for reading purposes, there is
> +	 * no need for FMODE_EXCL. Additionally, the devices are read again
> +	 * during the mount process. It is ok to get some inconsistent
> +	 * values temporarily, as the device paths of the fsid are the only
> +	 * required information for assembling the volume.
> +	 */
>  	bdev = blkdev_get_by_path(path, flags, holder);
>  	if (IS_ERR(bdev))
>  		return ERR_CAST(bdev);
> -- 
> 2.38.1



  reply	other threads:[~2023-03-23 11:58 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-23  7:56 [PATCH] btrfs: fix mkfs/mount/check failures due to race with systemd-udevd scan Anand Jain
2023-03-23 11:57 ` Wang Yugui [this message]
2023-03-23 13:14   ` Anand Jain
2023-03-23 13:27     ` Wang Yugui
2023-03-23 18:27       ` David Sterba
2023-03-23 18:24 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230323195710.433E.409509F4@e16-tech.com \
    --to=wangyugui@e16-tech.com \
    --cc=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=sherry.yang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox