Re: [PATCH] mkfs: acquire flock before modifying the device superblock

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Wu Guanghao <wuguanghao3@huawei.com>
Cc: cem@kernel.org, linux-xfs@vger.kernel.org,
	"liuzhiqiang (I)" <liuzhiqiang26@huawei.com>
Subject: Re: [PATCH] mkfs: acquire flock before modifying the device superblock
Date: Fri, 14 Oct 2022 08:38:18 -0700	[thread overview]
Message-ID: <Y0mCauklwsDwImi8@magnolia> (raw)
In-Reply-To: <b359751c-2397-bcd1-9065-583afb2f93ef@huawei.com>

On Fri, Oct 14, 2022 at 04:41:35PM +0800, Wu Guanghao wrote:
> We noticed that systemd has an issue about symlink unreliable caused by
> formatting filesystem and systemd operating on same device.
> Issue Link: https://github.com/systemd/systemd/issues/23746
> 
> According to systemd doc, a BSD flock needs to be acquired before
> formatting the device.
> Related Link: https://systemd.io/BLOCK_DEVICE_LOCKING/

TLDR: udevd wants fs utilities to use advisory file locking to
coordinate (re)writes to block devices to avoid collisions between mkfs
and all the udev magic.

Critically, udev calls flock(LOCK_SH | LOCK_NB) to trylock the device in
shared mode to avoid blocking on fs utilities; if the trylock fails,
they'll move on and try again later.  The old O_EXCL-on-blockdevs trick
will not work for that usecase (I guess) because it's not a shared
reader lock.  It's also not the file locking API.

> So we acquire flock after opening the device but before
> writing superblock.

xfs_db and xfs_repair can write to the filesystem too; shouldn't this
locking apply to them as well?

> Signed-off-by: wuguanghao <wuguanghao3@huawei.com>
> ---
>  mkfs/xfs_mkfs.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 9dd0e79c..b83cb043 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -13,6 +13,7 @@
>  #include "libfrog/crc32cselftest.h"
>  #include "proto.h"
>  #include <ini.h>
> +#include <sys/file.h>
> 
>  #define TERABYTES(count, blog) ((uint64_t)(count) << (40 - (blog)))
>  #define GIGABYTES(count, blog) ((uint64_t)(count) << (30 - (blog)))
> @@ -2758,6 +2759,30 @@ _("log stripe unit (%d bytes) is too large (maximum is 256KiB)\n"
> 
>  }
> 
> +static void
> +lock_device(dev_t dev, int flag, char *name)
> +{
> +       int fd = libxfs_device_to_fd(dev);
> +       int readonly = flag & LIBXFS_ISREADONLY;
> +
> +       if (!readonly && fd > 0)
> +               if (flock(fd, LOCK_EX) != 0) {
> +                       fprintf(stderr, "%s: failed to get lock.\n", name);
> +                       exit(1);
> +               }

So yes, this belongs in libxfs_device_open.

If we're opening the bdevs in readonly mode, shouldn't we take LOCK_SH
to prevent mkfs from colliding with (say) xfs_metadump?

Bonus question: Shouldn't the /kernel/ also effectively be taking
LOCK_SH when it opens the bdevs to mount the filesystem?

--D

> +}
> +
> +static void
> +lock_devices(struct libxfs_xinit *xi)
> +{
> +       if (!xi->disfile)
> +               lock_device(xi->ddev, xi->dcreat, xi->dname);
> +       if (xi->logdev && !xi->lisfile)
> +               lock_device(xi->logdev, xi->lcreat, xi->logname);
> +       if (xi->rtdev && !xi->risfile)
> +               lock_device(xi->rtdev, xi->rcreat, xi->rtname);
> +}
> +
>  static void
>  open_devices(
>         struct mkfs_params      *cfg,
> @@ -4208,6 +4233,7 @@ main(
>          * Open and validate the device configurations
>          */
>         open_devices(&cfg, &xi);
> +       lock_devices(&xi);
>         validate_overwrite(dfile, force_overwrite);
>         validate_datadev(&cfg, &cli);
>         validate_logdev(&cfg, &cli, &logfile);
> --
> 2.27.0

next prev parent reply	other threads:[~2022-10-14 15:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14  8:41 [PATCH] mkfs: acquire flock before modifying the device superblock Wu Guanghao
2022-10-14 15:38 ` Darrick J. Wong [this message]
2022-10-18  2:45   ` Wu Guanghao
2022-10-18 21:09     ` Darrick J. Wong
2022-10-19  1:00       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0mCauklwsDwImi8@magnolia \
    --to=djwong@kernel.org \
    --cc=cem@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=liuzhiqiang26@huawei.com \
    --cc=wuguanghao3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox