linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	Zdenek Kabelac <zkabelac@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	dm-devel@redhat.com
Subject: Re: [PATCH] fix writing to the filesystem after unmount
Date: Wed, 6 Sep 2023 08:22:45 -0700	[thread overview]
Message-ID: <20230906152245.GD28160@frogsfrogsfrogs> (raw)
In-Reply-To: <59b54cc3-b98b-aff9-14fc-dc25c61111c6@redhat.com>

On Wed, Sep 06, 2023 at 03:26:21PM +0200, Mikulas Patocka wrote:
> lvm may suspend any logical volume anytime. If lvm suspend races with
> unmount, it may be possible that the kernel writes to the filesystem after
> unmount successfully returned. The problem can be demonstrated with this
> script:
> 
> #!/bin/sh -ex
> modprobe brd rd_size=4194304
> vgcreate vg /dev/ram0
> lvcreate -L 16M -n lv vg
> mkfs.ext4 /dev/vg/lv
> mount -t ext4 /dev/vg/lv /mnt/test
> dmsetup suspend /dev/vg/lv
> (sleep 1; dmsetup resume /dev/vg/lv) &
> umount /mnt/test
> md5sum /dev/vg/lv
> md5sum /dev/vg/lv
> dmsetup remove_all
> rmmod brd
> 
> The script unmounts the filesystem and runs md5sum twice, the result is
> that these two invocations return different hash.
> 
> What happens:
> * dmsetup suspend calls freeze_bdev, that goes to freeze_super and it
>   increments sb->s_active
> * then we unmount the filesystem, we go to cleanup_mnt, cleanup_mnt calls
>   deactivate_super, deactivate_super sees that sb->s_active is 2, so it
>   decreases it to 1 and does nothing - the umount syscall returns
>   successfully
> * now we have a mounted filesystem despite the fact that umount returned
> * we call md5sum, this waits for the block device being unblocked
> * dmsetup resume unblocks the block device and calls thaw_bdev, that calls
>   thaw_super and thaw_super_locked
> * thaw_super_locked calls deactivate_locked_super, this actually drops the
>   refcount and performs the unmount. The unmount races with md5sum. md5sum
>   wins the race and it returns the hash of the filesystem before it was
>   unmounted
> * the second md5sum returns the hash after the filesystem was unmounted
> 
> In order to fix this bug, this patch introduces a new function
> wait_and_deactivate_super that will wait if the filesystem is frozen and
> perform deactivate_locked_super only after that.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org
> 
> ---
>  fs/namespace.c     |    2 +-
>  fs/super.c         |   20 ++++++++++++++++++++
>  include/linux/fs.h |    1 +
>  3 files changed, 22 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/fs/namespace.c
> ===================================================================
> --- linux-2.6.orig/fs/namespace.c	2023-09-06 09:45:54.000000000 +0200
> +++ linux-2.6/fs/namespace.c	2023-09-06 09:47:15.000000000 +0200
> @@ -1251,7 +1251,7 @@ static void cleanup_mnt(struct mount *mn
>  	}
>  	fsnotify_vfsmount_delete(&mnt->mnt);
>  	dput(mnt->mnt.mnt_root);
> -	deactivate_super(mnt->mnt.mnt_sb);
> +	wait_and_deactivate_super(mnt->mnt.mnt_sb);
>  	mnt_free_id(mnt);
>  	call_rcu(&mnt->mnt_rcu, delayed_free_vfsmnt);
>  }
> Index: linux-2.6/fs/super.c
> ===================================================================
> --- linux-2.6.orig/fs/super.c	2023-09-05 21:09:16.000000000 +0200
> +++ linux-2.6/fs/super.c	2023-09-06 09:52:20.000000000 +0200
> @@ -36,6 +36,7 @@
>  #include <linux/lockdep.h>
>  #include <linux/user_namespace.h>
>  #include <linux/fs_context.h>
> +#include <linux/delay.h>
>  #include <uapi/linux/mount.h>
>  #include "internal.h"
>  
> @@ -365,6 +366,25 @@ void deactivate_super(struct super_block
>  EXPORT_SYMBOL(deactivate_super);
>  
>  /**
> + *	wait_and_deactivate_super	-	wait for unfreeze and drop a reference
> + *	@s: superblock to deactivate
> + *
> + *	Variant of deactivate_super(), except that we wait if the filesystem is
> + *	frozen. This is required on unmount, to make sure that the filesystem is
> + *	really unmounted when this function returns.
> + */
> +void wait_and_deactivate_super(struct super_block *s)
> +{
> +	down_write(&s->s_umount);
> +	while (s->s_writers.frozen != SB_UNFROZEN) {
> +		up_write(&s->s_umount);
> +		msleep(1);
> +		down_write(&s->s_umount);
> +	}

Instead of msleep, could you use wait_var_event_killable like
wait_for_partially_frozen() does?

--D

> +	deactivate_locked_super(s);
> +}
> +
> +/**
>   *	grab_super - acquire an active reference
>   *	@s: reference we are trying to make active
>   *
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2023-09-05 21:09:16.000000000 +0200
> +++ linux-2.6/include/linux/fs.h	2023-09-06 09:46:56.000000000 +0200
> @@ -2262,6 +2262,7 @@ void kill_anon_super(struct super_block
>  void kill_litter_super(struct super_block *sb);
>  void deactivate_super(struct super_block *sb);
>  void deactivate_locked_super(struct super_block *sb);
> +void wait_and_deactivate_super(struct super_block *sb);
>  int set_anon_super(struct super_block *s, void *data);
>  int set_anon_super_fc(struct super_block *s, struct fs_context *fc);
>  int get_anon_bdev(dev_t *);
> 

  parent reply	other threads:[~2023-09-06 15:22 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-06 13:26 [PATCH] fix writing to the filesystem after unmount Mikulas Patocka
2023-09-06 14:27 ` Christian Brauner
2023-09-06 15:03   ` Mikulas Patocka
2023-09-06 15:33     ` Christian Brauner
2023-09-06 15:58       ` Christian Brauner
2023-09-06 16:01       ` Mikulas Patocka
2023-09-06 16:19         ` Christian Brauner
2023-09-06 16:52           ` Mikulas Patocka
2023-09-07  9:44             ` Jan Kara
2023-09-07 10:43               ` Christian Brauner
2023-09-07 12:04                 ` Mikulas Patocka
2023-09-08  7:32                   ` Jan Kara
2023-09-08  9:29                     ` Zdenek Kabelac
2023-09-08 10:20                       ` Jan Kara
     [not found]                         ` <15c62097-d58f-4e66-bdf5-e0edb1306b2f@redhat.com>
2023-09-08 11:32                           ` Christian Brauner
2023-09-08 12:07                             ` Zdenek Kabelac
2023-09-08 12:34                               ` Christian Brauner
2023-09-12  9:10                           ` Jan Kara
2023-09-08 12:02                         ` Christian Brauner
2023-09-08 16:49                           ` John Stoffel
2023-09-09 11:21                         ` Christoph Hellwig
2023-09-08 12:01                     ` Pavel Machek
2023-09-08 11:59               ` Pavel Machek
2023-09-06 17:10         ` Al Viro
2023-09-06 17:08     ` Al Viro
2023-09-06 15:22 ` Darrick J. Wong [this message]
2023-09-06 15:38   ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230906152245.GD28160@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=brauner@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).