From: "Darrick J. Wong" <djwong@kernel.org>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>,
linux-kernel@vger.kernel.org, dm-devel@redhat.com,
Alexander Viro <viro@zeniv.linux.org.uk>,
Zdenek Kabelac <zkabelac@redhat.com>,
linux-fsdevel@vger.kernel.org
Subject: Re: [dm-devel] [PATCH] fix writing to the filesystem after unmount
Date: Wed, 6 Sep 2023 08:22:45 -0700 [thread overview]
Message-ID: <20230906152245.GD28160@frogsfrogsfrogs> (raw)
In-Reply-To: <59b54cc3-b98b-aff9-14fc-dc25c61111c6@redhat.com>
On Wed, Sep 06, 2023 at 03:26:21PM +0200, Mikulas Patocka wrote:
> lvm may suspend any logical volume anytime. If lvm suspend races with
> unmount, it may be possible that the kernel writes to the filesystem after
> unmount successfully returned. The problem can be demonstrated with this
> script:
>
> #!/bin/sh -ex
> modprobe brd rd_size=4194304
> vgcreate vg /dev/ram0
> lvcreate -L 16M -n lv vg
> mkfs.ext4 /dev/vg/lv
> mount -t ext4 /dev/vg/lv /mnt/test
> dmsetup suspend /dev/vg/lv
> (sleep 1; dmsetup resume /dev/vg/lv) &
> umount /mnt/test
> md5sum /dev/vg/lv
> md5sum /dev/vg/lv
> dmsetup remove_all
> rmmod brd
>
> The script unmounts the filesystem and runs md5sum twice, the result is
> that these two invocations return different hash.
>
> What happens:
> * dmsetup suspend calls freeze_bdev, that goes to freeze_super and it
> increments sb->s_active
> * then we unmount the filesystem, we go to cleanup_mnt, cleanup_mnt calls
> deactivate_super, deactivate_super sees that sb->s_active is 2, so it
> decreases it to 1 and does nothing - the umount syscall returns
> successfully
> * now we have a mounted filesystem despite the fact that umount returned
> * we call md5sum, this waits for the block device being unblocked
> * dmsetup resume unblocks the block device and calls thaw_bdev, that calls
> thaw_super and thaw_super_locked
> * thaw_super_locked calls deactivate_locked_super, this actually drops the
> refcount and performs the unmount. The unmount races with md5sum. md5sum
> wins the race and it returns the hash of the filesystem before it was
> unmounted
> * the second md5sum returns the hash after the filesystem was unmounted
>
> In order to fix this bug, this patch introduces a new function
> wait_and_deactivate_super that will wait if the filesystem is frozen and
> perform deactivate_locked_super only after that.
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org
>
> ---
> fs/namespace.c | 2 +-
> fs/super.c | 20 ++++++++++++++++++++
> include/linux/fs.h | 1 +
> 3 files changed, 22 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/fs/namespace.c
> ===================================================================
> --- linux-2.6.orig/fs/namespace.c 2023-09-06 09:45:54.000000000 +0200
> +++ linux-2.6/fs/namespace.c 2023-09-06 09:47:15.000000000 +0200
> @@ -1251,7 +1251,7 @@ static void cleanup_mnt(struct mount *mn
> }
> fsnotify_vfsmount_delete(&mnt->mnt);
> dput(mnt->mnt.mnt_root);
> - deactivate_super(mnt->mnt.mnt_sb);
> + wait_and_deactivate_super(mnt->mnt.mnt_sb);
> mnt_free_id(mnt);
> call_rcu(&mnt->mnt_rcu, delayed_free_vfsmnt);
> }
> Index: linux-2.6/fs/super.c
> ===================================================================
> --- linux-2.6.orig/fs/super.c 2023-09-05 21:09:16.000000000 +0200
> +++ linux-2.6/fs/super.c 2023-09-06 09:52:20.000000000 +0200
> @@ -36,6 +36,7 @@
> #include <linux/lockdep.h>
> #include <linux/user_namespace.h>
> #include <linux/fs_context.h>
> +#include <linux/delay.h>
> #include <uapi/linux/mount.h>
> #include "internal.h"
>
> @@ -365,6 +366,25 @@ void deactivate_super(struct super_block
> EXPORT_SYMBOL(deactivate_super);
>
> /**
> + * wait_and_deactivate_super - wait for unfreeze and drop a reference
> + * @s: superblock to deactivate
> + *
> + * Variant of deactivate_super(), except that we wait if the filesystem is
> + * frozen. This is required on unmount, to make sure that the filesystem is
> + * really unmounted when this function returns.
> + */
> +void wait_and_deactivate_super(struct super_block *s)
> +{
> + down_write(&s->s_umount);
> + while (s->s_writers.frozen != SB_UNFROZEN) {
> + up_write(&s->s_umount);
> + msleep(1);
> + down_write(&s->s_umount);
> + }
Instead of msleep, could you use wait_var_event_killable like
wait_for_partially_frozen() does?
--D
> + deactivate_locked_super(s);
> +}
> +
> +/**
> * grab_super - acquire an active reference
> * @s: reference we are trying to make active
> *
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h 2023-09-05 21:09:16.000000000 +0200
> +++ linux-2.6/include/linux/fs.h 2023-09-06 09:46:56.000000000 +0200
> @@ -2262,6 +2262,7 @@ void kill_anon_super(struct super_block
> void kill_litter_super(struct super_block *sb);
> void deactivate_super(struct super_block *sb);
> void deactivate_locked_super(struct super_block *sb);
> +void wait_and_deactivate_super(struct super_block *sb);
> int set_anon_super(struct super_block *s, void *data);
> int set_anon_super_fc(struct super_block *s, struct fs_context *fc);
> int get_anon_bdev(dev_t *);
>
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: "Darrick J. Wong" <djwong@kernel.org>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
Zdenek Kabelac <zkabelac@redhat.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
dm-devel@redhat.com
Subject: Re: [PATCH] fix writing to the filesystem after unmount
Date: Wed, 6 Sep 2023 08:22:45 -0700 [thread overview]
Message-ID: <20230906152245.GD28160@frogsfrogsfrogs> (raw)
In-Reply-To: <59b54cc3-b98b-aff9-14fc-dc25c61111c6@redhat.com>
On Wed, Sep 06, 2023 at 03:26:21PM +0200, Mikulas Patocka wrote:
> lvm may suspend any logical volume anytime. If lvm suspend races with
> unmount, it may be possible that the kernel writes to the filesystem after
> unmount successfully returned. The problem can be demonstrated with this
> script:
>
> #!/bin/sh -ex
> modprobe brd rd_size=4194304
> vgcreate vg /dev/ram0
> lvcreate -L 16M -n lv vg
> mkfs.ext4 /dev/vg/lv
> mount -t ext4 /dev/vg/lv /mnt/test
> dmsetup suspend /dev/vg/lv
> (sleep 1; dmsetup resume /dev/vg/lv) &
> umount /mnt/test
> md5sum /dev/vg/lv
> md5sum /dev/vg/lv
> dmsetup remove_all
> rmmod brd
>
> The script unmounts the filesystem and runs md5sum twice, the result is
> that these two invocations return different hash.
>
> What happens:
> * dmsetup suspend calls freeze_bdev, that goes to freeze_super and it
> increments sb->s_active
> * then we unmount the filesystem, we go to cleanup_mnt, cleanup_mnt calls
> deactivate_super, deactivate_super sees that sb->s_active is 2, so it
> decreases it to 1 and does nothing - the umount syscall returns
> successfully
> * now we have a mounted filesystem despite the fact that umount returned
> * we call md5sum, this waits for the block device being unblocked
> * dmsetup resume unblocks the block device and calls thaw_bdev, that calls
> thaw_super and thaw_super_locked
> * thaw_super_locked calls deactivate_locked_super, this actually drops the
> refcount and performs the unmount. The unmount races with md5sum. md5sum
> wins the race and it returns the hash of the filesystem before it was
> unmounted
> * the second md5sum returns the hash after the filesystem was unmounted
>
> In order to fix this bug, this patch introduces a new function
> wait_and_deactivate_super that will wait if the filesystem is frozen and
> perform deactivate_locked_super only after that.
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org
>
> ---
> fs/namespace.c | 2 +-
> fs/super.c | 20 ++++++++++++++++++++
> include/linux/fs.h | 1 +
> 3 files changed, 22 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/fs/namespace.c
> ===================================================================
> --- linux-2.6.orig/fs/namespace.c 2023-09-06 09:45:54.000000000 +0200
> +++ linux-2.6/fs/namespace.c 2023-09-06 09:47:15.000000000 +0200
> @@ -1251,7 +1251,7 @@ static void cleanup_mnt(struct mount *mn
> }
> fsnotify_vfsmount_delete(&mnt->mnt);
> dput(mnt->mnt.mnt_root);
> - deactivate_super(mnt->mnt.mnt_sb);
> + wait_and_deactivate_super(mnt->mnt.mnt_sb);
> mnt_free_id(mnt);
> call_rcu(&mnt->mnt_rcu, delayed_free_vfsmnt);
> }
> Index: linux-2.6/fs/super.c
> ===================================================================
> --- linux-2.6.orig/fs/super.c 2023-09-05 21:09:16.000000000 +0200
> +++ linux-2.6/fs/super.c 2023-09-06 09:52:20.000000000 +0200
> @@ -36,6 +36,7 @@
> #include <linux/lockdep.h>
> #include <linux/user_namespace.h>
> #include <linux/fs_context.h>
> +#include <linux/delay.h>
> #include <uapi/linux/mount.h>
> #include "internal.h"
>
> @@ -365,6 +366,25 @@ void deactivate_super(struct super_block
> EXPORT_SYMBOL(deactivate_super);
>
> /**
> + * wait_and_deactivate_super - wait for unfreeze and drop a reference
> + * @s: superblock to deactivate
> + *
> + * Variant of deactivate_super(), except that we wait if the filesystem is
> + * frozen. This is required on unmount, to make sure that the filesystem is
> + * really unmounted when this function returns.
> + */
> +void wait_and_deactivate_super(struct super_block *s)
> +{
> + down_write(&s->s_umount);
> + while (s->s_writers.frozen != SB_UNFROZEN) {
> + up_write(&s->s_umount);
> + msleep(1);
> + down_write(&s->s_umount);
> + }
Instead of msleep, could you use wait_var_event_killable like
wait_for_partially_frozen() does?
--D
> + deactivate_locked_super(s);
> +}
> +
> +/**
> * grab_super - acquire an active reference
> * @s: reference we are trying to make active
> *
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h 2023-09-05 21:09:16.000000000 +0200
> +++ linux-2.6/include/linux/fs.h 2023-09-06 09:46:56.000000000 +0200
> @@ -2262,6 +2262,7 @@ void kill_anon_super(struct super_block
> void kill_litter_super(struct super_block *sb);
> void deactivate_super(struct super_block *sb);
> void deactivate_locked_super(struct super_block *sb);
> +void wait_and_deactivate_super(struct super_block *sb);
> int set_anon_super(struct super_block *s, void *data);
> int set_anon_super_fc(struct super_block *s, struct fs_context *fc);
> int get_anon_bdev(dev_t *);
>
next prev parent reply other threads:[~2023-09-06 15:30 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-06 13:26 [dm-devel] [PATCH] fix writing to the filesystem after unmount Mikulas Patocka
2023-09-06 13:26 ` Mikulas Patocka
2023-09-06 14:27 ` [dm-devel] " Christian Brauner
2023-09-06 14:27 ` Christian Brauner
2023-09-06 15:03 ` [dm-devel] " Mikulas Patocka
2023-09-06 15:03 ` Mikulas Patocka
2023-09-06 15:33 ` [dm-devel] " Christian Brauner
2023-09-06 15:33 ` Christian Brauner
2023-09-06 15:58 ` [dm-devel] " Christian Brauner
2023-09-06 15:58 ` Christian Brauner
2023-09-06 16:01 ` [dm-devel] " Mikulas Patocka
2023-09-06 16:01 ` Mikulas Patocka
2023-09-06 16:19 ` [dm-devel] " Christian Brauner
2023-09-06 16:19 ` Christian Brauner
2023-09-06 16:52 ` [dm-devel] " Mikulas Patocka
2023-09-06 16:52 ` Mikulas Patocka
2023-09-07 9:44 ` [dm-devel] " Jan Kara
2023-09-07 9:44 ` Jan Kara
2023-09-07 10:43 ` [dm-devel] " Christian Brauner
2023-09-07 10:43 ` Christian Brauner
2023-09-07 12:04 ` [dm-devel] " Mikulas Patocka
2023-09-07 12:04 ` Mikulas Patocka
2023-09-08 7:32 ` [dm-devel] " Jan Kara
2023-09-08 7:32 ` Jan Kara
2023-09-08 9:29 ` [dm-devel] " Zdenek Kabelac
2023-09-08 9:29 ` Zdenek Kabelac
2023-09-08 10:20 ` [dm-devel] " Jan Kara
2023-09-08 10:20 ` Jan Kara
2023-09-08 10:51 ` [dm-devel] " Zdenek Kabelac
2023-09-08 11:32 ` Christian Brauner
2023-09-08 11:32 ` Christian Brauner
2023-09-08 12:07 ` [dm-devel] " Zdenek Kabelac
2023-09-08 12:07 ` Zdenek Kabelac
2023-09-08 12:07 ` [dm-devel] " Zdenek Kabelac
2023-09-08 12:34 ` Christian Brauner
2023-09-08 12:34 ` Christian Brauner
2023-09-12 9:10 ` [dm-devel] " Jan Kara
2023-09-12 9:10 ` Jan Kara
2023-09-08 12:02 ` [dm-devel] " Christian Brauner
2023-09-08 12:02 ` Christian Brauner
2023-09-08 16:49 ` [dm-devel] " John Stoffel
2023-09-08 16:49 ` John Stoffel
2023-09-09 11:21 ` [dm-devel] " Christoph Hellwig
2023-09-09 11:21 ` Christoph Hellwig
2023-09-08 12:01 ` [dm-devel] " Pavel Machek
2023-09-08 12:01 ` Pavel Machek
2023-09-08 11:59 ` [dm-devel] " Pavel Machek
2023-09-08 11:59 ` Pavel Machek
2023-09-06 17:10 ` [dm-devel] " Al Viro
2023-09-06 17:10 ` Al Viro
2023-09-06 17:08 ` [dm-devel] " Al Viro
2023-09-06 17:08 ` Al Viro
2023-09-06 15:22 ` Darrick J. Wong [this message]
2023-09-06 15:22 ` Darrick J. Wong
2023-09-06 15:38 ` [dm-devel] " Christian Brauner
2023-09-06 15:38 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230906152245.GD28160@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=brauner@kernel.org \
--cc=dm-devel@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=viro@zeniv.linux.org.uk \
--cc=zkabelac@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.