Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC] btrfs: reject transaction creation for read-only mount except for log recovery
Date: Thu, 16 Dec 2021 18:25:26 +0800	[thread overview]
Message-ID: <2a3876e3-e646-caae-96ca-55f1d5c1cac4@suse.com> (raw)
In-Reply-To: <20211216091441.53270-1-wqu@suse.com>



On 2021/12/16 17:14, Qu Wenruo wrote:
> [BUG]
> The following super simple script would crash btrfs at unmount time, if
> CONFIG_BTRFS_ASSERT() is set.
> 
>   mkfs.btrfs -f $dev
>   mount $dev $mnt
>   xfs_io -f -c "pwrite 0 4k" $mnt/file
>   umount $mnt
>   mount -r ro $dev $mnt
>   btrfs scrub start -Br $mnt
>   umount $mnt
> 
> This will trigger the following ASSERT() introduced by commit
> 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at
> late stage of umount").
> 
> That patch is deifnitely not the cause, it just makes enough noise for
> us developer.
> 
> [CAUSE]
> We will start transaction for the following call chain during scrub:
> 
> scrub_enumerate_chunks()
> |- btrfs_inc_block_group_ro()
>     |- btrfs_join_transaction()
> 
> However for RO mount, there is no running transaction at all, thus
> btrfs_join_transaction() will start a new transaction.
> 
> But since the fs is already read-only, there is no way to commit the
> transaction, thus triggering the ASSERT().
> 
> The bug should be there for a long time. As I can still reproduce the
> crash at v5.10 kernel.
> 
> [FIX]
> Currently I choose to separate the log recovery code transaction with
> other transactions, and reject all other transactions if the filesystem
> is mounted read-only.
> 
> But I'm not sure if this is the best solution, thus this patch still
> requires for comments.
> 
> There is some alternatives I can thing of:
> 
> - Don't start new transaction in btrfs_inc_block_group_ro().
>    We have btrfs_join_transaction_nostart(), but even with that we still
>    can't ensure there is no new transaction started and committed after
>    we called btrfs_join_transaction_nostart() and got -ENOENT.
> 
> - Allow btrfs to commit empty transaction without writing any thing
>    If we know this transaction contains no dirty metadata, we allow
>    it to be "committed" even on RO mount, although no real data will
>    be written to disk.

This seems to be a better solution, after more testing with this patch.

There is another call sites which would need transaction start on 
read-only mount, adding new device to seed device.

If there is an hole, there will be more holes, and I don't think 
creating so many special cases is a good idea.


On the other hand, for such empty transaction method, we only need to 
make sure such transaction is always empty, and then can clean it up 
properly.

No extra impact on log recovery nor seed device.

Thanks,
Qu

> 
> And even with current fix, I'm not 100% sure if we won't get a crash if
> we run read-only scrub with frequently ro/rw remount.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/block-group.c |  6 +++++-
>   fs/btrfs/transaction.c | 14 ++++++++++++++
>   fs/btrfs/transaction.h |  5 +++++
>   fs/btrfs/tree-log.c    |  2 +-
>   4 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 1db24e6d6d90..13391d562189 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -2546,6 +2546,9 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
>   
>   	do {
>   		trans = btrfs_join_transaction(root);
> +		if (IS_ERR(trans) && PTR_ERR(trans) == -EROFS)
> +			return 0;
> +
>   		if (IS_ERR(trans))
>   			return PTR_ERR(trans);
>   
> @@ -2621,7 +2624,8 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache)
>   	struct btrfs_space_info *sinfo = cache->space_info;
>   	u64 num_bytes;
>   
> -	BUG_ON(!cache->ro);
> +	if (cache->ro)
> +		return;
>   
>   	spin_lock(&sinfo->lock);
>   	spin_lock(&cache->lock);
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 03de89b45f27..306eaeb41ec9 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -591,6 +591,13 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
>   	if (BTRFS_FS_ERROR(fs_info))
>   		return ERR_PTR(-EROFS);
>   
> +	/*
> +	 * If the FS is mounted RO, we only allow transaction for log recovery,
> +	 * no regular transaction can be started.
> +	 */
> +	if (sb_rdonly(fs_info->sb) && !(type & __TRANS_START_IGNORE_RO))
> +		return ERR_PTR(-EROFS);
> +
>   	if (current->journal_info) {
>   		WARN_ON(type & TRANS_EXTWRITERS);
>   		h = current->journal_info;
> @@ -781,6 +788,13 @@ struct btrfs_trans_handle *btrfs_start_transaction(struct btrfs_root *root,
>   				 BTRFS_RESERVE_FLUSH_ALL, true);
>   }
>   
> +struct btrfs_trans_handle *btrfs_start_transaction_log_recover(
> +			struct btrfs_root *root, unsigned int num_items)
> +{
> +	return start_transaction(root, num_items, TRANS_START_LOG_RECOVER,
> +				 BTRFS_RESERVE_FLUSH_ALL, true);
> +}
> +
>   struct btrfs_trans_handle *btrfs_start_transaction_fallback_global_rsv(
>   					struct btrfs_root *root,
>   					unsigned int num_items)
> diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
> index 1852ed9de7fd..24a743d91eff 100644
> --- a/fs/btrfs/transaction.h
> +++ b/fs/btrfs/transaction.h
> @@ -106,8 +106,11 @@ struct btrfs_transaction {
>   #define __TRANS_JOIN_NOLOCK	(1U << 12)
>   #define __TRANS_DUMMY		(1U << 13)
>   #define __TRANS_JOIN_NOSTART	(1U << 14)
> +#define __TRANS_START_IGNORE_RO	(1U << 15)
>   
>   #define TRANS_START		(__TRANS_START | __TRANS_FREEZABLE)
> +#define TRANS_START_LOG_RECOVER	(__TRANS_START | __TRANS_FREEZABLE | \
> +				 __TRANS_START_IGNORE_RO)
>   #define TRANS_ATTACH		(__TRANS_ATTACH)
>   #define TRANS_JOIN		(__TRANS_JOIN | __TRANS_FREEZABLE)
>   #define TRANS_JOIN_NOLOCK	(__TRANS_JOIN_NOLOCK)
> @@ -201,6 +204,8 @@ static inline void btrfs_clear_skip_qgroup(struct btrfs_trans_handle *trans)
>   int btrfs_end_transaction(struct btrfs_trans_handle *trans);
>   struct btrfs_trans_handle *btrfs_start_transaction(struct btrfs_root *root,
>   						   unsigned int num_items);
> +struct btrfs_trans_handle *btrfs_start_transaction_log_recover(
> +			struct btrfs_root *root, unsigned int num_items);
>   struct btrfs_trans_handle *btrfs_start_transaction_fallback_global_rsv(
>   					struct btrfs_root *root,
>   					unsigned int num_items);
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 69f901813ea8..42369fa9a038 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -6478,7 +6478,7 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
>   
>   	set_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags);
>   
> -	trans = btrfs_start_transaction(fs_info->tree_root, 0);
> +	trans = btrfs_start_transaction_log_recover(fs_info->tree_root, 0);
>   	if (IS_ERR(trans)) {
>   		ret = PTR_ERR(trans);
>   		goto error;


      parent reply	other threads:[~2021-12-16 10:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-16  9:14 [PATCH RFC] btrfs: reject transaction creation for read-only mount except for log recovery Qu Wenruo
2021-12-16  9:32 ` Qu Wenruo
2021-12-16 10:25 ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a3876e3-e646-caae-96ca-55f1d5c1cac4@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox