linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.de>
To: Nikolay Borisov <nborisov@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH] btrfs: Introduce developer-oriented check to ensure all tree blocks are written back before writing super blocks
Date: Thu, 21 Feb 2019 22:32:10 +0800	[thread overview]
Message-ID: <e07446e9-7aff-9f12-582e-310135c8ec30@suse.de> (raw)
In-Reply-To: <c2aea5d2-5039-0243-6fd7-fed0880a0be0@suse.com>



On 2019/2/21 下午10:25, Nikolay Borisov wrote:
> 
> 
> On 21.02.19 г. 10:22 ч., Qu Wenruo wrote:
>> There are a lot of error reports complaining about transid error in the
>> mail list.
>>
>> Under most case, the on-disk transid is lower than expected transid.
>> This may indicate that some tree blocks are not written back to disk
>> before writing super blocks.
>>
>> This patch will add a safe net for developers, by calling
>> btrfs_write_and_wait_transaction() before setting transaction unblocked
>> and double check btree_inode and dirty_pages io_tree, to ensure no tree
>> blocks are still dirty or under writeback.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> The reason for RFC is, I'm not sure why we currently call
>> btrfs_write_and_wait_transaction() after setting transaction UNBLOCKED.
>>
>> It looks like an optimization, but I don't see much performance
>> difference during regression test.
>>
>> I hope to move the call before we unblock transaction so we can do such
>> sanity check for all builds and hope to catch some clue of transid
>> error.
> 
> Even current code ensures that all allocated blocks in the current
> transaction (which is what all those EXTENT_DIRTY extents in the
> dirty_pages tree ) are written before the new superblocks are.
> 
> Slight offtopic: In fact instead of playing games with the flags and
> having an extent_io_tree called dirty_pages o_O it can be replaced with
> a simple linked list that holds all newly allocated buffers so writing
> all such buffers will result in simply iterating the list.
> 
> In any case this patch is buggy, see below on why
> 
>> ---
>>  fs/btrfs/transaction.c | 38 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 38 insertions(+)
>>
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index 4ec2b660d014..30b7ed0bf873 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -2213,6 +2213,44 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>  
>>  	btrfs_trans_release_chunk_metadata(trans);
>>  
>> +	/* Last safenet or developer to catch any unwritten tree blocks */
>> +	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
>> +		u64 found_start = 0;
>> +		u64 found_end = 0;
>> +
>> +		ret = btrfs_write_and_wait_transaction(trans);
>> +		if (ret) {
>> +			btrfs_handle_fs_error(fs_info, ret,
>> +					      "Error while writing out transaction");
>> +			mutex_unlock(&fs_info->tree_log_mutex);
>> +			goto scrub_continue;
>> +		}
>> +
>> +		/* No dirty extent should exist in btree inode */
>> +		ret = test_range_bit(&trans->transaction->dirty_pages, 0,
>> +				(u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
> 
> Why do you check EXTENT_WRITEBACK, AFAICS that flag is not currently
> used in the code and should perhahps be deleted? I don't see anything
> setting it, it's only being checked for (as part of EXTENT_IOBITS).
> 
> Additionally this check is pointless because
> btrfs_write_and_wait_transaction calls clear_btree_io_tree which purges
> the tree.

But we still have BTRFS_I(fs_info->btree_inode)->io_tree, and that's the
main part of the check.

I don't really think the dirty_pages is really an important thing
compared to btree_inode.
If there is some way to find any dirty pages from an address_space, it
would be even better.

Thanks,
Qu

> 
>> +				0, NULL);
>> +		if (ret > 0) {
>> +			WARN(1,
>> +		"dirty_pages not fully written back, start=%llu len=%llu\n",
>> +			     found_start, found_end + 1 - found_start);
>> +			ret = -EUCLEAN;
>> +			mutex_unlock(&fs_info->tree_log_mutex);
>> +			goto scrub_continue;
>> +		}
>> +		ret = test_range_bit(&BTRFS_I(fs_info->btree_inode)->io_tree, 0,
>> +				     (u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
>> +				     0, NULL);
>> +		if (ret > 0) {
>> +			WARN(1,
>> +		"btree io_tree not fully written back, start=%llu len=%llu\n",
>> +			     found_start, found_end + 1 - found_start);
>> +			ret = -EUCLEAN;
>> +			mutex_unlock(&fs_info->tree_log_mutex);
>> +			goto scrub_continue;
>> +		}
>> +	}
>> +
>>  	spin_lock(&fs_info->trans_lock);
>>  	cur_trans->state = TRANS_STATE_UNBLOCKED;
>>  	fs_info->running_transaction = NULL;
>>

  reply	other threads:[~2019-02-21 14:32 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-21  8:22 [RFC PATCH] btrfs: Introduce developer-oriented check to ensure all tree blocks are written back before writing super blocks Qu Wenruo
2019-02-21 14:25 ` Nikolay Borisov
2019-02-21 14:32   ` Qu Wenruo [this message]
2019-02-21 15:01     ` Nikolay Borisov
2019-02-26  7:38 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e07446e9-7aff-9f12-582e-310135c8ec30@suse.de \
    --to=wqu@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).