Re: [PATCH v2] btrfs: don't update the block group item if used bytes are the same

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Josef Bacik <josef@toxicpanda.com>, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2] btrfs: don't update the block group item if used bytes are the same
Date: Fri, 9 Sep 2022 06:35:15 +0800	[thread overview]
Message-ID: <91c7d385-4687-efd9-2c9a-bd02afae6141@gmx.com> (raw)
In-Reply-To: <YxnrXj7GqhPg7vRa@localhost.localdomain>



On 2022/9/8 21:17, Josef Bacik wrote:
> On Thu, Sep 08, 2022 at 02:33:48PM +0800, Qu Wenruo wrote:
>> [BACKGROUND]
>>
>> When committing a transaction, we will update block group items for all
>> dirty block groups.
>>
>> But in fact, dirty block groups don't always need to update their block
>> group items.
>> It's pretty common to have a metadata block group which experienced
>> several CoW operations, but still have the same amount of used bytes.
>>
>> In that case, we may unnecessarily CoW a tree block doing nothing.
>>
>> [ENHANCEMENT]
>>
>> This patch will introduce btrfs_block_group::commit_used member to
>> remember the last used bytes, and use that new member to skip
>> unnecessary block group item update.
>>
>> This would be more common for large fs, which metadata block group can
>> be as large as 1GiB, containing at most 64K metadata items.
>>
>> In that case, if CoW added and the deleted one metadata item near the end
>> of the block group, then it's completely possible we don't need to touch
>> the block group item at all.
>>
>> [BENCHMARK]
>>
>> To properly benchmark how many block group items we skipped the update,
>> I added some members into btrfs_tranaction to record how many times
>> update_block_group_item() is called, and how many of them are skipped.
>>
>> Then with a single line fio to trigger the workload on a newly created
>> btrfs:
>>
>>    fio --filename=$mnt/file --size=4G --rw=randrw --bs=32k --ioengine=libaio \
>>        --direct=1 --iodepth=64 --runtime=120 --numjobs=4 --name=random_rw \
>>        --fallocate=posix
>>
>> Then I got 101 transaction committed during that fio command, and a
>> total of 2062 update_block_group_item() calls, in which 1241 can be
>> skipped.
>>
>> This is already a good 60% got skipped.
>>
>> The full analyse can be found here:
>> https://docs.google.com/spreadsheets/d/e/2PACX-1vTbjhqqqxoebnQM_ZJzSM1rF7EY7I1IRbAzZjv19imcDHsVDA7qeA_-MzXxltFZ0kHBjxMA15s2CSH8/pubhtml
>>
>> Furthermore, since I have per-trans accounting, it shows that initially
>> we have a very low percentage of skipped block group item update.
>>
>> This is expected since we're inserting a lot of new file extents
>> initially, thus the metadata usage is going to increase.
>>
>> But after the initial 3 transactions, all later transactions are have a
>> very stable 40~80% skip rate, mostly proving the patch is working.
>>
>> Although such high skip rate is not always a huge win, as for
>> our later block group tree feature, we will have a much higher chance to
>> have a block group item already COWed, thus can skip some COW work.
>>
>> But still, skipping a full COW search on extent tree is always a good
>> thing, especially when the benefit almost comes from thin-air.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> [Josef pinned down the race and provided a fix]
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>
> Generally I like this change, any time we can avoid a tree search will be good.
> I would like to see if this makes any difference timing wise.  Could you record
> transaction commit times for this job with and without your change?

Sure, would add extra benchmarking the transaction commit time consumption.

However previous fio performance tests show no observable change in
performance numbers, so just in case I would also record the total and
individual execution time for update_block_group_item() function just in
case.

Thanks,
Qu

> That would
> likely show a difference.  In any case the code is fine
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>
> Thanks,
>
> Josef

     prev parent reply	other threads:[~2022-09-08 22:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-08  6:33 [PATCH v2] btrfs: don't update the block group item if used bytes are the same Qu Wenruo
2022-09-08  6:42 ` Qu Wenruo
2022-09-08 10:31 ` Filipe Manana
2022-09-08 11:41   ` Qu Wenruo
2022-09-08 12:18     ` Filipe Manana
2022-09-08 13:17 ` Josef Bacik
2022-09-08 22:35   ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91c7d385-4687-efd9-2c9a-bd02afae6141@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox