linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ethanlien <ethanlien@synology.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-btrfs@vger.kernel.org, linux-btrfs-owner@vger.kernel.org
Subject: Re: [PATCH v2] btrfs: use customized batch size for total_bytes_pinned
Date: Fri, 13 Jul 2018 10:27:36 +0800	[thread overview]
Message-ID: <5de5cbc9058ec891dc7e777c46687dab@synology.com> (raw)
In-Reply-To: <20180712221936.GA8064@vader>

Omar Sandoval 於 2018-07-13 06:19 寫到:
> On Wed, Jul 11, 2018 at 11:59:36PM +0800, Ethan Lien wrote:
>> In commit b150a4f10d878 ("Btrfs: use a percpu to keep track of 
>> possibly
>> pinned bytes") we use total_bytes_pinned to track how many bytes we 
>> are
>> going to free in this transaction. When we are close to ENOSPC, we 
>> check it
>> and know if we can make the allocation by commit the current 
>> transaction.
>> For every data/metadata extent we are going to free, we add
>> total_bytes_pinned in btrfs_free_extent() and btrfs_free_tree_block(), 
>> and
>> release it in unpin_extent_range() when we finish the transaction. So 
>> this
>> is a variable we frequently update but rarely read - just the suitable
>> use of percpu_counter. But in previous commit we update 
>> total_bytes_pinned
>> by default 32 batch size, making every update essentially a spin lock
>> protected update. Since every spin lock/unlock operation involves 
>> syncing
>> a globally used variable and some kind of barrier in a SMP system, 
>> this is
>> more expensive than using total_bytes_pinned as a simple atomic64_t. 
>> So
>> fix this by using a customized batch size. Since we only read
>> total_bytes_pinned when we are close to ENOSPC and fail to alloc new 
>> chunk,
>> we can use a really large batch size and have nearly no penalty in 
>> most
>> cases.
>> 
>> 
>> [Test]
>> We test the patch on a 4-cores x86 machine:
>> 1. falloate a 16GiB size test file.
>> 2. take snapshot (so all following writes will be cow write).
>> 3. run a 180 sec, 4 jobs, 4K random write fio on test file.
>> 
>> We also add a temporary lockdep class on percpu_counter's spin lock 
>> used
>> by total_bytes_pinned to track lock_stat.
>> 
>> 
>> [Results]
>> unpatched:
>> lock_stat version 0.4
>> -----------------------------------------------------------------------
>>                               class name    con-bounces    contentions
>> waittime-min   waittime-max waittime-total   waittime-avg    
>> acq-bounces
>> acquisitions   holdtime-min   holdtime-max holdtime-total   
>> holdtime-avg
>> 
>>                total_bytes_pinned_percpu:            82             82
>>         0.21           0.61          29.46           0.36         
>> 298340
>>       635973           0.09          11.01      173476.25           
>> 0.27
>> 
>> 
>> patched:
>> lock_stat version 0.4
>> -----------------------------------------------------------------------
>>                               class name    con-bounces    contentions
>> waittime-min   waittime-max waittime-total   waittime-avg    
>> acq-bounces
>> acquisitions   holdtime-min   holdtime-max holdtime-total   
>> holdtime-avg
>> 
>>                total_bytes_pinned_percpu:             1              1
>>         0.62           0.62           0.62           0.62          
>> 13601
>>        31542           0.14           9.61       11016.90           
>> 0.35
>> 
>> 
>> [Analysis]
>> Since the spin lock only protect a single in-memory variable, the
>> contentions (number of lock acquisitions that had to wait) in both
>> unpatched and patched version are low. But when we see acquisitions 
>> and
>> acq-bounces, we get much lower counts in patched version. Here the 
>> most
>> important metric is acq-bounces. It means how many times the lock get
>> transferred between different cpus, so the patch can really recude
>> cacheline bouncing of spin lock (also the global counter of 
>> percpu_counter)
>> in a SMP system.
>> 
>> Fixes: b150a4f10d878 ("Btrfs: use a percpu to keep track of possibly
>> pinned bytes")
>> 
>> Signed-off-by: Ethan Lien <ethanlien@synology.com>
>> ---
>> 
>> V2:
>> 	Rewrite commit comments.
>> 	Add lock_stat test.
>> 	Pull dirty_metadata_bytes out to a separate patch.
>> 
>>  fs/btrfs/ctree.h       |  1 +
>>  fs/btrfs/extent-tree.c | 46 
>> ++++++++++++++++++++++++++++--------------
>>  2 files changed, 32 insertions(+), 15 deletions(-)
>> 
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 118346aceea9..df682a521635 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -422,6 +422,7 @@ struct btrfs_space_info {
>>  	 * time the transaction commits.
>>  	 */
>>  	struct percpu_counter total_bytes_pinned;
>> +	s32 total_bytes_pinned_batch;
> 
> Can this just be a constant instead of adding it to space_info?

Yes constant is better here, I'll resend it, thanks.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" 
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


      reply	other threads:[~2018-07-13  2:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 15:59 [PATCH v2] btrfs: use customized batch size for total_bytes_pinned Ethan Lien
2018-07-12  7:07 ` Nikolay Borisov
2018-07-12 17:13   ` ethanlien
2018-07-12 22:19 ` Omar Sandoval
2018-07-13  2:27   ` ethanlien [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5de5cbc9058ec891dc7e777c46687dab@synology.com \
    --to=ethanlien@synology.com \
    --cc=linux-btrfs-owner@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).