linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Josef Bacik <josef@toxicpanda.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 0/9] Improve preemptive ENOSPC flushing
Date: Tue, 6 Oct 2020 15:55:53 +0300	[thread overview]
Message-ID: <4ad555d4-70a9-e3a7-3b16-bbf77f123a3e@suse.com> (raw)
In-Reply-To: <cover.1601495426.git.josef@toxicpanda.com>



On 30.09.20 г. 23:01 ч., Josef Bacik wrote:
> Hello,
> 
> A while ago Nikolay started digging into a problem where they were seeing an
> around 20% regression on random writes, and he bisected it down to
> 
>   btrfs: don't end the transaction for delayed refs in throttle
> 
> However this wasn't actually the cause of the problem.
> 
> This patch removed the code that would preemptively end the transactions if we
> were low on space.  Because we had just introduced the ticketing code, this was
> no longer necessary and was causing a lot of transaction commits.
> 
> And in Nikolay's testing he validated this, we would see like 100x more
> transaction commits without that patch than with it, but the write regression
> clearly appeared when this patch was applied.
> 
> The root cause of this is that the transaction commits were essentially
> happening so quickly that we didn't end up needing to wait on space in the
> ENOSPC ticketing code as much, and thus were able to write pretty quickly.  With
> this gone, we now were getting a sawtoothy sort of behavior where we'd run up,
> stop while we flushed metadata space, run some more, stop again etc.
> 
> When I implemented the ticketing infrastructure, I was trying to get us out of
> excessively flushing space because we would sometimes over create block groups,
> and thus short circuited flushing if we no longer had tickets.  This had the
> side effect of breaking the preemptive flushing code, where we attempted to
> flush space in the background before we were forced to wait for space.
> 
> Enter this patchset.  We still have some of this preemption logic sprinkled
> everywhere, so I've separated it out of the normal ticketed flushing code, and
> made preemptive flushing it's own thing.
> 
> The preemptive flushing logic is more specialized than the standard flushing
> logic.  It attempts to flush in whichever pool has the highest usage.  This
> means that if most of our space is tied up in pinned extents, we'll commit the
> transaction.  If most of the space is tied up in delalloc, we'll flush delalloc,
> etc.
> 
> To test this out I used the fio job that Nikolay used, this needs to be adjusted
> so the overall IO size is at least 2x the RAM size for the box you are testing
> 
> fio --direct=0 --ioengine=sync --thread --directory=/mnt/test --invalidate=1 \
>         --group_reporting=1 --runtime=300 --fallocate=none --ramp_time=10 \
>         --name=RandomWrites-async-64512-4k-4 --new_group --rw=randwrite \
>         --size=2g --numjobs=4 --bs=4k --fsync_on_close=0 --end_fsync=0 \
>         --filename_format=FioWorkloads.\$jobnum
> 
> I got the following results
> 
> misc-next:	bw=13.4MiB/s (14.0MB/s), 13.4MiB/s-13.4MiB/s (14.0MB/s-14.0MB/s), io=4015MiB (4210MB), run=300323-300323msec
> pre-throttling:	bw=16.9MiB/s (17.7MB/s), 16.9MiB/s-16.9MiB/s (17.7MB/s-17.7MB/s), io=5068MiB (5314MB), run=300069-300069msec
> my patches:	bw=18.0MiB/s (18.9MB/s), 18.0MiB/s-18.0MiB/s (18.9MB/s-18.9MB/s), io=5403MiB (5666MB), run=300001-300001msec
> 
> Thanks,
> 
> Josef
> 
> Josef Bacik (9):
>   btrfs: add a trace point for reserve tickets
>   btrfs: improve preemptive background space flushing
>   btrfs: rename need_do_async_reclaim
>   btrfs: check reclaim_size in need_preemptive_reclaim
>   btrfs: rework btrfs_calc_reclaim_metadata_size
>   btrfs: simplify the logic in need_preemptive_flushing
>   btrfs: implement space clamping for preemptive flushing
>   btrfs: adjust the flush trace point to include the source
>   btrfs: add a trace class for dumping the current ENOSPC state
> 
>  fs/btrfs/ctree.h             |   1 +
>  fs/btrfs/disk-io.c           |   1 +
>  fs/btrfs/space-info.c        | 216 +++++++++++++++++++++++++++++------
>  fs/btrfs/space-info.h        |   3 +
>  include/trace/events/btrfs.h | 112 +++++++++++++++++-
>  5 files changed, 292 insertions(+), 41 deletions(-)
> 



Here are my results from testing misc-next/your patches/4.19 for buffered io: 

With your patches:
 WRITE: bw=29.1MiB/s (30.5MB/s), 29.1MiB/s-29.1MiB/s (30.5MB/s-30.5MB/s), io=8192MiB (8590MB), run=281678-281678msec
 WRITE: bw=30.0MiB/s (32.5MB/s), 30.0MiB/s-30.0MiB/s (32.5MB/s-32.5MB/s), io=8192MiB (8590MB), run=264337-264337msec
 WRITE: bw=29.6MiB/s (31.1MB/s), 29.6MiB/s-29.6MiB/s (31.1MB/s-31.1MB/s), io=8192MiB (8590MB), run=276312-276312msec
 WRITE: bw=29.8MiB/s (31.2MB/s), 29.8MiB/s-29.8MiB/s (31.2MB/s-31.2MB/s), io=8192MiB (8590MB), run=274916-274916msec
 WRITE: bw=30.4MiB/s (31.9MB/s), 30.4MiB/s-30.4MiB/s (31.9MB/s-31.9MB/s), io=8192MiB (8590MB), run=269030-269030msec

Without-misc-next:
  WRITE: bw=20.2MiB/s (21.2MB/s), 20.2MiB/s-20.2MiB/s (21.2MB/s-21.2MB/s), io=8192MiB (8590MB), run=404831-404831msec
  WRITE: bw=20.8MiB/s (21.8MB/s), 20.8MiB/s-20.8MiB/s (21.8MB/s-21.8MB/s), io=8192MiB (8590MB), run=394749-394749msec
  WRITE: bw=20.8MiB/s (21.8MB/s), 20.8MiB/s-20.8MiB/s (21.8MB/s-21.8MB/s), io=8192MiB (8590MB), run=393291-393291msec
  WRITE: bw=20.7MiB/s (21.8MB/s), 20.7MiB/s-20.7MiB/s (21.8MB/s-21.8MB/s), io=8192MiB (8590MB), run=394918-394918msec
  WRITE: bw=21.1MiB/s (22.1MB/s), 21.1MiB/s-21.1MiB/s (22.1MB/s-22.1MB/s), io=8192MiB (8590MB), run=388499-388499msec

4.19.x:
  WRITE: bw=23.3MiB/s (24.4MB/s), 23.3MiB/s-23.3MiB/s (24.4MB/s-24.4MB/s), io=6387MiB (6697MB), run=274460-274460msec
  WRITE: bw=23.3MiB/s (24.5MB/s), 23.3MiB/s-23.3MiB/s (24.5MB/s-24.5MB/s), io=6643MiB (6966MB), run=284518-284518msec
  WRITE: bw=23.4MiB/s (24.5MB/s), 23.4MiB/s-23.4MiB/s (24.5MB/s-24.5MB/s), io=6643MiB (6966MB), run=284372-284372msec
  WRITE: bw=23.6MiB/s (24.7MB/s), 23.6MiB/s-23.6MiB/s (24.7MB/s-24.7MB/s), io=6387MiB (6697MB), run=271200-271200msec
  WRITE: bw=23.4MiB/s (24.6MB/s), 23.4MiB/s-23.4MiB/s (24.6MB/s-24.6MB/s), io=6387MiB (6697MB), run=272670-272670msec

So there is indeed a net-gain in performance, however, (expectedly, due to the increased number of transaction commits) there is a regressions in DIO: 


DIO-josef:
  WRITE: bw=47.1MiB/s (49.4MB/s), 47.1MiB/s-47.1MiB/s (49.4MB/s-49.4MB/s), io=8192MiB (8590MB), run=174049-174049msec
  WRITE: bw=48.5MiB/s (50.8MB/s), 48.5MiB/s-48.5MiB/s (50.8MB/s-50.8MB/s), io=8192MiB (8590MB), run=169045-169045msec
  WRITE: bw=45.0MiB/s (48.2MB/s), 45.0MiB/s-45.0MiB/s (48.2MB/s-48.2MB/s), io=8192MiB (8590MB), run=178196-178196msec
  WRITE: bw=46.1MiB/s (48.3MB/s), 46.1MiB/s-46.1MiB/s (48.3MB/s-48.3MB/s), io=8192MiB (8590MB), run=177861-177861msec
  WRITE: bw=46.4MiB/s (48.7MB/s), 46.4MiB/s-46.4MiB/s (48.7MB/s-48.7MB/s), io=8192MiB (8590MB), run=176376-176376msec

DIO-misc-next:
  WRITE: bw=50.1MiB/s (52.6MB/s), 50.1MiB/s-50.1MiB/s (52.6MB/s-52.6MB/s), io=8192MiB (8590MB), run=163365-163365msec
  WRITE: bw=50.3MiB/s (52.8MB/s), 50.3MiB/s-50.3MiB/s (52.8MB/s-52.8MB/s), io=8192MiB (8590MB), run=162753-162753msec
  WRITE: bw=50.6MiB/s (53.1MB/s), 50.6MiB/s-50.6MiB/s (53.1MB/s-53.1MB/s), io=8192MiB (8590MB), run=161766-161766msec
  WRITE: bw=50.2MiB/s (52.7MB/s), 50.2MiB/s-50.2MiB/s (52.7MB/s-52.7MB/s), io=8192MiB (8590MB), run=163074-163074msec
  WRITE: bw=50.5MiB/s (52.9MB/s), 50.5MiB/s-50.5MiB/s (52.9MB/s-52.9MB/s), io=8192MiB (8590MB), run=162252-162252msec


With that: 

Tested-by: Nikolay Borisov <nborisov@suse.com>


      parent reply	other threads:[~2020-10-06 12:55 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-30 20:01 [PATCH 0/9] Improve preemptive ENOSPC flushing Josef Bacik
2020-09-30 20:01 ` [PATCH 1/9] btrfs: add a trace point for reserve tickets Josef Bacik
2020-10-01  5:54   ` Nikolay Borisov
2020-10-01 21:33     ` Josef Bacik
2020-09-30 20:01 ` [PATCH 2/9] btrfs: improve preemptive background space flushing Josef Bacik
2020-10-01 13:19   ` Nikolay Borisov
2020-10-01 21:35     ` Josef Bacik
2020-09-30 20:01 ` [PATCH 3/9] btrfs: rename need_do_async_reclaim Josef Bacik
2020-10-01 13:20   ` Nikolay Borisov
2020-10-01 13:24   ` Nikolay Borisov
2020-10-01 21:37     ` Josef Bacik
2020-09-30 20:01 ` [PATCH 4/9] btrfs: check reclaim_size in need_preemptive_reclaim Josef Bacik
2020-10-01 13:23   ` Nikolay Borisov
2020-10-01 21:36     ` Josef Bacik
2020-09-30 20:01 ` [PATCH 5/9] btrfs: rework btrfs_calc_reclaim_metadata_size Josef Bacik
2020-10-01 13:59   ` Nikolay Borisov
2020-10-01 21:38     ` Josef Bacik
2020-09-30 20:01 ` [PATCH 6/9] btrfs: simplify the logic in need_preemptive_flushing Josef Bacik
2020-10-01 14:09   ` Nikolay Borisov
2020-10-01 21:40     ` Josef Bacik
2020-10-02  7:13   ` Nikolay Borisov
2020-09-30 20:01 ` [PATCH 7/9] btrfs: implement space clamping for preemptive flushing Josef Bacik
2020-10-01 14:49   ` Nikolay Borisov
2020-10-01 21:41     ` Josef Bacik
2020-09-30 20:01 ` [PATCH 8/9] btrfs: adjust the flush trace point to include the source Josef Bacik
2020-10-01 15:32   ` Nikolay Borisov
2020-09-30 20:01 ` [PATCH 9/9] btrfs: add a trace class for dumping the current ENOSPC state Josef Bacik
2020-10-02  8:30   ` Nikolay Borisov
2020-10-02 13:45     ` Josef Bacik
2020-10-06 12:55 ` Nikolay Borisov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ad555d4-70a9-e3a7-3b16-bbf77f123a3e@suse.com \
    --to=nborisov@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).