From: Nikolay Borisov <nborisov@suse.com>
To: Josef Bacik <josef@toxicpanda.com>,
linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 0/9] Improve preemptive ENOSPC flushing
Date: Tue, 6 Oct 2020 15:55:53 +0300 [thread overview]
Message-ID: <4ad555d4-70a9-e3a7-3b16-bbf77f123a3e@suse.com> (raw)
In-Reply-To: <cover.1601495426.git.josef@toxicpanda.com>
On 30.09.20 г. 23:01 ч., Josef Bacik wrote:
> Hello,
>
> A while ago Nikolay started digging into a problem where they were seeing an
> around 20% regression on random writes, and he bisected it down to
>
> btrfs: don't end the transaction for delayed refs in throttle
>
> However this wasn't actually the cause of the problem.
>
> This patch removed the code that would preemptively end the transactions if we
> were low on space. Because we had just introduced the ticketing code, this was
> no longer necessary and was causing a lot of transaction commits.
>
> And in Nikolay's testing he validated this, we would see like 100x more
> transaction commits without that patch than with it, but the write regression
> clearly appeared when this patch was applied.
>
> The root cause of this is that the transaction commits were essentially
> happening so quickly that we didn't end up needing to wait on space in the
> ENOSPC ticketing code as much, and thus were able to write pretty quickly. With
> this gone, we now were getting a sawtoothy sort of behavior where we'd run up,
> stop while we flushed metadata space, run some more, stop again etc.
>
> When I implemented the ticketing infrastructure, I was trying to get us out of
> excessively flushing space because we would sometimes over create block groups,
> and thus short circuited flushing if we no longer had tickets. This had the
> side effect of breaking the preemptive flushing code, where we attempted to
> flush space in the background before we were forced to wait for space.
>
> Enter this patchset. We still have some of this preemption logic sprinkled
> everywhere, so I've separated it out of the normal ticketed flushing code, and
> made preemptive flushing it's own thing.
>
> The preemptive flushing logic is more specialized than the standard flushing
> logic. It attempts to flush in whichever pool has the highest usage. This
> means that if most of our space is tied up in pinned extents, we'll commit the
> transaction. If most of the space is tied up in delalloc, we'll flush delalloc,
> etc.
>
> To test this out I used the fio job that Nikolay used, this needs to be adjusted
> so the overall IO size is at least 2x the RAM size for the box you are testing
>
> fio --direct=0 --ioengine=sync --thread --directory=/mnt/test --invalidate=1 \
> --group_reporting=1 --runtime=300 --fallocate=none --ramp_time=10 \
> --name=RandomWrites-async-64512-4k-4 --new_group --rw=randwrite \
> --size=2g --numjobs=4 --bs=4k --fsync_on_close=0 --end_fsync=0 \
> --filename_format=FioWorkloads.\$jobnum
>
> I got the following results
>
> misc-next: bw=13.4MiB/s (14.0MB/s), 13.4MiB/s-13.4MiB/s (14.0MB/s-14.0MB/s), io=4015MiB (4210MB), run=300323-300323msec
> pre-throttling: bw=16.9MiB/s (17.7MB/s), 16.9MiB/s-16.9MiB/s (17.7MB/s-17.7MB/s), io=5068MiB (5314MB), run=300069-300069msec
> my patches: bw=18.0MiB/s (18.9MB/s), 18.0MiB/s-18.0MiB/s (18.9MB/s-18.9MB/s), io=5403MiB (5666MB), run=300001-300001msec
>
> Thanks,
>
> Josef
>
> Josef Bacik (9):
> btrfs: add a trace point for reserve tickets
> btrfs: improve preemptive background space flushing
> btrfs: rename need_do_async_reclaim
> btrfs: check reclaim_size in need_preemptive_reclaim
> btrfs: rework btrfs_calc_reclaim_metadata_size
> btrfs: simplify the logic in need_preemptive_flushing
> btrfs: implement space clamping for preemptive flushing
> btrfs: adjust the flush trace point to include the source
> btrfs: add a trace class for dumping the current ENOSPC state
>
> fs/btrfs/ctree.h | 1 +
> fs/btrfs/disk-io.c | 1 +
> fs/btrfs/space-info.c | 216 +++++++++++++++++++++++++++++------
> fs/btrfs/space-info.h | 3 +
> include/trace/events/btrfs.h | 112 +++++++++++++++++-
> 5 files changed, 292 insertions(+), 41 deletions(-)
>
Here are my results from testing misc-next/your patches/4.19 for buffered io:
With your patches:
WRITE: bw=29.1MiB/s (30.5MB/s), 29.1MiB/s-29.1MiB/s (30.5MB/s-30.5MB/s), io=8192MiB (8590MB), run=281678-281678msec
WRITE: bw=30.0MiB/s (32.5MB/s), 30.0MiB/s-30.0MiB/s (32.5MB/s-32.5MB/s), io=8192MiB (8590MB), run=264337-264337msec
WRITE: bw=29.6MiB/s (31.1MB/s), 29.6MiB/s-29.6MiB/s (31.1MB/s-31.1MB/s), io=8192MiB (8590MB), run=276312-276312msec
WRITE: bw=29.8MiB/s (31.2MB/s), 29.8MiB/s-29.8MiB/s (31.2MB/s-31.2MB/s), io=8192MiB (8590MB), run=274916-274916msec
WRITE: bw=30.4MiB/s (31.9MB/s), 30.4MiB/s-30.4MiB/s (31.9MB/s-31.9MB/s), io=8192MiB (8590MB), run=269030-269030msec
Without-misc-next:
WRITE: bw=20.2MiB/s (21.2MB/s), 20.2MiB/s-20.2MiB/s (21.2MB/s-21.2MB/s), io=8192MiB (8590MB), run=404831-404831msec
WRITE: bw=20.8MiB/s (21.8MB/s), 20.8MiB/s-20.8MiB/s (21.8MB/s-21.8MB/s), io=8192MiB (8590MB), run=394749-394749msec
WRITE: bw=20.8MiB/s (21.8MB/s), 20.8MiB/s-20.8MiB/s (21.8MB/s-21.8MB/s), io=8192MiB (8590MB), run=393291-393291msec
WRITE: bw=20.7MiB/s (21.8MB/s), 20.7MiB/s-20.7MiB/s (21.8MB/s-21.8MB/s), io=8192MiB (8590MB), run=394918-394918msec
WRITE: bw=21.1MiB/s (22.1MB/s), 21.1MiB/s-21.1MiB/s (22.1MB/s-22.1MB/s), io=8192MiB (8590MB), run=388499-388499msec
4.19.x:
WRITE: bw=23.3MiB/s (24.4MB/s), 23.3MiB/s-23.3MiB/s (24.4MB/s-24.4MB/s), io=6387MiB (6697MB), run=274460-274460msec
WRITE: bw=23.3MiB/s (24.5MB/s), 23.3MiB/s-23.3MiB/s (24.5MB/s-24.5MB/s), io=6643MiB (6966MB), run=284518-284518msec
WRITE: bw=23.4MiB/s (24.5MB/s), 23.4MiB/s-23.4MiB/s (24.5MB/s-24.5MB/s), io=6643MiB (6966MB), run=284372-284372msec
WRITE: bw=23.6MiB/s (24.7MB/s), 23.6MiB/s-23.6MiB/s (24.7MB/s-24.7MB/s), io=6387MiB (6697MB), run=271200-271200msec
WRITE: bw=23.4MiB/s (24.6MB/s), 23.4MiB/s-23.4MiB/s (24.6MB/s-24.6MB/s), io=6387MiB (6697MB), run=272670-272670msec
So there is indeed a net-gain in performance, however, (expectedly, due to the increased number of transaction commits) there is a regressions in DIO:
DIO-josef:
WRITE: bw=47.1MiB/s (49.4MB/s), 47.1MiB/s-47.1MiB/s (49.4MB/s-49.4MB/s), io=8192MiB (8590MB), run=174049-174049msec
WRITE: bw=48.5MiB/s (50.8MB/s), 48.5MiB/s-48.5MiB/s (50.8MB/s-50.8MB/s), io=8192MiB (8590MB), run=169045-169045msec
WRITE: bw=45.0MiB/s (48.2MB/s), 45.0MiB/s-45.0MiB/s (48.2MB/s-48.2MB/s), io=8192MiB (8590MB), run=178196-178196msec
WRITE: bw=46.1MiB/s (48.3MB/s), 46.1MiB/s-46.1MiB/s (48.3MB/s-48.3MB/s), io=8192MiB (8590MB), run=177861-177861msec
WRITE: bw=46.4MiB/s (48.7MB/s), 46.4MiB/s-46.4MiB/s (48.7MB/s-48.7MB/s), io=8192MiB (8590MB), run=176376-176376msec
DIO-misc-next:
WRITE: bw=50.1MiB/s (52.6MB/s), 50.1MiB/s-50.1MiB/s (52.6MB/s-52.6MB/s), io=8192MiB (8590MB), run=163365-163365msec
WRITE: bw=50.3MiB/s (52.8MB/s), 50.3MiB/s-50.3MiB/s (52.8MB/s-52.8MB/s), io=8192MiB (8590MB), run=162753-162753msec
WRITE: bw=50.6MiB/s (53.1MB/s), 50.6MiB/s-50.6MiB/s (53.1MB/s-53.1MB/s), io=8192MiB (8590MB), run=161766-161766msec
WRITE: bw=50.2MiB/s (52.7MB/s), 50.2MiB/s-50.2MiB/s (52.7MB/s-52.7MB/s), io=8192MiB (8590MB), run=163074-163074msec
WRITE: bw=50.5MiB/s (52.9MB/s), 50.5MiB/s-50.5MiB/s (52.9MB/s-52.9MB/s), io=8192MiB (8590MB), run=162252-162252msec
With that:
Tested-by: Nikolay Borisov <nborisov@suse.com>
prev parent reply other threads:[~2020-10-06 12:55 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-30 20:01 [PATCH 0/9] Improve preemptive ENOSPC flushing Josef Bacik
2020-09-30 20:01 ` [PATCH 1/9] btrfs: add a trace point for reserve tickets Josef Bacik
2020-10-01 5:54 ` Nikolay Borisov
2020-10-01 21:33 ` Josef Bacik
2020-09-30 20:01 ` [PATCH 2/9] btrfs: improve preemptive background space flushing Josef Bacik
2020-10-01 13:19 ` Nikolay Borisov
2020-10-01 21:35 ` Josef Bacik
2020-09-30 20:01 ` [PATCH 3/9] btrfs: rename need_do_async_reclaim Josef Bacik
2020-10-01 13:20 ` Nikolay Borisov
2020-10-01 13:24 ` Nikolay Borisov
2020-10-01 21:37 ` Josef Bacik
2020-09-30 20:01 ` [PATCH 4/9] btrfs: check reclaim_size in need_preemptive_reclaim Josef Bacik
2020-10-01 13:23 ` Nikolay Borisov
2020-10-01 21:36 ` Josef Bacik
2020-09-30 20:01 ` [PATCH 5/9] btrfs: rework btrfs_calc_reclaim_metadata_size Josef Bacik
2020-10-01 13:59 ` Nikolay Borisov
2020-10-01 21:38 ` Josef Bacik
2020-09-30 20:01 ` [PATCH 6/9] btrfs: simplify the logic in need_preemptive_flushing Josef Bacik
2020-10-01 14:09 ` Nikolay Borisov
2020-10-01 21:40 ` Josef Bacik
2020-10-02 7:13 ` Nikolay Borisov
2020-09-30 20:01 ` [PATCH 7/9] btrfs: implement space clamping for preemptive flushing Josef Bacik
2020-10-01 14:49 ` Nikolay Borisov
2020-10-01 21:41 ` Josef Bacik
2020-09-30 20:01 ` [PATCH 8/9] btrfs: adjust the flush trace point to include the source Josef Bacik
2020-10-01 15:32 ` Nikolay Borisov
2020-09-30 20:01 ` [PATCH 9/9] btrfs: add a trace class for dumping the current ENOSPC state Josef Bacik
2020-10-02 8:30 ` Nikolay Borisov
2020-10-02 13:45 ` Josef Bacik
2020-10-06 12:55 ` Nikolay Borisov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ad555d4-70a9-e3a7-3b16-bbf77f123a3e@suse.com \
--to=nborisov@suse.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).