public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v4 0/4] btrfs: improve stalls under sudden writeback
Date: Mon, 13 Apr 2026 20:41:43 +0200	[thread overview]
Message-ID: <20260413184143.GD12792@twin.jikos.cz> (raw)
In-Reply-To: <cover.1775756789.git.boris@bur.io>

On Thu, Apr 09, 2026 at 10:48:47AM -0700, Boris Burkov wrote:
> If you have a system with very large memory (TiBs) and a normal
> percentage based dirty_ratio/dirty_background_ratio like the defaults of
> 20%/10%, then we can theoretically rack up 100s of GiB of dirty pages
> before doing any writeback. This is further exacerbated if we also see a
> sudden drop in the free memory due to a large allocation. If we
> (relatively likely for a large ram system) also have a large disk, we are
> unlikely to do trigger much preemptive metadata reclaim either.
> 
> Once we do start doing writeback with such a large supply, the results
> are somewhat ugly. The delalloc work generates a huge amount of delayed
> refs without proper reservations which sends the metadata space system
> into a tailspin trying to run yet more delalloc to free space.
> Ultimately, the system stalls waiting for huge amounts of ordered
> extents and delayed refs blocking all users in start_transaction() on
> tickets in reserve_space().
> 
> This patch series aims to address these issues in a relatively targeted
> way by improving our reservations for delalloc delayed refs and by doing
> some very basic smoothing of the work in flush_space(). Further work
> could be done to improve flush_space() heuristics and latency but this
> is already a big help on my observed workloads.
> 
> I was able to reproduce stalls on a more "modest" system with 264GiB of
> ram by using a somewhat silly 80% dirty_ratio.
> 
> I was unfortunately unable to reproduce any stalls on a yet smaller
> system with only 32GiB of ram.
> 
> The first 2 patches do the delayed_ref rsv accounting on btrfs_inode,
> mirroring inode->block_rsv.
> The 3th patch is a cleanup to the types counting max extents
> The 4th patch reduces the size of the unit of work in shrink_delalloc()
> to further reduce stalls.
> ---
> Changelog:
> v4:
> - Treat the extent tree data delayed ref as needing reservation for two cow
>   operations.

As this has been reviewed by Filipe, please add it to for-next.  Thanks.

      parent reply	other threads:[~2026-04-13 18:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 17:48 [PATCH v4 0/4] btrfs: improve stalls under sudden writeback Boris Burkov
2026-04-09 17:48 ` [PATCH v4 1/4] btrfs: reserve space for delayed_refs in delalloc Boris Burkov
2026-04-10 16:07   ` Filipe Manana
2026-04-09 17:48 ` [PATCH v4 2/4] btrfs: account for compression in delalloc extent reservation Boris Burkov
2026-04-09 17:48 ` [PATCH v4 3/4] btrfs: make inode->outstanding_extents a u64 Boris Burkov
2026-04-13 18:43   ` David Sterba
2026-04-09 17:48 ` [PATCH v4 4/4] btrfs: cap shrink_delalloc iterations to 128M Boris Burkov
2026-04-13 18:41 ` David Sterba [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260413184143.GD12792@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=boris@bur.io \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox