From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2A98CD37B0 for ; Mon, 18 Sep 2023 20:14:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229605AbjIRUOG (ORCPT ); Mon, 18 Sep 2023 16:14:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbjIRUOF (ORCPT ); Mon, 18 Sep 2023 16:14:05 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C4D28F for ; Mon, 18 Sep 2023 13:13:59 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 8EF1D3200920; Mon, 18 Sep 2023 16:13:56 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Mon, 18 Sep 2023 16:13:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1695068036; x=1695154436; bh=z8 LEFwuNjm/y52/7g+KgOrA5jlmplYSq0HWofYt+TOA=; b=qJED2a4ZKec0TN5irT SMSaWN0ngN41aBLYfwhKw9VM1xqF0+xdYqo9/MEhzelxj3Za6hhlc86gKWHOMDER bILUSMDeslfNDZgivb3DvnEv91Zfs6zrw+HMhfhOGpY4aGohsqR/tRAl3Ucktqkl I4BhdHT5diBqz/qRSSBWJkCVu+pmqbKQmEBqlHNFXYLtJdamL2s3zvgD4sTMg3H3 Pv8RjXGviwJ9lQIuFyJO1qUaqH25qZYp4xGPfvEBANbCJrd0UVaL9Qt2K7ZHwmms iMLkRa55W5gXiyCH3zmYnNwd05lDDEdbcHAgvskQCex4E9tNC7h7WPyoap9HNDdP TDvw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1695068036; x=1695154436; bh=z8LEFwuNjm/y5 2/7g+KgOrA5jlmplYSq0HWofYt+TOA=; b=bMqoR62NE1blpIRx7ypTpfqhxhIml 2BG4RhV+xJ4kf0X0UcDS+CsF+9JLuo2xF8aWMqcO4ln3ZV3QCI2lo5UNa1yvNRDi LisnSkx7YvXCo4GEYFRqktq4snHOug6PmKFct8cC5e2RhimpktBIjB9E9FLdXQys cLcin7YPh424sfG1nVVkUv7z/2vIyPlTYpiPQpmaPoOP2T8202oq10lJjU0yQOIu l2XlEiVUonr1n8YJFWzNIADfZYas00TmLDiNDQHfLQdwHCF4I1AvucU/IsbZyOGw lUA+r9a3UOwaFKyk0+n43+TlYhgMu2WQ8vqmbcRm+3a0aTWvoNKWdjRqg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudejkedgudegjecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpeffhffvvefukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpeeuohhr ihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhioheqnecuggftrfgrthhtvghrnh epkedvkeffjeellefhveehvdejudfhjedthfdvveeiieeiudfguefgtdejgfefleejnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepsghorhhish essghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 18 Sep 2023 16:13:55 -0400 (EDT) Date: Mon, 18 Sep 2023 13:14:41 -0700 From: Boris Burkov To: Josef Bacik Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] btrfs: adjust overcommit logic when very close to full Message-ID: <20230918201441.GA299788@zen> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Mon, Sep 18, 2023 at 03:27:47PM -0400, Josef Bacik wrote: > A user reported some unpleasant behavior with very small file systems. > The reproducer is this > > mkfs.btrfs -f -m single -b 8g /dev/vdb > mount /dev/vdb /mnt/test > dd if=/dev/zero of=/mnt/test/testfile bs=512M count=20 > > This will result in usage that looks like this > > Overall: > Device size: 8.00GiB > Device allocated: 8.00GiB > Device unallocated: 1.00MiB > Device missing: 0.00B > Device slack: 2.00GiB > Used: 5.47GiB > Free (estimated): 2.52GiB (min: 2.52GiB) > Free (statfs, df): 0.00B > Data ratio: 1.00 > Metadata ratio: 1.00 > Global reserve: 5.50MiB (used: 0.00B) > Multiple profiles: no > > Data,single: Size:7.99GiB, Used:5.46GiB (68.41%) > /dev/vdb 7.99GiB > > Metadata,single: Size:8.00MiB, Used:5.77MiB (72.07%) > /dev/vdb 8.00MiB > > System,single: Size:4.00MiB, Used:16.00KiB (0.39%) > /dev/vdb 4.00MiB > > Unallocated: > /dev/vdb 1.00MiB > > As you can see we've gotten ourselves quite full with metadata, with all > of the disk being allocated for data. > > On smaller file systems there's not a lot of time before we get full, so > our overcommit behavior bites us here. Generally speaking data > reservations result in chunk allocations as we assume reservation == > actual use for data. This means at any point we could end up with a > chunk allocation for data, and if we're very close to full we could do > this before we have a chance to figure out that we need another metadata > chunk. > > Address this by adjusting the overcommit logic. Simply put we need to > take away 1 chunk from the available chunk space in case of a data > reservation. This will allow us to stop overcommitting before we > potentially lose this space to a data allocation. With this fix in > place we properly allocate a metadata chunk before we're completely > full, allowing for enough slack space in metadata. LGTM, this should help and I've been kicking around the same idea in my head for a while. I do think this is kind of a band-aid, though. It isn't hard to imagine that you allocate data chunks up to the 1G, then allocate a metadata chunk, then fragment/under-utilize the data to the point that you actually fill up the metadata and get right back to this same point. Long term, I think we still need more/smarter reclaim, but this should be a good steam valve for the simple cases where we deterministically gobble up all the unallocated space for data. > > Signed-off-by: Josef Bacik Reviewed-by: Boris Burkov > --- > fs/btrfs/space-info.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > index d7e8cd4f140c..7aa53058d893 100644 > --- a/fs/btrfs/space-info.c > +++ b/fs/btrfs/space-info.c > @@ -365,6 +365,23 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, > factor = btrfs_bg_type_to_factor(profile); > avail = div_u64(avail, factor); > > + /* > + * Since data allocations immediately use block groups as part of the > + * reservation, because we assume that data reservations will == actual > + * usage, we could potentially overcommit and then immediately have that > + * available space used by a data allocation, which could put us in a > + * bind when we get close to filling the file system. > + * > + * To handle this simply remove 1G (which is our current maximum chunk > + * allocation size) from the available space. If we are relatively > + * empty this won't affect our ability to overcommit much, and if we're > + * very close to full it'll keep us from getting into a position where > + * we've given ourselves very little metadata wiggle room. > + */ > + if (avail < SZ_1G) > + return 0; > + avail -= SZ_1G; > + > /* > * If we aren't flushing all things, let us overcommit up to > * 1/2th of the space. If we can flush, don't let us overcommit > -- > 2.41.0 >