From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 418CDC433F5 for ; Sat, 12 Mar 2022 03:07:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230176AbiCLDIj (ORCPT ); Fri, 11 Mar 2022 22:08:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229480AbiCLDIh (ORCPT ); Fri, 11 Mar 2022 22:08:37 -0500 X-Greylist: delayed 1451 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 11 Mar 2022 19:07:33 PST Received: from drax.kayaks.hungrycats.org (drax.kayaks.hungrycats.org [174.142.148.226]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4DB492713C7 for ; Fri, 11 Mar 2022 19:07:32 -0800 (PST) Received: by drax.kayaks.hungrycats.org (Postfix, from userid 1002) id 69010250280; Fri, 11 Mar 2022 21:43:18 -0500 (EST) Date: Fri, 11 Mar 2022 21:43:18 -0500 From: Zygo Blaxell To: Jan Ziak <0xe2.0x9a.0x9b@gmail.com> Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: Re: Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit Message-ID: References: <7fc9f5b4-ddb6-bd3b-bb02-2bd4af703e3b@gmx.com> <078f9f05-3f8f-eef1-8b0b-7d2a26bf1f97@gmx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Sat, Mar 12, 2022 at 12:28:10AM +0100, Jan Ziak wrote: > On Sat, Mar 12, 2022 at 12:04 AM Qu Wenruo wrote: > > As stated before, autodefrag is not really that useful for database. > > Do you realize that you are claiming that btrfs autodefrag should not > - by design - be effective in the case of high-fragmentation files? If > it isn't supposed to be useful for high-fragmentation files then where > is it supposed to be useful? Low-fragmentation files? IMHO it's best to deprecate the in-kernel autodefrag option, and start over with a better approach. The kernel is the wrong place to solve this problem, and the undesirable and unfixable things in autodefrag are a consequence of that early design error. As far as I can tell, in-kernel autodefrag's only purpose is to provide exposure to new and exciting bugs on each kernel release, and a lot of uncontrolled IO demands even when it's working perfectly. Inevitably, re-reading old fragments that are no longer in memory will consume RAM and iops during writeback activity, when memory and IO bandwidth is least available. If we avoid expensive re-reading of extents, then we don't get a useful rate of reduction of fragmentation, because we can't coalesce small new exists with small existing ones. If we try to fix these issues one at a time, the feature would inevitably grow a lot of complicated and brittle configuration knobs to turn it off selectively, because it's so awful without extensive filtering. All the above criticism applies to abstract ideal in-kernel autodefrag, _before_ considering whether a concrete implementation might have limitations or bugs which make it worse than the already-bad best case. 5.16 happened to have a lot of examples of these, but fixing the regressions can only restore autodefrag's relative harmlessness, not add utility within the constraints the kernel is under. The right place to do autodefrag is userspace. Interfaces already exist for userspace to 1) discover new extents and their neighbors, quickly and safely, across the entire filesystem; 2) invoke defrag_range on file extent ranges found in step 1; and 3) run a while (true) loop that periodically performs steps 1 and 2. Indeed, the existing kernel autodefrag implementation is already using the same back-end infrastructure for parts 1 and 2, so all that would be required for userspace is to reimplement (and start improving upon) part 3. A command-line utility or daemon can locate new extents immediately with tree_search queries, either at filesystem-wide scales, or directed at user-chosen file subsets. Tools can quickly assess whether new extents are good candidates for defrag, then coalesce them with their neighbors. The user can choose between different tools to decide basic policy questions like: whether to run once in a batch job or continuously in the background, what amounts of IO bandwidth and memory to consume, whether to recompress data with a more aggressive algorithm/level, which reference to a snapshot-shared extent should be preferred for defrag, file-type-specific layout optimizations to apply, or any custom or experimental selection, scheduling, or optimization logic desired. Implementations can be kept simple because it's not necessary for userspace tools to pile every possible option into a single implementation, and support every released option forever (as required for the kernel). A specialist implementation can discard existing code with impunity or start from scratch with an experimental algorithm, and spend its life in a fork of the main userspace autodefrag project with niche users who never have to cope with generic users' use cases and vice versa. This efficiently distributes development and maintenance costs. Userspace autodefrag can be implemented today in any programming language with btrfs ioctl support, and run on any kernel released in the last 6 years. Alas, I don't know of anybody who's released a userspace autodefrag tool yet, and it hasn't been important enough to me to build one myself (other than a few proof-of-concept prototypes). For now, I do defrag mostly ad-hoc with 'btrfs fi defrag' on the most severely fragmented files (top N list of files with the highest extent counts on the filesystem), and ignore fragmentation everywhere else. > -Jan