From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35307C47080 for ; Tue, 1 Jun 2021 06:42:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 08BFC6139A for ; Tue, 1 Jun 2021 06:42:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230326AbhFAGoV (ORCPT ); Tue, 1 Jun 2021 02:44:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229984AbhFAGoV (ORCPT ); Tue, 1 Jun 2021 02:44:21 -0400 Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8C0FC061574 for ; Mon, 31 May 2021 23:42:40 -0700 (PDT) Received: by mail-pg1-x535.google.com with SMTP id q15so9933712pgg.12 for ; Mon, 31 May 2021 23:42:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=references:user-agent:from:to:cc:subject:in-reply-to:date :message-id:mime-version; bh=B9flEdjtWT8sI6XmEHx5R1ue22RDHTwR5gHfh6ldPAc=; b=d9uO7/MO6ZFWafz90FkSUZWhaW5y36W4pZlouH7uvWi7iGLqfCM/PAQJw1My467s6U Dfqsjr/6RGumYyPqkkCBIhxGP/u+eNZ6PwlTVDOxLp9kv+nnCipHUA0k/Yc+0K9HGE/z VoWowplzWaWsD0IvPWypl0ltiP/UUm/MtF7SkMOhHwlu4W1Zx5vxdDptuf27RWIuHXRb VeUMILjlO3exLI6UfBj2CkgE0htaSi5WlH4puEOjCR/rqU8B86AUBTjC15GtldpCDGNZ 9WgFIs9PfVzoN9z4tJ87OWNVacQ9C0IYIlxD5BitaYqvQvs/gRNis/jsvB2Qpz8RKYow qg/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:references:user-agent:from:to:cc:subject :in-reply-to:date:message-id:mime-version; bh=B9flEdjtWT8sI6XmEHx5R1ue22RDHTwR5gHfh6ldPAc=; b=aDP2NBbWIqpcsLIVjrTJYjd8b5r2Mti7cIMtbM7OkupCKgfwvBnl/CFsKInxyRGSjm LDJ6Dl+8dZbDonzNr4ay0EBD6/EdtcM/YPhRn+5gm9iCCDoygy+MnuhDKzxYUL8j+N1F gBT0nQQioKmSQFOdcA/4attWgarsGgUyZNMDw5e1sw11jtfKj47fNwGqqkCxwMzieX2K iKGafNONOCURyDH9qsIdwuziPz1j4YJ6WTZn5LZzpwikisyhffTguK6/H6h4uVj8G58X QxrOARSgiMcqOYXeLAFnsJAzcB6hiZCCBnfQSrrz/jF/nWiXc9PdLLvn4pRxePiUxj9p 7uDg== X-Gm-Message-State: AOAM532Ah+ohgZJ0cSEqdx+otRQDUBh3nfgXjGnBHdw4gBLxuj6YMaRY gZjCGQpq6u7xWI3YXQxD8tG7ntPp/E3XOw== X-Google-Smtp-Source: ABdhPJxBMUAePdZWBmx1NbAhB4FprR3Lz0lphVlCwcDxuv+KKNS/sIbNk9adYA2AK7/Jld1hFX0G4A== X-Received: by 2002:a63:a54b:: with SMTP id r11mr12631238pgu.76.1622529760072; Mon, 31 May 2021 23:42:40 -0700 (PDT) Received: from garuda ([171.61.76.81]) by smtp.gmail.com with ESMTPSA id l20sm12294491pjq.38.2021.05.31.23.42.38 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Mon, 31 May 2021 23:42:39 -0700 (PDT) References: <20210527045202.1155628-1-david@fromorbit.com> <20210527045202.1155628-4-david@fromorbit.com> <87eednukpk.fsf@garuda> <87czt7uk7v.fsf@garuda> <20210531232820.GT664593@dread.disaster.area> User-agent: mu4e 1.0; emacs 26.1 From: Chandan Babu R To: Dave Chinner Cc: linux-xfs@vger.kernel.org Subject: Re: [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation In-reply-to: <20210531232820.GT664593@dread.disaster.area> Date: Tue, 01 Jun 2021 12:12:36 +0530 Message-ID: <871r9mytk3.fsf@garuda> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On 01 Jun 2021 at 04:58, Dave Chinner wrote: > On Mon, May 31, 2021 at 06:35:40PM +0530, Chandan Babu R wrote: >> On 31 May 2021 at 18:25, Chandan Babu R wrote: >> > On 27 May 2021 at 10:21, Dave Chinner wrote: >> >> From: Dave Chinner >> >> >> >> Ever since we moved to freeing of extents by deferred operations, >> >> we've already freed extents via individual transactions. Hence the >> >> only limitation of how many extents we can mark for freeing in a >> >> single xfs_bunmapi() call bound only by how many deferrals we want >> >> to queue. >> >> >> >> That is xfs_bunmapi() doesn't actually do any AG based extent >> >> freeing, so there's no actually transaction reservation used up by >> >> calling bunmapi with a large count of extents to be freed. RT >> >> extents have always been freed directly by bunmapi, but that doesn't >> >> require modification of large number of blocks as there are no >> >> btrees to split. >> >> >> >> Some callers of xfs_bunmapi assume that the extent count being freed >> >> is bound by geometry (e.g. directories) and these can ask bunmapi to >> >> free up to 64 extents in a single call. These functions just work as >> >> tehy stand, so there's no reason for truncate to have a limit of >> >> just two extents per bunmapi call anymore. >> >> >> >> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents >> >> that can be deferred in a single loop to match what the directory >> >> code already uses. >> >> >> >> For realtime inodes, where xfs_bunmapi() directly frees extents, >> >> leave the limit at 2 extents per loop as this is all the space that >> >> the transaction reservation will cover. >> >> >> >> Signed-off-by: Dave Chinner >> >> --- >> >> fs/xfs/xfs_inode.c | 15 ++++++++++++--- >> >> 1 file changed, 12 insertions(+), 3 deletions(-) >> >> >> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c >> >> index 0369eb22c1bb..db220eaa34b8 100644 >> >> --- a/fs/xfs/xfs_inode.c >> >> +++ b/fs/xfs/xfs_inode.c >> >> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone; >> >> >> >> /* >> >> * Used in xfs_itruncate_extents(). This is the maximum number of extents >> >> - * freed from a file in a single transaction. >> >> + * we will unmap and defer for freeing in a single call to xfs_bunmapi(). >> >> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound >> >> + * by transaction reservation size to 2 extents. >> >> */ >> >> -#define XFS_ITRUNC_MAX_EXTENTS 2 >> >> +static inline int >> >> +xfs_itrunc_max_extents( >> >> + struct xfs_inode *ip) >> >> +{ >> >> + if (XFS_IS_REALTIME_INODE(ip)) >> >> + return 2; >> >> + return 64; >> >> +} >> >> >> >> STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *); >> >> STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *); >> >> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags( >> >> while (unmap_len > 0) { >> >> ASSERT(tp->t_firstblock == NULLFSBLOCK); >> >> error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len, >> >> - flags, XFS_ITRUNC_MAX_EXTENTS); >> >> + flags, xfs_itrunc_max_extents(ip)); >> >> if (error) >> >> goto out; >> > >> > The list of free extent items at xfs_defer_pending->dfp_work could >> > now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case. > > Yes, but we do exactly this when freeing a large fragmented directly > block. That is, we ask xfs_bunmapi to unmap a 64kB range regardless > of how many extents map that range. > > IOWs, the limitation in extent count placed in > xfs_itruncate_extents() doesn't actually address the underlying > problem - it's just a band-aid that has been placed over the easy to > trigger transaction overrun symptom that has always been present in > the underlying extent freeing code. > >> > For a single transaction, xfs_calc_itruncate_reservation() reserves space for >> > logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block >> > size). >> >> ... Sorry, I meant to say "xfs_calc_itruncate_reservation() reserves log space >> required for freeing 4 extents ..." > > My point exactly - the code has always had a mismatch between > reservations and what we can stuff into an EFI. EFIs are unbound in > size, so having a fixed "4 extents per EFI" reservation limit has > never made any real sense given that unmaping a 64kB directory block > on a 1kb block size filesystem has a worst case of freeing 64 > extents in a single transaction. As I said above, the limitations > placed on xfs_itruncate_extents is largely a hack because it doesn't > address other avenues to the same overruns... > Ok. Thanks for the explanation. > This "4 extents per transaction" reservation makes even less sense > now that extents are freed by defer ops, not by the transaction that > unmaps them. We've completely decoupled extent freeing from the > higher level code that unmaps them, and so now the freeing > transactions are independent of the high level code that runs the > freeing operations. > > IOWs, having a reservation big enough to free a single extent is all > we should need now as we can break the extent freeing up into > individual transactions and still have them complete atomically even > after a crash. That's where I'm trying to get to here, but it's > clear that I need to refine the code a bit further such that we only > allow individual EFIs to queue up and free multiple extents within > an AG to be freed in the same transaction.... Ok. That should limit the number of CNTBT/BNOBT blocks being logged by a transaction which is freeing extents. > > FWIW, I suspect that the right thing to do here is make use of the > xfs_defer_finish_one() mechanism for relogging the remaining intents > in work list while it is processing that list. All we need is for > xfs_extent_free_finish_item() to return -EAGAIN instead of running > the free transaction, and we will log the completed EFDs and the > remaining EFIs to be run and roll the transaction. > > Hence we can control exactly when we roll the extent freeing > transaction (e.g. when the next extent to free is in a different AG > to the one we've just been freeing extents in) and as a result we > can define a fixed transaction reservation for extent freeing that > works in every situation without having to care about arbitrary > "how many extents can we unmap without overrun" concerns. > That sounds perfect. > That's the goal here - fixed, small reservation for extent freeing > that is decoupled from and independent of the number of extents that > need to be freed atomically by the high level operation... > > Hence I think that a minor amount of rework will allow the EFI code > to log large numbers of extents to be freed whilst still processing > them within single AG free space tree modification reservation > bounds... -- chandan