From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05214C433B4 for ; Fri, 16 Apr 2021 14:43:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D357461073 for ; Fri, 16 Apr 2021 14:43:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245216AbhDPOnX (ORCPT ); Fri, 16 Apr 2021 10:43:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244980AbhDPOml (ORCPT ); Fri, 16 Apr 2021 10:42:41 -0400 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAB49C061574 for ; Fri, 16 Apr 2021 07:42:13 -0700 (PDT) Received: by mail-qv1-xf29.google.com with SMTP id bs7so13026746qvb.12 for ; Fri, 16 Apr 2021 07:42:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=eZjFl18qB08gf7W04OZ5B948L/XebPphKDKZJXR07C8=; b=wK5zRUX4inFi1noLL9SHpNN9dNdQZ+ADsIKUurHVSuWPyvs7xDgNGh8dIXjRQVVW2z y2Z4Ew3m9GdfpjpPP+jWmXZzHBozRgsD59Li1QYYZjEcfNjXY3CLCBSEqGKYBmzXMWT8 KxO8WiGt8+5kiQLZyCc8IgEiaSluDC2+iFiWe8BYFn2iyqokMVsnQRT8YEFG4qW9bH33 ssu0ngc/+SnqTqoP1VLTdAtTD/CMIIbv0VxAkjDPXmMUTRADutCfkjRYlWipcZkSsWyT ay20U+f1sTu4nbzCezoK2S1ydEjJ1MsqA7qws3QjmCqnhc/dkc38smjzZSG3wZNIpiKo XtWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=eZjFl18qB08gf7W04OZ5B948L/XebPphKDKZJXR07C8=; b=cy3pEVZ3JZtPudAJntSL1o+YaIRoZ/Ot90GNNXwCfqHtEsO1ROA//gRZjnewhBlJqm d5rDYR/vx4QHnRsKROShlSlx+0/YG9Nb6ullVRM5Z0StDup5wmYL+Wk5ZyqAJDm5UMSO FEIzfyHPykB3rQ1rFO2smnug0PLA7RtdBECqAuqJfmVJwKQxPICOZ78QB48uiSCic2li qgUrjiWYIGC+y7cu6Pms1mDsf0iTGfKfTklIZ33hegw/EvTKyCyuNzIi5Orwo9XuWBpF BKQabLCxz5QZ06u4f0E8GlwsbpPZHaKF4VfCnD7t8MxzGcdxggA/QIFTOYIxh1PSPIri wPxg== X-Gm-Message-State: AOAM532l7wCOPlOd2QfhSrvfKY3JEfiYyWqzPVUfp5kgXiX2HYE7G2eO 0Bo/wbruDm+YCVwAqpZQpEzVgjZ6htWiGg== X-Google-Smtp-Source: ABdhPJw3+THMBXDX4stS6ST9XpIr1Wx5Fsu2vt+YVcyaTYF3xPhaKbLZU66eT2HOhsVHVWjuIXkggg== X-Received: by 2002:ad4:490f:: with SMTP id bh15mr8836815qvb.55.1618584132556; Fri, 16 Apr 2021 07:42:12 -0700 (PDT) Received: from [192.168.1.45] (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id g128sm4319008qke.1.2021.04.16.07.42.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 Apr 2021 07:42:12 -0700 (PDT) Subject: Re: [PATCH 11/42] btrfs: refactor btrfs_invalidatepage() To: Qu Wenruo , linux-btrfs@vger.kernel.org References: <20210415050448.267306-1-wqu@suse.com> <20210415050448.267306-12-wqu@suse.com> From: Josef Bacik Message-ID: <0babca4e-325a-88a7-bbfc-c810a5bedbeb@toxicpanda.com> Date: Fri, 16 Apr 2021 10:42:11 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 MIME-Version: 1.0 In-Reply-To: <20210415050448.267306-12-wqu@suse.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On 4/15/21 1:04 AM, Qu Wenruo wrote: > This patch will refactor btrfs_invalidatepage() for the incoming subpage > support. > > The invovled modifcations are: > - Use while() loop instead of "goto again;" > - Use single variable to determine whether to delete extent states > Each branch will also have comments why we can or cannot delete the > extent states > - Do qgroup free and extent states deletion per-loop > Current code can only work for PAGE_SIZE == sectorsize case. > > This refactor also makes it clear what we do for different sectors: > - Sectors without ordered extent > We're completely safe to remove all extent states for the sector(s) > > - Sectors with ordered extent, but no Private2 bit > This means the endio has already been executed, we can't remove all > extent states for the sector(s). > > - Sectors with ordere extent, still has Private2 bit > This means we need to decrease the ordered extent accounting. > And then it comes to two different variants: > * We have finished and removed the ordered extent > Then it's the same as "sectors without ordered extent" > * We didn't finished the ordered extent > We can remove some extent states, but not all. > > Signed-off-by: Qu Wenruo > --- > fs/btrfs/inode.c | 173 +++++++++++++++++++++++++---------------------- > 1 file changed, 94 insertions(+), 79 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 4c894de2e813..93bb7c0482ba 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -8320,15 +8320,12 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, > { > struct btrfs_inode *inode = BTRFS_I(page->mapping->host); > struct extent_io_tree *tree = &inode->io_tree; > - struct btrfs_ordered_extent *ordered; > struct extent_state *cached_state = NULL; > u64 page_start = page_offset(page); > u64 page_end = page_start + PAGE_SIZE - 1; > - u64 start; > - u64 end; > + u64 cur; > + u32 sectorsize = inode->root->fs_info->sectorsize; > int inode_evicting = inode->vfs_inode.i_state & I_FREEING; > - bool found_ordered = false; > - bool completed_ordered = false; > > /* > * We have page locked so no new ordered extent can be created on > @@ -8352,96 +8349,114 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, > if (!inode_evicting) > lock_extent_bits(tree, page_start, page_end, &cached_state); > > - start = page_start; > -again: > - ordered = btrfs_lookup_ordered_range(inode, start, page_end - start + 1); > - if (ordered) { > - found_ordered = true; > - end = min(page_end, > - ordered->file_offset + ordered->num_bytes - 1); > + cur = page_start; > + while (cur < page_end) { > + struct btrfs_ordered_extent *ordered; > + bool delete_states = false; > + u64 range_end; > + > + /* > + * Here we can't pass "file_offset = cur" and > + * "len = page_end + 1 - cur", as btrfs_lookup_ordered_range() > + * may not return the first ordered extent after @file_offset. > + * > + * Here we want to iterate through the range in byte order. > + * This is slower but definitely correct. > + * > + * TODO: Make btrfs_lookup_ordered_range() to return the > + * first ordered extent in the range to reduce the number > + * of loops. > + */ > + ordered = btrfs_lookup_ordered_range(inode, cur, sectorsize); How does it not find the first ordered extent after file_offset? Looking at the code it just loops through and returns the first thing it finds that overlaps our range. Is there a bug in btrfs_lookup_ordered_range()? We should add some self tests to make sure these helpers are doing the right thing if there is in fact a bug. > + if (!ordered) { > + range_end = cur + sectorsize - 1; > + /* > + * No ordered extent covering this sector, we are safe > + * to delete all extent states in the range. > + */ > + delete_states = true; > + goto next; > + } > + > + range_end = min(ordered->file_offset + ordered->num_bytes - 1, > + page_end); > + if (!PagePrivate2(page)) { > + /* > + * If Private2 is cleared, it means endio has already > + * been executed for the range. > + * We can't delete the extent states as > + * btrfs_finish_ordered_io() may still use some of them. > + */ > + delete_states = false; delete_states is already false. > + goto next; > + } > + ClearPagePrivate2(page); > + > /* > * IO on this page will never be started, so we need to account > * for any ordered extents now. Don't clear EXTENT_DELALLOC_NEW > * here, must leave that up for the ordered extent completion. > + * > + * This will also unlock the range for incoming > + * btrfs_finish_ordered_io(). > */ > if (!inode_evicting) > - clear_extent_bit(tree, start, end, > + clear_extent_bit(tree, cur, range_end, > EXTENT_DELALLOC | > EXTENT_LOCKED | EXTENT_DO_ACCOUNTING | > EXTENT_DEFRAG, 1, 0, &cached_state); > + > + spin_lock_irq(&inode->ordered_tree.lock); > + set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags); > + ASSERT(cur - ordered->file_offset < U32_MAX); > + ordered->truncated_len = min_t(u32, ordered->truncated_len, > + cur - ordered->file_offset); I've realized my previous comment about this needing to be u64 was wrong, I'm starting to wake up now. However I still don't see the value in saving the space, as we can just leave everything u64 and the math all works out cleanly. > + spin_unlock_irq(&inode->ordered_tree.lock); > + > + ASSERT(range_end + 1 - cur < U32_MAX); And we don't have to pollute the code with all of these checks. > + if (btrfs_dec_test_ordered_pending(inode, &ordered, > + cur, range_end + 1 - cur, 1)) { > + btrfs_finish_ordered_io(ordered); > + /* > + * The ordered extent has finished, now we're again > + * safe to delete all extent states of the range. > + */ > + delete_states = true; > + } else { > + /* > + * btrfs_finish_ordered_io() will get executed by endio of > + * other pages, thus we can't delete extent states any more > + */ > + delete_states = false; This is already false. Thanks, Josef