From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B025C433EF for ; Tue, 8 Feb 2022 16:34:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1383063AbiBHQer (ORCPT ); Tue, 8 Feb 2022 11:34:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343746AbiBHQeq (ORCPT ); Tue, 8 Feb 2022 11:34:46 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75F87C061576 for ; Tue, 8 Feb 2022 08:34:44 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E47D8B81C1B for ; Tue, 8 Feb 2022 16:34:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2CBC7C004E1; Tue, 8 Feb 2022 16:34:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1644338081; bh=5mvjgXaIFCDYYea50UQ+K2oyQGWaEbjMFh6iT31L9oU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ujQkBV23Rw7DhmA7Y7D+q0GuFxAaQaVLJnHOAluyxWh1fg0UC2mmElBycam7QUTDl oS0Ydc/60/DaDPjXxod5iamaKjUEBbW14nfyqnMkn7cNCPKBz52PG0p4EVfrKe3wZs bXOBtMMsJD3yEOIQbqT/1OLUsfukwgHjFvFR+GRNGl9ImkQxYMHtg5sAUdMZIH0E8d sZsr2EOU9kLe0XO37p0sO84jsLYStooUd24kORQx0jPjPm+taOFCyW4ql+LfAt1d53 bUtf6onyXoog1P4qiajOWx4yoEi3GeOzD2lcDfcxhNIIuzdjdpbCAmhCQjGoq7yuGl sPzXFJ7ZO0QDw== Date: Tue, 8 Feb 2022 16:34:37 +0000 From: Filipe Manana To: Qu Wenruo Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs: populate extent_map::generation when reading from disk Message-ID: References: <817e735ee9c225268f17bee906c871b1fd965c4f.1644051267.git.wqu@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Mon, Feb 07, 2022 at 07:39:06PM +0800, Qu Wenruo wrote: > > > On 2022/2/7 18:52, Filipe Manana wrote: > > On Sat, Feb 05, 2022 at 04:55:47PM +0800, Qu Wenruo wrote: > > > [WEIRD BEHAVIOR] > > > > > > When btrfs_get_extent() tries to get some file extent from disk, it > > > never populates extent_map::generation , leaving the value to be 0. > > > > > > On the other hand, for extent map generated by IO, it will get its > > > generation properly set at finish_ordered_io() > > > > > > finish_ordered_io() > > > |- unpin_extent_cache(gen = trans->transid) > > > |- em->generation = gen; > > > > > > [REGRESSION?] > > > I have no idea when such behavior is introduced, but at least in v5.15 > > > this incorrect behavior is already there. > > > > The extent map generation is basically only used by the fsync code, but > > as it deals only with modified extents, it always sees non-zero generation. > > > > > > > > [AFFECT] > > > Not really sure if there is any behavior really get affected. > > > > affect -> effect > > > > No, I don't think it affects anything. > > > > > > > > Sure there are locations like extent map merging, but there is no value > > > smaller than 0 for u64, thus it won't really cause a difference. > > > > > > For autodefrag, although it's checking em->generation to determine if we > > > need to defrag a range, but that @new_than value is always from IO, thus > > > > This is confusing. > > You mean the minimum generation threshold for autodefrag. Referring to > > a function parameter (and it's named "newer_than") out of context, is > > hard to follow. > > > > > > > all those extent maps with 0 generation will just be skipped, and that's > > > the expected behavior anyway. > > > > > > For manual defrag, @newer_than is 0, and our check is to skip generation > > > smaller than @newer_than, thus it still makes no difference. > > > > Same here, saying the minimum generation threshold for defrag is more > > informative than referring to the name of a function parameter. A function > > that is not even touched by the patch makes it hard to understand. > > > > > > > > [FIX] > > > To make things less weird, let us populate extent_map::generation in > > > btrfs_extent_item_to_extent_map(). > > > > Looks good. > > Though I don't think this fixes anything. > > I'm considering the following extreme corner case, which this patch may > make a difference (although to cause extra IO) > > E.g. > Root 5, last_trans = 500. > > In transaction 510, we write back some data for inode A of root 5, the > extent is at file offset 0 len 32K, triggering autodefrag to create an > inode_defrag structure with transid 500 (from root 5 last_trans). > > Then we do some fragmented writeback following inode A file offset 32K, > they all happen before cleaner get triggered. > > Before cleaner get triggered, fd for inode A is closed, and all cache is > dropped, including the extent map cache. > > Then autodefrag get triggered, it tries to get the extent map for offset > 0, and got an em, with generation 0. > > Since 0 (unpopulated em::generation) < 500 (inode_defrag::transid), the > extent doesn't need to be defragged. > > But in fact, these new extents at file offset 0 and onward all have > generation newer than 510, and can be defragged. > > > Although this is opposite what we're chasing, it will cause more IO, > instead of less... Right, when I said it didn't fix anything, it was in regards to 5.15 and previous kernels, since the only code that checks for generations in extent maps is the fsync code. But the extent maps it looks at are always in the list of modified extents, so their generation is never 0. Another quick look, and I can't find anything else relying on the generation of extent maps that could break a 0 value. Yes, this is the opposite of the problems being reported regarding excessive IO. With the new defrag code from 5.16 this will often cause even more IO. Looks fine to me, but I think right now the priority should be to find out why is so much more IO being triggered. I still think the change log should be phrased differently, specially parts like "Not really sure if there is any behavior really get affected.", as that doesn't sound very good. Once that's updated: Reviewed-by: Filipe Manana Thanks. > > Thanks, > Qu > > > > > As I pointed out in the other > > thread, the extent map generation is basically only I used by fsync, which > > doesn't use extent maps that are not the in the list of modified extents > > (and those always have a generation > 0). > > > > Thnaks. > > > > > > > > Signed-off-by: Qu Wenruo > > > --- > > > fs/btrfs/file-item.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c > > > index 90c5c38836ab..9a3de652ada8 100644 > > > --- a/fs/btrfs/file-item.c > > > +++ b/fs/btrfs/file-item.c > > > @@ -1211,6 +1211,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode, > > > extent_start = key.offset; > > > extent_end = btrfs_file_extent_end(path); > > > em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi); > > > + em->generation = btrfs_file_extent_generation(leaf, fi); > > > if (type == BTRFS_FILE_EXTENT_REG || > > > type == BTRFS_FILE_EXTENT_PREALLOC) { > > > em->start = extent_start; > > > -- > > > 2.35.0 > > >