All of lore.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: populate extent_map::generation when reading from disk
Date: Tue, 8 Feb 2022 16:34:37 +0000	[thread overview]
Message-ID: <YgKbnWu77WJlHDa1@debian9.Home> (raw)
In-Reply-To: <e15a84e9-6e22-7a64-0ee2-67b9c4b51e1c@gmx.com>

On Mon, Feb 07, 2022 at 07:39:06PM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/2/7 18:52, Filipe Manana wrote:
> > On Sat, Feb 05, 2022 at 04:55:47PM +0800, Qu Wenruo wrote:
> > > [WEIRD BEHAVIOR]
> > > 
> > > When btrfs_get_extent() tries to get some file extent from disk, it
> > > never populates extent_map::generation , leaving the value to be 0.
> > > 
> > > On the other hand, for extent map generated by IO, it will get its
> > > generation properly set at finish_ordered_io()
> > > 
> > >   finish_ordered_io()
> > >   |- unpin_extent_cache(gen = trans->transid)
> > >      |- em->generation = gen;
> > > 
> > > [REGRESSION?]
> > > I have no idea when such behavior is introduced, but at least in v5.15
> > > this incorrect behavior is already there.
> > 
> > The extent map generation is basically only used by the fsync code, but
> > as it deals only with modified extents, it always sees non-zero generation.
> > 
> > > 
> > > [AFFECT]
> > > Not really sure if there is any behavior really get affected.
> > 
> > affect -> effect
> > 
> > No, I don't think it affects anything.
> > 
> > > 
> > > Sure there are locations like extent map merging, but there is no value
> > > smaller than 0 for u64, thus it won't really cause a difference.
> > > 
> > > For autodefrag, although it's checking em->generation to determine if we
> > > need to defrag a range, but that @new_than value is always from IO, thus
> > 
> > This is confusing.
> > You mean the minimum generation threshold for autodefrag. Referring to
> > a function parameter (and it's named "newer_than") out of context, is
> > hard to follow.
> > 
> > 
> > > all those extent maps with 0 generation will just be skipped, and that's
> > > the expected behavior anyway.
> > > 
> > > For manual defrag, @newer_than is 0, and our check is to skip generation
> > > smaller than @newer_than, thus it still makes no difference.
> > 
> > Same here, saying the minimum generation threshold for defrag is more
> > informative than referring to the name of a function parameter. A function
> > that is not even touched by the patch makes it hard to understand.
> > 
> > > 
> > > [FIX]
> > > To make things less weird, let us populate extent_map::generation in
> > > btrfs_extent_item_to_extent_map().
> > 
> > Looks good.
> > Though I don't think this fixes anything.
> 
> I'm considering the following extreme corner case, which this patch may
> make a difference (although to cause extra IO)
> 
> E.g.
> Root 5, last_trans = 500.
> 
> In transaction 510, we write back some data for inode A of root 5, the
> extent is at file offset 0 len 32K, triggering autodefrag to create an
> inode_defrag structure with transid 500 (from root 5 last_trans).
> 
> Then we do some fragmented writeback following inode A file offset 32K,
> they all happen before cleaner get triggered.
> 
> Before cleaner get triggered, fd for inode A is closed, and all cache is
> dropped, including the extent map cache.
> 
> Then autodefrag get triggered, it tries to get the extent map for offset
> 0, and got an em, with generation 0.
> 
> Since 0 (unpopulated em::generation) < 500 (inode_defrag::transid), the
> extent doesn't need to be defragged.
> 
> But in fact, these new extents at file offset 0 and onward all have
> generation newer than 510, and can be defragged.
> 
> 
> Although this is opposite what we're chasing, it will cause more IO,
> instead of less...

Right, when I said it didn't fix anything, it was in regards to 5.15 and
previous kernels, since the only code that checks for generations in
extent maps is the fsync code. But the extent maps it looks at are always
in the list of modified extents, so their generation is never 0.

Another quick look, and I can't find anything else relying on the generation
of extent maps that could break a 0 value.

Yes, this is the opposite of the problems being reported regarding excessive
IO. With the new defrag code from 5.16 this will often cause even more IO.

Looks fine to me, but I think right now the priority should be to find out
why is so much more IO being triggered.

I still think the change log should be phrased differently, specially parts
like "Not really sure if there is any behavior really get affected.", as
that doesn't sound very good.

Once that's updated:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.


> 
> Thanks,
> Qu
> 
> 
> 
> > As I pointed out in the other
> > thread, the extent map generation is basically only I used by fsync, which
> > doesn't use extent maps that are not the in the list of modified extents
> > (and those always have a generation > 0).
> > 
> > Thnaks.
> > 
> > > 
> > > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > > ---
> > >   fs/btrfs/file-item.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> > > index 90c5c38836ab..9a3de652ada8 100644
> > > --- a/fs/btrfs/file-item.c
> > > +++ b/fs/btrfs/file-item.c
> > > @@ -1211,6 +1211,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
> > >   	extent_start = key.offset;
> > >   	extent_end = btrfs_file_extent_end(path);
> > >   	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
> > > +	em->generation = btrfs_file_extent_generation(leaf, fi);
> > >   	if (type == BTRFS_FILE_EXTENT_REG ||
> > >   	    type == BTRFS_FILE_EXTENT_PREALLOC) {
> > >   		em->start = extent_start;
> > > --
> > > 2.35.0
> > > 

  reply	other threads:[~2022-02-08 16:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-05  8:55 [PATCH] btrfs: populate extent_map::generation when reading from disk Qu Wenruo
2022-02-07 10:52 ` Filipe Manana
2022-02-07 11:39   ` Qu Wenruo
2022-02-08 16:34     ` Filipe Manana [this message]
2022-02-09  0:41       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YgKbnWu77WJlHDa1@debian9.Home \
    --to=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.