All of lore.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2] btrfs: populate extent_map::generation when reading from disk
Date: Tue, 8 Feb 2022 16:39:45 +0000	[thread overview]
Message-ID: <YgKc0ax5LWnu/SM7@debian9.Home> (raw)
In-Reply-To: <f5c2cc5d57a7edc120b0e743aa5d82298595ae24.1644298193.git.wqu@suse.com>

On Tue, Feb 08, 2022 at 01:31:19PM +0800, Qu Wenruo wrote:
> [WEIRD BEHAVIOR]
> 
> When btrfs_get_extent() tries to get some file extent from disk, it
> never populates extent_map::generation , leaving the value to be 0.
> 
> On the other hand, for extent map generated by IO, it will get its
> generation properly set at finish_ordered_io()
> 
>  finish_ordered_io()
>  |- unpin_extent_cache(gen = trans->transid)
>     |- em->generation = gen;
> 
> [CAUSE]
> Since extent_map::generation is mostly used by fsync code, and for fsync
> they only care about modified extents, which all have their em::generation > 0.
> 
> Thus it's fine to not populate em read from disk for fsync.
> 
> [CORNER CASE]
> However autodefrag also relies on em::geneartion to determine if one extent
> needs to be defragged.

em::geneartion -> em::generation

> 
> This unpopulated extent_map::geneartion can prevent the following autodefrag
> case from working:

Same here.

> 
> 	mkfs.btrfs -f $dev
> 	mount $dev $mnt -o autodefrag
> 
> 	# initial write to queue the inode for autodefrag
> 	xfs_io -f -c "pwrite 0 4k" $mnt/file
> 	sync
> 
> 	# Real fragmented write
> 	xfs_io -f -s -c "pwrite -b 4096 0 32k" $mnt/file
> 	sync
> 	echo "=== before autodefrag ==="
> 	xfs_io -c "fiemap -v" $mnt/file
> 
> 	# Drop cache to force em to be read from disk
> 	echo 3 > /proc/sys/vm/drop_caches
> 	mount -o remount,commit=1 $mnt
> 	sleep 3
> 	sync
> 
> 	echo "=== After autodefrag ==="
> 	xfs_io -c "fiemap -v" $mnt/file
> 	umount $mnt
> 
> The result looks like this:
> 
>   === before autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..15]:         26672..26687        16   0x0
>      1: [16..31]:        26656..26671        16   0x0
>      2: [32..47]:        26640..26655        16   0x0
>      3: [48..63]:        26624..26639        16   0x1
>   === After autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..15]:         26672..26687        16   0x0
>      1: [16..31]:        26656..26671        16   0x0
>      2: [32..47]:        26640..26655        16   0x0
>      3: [48..63]:        26624..26639        16   0x1
> 
> This fragmented 32K will not be defragged by autodefrag.
> 
> [FIX]
> To make things less weird, just populate extent_map::generation when
> reading file extents from disk.
> 
> This would make above fragmented extents to be properly defragged:
> 
>   == before autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..15]:         26672..26687        16   0x0
>      1: [16..31]:        26656..26671        16   0x0
>      2: [32..47]:        26640..26655        16   0x0
>      3: [48..63]:        26624..26639        16   0x1
>   === After autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..63]:         26688..26751        64   0x1
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

I don't want to make you send yet another version because only of
a typo in the changelog, so:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> ---
> Changelog:
> v2:
> - Update the commit message to include a reproducer
>   Although this is not what we want (to reduce autodefrag IO),
>   the behavior still worthy fixing anyway.
> ---
>  fs/btrfs/file-item.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index 90c5c38836ab..9a3de652ada8 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -1211,6 +1211,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>  	extent_start = key.offset;
>  	extent_end = btrfs_file_extent_end(path);
>  	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
> +	em->generation = btrfs_file_extent_generation(leaf, fi);
>  	if (type == BTRFS_FILE_EXTENT_REG ||
>  	    type == BTRFS_FILE_EXTENT_PREALLOC) {
>  		em->start = extent_start;
> -- 
> 2.35.0
> 

  reply	other threads:[~2022-02-08 16:39 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08  5:31 [PATCH v2] btrfs: populate extent_map::generation when reading from disk Qu Wenruo
2022-02-08 16:39 ` Filipe Manana [this message]
2022-02-08 18:42 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YgKc0ax5LWnu/SM7@debian9.Home \
    --to=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.