Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: more explaination on extent_map members
@ 2024-04-02  6:23 Qu Wenruo
  2024-04-02  6:23 ` [PATCH 1/2] btrfs: add extra comments " Qu Wenruo
  2024-04-02  6:23 ` [PATCH 2/2] btrfs: simplify the inline extent map creation Qu Wenruo
  0 siblings, 2 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-04-02  6:23 UTC (permalink / raw)
  To: linux-btrfs

Btrfs uses extent_map to represent a in-memory file extent.

There are severam members that are 1:1 mappe in on-disk file extent
items and extent maps:

- extent_map::start	==	key.offset
- extent_map::len	==	file_extent_num_bytes
- extent_map::ram_bytes	==	file_extent_ram_bytes

But that's all, the remaining are pretty different:

- Use block_start to indicate holes/inline extents
  Meanwhile btrfs on-disk file extent items go with a dedicated type for
  inline extents, and disk_bytenr 0 for holes.

- Extra members for fsync optimization
  I'm still not 100% sure how mod_start and mod_len really works though.

- Weird block_start/orig_block_len/orig_start
  In theory we can directly go with the same file_extent_disk_bytenr,
  file_extent_disk_num_bytes and file_extent_offset to calculate the
  remaining members (block_start/orig_start/orig_block_len/block_len).

  But for whatever reason, we didn't go that path and have a hell of
  weird and inconsistent calculation for them.

I do not have the confidence to handle the mess yet, but as the first
step, I would add comments for those members mostly according to
btrfs_extent_item_to_extent_map(), and hopefully we can improve the
situation in not-far-away future.

Qu Wenruo (2):
  btrfs: add extra comments on extent_map members
  btrfs: simplify the inline extent map creation

 fs/btrfs/extent_map.h | 62 ++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/file-item.c  | 14 ++++++----
 2 files changed, 70 insertions(+), 6 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] btrfs: add extra comments on extent_map members
  2024-04-02  6:23 [PATCH 0/2] btrfs: more explaination on extent_map members Qu Wenruo
@ 2024-04-02  6:23 ` Qu Wenruo
  2024-04-02  7:33   ` Andrea Gelmini
  2024-04-02 15:45   ` Filipe Manana
  2024-04-02  6:23 ` [PATCH 2/2] btrfs: simplify the inline extent map creation Qu Wenruo
  1 sibling, 2 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-04-02  6:23 UTC (permalink / raw)
  To: linux-btrfs

The extent_map structure is very critical to btrfs, as it is involved
for both read and write paths.

Unfortunately the structure is not properly explained, making it pretty
hard to understand nor to do further improvement.

This patch would add extra comments explaining the major numbers base on
my code reading.
Hopefully we can find more members to cleanup in the future.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_map.h | 62 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index c5a098c99cc6..30322defcd03 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -37,21 +37,81 @@ enum {
 };
 
 /*
+ * This extent_map structure is an in-memory representation of file extents,
+ * it would represent all file extents (including holes, no matter if we have
+ * hole file extents).
+ *
  * Keep this structure as compact as possible, as we can have really large
  * amounts of allocated extent maps at any time.
  */
 struct extent_map {
 	struct rb_node rb_node;
 
-	/* all of these are in bytes */
+	/* All of these are in bytes */
+
+	/*
+	 * File offset of the file extent. matching key.offset of
+	 * (INO EXTENT_DATA FILEPOS) key.
+	 */
 	u64 start;
+
+	/*
+	 * Length of the file extent.
+	 * For non-inlined file extents it's btrfs_file_extent_item::num_bytes.
+	 * For inlined file extents it's sectorsize. (as there is no reliable
+	 * btrfs_file_extent::num_bytes).
+	 */
 	u64 len;
+
+	/*
+	 * The modified range start/length, these are in-memory-only
+	 * members for fsync/logtree optimization.
+	 */
 	u64 mod_start;
 	u64 mod_len;
+
+	/*
+	 * The file offset of the original file extent before splitting.
+	 *
+	 * This is an in-memory-only member, mathcing
+	 * em::start - btrfs_file_extent_item::offset for regular/preallocated
+	 * extents. EXTENT_MAP_HOLE otherwise.
+	 */
 	u64 orig_start;
+
+	/*
+	 * The full on-disk extent length, matching
+	 * btrfs_file_extent_item::disk_num_bytes.
+	 */
 	u64 orig_block_len;
+
+	/*
+	 * The decompressed size of the whole on-disk extent, matching
+	 * btrfs_file_extent_item::ram_bytes.
+	 *
+	 * For non-compressed extents, this matches orig_block_len.
+	 */
 	u64 ram_bytes;
+
+	/*
+	 * The on-disk logical bytenr for the file extent.
+	 *
+	 * For compressed extents it matches btrfs_file_extent_item::disk_bytenr.
+	 * For uncompressed extents it matches
+	 * btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset
+	 *
+	 * For hole extents it is EXTENT_MAP_HOLE and for inline extents it is
+	 * EXTENT_MAP_INLINE.
+	 */
 	u64 block_start;
+
+	/*
+	 * The on-disk length for the file extent.
+	 *
+	 * For compressed extents it matches btrfs_file_extent_item::disk_num_bytes.
+	 * For uncompressed extents it matches em::len.
+	 * Otherwise -1 (aka doesn't make much sense).
+	 */
 	u64 block_len;
 
 	/*
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] btrfs: simplify the inline extent map creation
  2024-04-02  6:23 [PATCH 0/2] btrfs: more explaination on extent_map members Qu Wenruo
  2024-04-02  6:23 ` [PATCH 1/2] btrfs: add extra comments " Qu Wenruo
@ 2024-04-02  6:23 ` Qu Wenruo
  2024-04-02 15:50   ` Filipe Manana
  1 sibling, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2024-04-02  6:23 UTC (permalink / raw)
  To: linux-btrfs

With the tree-checker ensuring all inline file extents starts at file
offset 0 and has a length no larger than sectorsize, we can simplify the
calculation to assigned those fixes values directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file-item.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index e58fb5347e65..de3ccee38572 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -1265,18 +1265,19 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 	struct extent_buffer *leaf = path->nodes[0];
 	const int slot = path->slots[0];
 	struct btrfs_key key;
-	u64 extent_start, extent_end;
+	u64 extent_start;
 	u64 bytenr;
 	u8 type = btrfs_file_extent_type(leaf, fi);
 	int compress_type = btrfs_file_extent_compression(leaf, fi);
 
 	btrfs_item_key_to_cpu(leaf, &key, slot);
 	extent_start = key.offset;
-	extent_end = btrfs_file_extent_end(path);
 	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
 	em->generation = btrfs_file_extent_generation(leaf, fi);
 	if (type == BTRFS_FILE_EXTENT_REG ||
 	    type == BTRFS_FILE_EXTENT_PREALLOC) {
+		u64 extent_end = btrfs_file_extent_end(path);
+
 		em->start = extent_start;
 		em->len = extent_end - extent_start;
 		em->orig_start = extent_start -
@@ -1299,9 +1300,12 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 				em->flags |= EXTENT_FLAG_PREALLOC;
 		}
 	} else if (type == BTRFS_FILE_EXTENT_INLINE) {
+		/* Tree-checker has ensured this. */
+		ASSERT(extent_start == 0);
+
 		em->block_start = EXTENT_MAP_INLINE;
-		em->start = extent_start;
-		em->len = extent_end - extent_start;
+		em->start = 0;
+		em->len = fs_info->sectorsize;
 		/*
 		 * Initialize orig_start and block_len with the same values
 		 * as in inode.c:btrfs_get_extent().
@@ -1336,7 +1340,7 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path)
 
 	if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) {
 		end = btrfs_file_extent_ram_bytes(leaf, fi);
-		end = ALIGN(key.offset + end, leaf->fs_info->sectorsize);
+		end = leaf->fs_info->sectorsize;
 	} else {
 		end = key.offset + btrfs_file_extent_num_bytes(leaf, fi);
 	}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: add extra comments on extent_map members
  2024-04-02  6:23 ` [PATCH 1/2] btrfs: add extra comments " Qu Wenruo
@ 2024-04-02  7:33   ` Andrea Gelmini
  2024-04-02  8:25     ` Qu Wenruo
  2024-04-02 15:45   ` Filipe Manana
  1 sibling, 1 reply; 7+ messages in thread
From: Andrea Gelmini @ 2024-04-02  7:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Il giorno mar 2 apr 2024 alle ore 08:24 Qu Wenruo <wqu@suse.com> ha scritto:
>
> +        * This is an in-memory-only member, matching

Just a stupid fix about "mathcing".

Thanks a lot Qu,
Gelma

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: add extra comments on extent_map members
  2024-04-02  7:33   ` Andrea Gelmini
@ 2024-04-02  8:25     ` Qu Wenruo
  0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-04-02  8:25 UTC (permalink / raw)
  To: Andrea Gelmini; +Cc: linux-btrfs



在 2024/4/2 18:03, Andrea Gelmini 写道:
> Il giorno mar 2 apr 2024 alle ore 08:24 Qu Wenruo <wqu@suse.com> ha scritto:
>>
>> +        * This is an in-memory-only member, matching
> 
> Just a stupid fix about "mathcing".

Thanks for pointing out, I guess David is already giving up on my grammar...

Thanks,
Qu
> 
> Thanks a lot Qu,
> Gelma

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: add extra comments on extent_map members
  2024-04-02  6:23 ` [PATCH 1/2] btrfs: add extra comments " Qu Wenruo
  2024-04-02  7:33   ` Andrea Gelmini
@ 2024-04-02 15:45   ` Filipe Manana
  1 sibling, 0 replies; 7+ messages in thread
From: Filipe Manana @ 2024-04-02 15:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 2, 2024 at 7:24 AM Qu Wenruo <wqu@suse.com> wrote:
>
> The extent_map structure is very critical to btrfs, as it is involved
> for both read and write paths.
>
> Unfortunately the structure is not properly explained, making it pretty
> hard to understand nor to do further improvement.
>
> This patch would add extra comments explaining the major numbers base on

would add -> adds

And by "numbers" I think you wanted to say "members"?

base -> based

> my code reading.
> Hopefully we can find more members to cleanup in the future.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_map.h | 62 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 61 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
> index c5a098c99cc6..30322defcd03 100644
> --- a/fs/btrfs/extent_map.h
> +++ b/fs/btrfs/extent_map.h
> @@ -37,21 +37,81 @@ enum {
>  };
>
>  /*
> + * This extent_map structure is an in-memory representation of file extents,
> + * it would represent all file extents (including holes, no matter if we have
> + * hole file extents).

This can simply be:

"This structure represents file extents and holes."

Which also fixes some grammar (it would represent -> it represents)
and the stuff between parentheses is confusing.

> + *
>   * Keep this structure as compact as possible, as we can have really large
>   * amounts of allocated extent maps at any time.
>   */
>  struct extent_map {
>         struct rb_node rb_node;
>
> -       /* all of these are in bytes */
> +       /* All of these are in bytes */

Please add punctuation as well.

> +
> +       /*
> +        * File offset of the file extent. matching key.offset of
> +        * (INO EXTENT_DATA FILEPOS) key.
> +        */

"File offset matching the offset of a BTRFS_EXTENT_ITEM_KEY key."

That's a lot more clear than using the shortened format from tree-dump.
Also note that "matching" should start with a capital letter
(beginning of sentence.

>         u64 start;
> +
> +       /*
> +        * Length of the file extent.
> +        * For non-inlined file extents it's btrfs_file_extent_item::num_bytes.
> +        * For inlined file extents it's sectorsize. (as there is no reliable
> +        * btrfs_file_extent::num_bytes).

Don't put whole sentences in parentheses after a punctuation mark.
This can also be rephrased in a more clear way:

"For inline extents it's sectorsize and
btrfs_file_extent_item::num_bytes has data and not a valid length,
because inline data starts at offsetof(struct btrfs_file_extent_item,
disk_bytenr)."

> +        */
>         u64 len;
> +
> +       /*
> +        * The modified range start/length, these are in-memory-only
> +        * members for fsync/logtree optimization.
> +        */

These were initially used to avoid logging the same csum ranges
multiple times when extent maps get merged.
But from a quick look and experiment we don't actually need them in
order to avoid that (we don't merge new modified extents).
I'll send a patch to remove them and update the fsync logic.

>         u64 mod_start;
>         u64 mod_len;
> +
> +       /*
> +        * The file offset of the original file extent before splitting.
> +        *
> +        * This is an in-memory-only member, mathcing

in-memory-only -> in-memory only
mathcing -> matching

> +        * em::start - btrfs_file_extent_item::offset for regular/preallocated

Instead of em, which is only a typical variable name we use for extent
maps, use 'extent_map' so that it's precise and leaves no room for
confusion.

> +        * extents. EXTENT_MAP_HOLE otherwise.
> +        */
>         u64 orig_start;
> +
> +       /*
> +        * The full on-disk extent length, matching
> +        * btrfs_file_extent_item::disk_num_bytes.
> +        */
>         u64 orig_block_len;
> +
> +       /*
> +        * The decompressed size of the whole on-disk extent, matching
> +        * btrfs_file_extent_item::ram_bytes.
> +        *
> +        * For non-compressed extents, this matches orig_block_len.
> +        */
>         u64 ram_bytes;
> +
> +       /*
> +        * The on-disk logical bytenr for the file extent.
> +        *
> +        * For compressed extents it matches btrfs_file_extent_item::disk_bytenr.
> +        * For uncompressed extents it matches
> +        * btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset
> +        *
> +        * For hole extents it is EXTENT_MAP_HOLE and for inline extents it is
> +        * EXTENT_MAP_INLINE.
> +        */
>         u64 block_start;
> +
> +       /*
> +        * The on-disk length for the file extent.
> +        *
> +        * For compressed extents it matches btrfs_file_extent_item::disk_num_bytes.
> +        * For uncompressed extents it matches em::len.

Same as before, use 'extent_map::len' instead of 'em::len'.

> +        * Otherwise -1 (aka doesn't make much sense).

That is cryptic...
Should be something like:  "If the extent map represents a hole, then
it's -1 and shouldn't be used."

Thanks.



> +        */
>         u64 block_len;
>
>         /*
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: simplify the inline extent map creation
  2024-04-02  6:23 ` [PATCH 2/2] btrfs: simplify the inline extent map creation Qu Wenruo
@ 2024-04-02 15:50   ` Filipe Manana
  0 siblings, 0 replies; 7+ messages in thread
From: Filipe Manana @ 2024-04-02 15:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 2, 2024 at 7:24 AM Qu Wenruo <wqu@suse.com> wrote:
>
> With the tree-checker ensuring all inline file extents starts at file
> offset 0 and has a length no larger than sectorsize, we can simplify the
> calculation to assigned those fixes values directly.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/file-item.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index e58fb5347e65..de3ccee38572 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -1265,18 +1265,19 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>         struct extent_buffer *leaf = path->nodes[0];
>         const int slot = path->slots[0];
>         struct btrfs_key key;
> -       u64 extent_start, extent_end;
> +       u64 extent_start;
>         u64 bytenr;
>         u8 type = btrfs_file_extent_type(leaf, fi);
>         int compress_type = btrfs_file_extent_compression(leaf, fi);
>
>         btrfs_item_key_to_cpu(leaf, &key, slot);
>         extent_start = key.offset;
> -       extent_end = btrfs_file_extent_end(path);
>         em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
>         em->generation = btrfs_file_extent_generation(leaf, fi);
>         if (type == BTRFS_FILE_EXTENT_REG ||
>             type == BTRFS_FILE_EXTENT_PREALLOC) {
> +               u64 extent_end = btrfs_file_extent_end(path);
> +

This can be made const or better, since it's only used in one place
now and fits within 80 characters, just use the expression directly
instead of using a variable.

>                 em->start = extent_start;
>                 em->len = extent_end - extent_start;

Which is here:   em->len = btrfs_file_extent_end(path) - extent_start;

>                 em->orig_start = extent_start -
> @@ -1299,9 +1300,12 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>                                 em->flags |= EXTENT_FLAG_PREALLOC;
>                 }
>         } else if (type == BTRFS_FILE_EXTENT_INLINE) {
> +               /* Tree-checker has ensured this. */
> +               ASSERT(extent_start == 0);
> +
>                 em->block_start = EXTENT_MAP_INLINE;
> -               em->start = extent_start;
> -               em->len = extent_end - extent_start;
> +               em->start = 0;
> +               em->len = fs_info->sectorsize;
>                 /*
>                  * Initialize orig_start and block_len with the same values
>                  * as in inode.c:btrfs_get_extent().
> @@ -1336,7 +1340,7 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path)
>
>         if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) {
>                 end = btrfs_file_extent_ram_bytes(leaf, fi);
> -               end = ALIGN(key.offset + end, leaf->fs_info->sectorsize);
> +               end = leaf->fs_info->sectorsize;

So both assignments to "end" should be removed, not just the second
one, as the new one doesn't need the previous and makes it pointless.

Thanks.

>         } else {
>                 end = key.offset + btrfs_file_extent_num_bytes(leaf, fi);
>         }
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-02 15:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-02  6:23 [PATCH 0/2] btrfs: more explaination on extent_map members Qu Wenruo
2024-04-02  6:23 ` [PATCH 1/2] btrfs: add extra comments " Qu Wenruo
2024-04-02  7:33   ` Andrea Gelmini
2024-04-02  8:25     ` Qu Wenruo
2024-04-02 15:45   ` Filipe Manana
2024-04-02  6:23 ` [PATCH 2/2] btrfs: simplify the inline extent map creation Qu Wenruo
2024-04-02 15:50   ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox