[PATCH v3 0/3] btrfs: more explaination on extent

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/3] btrfs: more explaination on extent_map members
@ 2024-04-05  3:27 Qu Wenruo
  2024-04-05  3:27 ` [PATCH v3 1/3] btrfs: add extra comments " Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-04-05  3:27 UTC (permalink / raw)
  To: linux-btrfs

[REPO]
https://github.com/adam900710/linux.git/ em_cleanup

[CHANGELOG]
v3:
- Rebased to latest for-next branch
- Further comments polishment
- Coding style update to follow the guideline

v2:
- Add Filipe's cleanup on mod_start/mod_len
  These two members are no longer utilized, saving me quite some time on
  digging into their usage.

- Update the comments of the extent_map structure
  To make them more readable and less confusing.

- Further cleanup for inline extent_map reading

- A new patch to do extra sanity checks for create_io_em()
  Firstly pure NOCOW writes should not call create_io_em(), secondly
  with the new knowledge of extent_map, it's easier to do extra sanity
  checks for the already pretty long parameter list.

Btrfs uses extent_map to represent a in-memory file extent.

There are severam members that are 1:1 mappe in on-disk file extent
items and extent maps:

- extent_map::start	==	key.offset
- extent_map::len	==	file_extent_num_bytes
- extent_map::ram_bytes	==	file_extent_ram_bytes

But that's all, the remaining are pretty different:

- Use block_start to indicate holes/inline extents
  Meanwhile btrfs on-disk file extent items go with a dedicated type for
  inline extents, and disk_bytenr 0 for holes.

- Weird block_start/orig_block_len/orig_start
  In theory we can directly go with the same file_extent_disk_bytenr,
  file_extent_disk_num_bytes and file_extent_offset to calculate the
  remaining members (block_start/orig_start/orig_block_len/block_len).

  But for whatever reason, we didn't go that path and have a hell of
  weird and inconsistent calculation for them.

I do not have the confidence to handle the mess yet, but as the first
step, I would add comments for those members mostly according to
btrfs_extent_item_to_extent_map(), and hopefully we can improve the
situation in not-far-away future.

Qu Wenruo (3):
  btrfs: add extra comments on extent_map members
  btrfs: simplify the inline extent map creation
  btrfs: add extra sanity checks for create_io_em()

 fs/btrfs/extent_map.h | 52 ++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/file-item.c  | 20 ++++++++---------
 fs/btrfs/inode.c      | 40 ++++++++++++++++++++++++++++++++-
 3 files changed, 100 insertions(+), 12 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/3] btrfs: add extra comments on extent_map members
  2024-04-05  3:27 [PATCH v3 0/3] btrfs: more explaination on extent_map members Qu Wenruo
@ 2024-04-05  3:27 ` Qu Wenruo
  2024-04-05 12:24   ` Filipe Manana
  2024-04-05  3:27 ` [PATCH v3 2/3] btrfs: simplify the inline extent map creation Qu Wenruo
  2024-04-05  3:27 ` [PATCH v3 3/3] btrfs: add extra sanity checks for create_io_em() Qu Wenruo
  2 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2024-04-05  3:27 UTC (permalink / raw)
  To: linux-btrfs

The extent_map structure is very critical to btrfs, as it is involved
for both read and write paths.

Unfortunately the structure is not properly explained, making it pretty
hard to understand nor to do further improvement.

This patch adds extra comments explaining the major members based on
my code reading.
Hopefully we can find more members to cleanup in the future.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_map.h | 52 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 10e9491865c9..0b938e12cc78 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -35,19 +35,69 @@ enum {
 };
 
 /*
+ * This structure represents file extents and holes.
+ *
  * Keep this structure as compact as possible, as we can have really large
  * amounts of allocated extent maps at any time.
  */
 struct extent_map {
 	struct rb_node rb_node;
 
-	/* all of these are in bytes */
+	/* All of these are in bytes. */
+
+	/* File offset matching the offset of a BTRFS_EXTENT_ITEM_KEY key. */
 	u64 start;
+
+	/*
+	 * Length of the file extent.
+	 *
+	 * For non-inlined file extents it's btrfs_file_extent_item::num_bytes.
+	 * For inline extents it's sectorsize, since inline data starts at
+	 * offsetof(struct btrfs_file_extent_item, disk_bytenr) thus
+	 * btrfs_file_extent_item::num_bytes is not valid.
+	 */
 	u64 len;
+
+	/*
+	 * The file offset of the original file extent before splitting.
+	 *
+	 * This is an in-memory only member, matching
+	 * extent_map::start - btrfs_file_extent_item::offset for
+	 * regular/preallocated extents. EXTENT_MAP_HOLE otherwise.
+	 */
 	u64 orig_start;
+
+	/*
+	 * The full on-disk extent length, matching
+	 * btrfs_file_extent_item::disk_num_bytes.
+	 */
 	u64 orig_block_len;
+
+	/*
+	 * The decompressed size of the whole on-disk extent, matching
+	 * btrfs_file_extent_item::ram_bytes.
+	 */
 	u64 ram_bytes;
+
+	/*
+	 * The on-disk logical bytenr for the file extent.
+	 *
+	 * For compressed extents it matches btrfs_file_extent_item::disk_bytenr.
+	 * For uncompressed extents it matches
+	 * btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset
+	 *
+	 * For holes it is EXTENT_MAP_HOLE and for inline extents it is
+	 * EXTENT_MAP_INLINE.
+	 */
 	u64 block_start;
+
+	/*
+	 * The on-disk length for the file extent.
+	 *
+	 * For compressed extents it matches btrfs_file_extent_item::disk_num_bytes.
+	 * For uncompressed extents it matches extent_map::len.
+	 * For holes and inline extents it's -1 and shouldn't be used.
+	 */
 	u64 block_len;
 
 	/*
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/3] btrfs: simplify the inline extent map creation
  2024-04-05  3:27 [PATCH v3 0/3] btrfs: more explaination on extent_map members Qu Wenruo
  2024-04-05  3:27 ` [PATCH v3 1/3] btrfs: add extra comments " Qu Wenruo
@ 2024-04-05  3:27 ` Qu Wenruo
  2024-04-05 12:25   ` Filipe Manana
  2024-04-05  3:27 ` [PATCH v3 3/3] btrfs: add extra sanity checks for create_io_em() Qu Wenruo
  2 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2024-04-05  3:27 UTC (permalink / raw)
  To: linux-btrfs

With the tree-checker ensuring all inline file extents starts at file
offset 0 and has a length no larger than sectorsize, we can simplify the
calculation to assigned those fixes values directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file-item.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index e58fb5347e65..844439f19949 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -1265,20 +1265,19 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 	struct extent_buffer *leaf = path->nodes[0];
 	const int slot = path->slots[0];
 	struct btrfs_key key;
-	u64 extent_start, extent_end;
+	u64 extent_start;
 	u64 bytenr;
 	u8 type = btrfs_file_extent_type(leaf, fi);
 	int compress_type = btrfs_file_extent_compression(leaf, fi);
 
 	btrfs_item_key_to_cpu(leaf, &key, slot);
 	extent_start = key.offset;
-	extent_end = btrfs_file_extent_end(path);
 	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
 	em->generation = btrfs_file_extent_generation(leaf, fi);
 	if (type == BTRFS_FILE_EXTENT_REG ||
 	    type == BTRFS_FILE_EXTENT_PREALLOC) {
 		em->start = extent_start;
-		em->len = extent_end - extent_start;
+		em->len = btrfs_file_extent_end(path) - extent_start;
 		em->orig_start = extent_start -
 			btrfs_file_extent_offset(leaf, fi);
 		em->orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi);
@@ -1299,9 +1298,12 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 				em->flags |= EXTENT_FLAG_PREALLOC;
 		}
 	} else if (type == BTRFS_FILE_EXTENT_INLINE) {
+		/* Tree-checker has ensured this. */
+		ASSERT(extent_start == 0);
+
 		em->block_start = EXTENT_MAP_INLINE;
-		em->start = extent_start;
-		em->len = extent_end - extent_start;
+		em->start = 0;
+		em->len = fs_info->sectorsize;
 		/*
 		 * Initialize orig_start and block_len with the same values
 		 * as in inode.c:btrfs_get_extent().
@@ -1334,12 +1336,10 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path)
 	ASSERT(key.type == BTRFS_EXTENT_DATA_KEY);
 	fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
 
-	if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) {
-		end = btrfs_file_extent_ram_bytes(leaf, fi);
-		end = ALIGN(key.offset + end, leaf->fs_info->sectorsize);
-	} else {
+	if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE)
+		end = leaf->fs_info->sectorsize;
+	else
 		end = key.offset + btrfs_file_extent_num_bytes(leaf, fi);
-	}
 
 	return end;
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/3] btrfs: add extra sanity checks for create_io_em()
  2024-04-05  3:27 [PATCH v3 0/3] btrfs: more explaination on extent_map members Qu Wenruo
  2024-04-05  3:27 ` [PATCH v3 1/3] btrfs: add extra comments " Qu Wenruo
  2024-04-05  3:27 ` [PATCH v3 2/3] btrfs: simplify the inline extent map creation Qu Wenruo
@ 2024-04-05  3:27 ` Qu Wenruo
  2024-04-05 12:27   ` Filipe Manana
  2 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2024-04-05  3:27 UTC (permalink / raw)
  To: linux-btrfs

The function create_io_em() is called before we submit an IO, to update
the in-memory extent map for the involved range.

This patch changes the following aspects:

- Does not allow BTRFS_ORDERED_NOCOW type
  For real NOCOW (excluding NOCOW writes into preallocated ranges)
  writes, we never call create_io_em(), as we does not need to update
  the extent map at all.

  So remove the sanity check allowing BTRFS_ORDERED_NOCOW type.

- Add extra sanity checks
  * PREALLOC
    - @block_len == len
      For uncompressed writes.

  * REGULAR
    - @block_len == @orig_block_len == @ram_bytes == @len
      We're creating a new uncompressed extent, and referring all of it.

    - @orig_start == @start
      We haven no offset inside the extent.

  * COMPRESSED
    - valid @compress_type
    - @len <= @ram_bytes
      This is to co-operate with encoded writes, which can cause a new
      file extent referring only part of a uncompressed extent.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c6f2b5d1dee1..ced916f42bab 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7320,11 +7320,49 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 	struct extent_map *em;
 	int ret;
 
+	/*
+	 * Note the missing of NOCOW type.
+	 *
+	 * For pure NOCOW writes, we should not create an io extent map,
+	 * but just reusing the existing one.
+	 * Only PREALLOC writes (NOCOW write into preallocated range) can
+	 * create io extent map.
+	 */
 	ASSERT(type == BTRFS_ORDERED_PREALLOC ||
 	       type == BTRFS_ORDERED_COMPRESSED ||
-	       type == BTRFS_ORDERED_NOCOW ||
 	       type == BTRFS_ORDERED_REGULAR);
 
+	switch (type) {
+	case BTRFS_ORDERED_PREALLOC:
+		/* Uncompressed extents. */
+		ASSERT(block_len == len);
+
+		/* We're only referring part of a larger preallocated extent. */
+		ASSERT(block_len <= ram_bytes);
+		break;
+	case BTRFS_ORDERED_REGULAR:
+		/* Uncompressed extents. */
+		ASSERT(block_len == len);
+
+		/* COW results a new extent matching our file extent size. */
+		ASSERT(orig_block_len == len);
+		ASSERT(ram_bytes == len);
+
+		/* Since it's a new extent, we should not have any offset. */
+		ASSERT(orig_start == start);
+		break;
+	case BTRFS_ORDERED_COMPRESSED:
+		/* Must be compressed. */
+		ASSERT(compress_type != BTRFS_COMPRESS_NONE);
+
+		/*
+		 * Encoded write can make us to refer to part of the
+		 * uncompressed extent.
+		 */
+		ASSERT(len <= ram_bytes);
+		break;
+	}
+
 	em = alloc_extent_map();
 	if (!em)
 		return ERR_PTR(-ENOMEM);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] btrfs: add extra comments on extent_map members
  2024-04-05  3:27 ` [PATCH v3 1/3] btrfs: add extra comments " Qu Wenruo
@ 2024-04-05 12:24   ` Filipe Manana
  2024-04-05 21:36     ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Filipe Manana @ 2024-04-05 12:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 5, 2024 at 4:28 AM Qu Wenruo <wqu@suse.com> wrote:
>
> The extent_map structure is very critical to btrfs, as it is involved
> for both read and write paths.
>
> Unfortunately the structure is not properly explained, making it pretty
> hard to understand nor to do further improvement.
>
> This patch adds extra comments explaining the major members based on
> my code reading.
> Hopefully we can find more members to cleanup in the future.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_map.h | 52 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
> index 10e9491865c9..0b938e12cc78 100644
> --- a/fs/btrfs/extent_map.h
> +++ b/fs/btrfs/extent_map.h
> @@ -35,19 +35,69 @@ enum {
>  };
>
>  /*
> + * This structure represents file extents and holes.

So I clearly forgot this before, but we should add the caveat that
it's guaranteed that it represents a single file extent only if the
extent is new and not persisted (in the list of modified extents and
not fsynced).
Otherwise it can represent 2 or more extents that were merged (to save
memory), which adds some caveats I mention below.

Holes can also be merged of course (e.g. read part of a hole, we
create an extent map for that part, read the remainder of the hole,
create another extent map for that remainder, which then merges with
the former).

> + *
>   * Keep this structure as compact as possible, as we can have really large
>   * amounts of allocated extent maps at any time.
>   */
>  struct extent_map {
>         struct rb_node rb_node;
>
> -       /* all of these are in bytes */
> +       /* All of these are in bytes. */
> +
> +       /* File offset matching the offset of a BTRFS_EXTENT_ITEM_KEY key. */
>         u64 start;
> +
> +       /*
> +        * Length of the file extent.
> +        *
> +        * For non-inlined file extents it's btrfs_file_extent_item::num_bytes.
> +        * For inline extents it's sectorsize, since inline data starts at
> +        * offsetof(struct btrfs_file_extent_item, disk_bytenr) thus
> +        * btrfs_file_extent_item::num_bytes is not valid.
> +        */
>         u64 len;
> +
> +       /*
> +        * The file offset of the original file extent before splitting.
> +        *
> +        * This is an in-memory only member, matching
> +        * extent_map::start - btrfs_file_extent_item::offset for
> +        * regular/preallocated extents. EXTENT_MAP_HOLE otherwise.
> +        */
>         u64 orig_start;
> +
> +       /*
> +        * The full on-disk extent length, matching
> +        * btrfs_file_extent_item::disk_num_bytes.
> +        */

So yes and no.
When merging extent maps, it's not updated, so it's tricky.
But that's ok because an extent map only needs to represent exactly
one file extent item if it's new and was not fsynced yet.

>         u64 orig_block_len;
> +
> +       /*
> +        * The decompressed size of the whole on-disk extent, matching
> +        * btrfs_file_extent_item::ram_bytes.
> +        */
>         u64 ram_bytes;

Same here regarding the merging.

Sorry I forgot this before.

> +
> +       /*
> +        * The on-disk logical bytenr for the file extent.
> +        *
> +        * For compressed extents it matches btrfs_file_extent_item::disk_bytenr.
> +        * For uncompressed extents it matches
> +        * btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset
> +        *
> +        * For holes it is EXTENT_MAP_HOLE and for inline extents it is
> +        * EXTENT_MAP_INLINE.
> +        */
>         u64 block_start;
> +
> +       /*
> +        * The on-disk length for the file extent.
> +        *
> +        * For compressed extents it matches btrfs_file_extent_item::disk_num_bytes.
> +        * For uncompressed extents it matches extent_map::len.
> +        * For holes and inline extents it's -1 and shouldn't be used.
> +        */
>         u64 block_len;
>
>         /*
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/3] btrfs: simplify the inline extent map creation
  2024-04-05  3:27 ` [PATCH v3 2/3] btrfs: simplify the inline extent map creation Qu Wenruo
@ 2024-04-05 12:25   ` Filipe Manana
  0 siblings, 0 replies; 10+ messages in thread
From: Filipe Manana @ 2024-04-05 12:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 5, 2024 at 4:28 AM Qu Wenruo <wqu@suse.com> wrote:
>
> With the tree-checker ensuring all inline file extents starts at file
> offset 0 and has a length no larger than sectorsize, we can simplify the
> calculation to assigned those fixes values directly.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good now, thanks.

> ---
>  fs/btrfs/file-item.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index e58fb5347e65..844439f19949 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -1265,20 +1265,19 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>         struct extent_buffer *leaf = path->nodes[0];
>         const int slot = path->slots[0];
>         struct btrfs_key key;
> -       u64 extent_start, extent_end;
> +       u64 extent_start;
>         u64 bytenr;
>         u8 type = btrfs_file_extent_type(leaf, fi);
>         int compress_type = btrfs_file_extent_compression(leaf, fi);
>
>         btrfs_item_key_to_cpu(leaf, &key, slot);
>         extent_start = key.offset;
> -       extent_end = btrfs_file_extent_end(path);
>         em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
>         em->generation = btrfs_file_extent_generation(leaf, fi);
>         if (type == BTRFS_FILE_EXTENT_REG ||
>             type == BTRFS_FILE_EXTENT_PREALLOC) {
>                 em->start = extent_start;
> -               em->len = extent_end - extent_start;
> +               em->len = btrfs_file_extent_end(path) - extent_start;
>                 em->orig_start = extent_start -
>                         btrfs_file_extent_offset(leaf, fi);
>                 em->orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi);
> @@ -1299,9 +1298,12 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>                                 em->flags |= EXTENT_FLAG_PREALLOC;
>                 }
>         } else if (type == BTRFS_FILE_EXTENT_INLINE) {
> +               /* Tree-checker has ensured this. */
> +               ASSERT(extent_start == 0);
> +
>                 em->block_start = EXTENT_MAP_INLINE;
> -               em->start = extent_start;
> -               em->len = extent_end - extent_start;
> +               em->start = 0;
> +               em->len = fs_info->sectorsize;
>                 /*
>                  * Initialize orig_start and block_len with the same values
>                  * as in inode.c:btrfs_get_extent().
> @@ -1334,12 +1336,10 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path)
>         ASSERT(key.type == BTRFS_EXTENT_DATA_KEY);
>         fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
>
> -       if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) {
> -               end = btrfs_file_extent_ram_bytes(leaf, fi);
> -               end = ALIGN(key.offset + end, leaf->fs_info->sectorsize);
> -       } else {
> +       if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE)
> +               end = leaf->fs_info->sectorsize;
> +       else
>                 end = key.offset + btrfs_file_extent_num_bytes(leaf, fi);
> -       }
>
>         return end;
>  }
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/3] btrfs: add extra sanity checks for create_io_em()
  2024-04-05  3:27 ` [PATCH v3 3/3] btrfs: add extra sanity checks for create_io_em() Qu Wenruo
@ 2024-04-05 12:27   ` Filipe Manana
  0 siblings, 0 replies; 10+ messages in thread
From: Filipe Manana @ 2024-04-05 12:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 5, 2024 at 4:28 AM Qu Wenruo <wqu@suse.com> wrote:
>
> The function create_io_em() is called before we submit an IO, to update
> the in-memory extent map for the involved range.
>
> This patch changes the following aspects:
>
> - Does not allow BTRFS_ORDERED_NOCOW type
>   For real NOCOW (excluding NOCOW writes into preallocated ranges)
>   writes, we never call create_io_em(), as we does not need to update
>   the extent map at all.
>
>   So remove the sanity check allowing BTRFS_ORDERED_NOCOW type.
>
> - Add extra sanity checks
>   * PREALLOC
>     - @block_len == len
>       For uncompressed writes.
>
>   * REGULAR
>     - @block_len == @orig_block_len == @ram_bytes == @len
>       We're creating a new uncompressed extent, and referring all of it.
>
>     - @orig_start == @start
>       We haven no offset inside the extent.
>
>   * COMPRESSED
>     - valid @compress_type
>     - @len <= @ram_bytes
>       This is to co-operate with encoded writes, which can cause a new
>       file extent referring only part of a uncompressed extent.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good now, thanks.

> ---
>  fs/btrfs/inode.c | 40 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index c6f2b5d1dee1..ced916f42bab 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7320,11 +7320,49 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
>         struct extent_map *em;
>         int ret;
>
> +       /*
> +        * Note the missing of NOCOW type.
> +        *
> +        * For pure NOCOW writes, we should not create an io extent map,
> +        * but just reusing the existing one.
> +        * Only PREALLOC writes (NOCOW write into preallocated range) can
> +        * create io extent map.
> +        */
>         ASSERT(type == BTRFS_ORDERED_PREALLOC ||
>                type == BTRFS_ORDERED_COMPRESSED ||
> -              type == BTRFS_ORDERED_NOCOW ||
>                type == BTRFS_ORDERED_REGULAR);
>
> +       switch (type) {
> +       case BTRFS_ORDERED_PREALLOC:
> +               /* Uncompressed extents. */
> +               ASSERT(block_len == len);
> +
> +               /* We're only referring part of a larger preallocated extent. */
> +               ASSERT(block_len <= ram_bytes);
> +               break;
> +       case BTRFS_ORDERED_REGULAR:
> +               /* Uncompressed extents. */
> +               ASSERT(block_len == len);
> +
> +               /* COW results a new extent matching our file extent size. */
> +               ASSERT(orig_block_len == len);
> +               ASSERT(ram_bytes == len);
> +
> +               /* Since it's a new extent, we should not have any offset. */
> +               ASSERT(orig_start == start);
> +               break;
> +       case BTRFS_ORDERED_COMPRESSED:
> +               /* Must be compressed. */
> +               ASSERT(compress_type != BTRFS_COMPRESS_NONE);
> +
> +               /*
> +                * Encoded write can make us to refer to part of the
> +                * uncompressed extent.
> +                */
> +               ASSERT(len <= ram_bytes);
> +               break;
> +       }
> +
>         em = alloc_extent_map();
>         if (!em)
>                 return ERR_PTR(-ENOMEM);
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] btrfs: add extra comments on extent_map members
  2024-04-05 12:24   ` Filipe Manana
@ 2024-04-05 21:36     ` Qu Wenruo
  2024-04-08 11:11       ` Filipe Manana
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2024-04-05 21:36 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



在 2024/4/5 22:54, Filipe Manana 写道:
> On Fri, Apr 5, 2024 at 4:28 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> The extent_map structure is very critical to btrfs, as it is involved
>> for both read and write paths.
>>
>> Unfortunately the structure is not properly explained, making it pretty
>> hard to understand nor to do further improvement.
>>
>> This patch adds extra comments explaining the major members based on
>> my code reading.
>> Hopefully we can find more members to cleanup in the future.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_map.h | 52 ++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 51 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
>> index 10e9491865c9..0b938e12cc78 100644
>> --- a/fs/btrfs/extent_map.h
>> +++ b/fs/btrfs/extent_map.h
>> @@ -35,19 +35,69 @@ enum {
>>   };
>>
>>   /*
>> + * This structure represents file extents and holes.
>
> So I clearly forgot this before, but we should add the caveat that
> it's guaranteed that it represents a single file extent only if the
> extent is new and not persisted (in the list of modified extents and
> not fsynced).
> Otherwise it can represent 2 or more extents that were merged (to save
> memory), which adds some caveats I mention below.

In fact I also wanted to address this, especially if I'm going to
introduce disk_bytenr/disk_num_bytes, as they can be merged, the merged
disk_bytenr/disk_num_bytes would not exist on-disk.

But on the other hand, it's a little too obvious, thus I didn't mention
through the whole series.

>
> Holes can also be merged of course (e.g. read part of a hole, we
> create an extent map for that part, read the remainder of the hole,
> create another extent map for that remainder, which then merges with
> the former).
>
[...]
>> +
>> +       /*
>> +        * The full on-disk extent length, matching
>> +        * btrfs_file_extent_item::disk_num_bytes.
>> +        */
>
> So yes and no.
> When merging extent maps, it's not updated, so it's tricky.

And that's already found in my sanity checks.

> But that's ok because an extent map only needs to represent exactly
> one file extent item if it's new and was not fsynced yet.

I can update the comments to add extra comments on merged behavior, and
fix the unexpected handling for merging/split.

>
>>          u64 orig_block_len;
>> +
>> +       /*
>> +        * The decompressed size of the whole on-disk extent, matching
>> +        * btrfs_file_extent_item::ram_bytes.
>> +        */
>>          u64 ram_bytes;
>
> Same here regarding the merging.
>
> Sorry I forgot this before.

No big deal, as my super strict sanity checks are crashing everywhere, I
have already experienced this quirk.

So I would add extra comments on the merging behaviors, but I'm afraid
since the current code doesn't handle certain members correctly (and a
lot of callers even do not populate things like ram_bytes), I'm afraid
those extra comments would only come after I fixed all of them.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] btrfs: add extra comments on extent_map members
  2024-04-05 21:36     ` Qu Wenruo
@ 2024-04-08 11:11       ` Filipe Manana
  2024-04-08 22:05         ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Filipe Manana @ 2024-04-08 11:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Fri, Apr 5, 2024 at 10:36 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/4/5 22:54, Filipe Manana 写道:
> > On Fri, Apr 5, 2024 at 4:28 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> The extent_map structure is very critical to btrfs, as it is involved
> >> for both read and write paths.
> >>
> >> Unfortunately the structure is not properly explained, making it pretty
> >> hard to understand nor to do further improvement.
> >>
> >> This patch adds extra comments explaining the major members based on
> >> my code reading.
> >> Hopefully we can find more members to cleanup in the future.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>   fs/btrfs/extent_map.h | 52 ++++++++++++++++++++++++++++++++++++++++++-
> >>   1 file changed, 51 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
> >> index 10e9491865c9..0b938e12cc78 100644
> >> --- a/fs/btrfs/extent_map.h
> >> +++ b/fs/btrfs/extent_map.h
> >> @@ -35,19 +35,69 @@ enum {
> >>   };
> >>
> >>   /*
> >> + * This structure represents file extents and holes.
> >
> > So I clearly forgot this before, but we should add the caveat that
> > it's guaranteed that it represents a single file extent only if the
> > extent is new and not persisted (in the list of modified extents and
> > not fsynced).
> > Otherwise it can represent 2 or more extents that were merged (to save
> > memory), which adds some caveats I mention below.
>
> In fact I also wanted to address this, especially if I'm going to
> introduce disk_bytenr/disk_num_bytes, as they can be merged, the merged
> disk_bytenr/disk_num_bytes would not exist on-disk.
>
> But on the other hand, it's a little too obvious, thus I didn't mention
> through the whole series.

Well, a lot of the comments this patch is adding are too obvious as well.
So I don't see why mentioning an extent map may represent merged
extents is too obvious, if anything, it's less obvious than the
comments for some of the fields the patch is adding.

>
> >
> > Holes can also be merged of course (e.g. read part of a hole, we
> > create an extent map for that part, read the remainder of the hole,
> > create another extent map for that remainder, which then merges with
> > the former).
> >
> [...]
> >> +
> >> +       /*
> >> +        * The full on-disk extent length, matching
> >> +        * btrfs_file_extent_item::disk_num_bytes.
> >> +        */
> >
> > So yes and no.
> > When merging extent maps, it's not updated, so it's tricky.
>
> And that's already found in my sanity checks.
>
> > But that's ok because an extent map only needs to represent exactly
> > one file extent item if it's new and was not fsynced yet.
>
> I can update the comments to add extra comments on merged behavior, and
> fix the unexpected handling for merging/split.

Fix what?
It's only important to be accurate for extents that need to be logged
(in the modified list of extents), and those are never merged or
split.

Otherwise it doesn't affect use cases other than fsync.

>
> >
> >>          u64 orig_block_len;
> >> +
> >> +       /*
> >> +        * The decompressed size of the whole on-disk extent, matching
> >> +        * btrfs_file_extent_item::ram_bytes.
> >> +        */
> >>          u64 ram_bytes;
> >
> > Same here regarding the merging.
> >
> > Sorry I forgot this before.
>
> No big deal, as my super strict sanity checks are crashing everywhere, I
> have already experienced this quirk.
>
> So I would add extra comments on the merging behaviors, but I'm afraid
> since the current code doesn't handle certain members correctly (and a
> lot of callers even do not populate things like ram_bytes), I'm afraid
> those extra comments would only come after I fixed all of them.

Well, as mentioned before it's ok for some fields to not be updated
after merging, and to a lesser extent, splitted.
Having all fields correct only makes sense when the mapping to a file
extent item is exactly 1 to 1 and the extent is new and needs to be
logged.

>
> Thanks,
> Qu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] btrfs: add extra comments on extent_map members
  2024-04-08 11:11       ` Filipe Manana
@ 2024-04-08 22:05         ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-04-08 22:05 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Qu Wenruo, linux-btrfs



在 2024/4/8 20:41, Filipe Manana 写道:
> On Fri, Apr 5, 2024 at 10:36 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
[...]
>>
>> But on the other hand, it's a little too obvious, thus I didn't mention
>> through the whole series.
>
> Well, a lot of the comments this patch is adding are too obvious as well.
> So I don't see why mentioning an extent map may represent merged
> extents is too obvious, if anything, it's less obvious than the
> comments for some of the fields the patch is adding.

Well, considering how many members are not properly populated (for tests
though), and other members like "orig_start" only makes sense for
non-mergeable extents (compressed), I strongly doubt if the extent_map
itself designed is that obvious.

>
>>
>>>
>>> Holes can also be merged of course (e.g. read part of a hole, we
>>> create an extent map for that part, read the remainder of the hole,
>>> create another extent map for that remainder, which then merges with
>>> the former).
>>>
>> [...]
>>>> +
>>>> +       /*
>>>> +        * The full on-disk extent length, matching
>>>> +        * btrfs_file_extent_item::disk_num_bytes.
>>>> +        */
>>>
>>> So yes and no.
>>> When merging extent maps, it's not updated, so it's tricky.
>>
>> And that's already found in my sanity checks.
>>
>>> But that's ok because an extent map only needs to represent exactly
>>> one file extent item if it's new and was not fsynced yet.
>>
>> I can update the comments to add extra comments on merged behavior, and
>> fix the unexpected handling for merging/split.
>
> Fix what?
> It's only important to be accurate for extents that need to be logged
> (in the modified list of extents), and those are never merged or
> split.

My bad, the "merged/split handling" is for my new sanity checks on the
extent map structure, which cross checks the old members
(orig_start/block_start/block_len) against the new members
(disk_bytenr/offset).

It turns out certain members do not really much sense after merging.
E.g. orig_start.

>
> Otherwise it doesn't affect use cases other than fsync.
>
>>
>>>
>>>>           u64 orig_block_len;
>>>> +
>>>> +       /*
>>>> +        * The decompressed size of the whole on-disk extent, matching
>>>> +        * btrfs_file_extent_item::ram_bytes.
>>>> +        */
>>>>           u64 ram_bytes;
>>>
>>> Same here regarding the merging.
>>>
>>> Sorry I forgot this before.
>>
>> No big deal, as my super strict sanity checks are crashing everywhere, I
>> have already experienced this quirk.
>>
>> So I would add extra comments on the merging behaviors, but I'm afraid
>> since the current code doesn't handle certain members correctly (and a
>> lot of callers even do not populate things like ram_bytes), I'm afraid
>> those extra comments would only come after I fixed all of them.
>
> Well, as mentioned before it's ok for some fields to not be updated
> after merging, and to a lesser extent, splitted.
> Having all fields correct only makes sense when the mapping to a file
> extent item is exactly 1 to 1 and the extent is new and needs to be
> logged.

I'd say, even only for the sake of consistence, we should populate every
member correctly no matter what.

That's why I'm adding strict sanity checks for all extent maps (only for
DEBUG build).

And it already exposed several bugs:

- The @block_start mis-calculation fix you just commented on
   Thankfully it's harmless

- Weird ram_bytes generated by g/311 test case
   We can create a file extent whose ram_bytes is double its
   disk_num_bytes, and is not compressed.

I'll spend time on pinning down the ram_bytes anomaly later, even it
doesn't cause data corruption.

Thanks,
Qu

>
>>
>> Thanks,
>> Qu

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-04-08 22:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-05  3:27 [PATCH v3 0/3] btrfs: more explaination on extent_map members Qu Wenruo
2024-04-05  3:27 ` [PATCH v3 1/3] btrfs: add extra comments " Qu Wenruo
2024-04-05 12:24   ` Filipe Manana
2024-04-05 21:36     ` Qu Wenruo
2024-04-08 11:11       ` Filipe Manana
2024-04-08 22:05         ` Qu Wenruo
2024-04-05  3:27 ` [PATCH v3 2/3] btrfs: simplify the inline extent map creation Qu Wenruo
2024-04-05 12:25   ` Filipe Manana
2024-04-05  3:27 ` [PATCH v3 3/3] btrfs: add extra sanity checks for create_io_em() Qu Wenruo
2024-04-05 12:27   ` Filipe Manana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.