[PATCH v2 0/4] btrfs: more explaination on extent

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* [PATCH v2 0/4] btrfs: more explaination on extent_map members
@ 2024-04-03 23:41 Qu Wenruo
  2024-04-03 23:41 ` [PATCH v2 1/4] btrfs: remove not needed mod_start and mod_len from struct extent_map Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Qu Wenruo @ 2024-04-03 23:41 UTC (permalink / raw)
  To: linux-btrfs

[REPO]
https://github.com/adam900710/linux.git/ em_cleanup

[CHANGELOG]
v2:
- Add Filipe's cleanup on mod_start/mod_len
  These two members are no longer utilized, saving me quite some time on
  digging into their usage.

- Update the comments of the extent_map structure
  To make them more readable and less confusing.

- Further cleanup for inline extent_map reading

- A new patch to do extra sanity checks for create_io_em()
  Firstly pure NOCOW writes should not call create_io_em(), secondly
  with the new knowledge of extent_map, it's easier to do extra sanity
  checks for the already pretty long parameter list.

Btrfs uses extent_map to represent a in-memory file extent.

There are severam members that are 1:1 mappe in on-disk file extent
items and extent maps:

- extent_map::start	==	key.offset
- extent_map::len	==	file_extent_num_bytes
- extent_map::ram_bytes	==	file_extent_ram_bytes

But that's all, the remaining are pretty different:

- Use block_start to indicate holes/inline extents
  Meanwhile btrfs on-disk file extent items go with a dedicated type for
  inline extents, and disk_bytenr 0 for holes.

- Extra members for fsync optimization
  I'm still not 100% sure how mod_start and mod_len really works though.

- Weird block_start/orig_block_len/orig_start
  In theory we can directly go with the same file_extent_disk_bytenr,
  file_extent_disk_num_bytes and file_extent_offset to calculate the
  remaining members (block_start/orig_start/orig_block_len/block_len).

  But for whatever reason, we didn't go that path and have a hell of
  weird and inconsistent calculation for them.

I do not have the confidence to handle the mess yet, but as the first
step, I would add comments for those members mostly according to
btrfs_extent_item_to_extent_map(), and hopefully we can improve the
situation in not-far-away future.

Filipe Manana (1):
  btrfs: remove not needed mod_start and mod_len from struct extent_map

Qu Wenruo (3):
  btrfs: add extra comments on extent_map members
  btrfs: simplify the inline extent map creation
  btrfs: add extra sanity checks for create_io_em()

 fs/btrfs/extent_map.c        | 18 -----------
 fs/btrfs/extent_map.h        | 58 ++++++++++++++++++++++++++++++++----
 fs/btrfs/file-item.c         | 15 +++++-----
 fs/btrfs/inode.c             | 44 ++++++++++++++++++++++++---
 fs/btrfs/tree-log.c          |  4 +--
 include/trace/events/btrfs.h |  3 +-
 6 files changed, 104 insertions(+), 38 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/4] btrfs: remove not needed mod_start and mod_len from struct extent_map
  2024-04-03 23:41 [PATCH v2 0/4] btrfs: more explaination on extent_map members Qu Wenruo
@ 2024-04-03 23:41 ` Qu Wenruo
  2024-04-04  9:58   ` Filipe Manana
  2024-04-03 23:41 ` [PATCH v2 2/4] btrfs: add extra comments on extent_map members Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2024-04-03 23:41 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana, David Sterba

From: Filipe Manana <fdmanana@suse.com>

The mod_start and mod_len fields of struct extent_map were introduced by
commit 4e2f84e63dc1 ("Btrfs: improve fsync by filtering extents that we
want") in order to avoid too low performance when fsyncing a file that
keeps getting extent maps merge, because it resulted in each fsync logging
again csum ranges that were already merged before.

We don't need this anymore as extent maps in the list of modified extents
are never merged with other extent maps and once we log an extent map we
remove it from the list of modified extent maps, so it's never logged
twice.

So remove the mod_start and mod_len fields from struct extent_map and use
instead the start and len fields when logging checksums in the fast fsync
path. This also makes EXTENT_FLAG_FILLING unused so remove it as well.

Running the reproducer from the commit mentioned before, with a larger
number of extents and against a null block device, so that IO is fast
and we can better see any impact from searching checksums items and
logging them, gave the following results from dd:

Before this change:

   409600000 bytes (410 MB, 391 MiB) copied, 22.948 s, 17.8 MB/s

After this change:

   409600000 bytes (410 MB, 391 MiB) copied, 22.9997 s, 17.8 MB/s

So no changes in throughput.
The test was done in a release kernel (non-debug, Debian's default kernel
config) and its steps are the following:

   $ mkfs.btrfs -f /dev/nullb0
   $ mount /dev/sdb /mnt
   $ dd if=/dev/zero of=/mnt/foobar bs=4k count=100000 oflag=sync
   $ umount /mnt

This also reduces the size of struct extent_map from 128 bytes down to 112
bytes, so now we can have 36 extents maps per 4K page instead of 32.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_map.c        | 18 ------------------
 fs/btrfs/extent_map.h        |  4 ----
 fs/btrfs/inode.c             |  4 +---
 fs/btrfs/tree-log.c          |  4 ++--
 include/trace/events/btrfs.h |  3 +--
 5 files changed, 4 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 445f7716f1e2..471654cb65b0 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -252,8 +252,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 			em->len += merge->len;
 			em->block_len += merge->block_len;
 			em->block_start = merge->block_start;
-			em->mod_len = (em->mod_len + em->mod_start) - merge->mod_start;
-			em->mod_start = merge->mod_start;
 			em->generation = max(em->generation, merge->generation);
 			em->flags |= EXTENT_FLAG_MERGED;
 
@@ -271,7 +269,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 		em->block_len += merge->block_len;
 		rb_erase_cached(&merge->rb_node, &tree->map);
 		RB_CLEAR_NODE(&merge->rb_node);
-		em->mod_len = (merge->mod_start + merge->mod_len) - em->mod_start;
 		em->generation = max(em->generation, merge->generation);
 		em->flags |= EXTENT_FLAG_MERGED;
 		free_extent_map(merge);
@@ -300,7 +297,6 @@ int unpin_extent_cache(struct btrfs_inode *inode, u64 start, u64 len, u64 gen)
 	struct extent_map_tree *tree = &inode->extent_tree;
 	int ret = 0;
 	struct extent_map *em;
-	bool prealloc = false;
 
 	write_lock(&tree->lock);
 	em = lookup_extent_mapping(tree, start, len);
@@ -325,21 +321,9 @@ int unpin_extent_cache(struct btrfs_inode *inode, u64 start, u64 len, u64 gen)
 
 	em->generation = gen;
 	em->flags &= ~EXTENT_FLAG_PINNED;
-	em->mod_start = em->start;
-	em->mod_len = em->len;
-
-	if (em->flags & EXTENT_FLAG_FILLING) {
-		prealloc = true;
-		em->flags &= ~EXTENT_FLAG_FILLING;
-	}
 
 	try_merge_map(tree, em);
 
-	if (prealloc) {
-		em->mod_start = em->start;
-		em->mod_len = em->len;
-	}
-
 out:
 	write_unlock(&tree->lock);
 	free_extent_map(em);
@@ -361,8 +345,6 @@ static inline void setup_extent_mapping(struct extent_map_tree *tree,
 					int modified)
 {
 	refcount_inc(&em->refs);
-	em->mod_start = em->start;
-	em->mod_len = em->len;
 
 	ASSERT(list_empty(&em->list));
 
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index c5a098c99cc6..10e9491865c9 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -30,8 +30,6 @@ enum {
 	ENUM_BIT(EXTENT_FLAG_PREALLOC),
 	/* Logging this extent */
 	ENUM_BIT(EXTENT_FLAG_LOGGING),
-	/* Filling in a preallocated extent */
-	ENUM_BIT(EXTENT_FLAG_FILLING),
 	/* This em is merged from two or more physically adjacent ems */
 	ENUM_BIT(EXTENT_FLAG_MERGED),
 };
@@ -46,8 +44,6 @@ struct extent_map {
 	/* all of these are in bytes */
 	u64 start;
 	u64 len;
-	u64 mod_start;
-	u64 mod_len;
 	u64 orig_start;
 	u64 orig_block_len;
 	u64 ram_bytes;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3442dedff53d..c6f2b5d1dee1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7338,9 +7338,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 	em->ram_bytes = ram_bytes;
 	em->generation = -1;
 	em->flags |= EXTENT_FLAG_PINNED;
-	if (type == BTRFS_ORDERED_PREALLOC)
-		em->flags |= EXTENT_FLAG_FILLING;
-	else if (type == BTRFS_ORDERED_COMPRESSED)
+	if (type == BTRFS_ORDERED_COMPRESSED)
 		extent_map_set_compression(em, compress_type);
 
 	ret = btrfs_replace_extent_map_range(inode, em, true);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 472918a5bc73..d9777649e170 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4574,8 +4574,8 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
 	struct btrfs_root *csum_root;
 	u64 csum_offset;
 	u64 csum_len;
-	u64 mod_start = em->mod_start;
-	u64 mod_len = em->mod_len;
+	u64 mod_start = em->start;
+	u64 mod_len = em->len;
 	LIST_HEAD(ordered_sums);
 	int ret = 0;
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 90b0222390e5..766cfd48386c 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -277,8 +277,7 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict,
 		{ EXTENT_FLAG_COMPRESS_LZO,	"COMPRESS_LZO"	},\
 		{ EXTENT_FLAG_COMPRESS_ZSTD,	"COMPRESS_ZSTD"	},\
 		{ EXTENT_FLAG_PREALLOC,		"PREALLOC"	},\
-		{ EXTENT_FLAG_LOGGING,		"LOGGING"	},\
-		{ EXTENT_FLAG_FILLING,		"FILLING"	})
+		{ EXTENT_FLAG_LOGGING,		"LOGGING"	})
 
 TRACE_EVENT_CONDITION(btrfs_get_extent,
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/4] btrfs: add extra comments on extent_map members
  2024-04-03 23:41 [PATCH v2 0/4] btrfs: more explaination on extent_map members Qu Wenruo
  2024-04-03 23:41 ` [PATCH v2 1/4] btrfs: remove not needed mod_start and mod_len from struct extent_map Qu Wenruo
@ 2024-04-03 23:41 ` Qu Wenruo
  2024-04-04 10:33   ` Filipe Manana
  2024-04-03 23:42 ` [PATCH v2 3/4] btrfs: simplify the inline extent map creation Qu Wenruo
  2024-04-03 23:42 ` [PATCH v2 4/4] btrfs: add extra sanity checks for create_io_em() Qu Wenruo
  3 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2024-04-03 23:41 UTC (permalink / raw)
  To: linux-btrfs

The extent_map structure is very critical to btrfs, as it is involved
for both read and write paths.

Unfortunately the structure is not properly explained, making it pretty
hard to understand nor to do further improvement.

This patch adds extra comments explaining the major members based on
my code reading.
Hopefully we can find more members to cleanup in the future.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_map.h | 54 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 10e9491865c9..82768288c6da 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -35,19 +35,71 @@ enum {
 };
 
 /*
+ * This structure represents file extents and holes.
+ *
  * Keep this structure as compact as possible, as we can have really large
  * amounts of allocated extent maps at any time.
  */
 struct extent_map {
 	struct rb_node rb_node;
 
-	/* all of these are in bytes */
+	/* All of these are in bytes. */
+
+	/* File offset matching the offset of a BTRFS_EXTENT_ITEM_KEY key. */
 	u64 start;
+
+	/*
+	 * Length of the file extent.
+	 *
+	 * For non-inlined file extents it's btrfs_file_extent_item::num_bytes.
+	 * For inline extents it's sectorsize, since inline data starts at
+	 * offsetof(struct , disk_bytenr) thus
+	 * btrfs_file_extent_item::num_bytes is not valid.
+	 */
 	u64 len;
+
+	/*
+	 * The file offset of the original file extent before splitting.
+	 *
+	 * This is an in-memory only member, matching
+	 * extent_map::start - btrfs_file_extent_item::offset for
+	 * regular/preallocated extents. EXTENT_MAP_HOLE otherwise.
+	 */
 	u64 orig_start;
+
+	/*
+	 * The full on-disk extent length, matching
+	 * btrfs_file_extent_item::disk_num_bytes.
+	 */
 	u64 orig_block_len;
+
+	/*
+	 * The decompressed size of the whole on-disk extent, matching
+	 * btrfs_file_extent_item::ram_bytes.
+	 *
+	 * For non-compressed extents, this matches orig_block_len.
+	 */
 	u64 ram_bytes;
+
+	/*
+	 * The on-disk logical bytenr for the file extent.
+	 *
+	 * For compressed extents it matches btrfs_file_extent_item::disk_bytenr.
+	 * For uncompressed extents it matches
+	 * btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset
+	 *
+	 * For holes it is EXTENT_MAP_HOLE and for inline extents it is
+	 * EXTENT_MAP_INLINE.
+	 */
 	u64 block_start;
+
+	/*
+	 * The on-disk length for the file extent.
+	 *
+	 * For compressed extents it matches btrfs_file_extent_item::disk_num_bytes.
+	 * For uncompressed extents it matches extent_map::len.
+	 * For holes and inline extents it's -1 and shouldn't be used.
+	 */
 	u64 block_len;
 
 	/*
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/4] btrfs: simplify the inline extent map creation
  2024-04-03 23:41 [PATCH v2 0/4] btrfs: more explaination on extent_map members Qu Wenruo
  2024-04-03 23:41 ` [PATCH v2 1/4] btrfs: remove not needed mod_start and mod_len from struct extent_map Qu Wenruo
  2024-04-03 23:41 ` [PATCH v2 2/4] btrfs: add extra comments on extent_map members Qu Wenruo
@ 2024-04-03 23:42 ` Qu Wenruo
  2024-04-04 10:40   ` Filipe Manana
  2024-04-03 23:42 ` [PATCH v2 4/4] btrfs: add extra sanity checks for create_io_em() Qu Wenruo
  3 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2024-04-03 23:42 UTC (permalink / raw)
  To: linux-btrfs

With the tree-checker ensuring all inline file extents starts at file
offset 0 and has a length no larger than sectorsize, we can simplify the
calculation to assigned those fixes values directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file-item.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index e58fb5347e65..ad4761192d59 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -1265,20 +1265,19 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 	struct extent_buffer *leaf = path->nodes[0];
 	const int slot = path->slots[0];
 	struct btrfs_key key;
-	u64 extent_start, extent_end;
+	u64 extent_start;
 	u64 bytenr;
 	u8 type = btrfs_file_extent_type(leaf, fi);
 	int compress_type = btrfs_file_extent_compression(leaf, fi);
 
 	btrfs_item_key_to_cpu(leaf, &key, slot);
 	extent_start = key.offset;
-	extent_end = btrfs_file_extent_end(path);
 	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
 	em->generation = btrfs_file_extent_generation(leaf, fi);
 	if (type == BTRFS_FILE_EXTENT_REG ||
 	    type == BTRFS_FILE_EXTENT_PREALLOC) {
 		em->start = extent_start;
-		em->len = extent_end - extent_start;
+		em->len = btrfs_file_extent_end(path) - extent_start;
 		em->orig_start = extent_start -
 			btrfs_file_extent_offset(leaf, fi);
 		em->orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi);
@@ -1299,9 +1298,12 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 				em->flags |= EXTENT_FLAG_PREALLOC;
 		}
 	} else if (type == BTRFS_FILE_EXTENT_INLINE) {
+		/* Tree-checker has ensured this. */
+		ASSERT(extent_start == 0);
+
 		em->block_start = EXTENT_MAP_INLINE;
-		em->start = extent_start;
-		em->len = extent_end - extent_start;
+		em->start = 0;
+		em->len = fs_info->sectorsize;
 		/*
 		 * Initialize orig_start and block_len with the same values
 		 * as in inode.c:btrfs_get_extent().
@@ -1335,8 +1337,7 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path)
 	fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
 
 	if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) {
-		end = btrfs_file_extent_ram_bytes(leaf, fi);
-		end = ALIGN(key.offset + end, leaf->fs_info->sectorsize);
+		end = leaf->fs_info->sectorsize;
 	} else {
 		end = key.offset + btrfs_file_extent_num_bytes(leaf, fi);
 	}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/4] btrfs: add extra sanity checks for create_io_em()
  2024-04-03 23:41 [PATCH v2 0/4] btrfs: more explaination on extent_map members Qu Wenruo
                   ` (2 preceding siblings ...)
  2024-04-03 23:42 ` [PATCH v2 3/4] btrfs: simplify the inline extent map creation Qu Wenruo
@ 2024-04-03 23:42 ` Qu Wenruo
  2024-04-04 10:55   ` Filipe Manana
  3 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2024-04-03 23:42 UTC (permalink / raw)
  To: linux-btrfs

The function create_io_em() is called before we submit an IO, to update
the in-memory extent map for the involved range.

This patch changes the following aspects:

- Does not allow BTRFS_ORDERED_NOCOW type
  For real NOCOW (excluding NOCOW writes into preallocated ranges)
  writes, we never call create_io_em(), as we does not need to update
  the extent map at all.

  So remove the sanity check allowing BTRFS_ORDERED_NOCOW type.

- Add extra sanity checks
  * PREALLOC
    - @block_len == len
      For uncompressed writes.

  * REGULAR
    - @block_len == @orig_block_len == @ram_bytes == @len
      We're creating a new uncompressed extent, and referring all of it.

    - @orig_start == @start
      We haven no offset inside the extent.

  * COMPRESSED
    - valid @compress_type
    - @len <= @ram_bytes
      This is to co-operate with encoded writes, which can cause a new
      file extent referring only part of a uncompressed extent.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c6f2b5d1dee1..261fafc66533 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7320,11 +7320,49 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 	struct extent_map *em;
 	int ret;
 
+	/*
+	 * Note the missing of NOCOW type.
+	 *
+	 * For pure NOCOW writes, we should not create an io extent map,
+	 * but just reusing the existing one.
+	 * Only PREALLOC writes (NOCOW write into preallocated range) can
+	 * create io extent map.
+	 */
 	ASSERT(type == BTRFS_ORDERED_PREALLOC ||
 	       type == BTRFS_ORDERED_COMPRESSED ||
-	       type == BTRFS_ORDERED_NOCOW ||
 	       type == BTRFS_ORDERED_REGULAR);
 
+	if (type == BTRFS_ORDERED_PREALLOC) {
+		/* Uncompressed extents. */
+		ASSERT(block_len == len);
+
+		/* We're only referring part of a larger preallocated extent. */
+		ASSERT(block_len <= ram_bytes);
+	}
+
+	if (type == BTRFS_ORDERED_REGULAR) {
+		/* Uncompressed extents. */
+		ASSERT(block_len == len);
+
+		/* COW results a new extent matching our file extent size. */
+		ASSERT(orig_block_len == len);
+		ASSERT(ram_bytes == len);
+
+		/* Since it's a new extent, we should not have any offset. */
+		ASSERT(orig_start == start);
+	}
+
+	if (type == BTRFS_ORDERED_COMPRESSED) {
+		/* Must be compressed. */
+		ASSERT(compress_type != BTRFS_COMPRESS_NONE);
+
+		/*
+		 * Encoded write can make us to refer part of the
+		 * uncompressed extent.
+		 */
+		ASSERT(len <= ram_bytes);
+	}
+
 	em = alloc_extent_map();
 	if (!em)
 		return ERR_PTR(-ENOMEM);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/4] btrfs: remove not needed mod_start and mod_len from struct extent_map
  2024-04-03 23:41 ` [PATCH v2 1/4] btrfs: remove not needed mod_start and mod_len from struct extent_map Qu Wenruo
@ 2024-04-04  9:58   ` Filipe Manana
  2024-04-04 10:22     ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: Filipe Manana @ 2024-04-04  9:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Filipe Manana, David Sterba

On Thu, Apr 4, 2024 at 12:46 AM Qu Wenruo <wqu@suse.com> wrote:
>
> From: Filipe Manana <fdmanana@suse.com>
>
> The mod_start and mod_len fields of struct extent_map were introduced by
> commit 4e2f84e63dc1 ("Btrfs: improve fsync by filtering extents that we
> want") in order to avoid too low performance when fsyncing a file that
> keeps getting extent maps merge, because it resulted in each fsync logging
> again csum ranges that were already merged before.
>
> We don't need this anymore as extent maps in the list of modified extents
> are never merged with other extent maps and once we log an extent map we
> remove it from the list of modified extent maps, so it's never logged
> twice.
>
> So remove the mod_start and mod_len fields from struct extent_map and use
> instead the start and len fields when logging checksums in the fast fsync
> path. This also makes EXTENT_FLAG_FILLING unused so remove it as well.
>
> Running the reproducer from the commit mentioned before, with a larger
> number of extents and against a null block device, so that IO is fast
> and we can better see any impact from searching checksums items and
> logging them, gave the following results from dd:
>
> Before this change:
>
>    409600000 bytes (410 MB, 391 MiB) copied, 22.948 s, 17.8 MB/s
>
> After this change:
>
>    409600000 bytes (410 MB, 391 MiB) copied, 22.9997 s, 17.8 MB/s
>
> So no changes in throughput.
> The test was done in a release kernel (non-debug, Debian's default kernel
> config) and its steps are the following:
>
>    $ mkfs.btrfs -f /dev/nullb0
>    $ mount /dev/sdb /mnt
>    $ dd if=/dev/zero of=/mnt/foobar bs=4k count=100000 oflag=sync
>    $ umount /mnt
>
> This also reduces the size of struct extent_map from 128 bytes down to 112
> bytes, so now we can have 36 extents maps per 4K page instead of 32.
>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Why are you resending this?
This was already in the for-next branch.

And given the SOB tag from David, your local for-next branch was up to date.

> ---
>  fs/btrfs/extent_map.c        | 18 ------------------
>  fs/btrfs/extent_map.h        |  4 ----
>  fs/btrfs/inode.c             |  4 +---
>  fs/btrfs/tree-log.c          |  4 ++--
>  include/trace/events/btrfs.h |  3 +--
>  5 files changed, 4 insertions(+), 29 deletions(-)
>
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index 445f7716f1e2..471654cb65b0 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -252,8 +252,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
>                         em->len += merge->len;
>                         em->block_len += merge->block_len;
>                         em->block_start = merge->block_start;
> -                       em->mod_len = (em->mod_len + em->mod_start) - merge->mod_start;
> -                       em->mod_start = merge->mod_start;
>                         em->generation = max(em->generation, merge->generation);
>                         em->flags |= EXTENT_FLAG_MERGED;
>
> @@ -271,7 +269,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
>                 em->block_len += merge->block_len;
>                 rb_erase_cached(&merge->rb_node, &tree->map);
>                 RB_CLEAR_NODE(&merge->rb_node);
> -               em->mod_len = (merge->mod_start + merge->mod_len) - em->mod_start;
>                 em->generation = max(em->generation, merge->generation);
>                 em->flags |= EXTENT_FLAG_MERGED;
>                 free_extent_map(merge);
> @@ -300,7 +297,6 @@ int unpin_extent_cache(struct btrfs_inode *inode, u64 start, u64 len, u64 gen)
>         struct extent_map_tree *tree = &inode->extent_tree;
>         int ret = 0;
>         struct extent_map *em;
> -       bool prealloc = false;
>
>         write_lock(&tree->lock);
>         em = lookup_extent_mapping(tree, start, len);
> @@ -325,21 +321,9 @@ int unpin_extent_cache(struct btrfs_inode *inode, u64 start, u64 len, u64 gen)
>
>         em->generation = gen;
>         em->flags &= ~EXTENT_FLAG_PINNED;
> -       em->mod_start = em->start;
> -       em->mod_len = em->len;
> -
> -       if (em->flags & EXTENT_FLAG_FILLING) {
> -               prealloc = true;
> -               em->flags &= ~EXTENT_FLAG_FILLING;
> -       }
>
>         try_merge_map(tree, em);
>
> -       if (prealloc) {
> -               em->mod_start = em->start;
> -               em->mod_len = em->len;
> -       }
> -
>  out:
>         write_unlock(&tree->lock);
>         free_extent_map(em);
> @@ -361,8 +345,6 @@ static inline void setup_extent_mapping(struct extent_map_tree *tree,
>                                         int modified)
>  {
>         refcount_inc(&em->refs);
> -       em->mod_start = em->start;
> -       em->mod_len = em->len;
>
>         ASSERT(list_empty(&em->list));
>
> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
> index c5a098c99cc6..10e9491865c9 100644
> --- a/fs/btrfs/extent_map.h
> +++ b/fs/btrfs/extent_map.h
> @@ -30,8 +30,6 @@ enum {
>         ENUM_BIT(EXTENT_FLAG_PREALLOC),
>         /* Logging this extent */
>         ENUM_BIT(EXTENT_FLAG_LOGGING),
> -       /* Filling in a preallocated extent */
> -       ENUM_BIT(EXTENT_FLAG_FILLING),
>         /* This em is merged from two or more physically adjacent ems */
>         ENUM_BIT(EXTENT_FLAG_MERGED),
>  };
> @@ -46,8 +44,6 @@ struct extent_map {
>         /* all of these are in bytes */
>         u64 start;
>         u64 len;
> -       u64 mod_start;
> -       u64 mod_len;
>         u64 orig_start;
>         u64 orig_block_len;
>         u64 ram_bytes;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3442dedff53d..c6f2b5d1dee1 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7338,9 +7338,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
>         em->ram_bytes = ram_bytes;
>         em->generation = -1;
>         em->flags |= EXTENT_FLAG_PINNED;
> -       if (type == BTRFS_ORDERED_PREALLOC)
> -               em->flags |= EXTENT_FLAG_FILLING;
> -       else if (type == BTRFS_ORDERED_COMPRESSED)
> +       if (type == BTRFS_ORDERED_COMPRESSED)
>                 extent_map_set_compression(em, compress_type);
>
>         ret = btrfs_replace_extent_map_range(inode, em, true);
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 472918a5bc73..d9777649e170 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -4574,8 +4574,8 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
>         struct btrfs_root *csum_root;
>         u64 csum_offset;
>         u64 csum_len;
> -       u64 mod_start = em->mod_start;
> -       u64 mod_len = em->mod_len;
> +       u64 mod_start = em->start;
> +       u64 mod_len = em->len;
>         LIST_HEAD(ordered_sums);
>         int ret = 0;
>
> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
> index 90b0222390e5..766cfd48386c 100644
> --- a/include/trace/events/btrfs.h
> +++ b/include/trace/events/btrfs.h
> @@ -277,8 +277,7 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict,
>                 { EXTENT_FLAG_COMPRESS_LZO,     "COMPRESS_LZO"  },\
>                 { EXTENT_FLAG_COMPRESS_ZSTD,    "COMPRESS_ZSTD" },\
>                 { EXTENT_FLAG_PREALLOC,         "PREALLOC"      },\
> -               { EXTENT_FLAG_LOGGING,          "LOGGING"       },\
> -               { EXTENT_FLAG_FILLING,          "FILLING"       })
> +               { EXTENT_FLAG_LOGGING,          "LOGGING"       })
>
>  TRACE_EVENT_CONDITION(btrfs_get_extent,
>
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/4] btrfs: remove not needed mod_start and mod_len from struct extent_map
  2024-04-04  9:58   ` Filipe Manana
@ 2024-04-04 10:22     ` Qu Wenruo
  2024-04-04 10:37       ` Filipe Manana
  0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2024-04-04 10:22 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, Filipe Manana, David Sterba



在 2024/4/4 20:28, Filipe Manana 写道:
> On Thu, Apr 4, 2024 at 12:46 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> The mod_start and mod_len fields of struct extent_map were introduced by
>> commit 4e2f84e63dc1 ("Btrfs: improve fsync by filtering extents that we
>> want") in order to avoid too low performance when fsyncing a file that
>> keeps getting extent maps merge, because it resulted in each fsync logging
>> again csum ranges that were already merged before.
>>
>> We don't need this anymore as extent maps in the list of modified extents
>> are never merged with other extent maps and once we log an extent map we
>> remove it from the list of modified extent maps, so it's never logged
>> twice.
>>
>> So remove the mod_start and mod_len fields from struct extent_map and use
>> instead the start and len fields when logging checksums in the fast fsync
>> path. This also makes EXTENT_FLAG_FILLING unused so remove it as well.
>>
>> Running the reproducer from the commit mentioned before, with a larger
>> number of extents and against a null block device, so that IO is fast
>> and we can better see any impact from searching checksums items and
>> logging them, gave the following results from dd:
>>
>> Before this change:
>>
>>     409600000 bytes (410 MB, 391 MiB) copied, 22.948 s, 17.8 MB/s
>>
>> After this change:
>>
>>     409600000 bytes (410 MB, 391 MiB) copied, 22.9997 s, 17.8 MB/s
>>
>> So no changes in throughput.
>> The test was done in a release kernel (non-debug, Debian's default kernel
>> config) and its steps are the following:
>>
>>     $ mkfs.btrfs -f /dev/nullb0
>>     $ mount /dev/sdb /mnt
>>     $ dd if=/dev/zero of=/mnt/foobar bs=4k count=100000 oflag=sync
>>     $ umount /mnt
>>
>> This also reduces the size of struct extent_map from 128 bytes down to 112
>> bytes, so now we can have 36 extents maps per 4K page instead of 32.
>>
>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: David Sterba <dsterba@suse.com>
> 
> Why are you resending this?
> This was already in the for-next branch.
> 
> And given the SOB tag from David, your local for-next branch was up to date.

Sorry, I still didn't really get how the whole btrfs/for-next branch 
should work.

Should all our patches based on the latest for-next? Since every member 
has write permission to the branch, and the branch can change in a daily 
base, rebasing it daily doesn't look solid to me.

And I still do not know what's the requirement to push a patch into 
misc-next.
Like just after fstests runs? Reviewed-by tag? or both or something more?

Thanks,
Qu
> 
>> ---
>>   fs/btrfs/extent_map.c        | 18 ------------------
>>   fs/btrfs/extent_map.h        |  4 ----
>>   fs/btrfs/inode.c             |  4 +---
>>   fs/btrfs/tree-log.c          |  4 ++--
>>   include/trace/events/btrfs.h |  3 +--
>>   5 files changed, 4 insertions(+), 29 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
>> index 445f7716f1e2..471654cb65b0 100644
>> --- a/fs/btrfs/extent_map.c
>> +++ b/fs/btrfs/extent_map.c
>> @@ -252,8 +252,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
>>                          em->len += merge->len;
>>                          em->block_len += merge->block_len;
>>                          em->block_start = merge->block_start;
>> -                       em->mod_len = (em->mod_len + em->mod_start) - merge->mod_start;
>> -                       em->mod_start = merge->mod_start;
>>                          em->generation = max(em->generation, merge->generation);
>>                          em->flags |= EXTENT_FLAG_MERGED;
>>
>> @@ -271,7 +269,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
>>                  em->block_len += merge->block_len;
>>                  rb_erase_cached(&merge->rb_node, &tree->map);
>>                  RB_CLEAR_NODE(&merge->rb_node);
>> -               em->mod_len = (merge->mod_start + merge->mod_len) - em->mod_start;
>>                  em->generation = max(em->generation, merge->generation);
>>                  em->flags |= EXTENT_FLAG_MERGED;
>>                  free_extent_map(merge);
>> @@ -300,7 +297,6 @@ int unpin_extent_cache(struct btrfs_inode *inode, u64 start, u64 len, u64 gen)
>>          struct extent_map_tree *tree = &inode->extent_tree;
>>          int ret = 0;
>>          struct extent_map *em;
>> -       bool prealloc = false;
>>
>>          write_lock(&tree->lock);
>>          em = lookup_extent_mapping(tree, start, len);
>> @@ -325,21 +321,9 @@ int unpin_extent_cache(struct btrfs_inode *inode, u64 start, u64 len, u64 gen)
>>
>>          em->generation = gen;
>>          em->flags &= ~EXTENT_FLAG_PINNED;
>> -       em->mod_start = em->start;
>> -       em->mod_len = em->len;
>> -
>> -       if (em->flags & EXTENT_FLAG_FILLING) {
>> -               prealloc = true;
>> -               em->flags &= ~EXTENT_FLAG_FILLING;
>> -       }
>>
>>          try_merge_map(tree, em);
>>
>> -       if (prealloc) {
>> -               em->mod_start = em->start;
>> -               em->mod_len = em->len;
>> -       }
>> -
>>   out:
>>          write_unlock(&tree->lock);
>>          free_extent_map(em);
>> @@ -361,8 +345,6 @@ static inline void setup_extent_mapping(struct extent_map_tree *tree,
>>                                          int modified)
>>   {
>>          refcount_inc(&em->refs);
>> -       em->mod_start = em->start;
>> -       em->mod_len = em->len;
>>
>>          ASSERT(list_empty(&em->list));
>>
>> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
>> index c5a098c99cc6..10e9491865c9 100644
>> --- a/fs/btrfs/extent_map.h
>> +++ b/fs/btrfs/extent_map.h
>> @@ -30,8 +30,6 @@ enum {
>>          ENUM_BIT(EXTENT_FLAG_PREALLOC),
>>          /* Logging this extent */
>>          ENUM_BIT(EXTENT_FLAG_LOGGING),
>> -       /* Filling in a preallocated extent */
>> -       ENUM_BIT(EXTENT_FLAG_FILLING),
>>          /* This em is merged from two or more physically adjacent ems */
>>          ENUM_BIT(EXTENT_FLAG_MERGED),
>>   };
>> @@ -46,8 +44,6 @@ struct extent_map {
>>          /* all of these are in bytes */
>>          u64 start;
>>          u64 len;
>> -       u64 mod_start;
>> -       u64 mod_len;
>>          u64 orig_start;
>>          u64 orig_block_len;
>>          u64 ram_bytes;
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 3442dedff53d..c6f2b5d1dee1 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -7338,9 +7338,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
>>          em->ram_bytes = ram_bytes;
>>          em->generation = -1;
>>          em->flags |= EXTENT_FLAG_PINNED;
>> -       if (type == BTRFS_ORDERED_PREALLOC)
>> -               em->flags |= EXTENT_FLAG_FILLING;
>> -       else if (type == BTRFS_ORDERED_COMPRESSED)
>> +       if (type == BTRFS_ORDERED_COMPRESSED)
>>                  extent_map_set_compression(em, compress_type);
>>
>>          ret = btrfs_replace_extent_map_range(inode, em, true);
>> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
>> index 472918a5bc73..d9777649e170 100644
>> --- a/fs/btrfs/tree-log.c
>> +++ b/fs/btrfs/tree-log.c
>> @@ -4574,8 +4574,8 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
>>          struct btrfs_root *csum_root;
>>          u64 csum_offset;
>>          u64 csum_len;
>> -       u64 mod_start = em->mod_start;
>> -       u64 mod_len = em->mod_len;
>> +       u64 mod_start = em->start;
>> +       u64 mod_len = em->len;
>>          LIST_HEAD(ordered_sums);
>>          int ret = 0;
>>
>> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
>> index 90b0222390e5..766cfd48386c 100644
>> --- a/include/trace/events/btrfs.h
>> +++ b/include/trace/events/btrfs.h
>> @@ -277,8 +277,7 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict,
>>                  { EXTENT_FLAG_COMPRESS_LZO,     "COMPRESS_LZO"  },\
>>                  { EXTENT_FLAG_COMPRESS_ZSTD,    "COMPRESS_ZSTD" },\
>>                  { EXTENT_FLAG_PREALLOC,         "PREALLOC"      },\
>> -               { EXTENT_FLAG_LOGGING,          "LOGGING"       },\
>> -               { EXTENT_FLAG_FILLING,          "FILLING"       })
>> +               { EXTENT_FLAG_LOGGING,          "LOGGING"       })
>>
>>   TRACE_EVENT_CONDITION(btrfs_get_extent,
>>
>> --
>> 2.44.0
>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/4] btrfs: add extra comments on extent_map members
  2024-04-03 23:41 ` [PATCH v2 2/4] btrfs: add extra comments on extent_map members Qu Wenruo
@ 2024-04-04 10:33   ` Filipe Manana
  2024-04-04 21:11     ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: Filipe Manana @ 2024-04-04 10:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Apr 4, 2024 at 12:47 AM Qu Wenruo <wqu@suse.com> wrote:
>
> The extent_map structure is very critical to btrfs, as it is involved
> for both read and write paths.
>
> Unfortunately the structure is not properly explained, making it pretty
> hard to understand nor to do further improvement.
>
> This patch adds extra comments explaining the major members based on
> my code reading.
> Hopefully we can find more members to cleanup in the future.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_map.h | 54 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 53 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
> index 10e9491865c9..82768288c6da 100644
> --- a/fs/btrfs/extent_map.h
> +++ b/fs/btrfs/extent_map.h
> @@ -35,19 +35,71 @@ enum {
>  };
>
>  /*
> + * This structure represents file extents and holes.
> + *
>   * Keep this structure as compact as possible, as we can have really large
>   * amounts of allocated extent maps at any time.
>   */
>  struct extent_map {
>         struct rb_node rb_node;
>
> -       /* all of these are in bytes */
> +       /* All of these are in bytes. */
> +
> +       /* File offset matching the offset of a BTRFS_EXTENT_ITEM_KEY key. */
>         u64 start;
> +
> +       /*
> +        * Length of the file extent.
> +        *
> +        * For non-inlined file extents it's btrfs_file_extent_item::num_bytes.
> +        * For inline extents it's sectorsize, since inline data starts at
> +        * offsetof(struct , disk_bytenr) thus

Missing the structure's name (btrfs_file_extent_item).

> +        * btrfs_file_extent_item::num_bytes is not valid.
> +        */
>         u64 len;
> +
> +       /*
> +        * The file offset of the original file extent before splitting.
> +        *
> +        * This is an in-memory only member, matching
> +        * extent_map::start - btrfs_file_extent_item::offset for
> +        * regular/preallocated extents. EXTENT_MAP_HOLE otherwise.
> +        */
>         u64 orig_start;
> +
> +       /*
> +        * The full on-disk extent length, matching
> +        * btrfs_file_extent_item::disk_num_bytes.
> +        */
>         u64 orig_block_len;
> +
> +       /*
> +        * The decompressed size of the whole on-disk extent, matching
> +        * btrfs_file_extent_item::ram_bytes.
> +        *
> +        * For non-compressed extents, this matches orig_block_len.

It always matches btrfs_file_extent_item::ram_bytes, regardless of compression.

Thanks.

> +        */
>         u64 ram_bytes;
> +
> +       /*
> +        * The on-disk logical bytenr for the file extent.
> +        *
> +        * For compressed extents it matches btrfs_file_extent_item::disk_bytenr.
> +        * For uncompressed extents it matches
> +        * btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset
> +        *
> +        * For holes it is EXTENT_MAP_HOLE and for inline extents it is
> +        * EXTENT_MAP_INLINE.
> +        */
>         u64 block_start;
> +
> +       /*
> +        * The on-disk length for the file extent.
> +        *
> +        * For compressed extents it matches btrfs_file_extent_item::disk_num_bytes.
> +        * For uncompressed extents it matches extent_map::len.
> +        * For holes and inline extents it's -1 and shouldn't be used.
> +        */
>         u64 block_len;
>
>         /*
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/4] btrfs: remove not needed mod_start and mod_len from struct extent_map
  2024-04-04 10:22     ` Qu Wenruo
@ 2024-04-04 10:37       ` Filipe Manana
  0 siblings, 0 replies; 12+ messages in thread
From: Filipe Manana @ 2024-04-04 10:37 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Filipe Manana, David Sterba

On Thu, Apr 4, 2024 at 11:23 AM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2024/4/4 20:28, Filipe Manana 写道:
> > On Thu, Apr 4, 2024 at 12:46 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> From: Filipe Manana <fdmanana@suse.com>
> >>
> >> The mod_start and mod_len fields of struct extent_map were introduced by
> >> commit 4e2f84e63dc1 ("Btrfs: improve fsync by filtering extents that we
> >> want") in order to avoid too low performance when fsyncing a file that
> >> keeps getting extent maps merge, because it resulted in each fsync logging
> >> again csum ranges that were already merged before.
> >>
> >> We don't need this anymore as extent maps in the list of modified extents
> >> are never merged with other extent maps and once we log an extent map we
> >> remove it from the list of modified extent maps, so it's never logged
> >> twice.
> >>
> >> So remove the mod_start and mod_len fields from struct extent_map and use
> >> instead the start and len fields when logging checksums in the fast fsync
> >> path. This also makes EXTENT_FLAG_FILLING unused so remove it as well.
> >>
> >> Running the reproducer from the commit mentioned before, with a larger
> >> number of extents and against a null block device, so that IO is fast
> >> and we can better see any impact from searching checksums items and
> >> logging them, gave the following results from dd:
> >>
> >> Before this change:
> >>
> >>     409600000 bytes (410 MB, 391 MiB) copied, 22.948 s, 17.8 MB/s
> >>
> >> After this change:
> >>
> >>     409600000 bytes (410 MB, 391 MiB) copied, 22.9997 s, 17.8 MB/s
> >>
> >> So no changes in throughput.
> >> The test was done in a release kernel (non-debug, Debian's default kernel
> >> config) and its steps are the following:
> >>
> >>     $ mkfs.btrfs -f /dev/nullb0
> >>     $ mount /dev/sdb /mnt
> >>     $ dd if=/dev/zero of=/mnt/foobar bs=4k count=100000 oflag=sync
> >>     $ umount /mnt
> >>
> >> This also reduces the size of struct extent_map from 128 bytes down to 112
> >> bytes, so now we can have 36 extents maps per 4K page instead of 32.
> >>
> >> Reviewed-by: Qu Wenruo <wqu@suse.com>
> >> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >> Signed-off-by: David Sterba <dsterba@suse.com>
> >
> > Why are you resending this?
> > This was already in the for-next branch.
> >
> > And given the SOB tag from David, your local for-next branch was up to date.
>
> Sorry, I still didn't really get how the whole btrfs/for-next branch
> should work.

You use it like you used misc-next.
In this case it seems you were using it already, given David's SOB tag
and the amended changelog.

>
> Should all our patches based on the latest for-next? Since every member
> has write permission to the branch, and the branch can change in a daily
> base, rebasing it daily doesn't look solid to me.

What's the problem? misc-next could also be updated at any time.
It's the same here, and it's not like for-next is updated that often
anyway - we aren't that many and patches are only merged after review.

>
> And I still do not know what's the requirement to push a patch into
> misc-next.

This has been mentioned on slack and there's even some document in
workflow repo IIRC.
A patch needs one review at least (2+ if it's a new feature).

> Like just after fstests runs? Reviewed-by tag? or both or something more?

Ideally after running fstests, but I don't think everyone can afford
that and we also have the github CI tests that run after updating
for-next.

>
> Thanks,
> Qu
> >
> >> ---
> >>   fs/btrfs/extent_map.c        | 18 ------------------
> >>   fs/btrfs/extent_map.h        |  4 ----
> >>   fs/btrfs/inode.c             |  4 +---
> >>   fs/btrfs/tree-log.c          |  4 ++--
> >>   include/trace/events/btrfs.h |  3 +--
> >>   5 files changed, 4 insertions(+), 29 deletions(-)
> >>
> >> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> >> index 445f7716f1e2..471654cb65b0 100644
> >> --- a/fs/btrfs/extent_map.c
> >> +++ b/fs/btrfs/extent_map.c
> >> @@ -252,8 +252,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
> >>                          em->len += merge->len;
> >>                          em->block_len += merge->block_len;
> >>                          em->block_start = merge->block_start;
> >> -                       em->mod_len = (em->mod_len + em->mod_start) - merge->mod_start;
> >> -                       em->mod_start = merge->mod_start;
> >>                          em->generation = max(em->generation, merge->generation);
> >>                          em->flags |= EXTENT_FLAG_MERGED;
> >>
> >> @@ -271,7 +269,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
> >>                  em->block_len += merge->block_len;
> >>                  rb_erase_cached(&merge->rb_node, &tree->map);
> >>                  RB_CLEAR_NODE(&merge->rb_node);
> >> -               em->mod_len = (merge->mod_start + merge->mod_len) - em->mod_start;
> >>                  em->generation = max(em->generation, merge->generation);
> >>                  em->flags |= EXTENT_FLAG_MERGED;
> >>                  free_extent_map(merge);
> >> @@ -300,7 +297,6 @@ int unpin_extent_cache(struct btrfs_inode *inode, u64 start, u64 len, u64 gen)
> >>          struct extent_map_tree *tree = &inode->extent_tree;
> >>          int ret = 0;
> >>          struct extent_map *em;
> >> -       bool prealloc = false;
> >>
> >>          write_lock(&tree->lock);
> >>          em = lookup_extent_mapping(tree, start, len);
> >> @@ -325,21 +321,9 @@ int unpin_extent_cache(struct btrfs_inode *inode, u64 start, u64 len, u64 gen)
> >>
> >>          em->generation = gen;
> >>          em->flags &= ~EXTENT_FLAG_PINNED;
> >> -       em->mod_start = em->start;
> >> -       em->mod_len = em->len;
> >> -
> >> -       if (em->flags & EXTENT_FLAG_FILLING) {
> >> -               prealloc = true;
> >> -               em->flags &= ~EXTENT_FLAG_FILLING;
> >> -       }
> >>
> >>          try_merge_map(tree, em);
> >>
> >> -       if (prealloc) {
> >> -               em->mod_start = em->start;
> >> -               em->mod_len = em->len;
> >> -       }
> >> -
> >>   out:
> >>          write_unlock(&tree->lock);
> >>          free_extent_map(em);
> >> @@ -361,8 +345,6 @@ static inline void setup_extent_mapping(struct extent_map_tree *tree,
> >>                                          int modified)
> >>   {
> >>          refcount_inc(&em->refs);
> >> -       em->mod_start = em->start;
> >> -       em->mod_len = em->len;
> >>
> >>          ASSERT(list_empty(&em->list));
> >>
> >> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
> >> index c5a098c99cc6..10e9491865c9 100644
> >> --- a/fs/btrfs/extent_map.h
> >> +++ b/fs/btrfs/extent_map.h
> >> @@ -30,8 +30,6 @@ enum {
> >>          ENUM_BIT(EXTENT_FLAG_PREALLOC),
> >>          /* Logging this extent */
> >>          ENUM_BIT(EXTENT_FLAG_LOGGING),
> >> -       /* Filling in a preallocated extent */
> >> -       ENUM_BIT(EXTENT_FLAG_FILLING),
> >>          /* This em is merged from two or more physically adjacent ems */
> >>          ENUM_BIT(EXTENT_FLAG_MERGED),
> >>   };
> >> @@ -46,8 +44,6 @@ struct extent_map {
> >>          /* all of these are in bytes */
> >>          u64 start;
> >>          u64 len;
> >> -       u64 mod_start;
> >> -       u64 mod_len;
> >>          u64 orig_start;
> >>          u64 orig_block_len;
> >>          u64 ram_bytes;
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index 3442dedff53d..c6f2b5d1dee1 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -7338,9 +7338,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
> >>          em->ram_bytes = ram_bytes;
> >>          em->generation = -1;
> >>          em->flags |= EXTENT_FLAG_PINNED;
> >> -       if (type == BTRFS_ORDERED_PREALLOC)
> >> -               em->flags |= EXTENT_FLAG_FILLING;
> >> -       else if (type == BTRFS_ORDERED_COMPRESSED)
> >> +       if (type == BTRFS_ORDERED_COMPRESSED)
> >>                  extent_map_set_compression(em, compress_type);
> >>
> >>          ret = btrfs_replace_extent_map_range(inode, em, true);
> >> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> >> index 472918a5bc73..d9777649e170 100644
> >> --- a/fs/btrfs/tree-log.c
> >> +++ b/fs/btrfs/tree-log.c
> >> @@ -4574,8 +4574,8 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
> >>          struct btrfs_root *csum_root;
> >>          u64 csum_offset;
> >>          u64 csum_len;
> >> -       u64 mod_start = em->mod_start;
> >> -       u64 mod_len = em->mod_len;
> >> +       u64 mod_start = em->start;
> >> +       u64 mod_len = em->len;
> >>          LIST_HEAD(ordered_sums);
> >>          int ret = 0;
> >>
> >> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
> >> index 90b0222390e5..766cfd48386c 100644
> >> --- a/include/trace/events/btrfs.h
> >> +++ b/include/trace/events/btrfs.h
> >> @@ -277,8 +277,7 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict,
> >>                  { EXTENT_FLAG_COMPRESS_LZO,     "COMPRESS_LZO"  },\
> >>                  { EXTENT_FLAG_COMPRESS_ZSTD,    "COMPRESS_ZSTD" },\
> >>                  { EXTENT_FLAG_PREALLOC,         "PREALLOC"      },\
> >> -               { EXTENT_FLAG_LOGGING,          "LOGGING"       },\
> >> -               { EXTENT_FLAG_FILLING,          "FILLING"       })
> >> +               { EXTENT_FLAG_LOGGING,          "LOGGING"       })
> >>
> >>   TRACE_EVENT_CONDITION(btrfs_get_extent,
> >>
> >> --
> >> 2.44.0
> >>
> >>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/4] btrfs: simplify the inline extent map creation
  2024-04-03 23:42 ` [PATCH v2 3/4] btrfs: simplify the inline extent map creation Qu Wenruo
@ 2024-04-04 10:40   ` Filipe Manana
  0 siblings, 0 replies; 12+ messages in thread
From: Filipe Manana @ 2024-04-04 10:40 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Apr 4, 2024 at 12:55 AM Qu Wenruo <wqu@suse.com> wrote:
>
> With the tree-checker ensuring all inline file extents starts at file
> offset 0 and has a length no larger than sectorsize, we can simplify the
> calculation to assigned those fixes values directly.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/file-item.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index e58fb5347e65..ad4761192d59 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -1265,20 +1265,19 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>         struct extent_buffer *leaf = path->nodes[0];
>         const int slot = path->slots[0];
>         struct btrfs_key key;
> -       u64 extent_start, extent_end;
> +       u64 extent_start;
>         u64 bytenr;
>         u8 type = btrfs_file_extent_type(leaf, fi);
>         int compress_type = btrfs_file_extent_compression(leaf, fi);
>
>         btrfs_item_key_to_cpu(leaf, &key, slot);
>         extent_start = key.offset;
> -       extent_end = btrfs_file_extent_end(path);
>         em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
>         em->generation = btrfs_file_extent_generation(leaf, fi);
>         if (type == BTRFS_FILE_EXTENT_REG ||
>             type == BTRFS_FILE_EXTENT_PREALLOC) {
>                 em->start = extent_start;
> -               em->len = extent_end - extent_start;
> +               em->len = btrfs_file_extent_end(path) - extent_start;
>                 em->orig_start = extent_start -
>                         btrfs_file_extent_offset(leaf, fi);
>                 em->orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi);
> @@ -1299,9 +1298,12 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>                                 em->flags |= EXTENT_FLAG_PREALLOC;
>                 }
>         } else if (type == BTRFS_FILE_EXTENT_INLINE) {
> +               /* Tree-checker has ensured this. */
> +               ASSERT(extent_start == 0);
> +
>                 em->block_start = EXTENT_MAP_INLINE;
> -               em->start = extent_start;
> -               em->len = extent_end - extent_start;
> +               em->start = 0;
> +               em->len = fs_info->sectorsize;
>                 /*
>                  * Initialize orig_start and block_len with the same values
>                  * as in inode.c:btrfs_get_extent().
> @@ -1335,8 +1337,7 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path)
>         fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
>
>         if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) {
> -               end = btrfs_file_extent_ram_bytes(leaf, fi);
> -               end = ALIGN(key.offset + end, leaf->fs_info->sectorsize);
> +               end = leaf->fs_info->sectorsize;

After this, to follow the code style (and checkpatch.pl should
complain about it IIRC), the curly braces should go away, as there's a
single statement in the if and else branches.

Otherwise it looks good, thanks.


>         } else {
>                 end = key.offset + btrfs_file_extent_num_bytes(leaf, fi);
>         }
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 4/4] btrfs: add extra sanity checks for create_io_em()
  2024-04-03 23:42 ` [PATCH v2 4/4] btrfs: add extra sanity checks for create_io_em() Qu Wenruo
@ 2024-04-04 10:55   ` Filipe Manana
  0 siblings, 0 replies; 12+ messages in thread
From: Filipe Manana @ 2024-04-04 10:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Apr 4, 2024 at 12:47 AM Qu Wenruo <wqu@suse.com> wrote:
>
> The function create_io_em() is called before we submit an IO, to update
> the in-memory extent map for the involved range.
>
> This patch changes the following aspects:
>
> - Does not allow BTRFS_ORDERED_NOCOW type
>   For real NOCOW (excluding NOCOW writes into preallocated ranges)
>   writes, we never call create_io_em(), as we does not need to update
>   the extent map at all.
>
>   So remove the sanity check allowing BTRFS_ORDERED_NOCOW type.
>
> - Add extra sanity checks
>   * PREALLOC
>     - @block_len == len
>       For uncompressed writes.
>
>   * REGULAR
>     - @block_len == @orig_block_len == @ram_bytes == @len
>       We're creating a new uncompressed extent, and referring all of it.
>
>     - @orig_start == @start
>       We haven no offset inside the extent.
>
>   * COMPRESSED
>     - valid @compress_type
>     - @len <= @ram_bytes
>       This is to co-operate with encoded writes, which can cause a new
>       file extent referring only part of a uncompressed extent.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/inode.c | 40 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index c6f2b5d1dee1..261fafc66533 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7320,11 +7320,49 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
>         struct extent_map *em;
>         int ret;
>
> +       /*
> +        * Note the missing of NOCOW type.
> +        *
> +        * For pure NOCOW writes, we should not create an io extent map,
> +        * but just reusing the existing one.
> +        * Only PREALLOC writes (NOCOW write into preallocated range) can
> +        * create io extent map.
> +        */
>         ASSERT(type == BTRFS_ORDERED_PREALLOC ||
>                type == BTRFS_ORDERED_COMPRESSED ||
> -              type == BTRFS_ORDERED_NOCOW ||
>                type == BTRFS_ORDERED_REGULAR);
>
> +       if (type == BTRFS_ORDERED_PREALLOC) {
> +               /* Uncompressed extents. */
> +               ASSERT(block_len == len);
> +
> +               /* We're only referring part of a larger preallocated extent. */
> +               ASSERT(block_len <= ram_bytes);
> +       }
> +
> +       if (type == BTRFS_ORDERED_REGULAR) {
> +               /* Uncompressed extents. */
> +               ASSERT(block_len == len);
> +
> +               /* COW results a new extent matching our file extent size. */
> +               ASSERT(orig_block_len == len);
> +               ASSERT(ram_bytes == len);
> +
> +               /* Since it's a new extent, we should not have any offset. */
> +               ASSERT(orig_start == start);
> +       }
> +
> +       if (type == BTRFS_ORDERED_COMPRESSED) {
> +               /* Must be compressed. */
> +               ASSERT(compress_type != BTRFS_COMPRESS_NONE);
> +
> +               /*
> +                * Encoded write can make us to refer part of the

refer part -> refer to part

> +                * uncompressed extent.
> +                */
> +               ASSERT(len <= ram_bytes);
> +       }

All these if statements to test the type should be combined into if -
else if - else if ... If we match a type, it's pointless to keep
comparing the type again.

Or use a switch statement.

Otherwise it looks good, thanks.

> +
>         em = alloc_extent_map();
>         if (!em)
>                 return ERR_PTR(-ENOMEM);
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/4] btrfs: add extra comments on extent_map members
  2024-04-04 10:33   ` Filipe Manana
@ 2024-04-04 21:11     ` Qu Wenruo
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2024-04-04 21:11 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



在 2024/4/4 21:03, Filipe Manana 写道:
> On Thu, Apr 4, 2024 at 12:47 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> The extent_map structure is very critical to btrfs, as it is involved
>> for both read and write paths.
>>
>> Unfortunately the structure is not properly explained, making it pretty
>> hard to understand nor to do further improvement.
>>
>> This patch adds extra comments explaining the major members based on
>> my code reading.
>> Hopefully we can find more members to cleanup in the future.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_map.h | 54 ++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 53 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
>> index 10e9491865c9..82768288c6da 100644
>> --- a/fs/btrfs/extent_map.h
>> +++ b/fs/btrfs/extent_map.h
>> @@ -35,19 +35,71 @@ enum {
>>   };
>>
>>   /*
>> + * This structure represents file extents and holes.
>> + *
>>    * Keep this structure as compact as possible, as we can have really large
>>    * amounts of allocated extent maps at any time.
>>    */
>>   struct extent_map {
>>          struct rb_node rb_node;
>>
>> -       /* all of these are in bytes */
>> +       /* All of these are in bytes. */
>> +
>> +       /* File offset matching the offset of a BTRFS_EXTENT_ITEM_KEY key. */
>>          u64 start;
>> +
>> +       /*
>> +        * Length of the file extent.
>> +        *
>> +        * For non-inlined file extents it's btrfs_file_extent_item::num_bytes.
>> +        * For inline extents it's sectorsize, since inline data starts at
>> +        * offsetof(struct , disk_bytenr) thus
>
> Missing the structure's name (btrfs_file_extent_item).
>
>> +        * btrfs_file_extent_item::num_bytes is not valid.
>> +        */
>>          u64 len;
>> +
>> +       /*
>> +        * The file offset of the original file extent before splitting.
>> +        *
>> +        * This is an in-memory only member, matching
>> +        * extent_map::start - btrfs_file_extent_item::offset for
>> +        * regular/preallocated extents. EXTENT_MAP_HOLE otherwise.
>> +        */
>>          u64 orig_start;
>> +
>> +       /*
>> +        * The full on-disk extent length, matching
>> +        * btrfs_file_extent_item::disk_num_bytes.
>> +        */
>>          u64 orig_block_len;
>> +
>> +       /*
>> +        * The decompressed size of the whole on-disk extent, matching
>> +        * btrfs_file_extent_item::ram_bytes.
>> +        *
>> +        * For non-compressed extents, this matches orig_block_len.
>
> It always matches btrfs_file_extent_item::ram_bytes, regardless of compression.

It always matches btrfs_file_extent_item::ram_bytes, but for
non-compressed extents it also matches em::orig_block_len.

Or should I just remove it as it doesn't provide extra info?

Thanks,
Qu
>
> Thanks.
>
>> +        */
>>          u64 ram_bytes;
>> +
>> +       /*
>> +        * The on-disk logical bytenr for the file extent.
>> +        *
>> +        * For compressed extents it matches btrfs_file_extent_item::disk_bytenr.
>> +        * For uncompressed extents it matches
>> +        * btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset
>> +        *
>> +        * For holes it is EXTENT_MAP_HOLE and for inline extents it is
>> +        * EXTENT_MAP_INLINE.
>> +        */
>>          u64 block_start;
>> +
>> +       /*
>> +        * The on-disk length for the file extent.
>> +        *
>> +        * For compressed extents it matches btrfs_file_extent_item::disk_num_bytes.
>> +        * For uncompressed extents it matches extent_map::len.
>> +        * For holes and inline extents it's -1 and shouldn't be used.
>> +        */
>>          u64 block_len;
>>
>>          /*
>> --
>> 2.44.0
>>
>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-04-04 21:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-03 23:41 [PATCH v2 0/4] btrfs: more explaination on extent_map members Qu Wenruo
2024-04-03 23:41 ` [PATCH v2 1/4] btrfs: remove not needed mod_start and mod_len from struct extent_map Qu Wenruo
2024-04-04  9:58   ` Filipe Manana
2024-04-04 10:22     ` Qu Wenruo
2024-04-04 10:37       ` Filipe Manana
2024-04-03 23:41 ` [PATCH v2 2/4] btrfs: add extra comments on extent_map members Qu Wenruo
2024-04-04 10:33   ` Filipe Manana
2024-04-04 21:11     ` Qu Wenruo
2024-04-03 23:42 ` [PATCH v2 3/4] btrfs: simplify the inline extent map creation Qu Wenruo
2024-04-04 10:40   ` Filipe Manana
2024-04-03 23:42 ` [PATCH v2 4/4] btrfs: add extra sanity checks for create_io_em() Qu Wenruo
2024-04-04 10:55   ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox