Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start
@ 2024-04-08 22:33 Qu Wenruo
  2024-04-08 22:33 ` [PATCH RFC 1/8] btrfs: rename extent_map::orig_block_len to disk_num_bytes Qu Wenruo
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Qu Wenruo @ 2024-04-08 22:33 UTC (permalink / raw)
  To: linux-btrfs

[REASON FOR RFC]
Not all sanity checks are implemented, there is a missing check for
ram_bytes on non-compressed extent.
Because even without this series, generic/311 can generate a file extent
with ram_bytes larger than disk_num_bytes.

This seems harmless, but I still want to fix it and implement a full
version of the em sanity check.

[REPO]
https://github.com/adam900710/linux/tree/em_cleanup

Which relies on previous changes on extent maps.

This series introduce two new members (disk_bytenr/offset) to
extent_map, and removes three old members
(block_start/block_len/offset), finally rename one member
(orig_block_len -> disk_num_bytes).

This should save us one u64 for extent_map.

But to make things safe to migrate, I introduce extra sanity checks for
extent_map, and do cross check for both old and new members.

The extra sanity checks already exposed one bug (thankfully harmless)
causing em::block_start to be incorrect.

There is another bug related to bad btrfs_file_extent_item::ram_bytes,
which can be larger than disk_num_bytes for non-compressed file extents.
(Generated by generic/311 test case, but it seems to be created on-disk
 first)

But so far, the patchset is fine for default fstests run.

The patchset would do two renames as preparation.
Then introduce the new member, the extra sanity checks.
Finally do the migration by remove the old member one-by-one, to make
sure everything is fine.

Qu Wenruo (8):
  btrfs: rename extent_map::orig_block_len to disk_num_bytes
  btrfs: rename members of can_nocow_file_extent_args
  btrfs: introduce new members for extent_map
  btrfs: introduce extra sanity checks for extent maps
  btrfs: remove extent_map::orig_start member
  btrfs: remove extent_map::block_len member
  btrfs: remove extent_map::block_start member
  btrfs: reorder disk_bytenr/disk_num_bytes/ram_bytes/offset parameters

 fs/btrfs/btrfs_inode.h            |   5 +-
 fs/btrfs/compression.c            |   7 +-
 fs/btrfs/defrag.c                 |  14 +-
 fs/btrfs/extent_io.c              |  10 +-
 fs/btrfs/extent_map.c             | 187 ++++++++++++++++--------
 fs/btrfs/extent_map.h             |  51 +++----
 fs/btrfs/file-item.c              |  23 +--
 fs/btrfs/file.c                   |  18 +--
 fs/btrfs/inode.c                  | 234 ++++++++++++++++--------------
 fs/btrfs/relocation.c             |   5 +-
 fs/btrfs/tests/extent-map-tests.c | 114 ++++++++-------
 fs/btrfs/tests/inode-tests.c      | 177 +++++++++++-----------
 fs/btrfs/tree-log.c               |  27 ++--
 fs/btrfs/zoned.c                  |   4 +-
 include/trace/events/btrfs.h      |  20 +--
 15 files changed, 487 insertions(+), 409 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH RFC 1/8] btrfs: rename extent_map::orig_block_len to disk_num_bytes
  2024-04-08 22:33 [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start Qu Wenruo
@ 2024-04-08 22:33 ` Qu Wenruo
  2024-04-09 14:58   ` David Sterba
  2024-04-08 22:33 ` [PATCH RFC 2/8] btrfs: rename members of can_nocow_file_extent_args Qu Wenruo
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2024-04-08 22:33 UTC (permalink / raw)
  To: linux-btrfs

This would make it very obvious that the member just matches
btrfs_file_extent_item::disk_num_bytes.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_map.c | 16 ++++++++--------
 fs/btrfs/extent_map.h |  2 +-
 fs/btrfs/file-item.c  |  4 ++--
 fs/btrfs/file.c       |  2 +-
 fs/btrfs/inode.c      | 10 +++++-----
 fs/btrfs/tree-log.c   |  6 +++---
 6 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 955ce300e5a1..dd51a21b6a76 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -759,14 +759,14 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 					split->block_len = em->block_len;
 				else
 					split->block_len = split->len;
-				split->orig_block_len = max(split->block_len,
-						em->orig_block_len);
+				split->disk_num_bytes = max(split->block_len,
+							    em->disk_num_bytes);
 				split->ram_bytes = em->ram_bytes;
 			} else {
 				split->orig_start = split->start;
 				split->block_len = 0;
 				split->block_start = em->block_start;
-				split->orig_block_len = 0;
+				split->disk_num_bytes = 0;
 				split->ram_bytes = split->len;
 			}
 
@@ -791,8 +791,8 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 			split->generation = gen;
 
 			if (em->block_start < EXTENT_MAP_LAST_BYTE) {
-				split->orig_block_len = max(em->block_len,
-						    em->orig_block_len);
+				split->disk_num_bytes = max(em->block_len,
+							    em->disk_num_bytes);
 
 				split->ram_bytes = em->ram_bytes;
 				if (compressed) {
@@ -809,7 +809,7 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 				split->ram_bytes = split->len;
 				split->orig_start = split->start;
 				split->block_len = 0;
-				split->orig_block_len = 0;
+				split->disk_num_bytes = 0;
 			}
 
 			if (extent_map_in_tree(em)) {
@@ -968,7 +968,7 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 	split_pre->orig_start = split_pre->start;
 	split_pre->block_start = new_logical;
 	split_pre->block_len = split_pre->len;
-	split_pre->orig_block_len = split_pre->block_len;
+	split_pre->disk_num_bytes = split_pre->block_len;
 	split_pre->ram_bytes = split_pre->len;
 	split_pre->flags = flags;
 	split_pre->generation = em->generation;
@@ -986,7 +986,7 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 	split_mid->orig_start = split_mid->start;
 	split_mid->block_start = em->block_start + pre;
 	split_mid->block_len = split_mid->len;
-	split_mid->orig_block_len = split_mid->block_len;
+	split_mid->disk_num_bytes = split_mid->block_len;
 	split_mid->ram_bytes = split_mid->len;
 	split_mid->flags = flags;
 	split_mid->generation = em->generation;
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 0b938e12cc78..242a0c2e7a5e 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -71,7 +71,7 @@ struct extent_map {
 	 * The full on-disk extent length, matching
 	 * btrfs_file_extent_item::disk_num_bytes.
 	 */
-	u64 orig_block_len;
+	u64 disk_num_bytes;
 
 	/*
 	 * The decompressed size of the whole on-disk extent, matching
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 844439f19949..b552646a0ce6 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -1280,7 +1280,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 		em->len = btrfs_file_extent_end(path) - extent_start;
 		em->orig_start = extent_start -
 			btrfs_file_extent_offset(leaf, fi);
-		em->orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi);
+		em->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
 		bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 		if (bytenr == 0) {
 			em->block_start = EXTENT_MAP_HOLE;
@@ -1289,7 +1289,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 		if (compress_type != BTRFS_COMPRESS_NONE) {
 			extent_map_set_compression(em, compress_type);
 			em->block_start = bytenr;
-			em->block_len = em->orig_block_len;
+			em->block_len = em->disk_num_bytes;
 		} else {
 			bytenr += btrfs_file_extent_offset(leaf, fi);
 			em->block_start = bytenr;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0c23053951be..cdcd7e0785c1 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2162,7 +2162,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 
 		hole_em->block_start = EXTENT_MAP_HOLE;
 		hole_em->block_len = 0;
-		hole_em->orig_block_len = 0;
+		hole_em->disk_num_bytes = 0;
 		hole_em->generation = trans->transid;
 
 		ret = btrfs_replace_extent_map_range(inode, hole_em, true);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ced916f42bab..2e0156943c7c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4991,7 +4991,7 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
 
 			hole_em->block_start = EXTENT_MAP_HOLE;
 			hole_em->block_len = 0;
-			hole_em->orig_block_len = 0;
+			hole_em->disk_num_bytes = 0;
 			hole_em->ram_bytes = hole_size;
 			hole_em->generation = btrfs_get_fs_generation(fs_info);
 
@@ -7313,7 +7313,7 @@ static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
 /* The callers of this must take lock_extent() */
 static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 				       u64 len, u64 orig_start, u64 block_start,
-				       u64 block_len, u64 orig_block_len,
+				       u64 block_len, u64 disk_num_bytes,
 				       u64 ram_bytes, int compress_type,
 				       int type)
 {
@@ -7345,7 +7345,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 		ASSERT(block_len == len);
 
 		/* COW results a new extent matching our file extent size. */
-		ASSERT(orig_block_len == len);
+		ASSERT(disk_num_bytes == len);
 		ASSERT(ram_bytes == len);
 
 		/* Since it's a new extent, we should not have any offset. */
@@ -7372,7 +7372,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 	em->len = len;
 	em->block_len = block_len;
 	em->block_start = block_start;
-	em->orig_block_len = orig_block_len;
+	em->disk_num_bytes = disk_num_bytes;
 	em->ram_bytes = ram_bytes;
 	em->generation = -1;
 	em->flags |= EXTENT_FLAG_PINNED;
@@ -9776,7 +9776,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		em->len = ins.offset;
 		em->block_start = ins.objectid;
 		em->block_len = ins.offset;
-		em->orig_block_len = ins.offset;
+		em->disk_num_bytes = ins.offset;
 		em->ram_bytes = ins.offset;
 		em->flags |= EXTENT_FLAG_PREALLOC;
 		em->generation = trans->transid;
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index d9777649e170..2a13ca1eb7c5 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2872,7 +2872,7 @@ static inline void btrfs_remove_log_ctx(struct btrfs_root *root,
 	mutex_unlock(&root->log_mutex);
 }
 
-/* 
+/*
  * Invoked in log mutex context, or be sure there is no other task which
  * can access the list.
  */
@@ -4645,7 +4645,7 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
 	/* If we're compressed we have to save the entire range of csums. */
 	if (extent_map_is_compressed(em)) {
 		csum_offset = 0;
-		csum_len = max(em->block_len, em->orig_block_len);
+		csum_len = max(em->block_len, em->disk_num_bytes);
 	} else {
 		csum_offset = mod_start - em->start;
 		csum_len = mod_len;
@@ -4694,7 +4694,7 @@ static int log_one_extent(struct btrfs_trans_handle *trans,
 	else
 		btrfs_set_stack_file_extent_type(&fi, BTRFS_FILE_EXTENT_REG);
 
-	block_len = max(em->block_len, em->orig_block_len);
+	block_len = max(em->block_len, em->disk_num_bytes);
 	compress_type = extent_map_compression(em);
 	if (compress_type != BTRFS_COMPRESS_NONE) {
 		btrfs_set_stack_file_extent_disk_bytenr(&fi, em->block_start);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 2/8] btrfs: rename members of can_nocow_file_extent_args
  2024-04-08 22:33 [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start Qu Wenruo
  2024-04-08 22:33 ` [PATCH RFC 1/8] btrfs: rename extent_map::orig_block_len to disk_num_bytes Qu Wenruo
@ 2024-04-08 22:33 ` Qu Wenruo
  2024-04-11 14:46   ` Filipe Manana
  2024-04-08 22:33 ` [PATCH RFC 3/8] btrfs: introduce new members for extent_map Qu Wenruo
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2024-04-08 22:33 UTC (permalink / raw)
  To: linux-btrfs

The structure can_nocow_file_extent_args is utilized to provide the
needed info for a NOCOW writes.

However some of its members are pretty confusing.
For example, @disk_bytenr is not btrfs_file_extent_item::disk_bytenr,
but with extra offset, thus it works more like extent_map::block_start.

This patch would:

- Rename members directly fetched from btrfs_file_extent_item
  The new name would have "orig_" prefix, with the same member name from
  btrfs_file_extent_item.

- For the old @disk_bytenr, rename it to @block_start
  As it's directly passed into create_io_em() as @block_start.

- Add extra comments explaining those members

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 51 ++++++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2e0156943c7c..4d207c3b38d9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1847,11 +1847,20 @@ struct can_nocow_file_extent_args {
 	 */
 	bool free_path;
 
-	/* Output fields. Only set when can_nocow_file_extent() returns 1. */
+	/*
+	 * Output fields. Only set when can_nocow_file_extent() returns 1.
+	 *
+	 * @block_start:	The bytenr of the new nocow write should be at.
+	 * @orig_disk_bytenr:	The original data extent's disk_bytenr.
+	 * @orig_disk_num_bytes:The original data extent's disk_num_bytes.
+	 * @orig_offset:	The original offset inside the old data extent.
+	 *			Caller should calculate their own
+	 *			btrfs_file_extent_item::offset base on this.
+	 */
 
-	u64 disk_bytenr;
-	u64 disk_num_bytes;
-	u64 extent_offset;
+	u64 block_start;
+	u64 orig_disk_num_bytes;
+	u64 orig_offset;
 	/* Number of bytes that can be written to in NOCOW mode. */
 	u64 num_bytes;
 };
@@ -1887,9 +1896,9 @@ static int can_nocow_file_extent(struct btrfs_path *path,
 		goto out;
 
 	/* Can't access these fields unless we know it's not an inline extent. */
-	args->disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
-	args->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
-	args->extent_offset = btrfs_file_extent_offset(leaf, fi);
+	args->block_start = btrfs_file_extent_disk_bytenr(leaf, fi);
+	args->orig_disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
+	args->orig_offset = btrfs_file_extent_offset(leaf, fi);
 
 	if (!(inode->flags & BTRFS_INODE_NODATACOW) &&
 	    extent_type == BTRFS_FILE_EXTENT_REG)
@@ -1906,7 +1915,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
 		goto out;
 
 	/* An explicit hole, must COW. */
-	if (args->disk_bytenr == 0)
+	if (args->block_start == 0)
 		goto out;
 
 	/* Compressed/encrypted/encoded extents must be COWed. */
@@ -1925,8 +1934,8 @@ static int can_nocow_file_extent(struct btrfs_path *path,
 	btrfs_release_path(path);
 
 	ret = btrfs_cross_ref_exist(root, btrfs_ino(inode),
-				    key->offset - args->extent_offset,
-				    args->disk_bytenr, args->strict, path);
+				    key->offset - args->orig_offset,
+				    args->block_start, args->strict, path);
 	WARN_ON_ONCE(ret > 0 && is_freespace_inode);
 	if (ret != 0)
 		goto out;
@@ -1947,15 +1956,15 @@ static int can_nocow_file_extent(struct btrfs_path *path,
 	    atomic_read(&root->snapshot_force_cow))
 		goto out;
 
-	args->disk_bytenr += args->extent_offset;
-	args->disk_bytenr += args->start - key->offset;
+	args->block_start += args->orig_offset;
+	args->block_start += args->start - key->offset;
 	args->num_bytes = min(args->end + 1, extent_end) - args->start;
 
 	/*
 	 * Force COW if csums exist in the range. This ensures that csums for a
 	 * given extent are either valid or do not exist.
 	 */
-	ret = csum_exist_in_range(root->fs_info, args->disk_bytenr, args->num_bytes,
+	ret = csum_exist_in_range(root->fs_info, args->block_start, args->num_bytes,
 				  nowait);
 	WARN_ON_ONCE(ret > 0 && is_freespace_inode);
 	if (ret != 0)
@@ -2112,7 +2121,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 			goto must_cow;
 
 		ret = 0;
-		nocow_bg = btrfs_inc_nocow_writers(fs_info, nocow_args.disk_bytenr);
+		nocow_bg = btrfs_inc_nocow_writers(fs_info, nocow_args.block_start);
 		if (!nocow_bg) {
 must_cow:
 			/*
@@ -2151,14 +2160,14 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 		nocow_end = cur_offset + nocow_args.num_bytes - 1;
 		is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
 		if (is_prealloc) {
-			u64 orig_start = found_key.offset - nocow_args.extent_offset;
+			u64 orig_start = found_key.offset - nocow_args.orig_offset;
 			struct extent_map *em;
 
 			em = create_io_em(inode, cur_offset, nocow_args.num_bytes,
 					  orig_start,
-					  nocow_args.disk_bytenr, /* block_start */
+					  nocow_args.block_start, /* block_start */
 					  nocow_args.num_bytes, /* block_len */
-					  nocow_args.disk_num_bytes, /* orig_block_len */
+					  nocow_args.orig_disk_num_bytes, /* orig_block_len */
 					  ram_bytes, BTRFS_COMPRESS_NONE,
 					  BTRFS_ORDERED_PREALLOC);
 			if (IS_ERR(em)) {
@@ -2171,7 +2180,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 
 		ordered = btrfs_alloc_ordered_extent(inode, cur_offset,
 				nocow_args.num_bytes, nocow_args.num_bytes,
-				nocow_args.disk_bytenr, nocow_args.num_bytes, 0,
+				nocow_args.block_start, nocow_args.num_bytes, 0,
 				is_prealloc
 				? (1 << BTRFS_ORDERED_PREALLOC)
 				: (1 << BTRFS_ORDERED_NOCOW),
@@ -7189,7 +7198,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 	}
 
 	ret = 0;
-	if (btrfs_extent_readonly(fs_info, nocow_args.disk_bytenr))
+	if (btrfs_extent_readonly(fs_info, nocow_args.block_start))
 		goto out;
 
 	if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW) &&
@@ -7206,9 +7215,9 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 	}
 
 	if (orig_start)
-		*orig_start = key.offset - nocow_args.extent_offset;
+		*orig_start = key.offset - nocow_args.orig_offset;
 	if (orig_block_len)
-		*orig_block_len = nocow_args.disk_num_bytes;
+		*orig_block_len = nocow_args.orig_disk_num_bytes;
 
 	*len = nocow_args.num_bytes;
 	ret = 1;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 3/8] btrfs: introduce new members for extent_map
  2024-04-08 22:33 [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start Qu Wenruo
  2024-04-08 22:33 ` [PATCH RFC 1/8] btrfs: rename extent_map::orig_block_len to disk_num_bytes Qu Wenruo
  2024-04-08 22:33 ` [PATCH RFC 2/8] btrfs: rename members of can_nocow_file_extent_args Qu Wenruo
@ 2024-04-08 22:33 ` Qu Wenruo
  2024-04-11 14:56   ` Filipe Manana
  2024-04-08 22:33 ` [PATCH RFC 4/8] btrfs: introduce extra sanity checks for extent maps Qu Wenruo
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2024-04-08 22:33 UTC (permalink / raw)
  To: linux-btrfs

Introduce two new members for extent_map:

- disk_bytenr
- offset

Both are matching the members with the same name inside
btrfs_file_extent_items.

For now this patch only touches those members when:

- Reading btrfs_file_extent_items from disk
- Inserting new holes
- Merging two extent maps
  With the new disk_bytenr and disk_num_bytes, doing merging would be a
  little complex, as we have 3 different cases:

  * Both extent maps are referring to the same data extent
  * Both extent maps are referring to different data extents, but
    those data extents are adjacent, and extent maps are at head/tail
    of each data extents
  * One of the extent map is referring to an merged and larger data
    extent that covers both extent maps

  The 3rd case seems only valid in selftest (test_case_3()), but
  a new helper merge_ondisk_extents() should be able to handle all of
  them.

- Add a new member for can_nocow_file_extent_args
  The new member is called "orig_disk_bytenr", for easier fetching the
  old disk_bytenr.

- Update the new members when doing extent map split
  This is in fact a little simpler, as we only need to update
  offset/len.

- Update the new members when inserting new io extent map
  This involves quite some NOCOW related functions, and adding two
  parameters to a already long parameter list.

  To avoid unexpected parameter change, the two new parameters,
  @disk_bytenr and @offset are all added to the end of the list.

  And they would be relocated when dropping the old
  @block_start/@block_len/@orig_start members.

For now, both the old members (block_start/block_len/orig_start) are
co-existing with the new members (disk_bytenr/offset), meanwhile all the
critical code is still using the old members only.

The switch to new members would happen gradually to be bisect
friendly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/btrfs_inode.h |  3 +-
 fs/btrfs/defrag.c      |  4 +++
 fs/btrfs/extent_map.c  | 75 ++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/extent_map.h  | 17 ++++++++++
 fs/btrfs/file-item.c   |  9 ++++-
 fs/btrfs/file.c        |  3 +-
 fs/btrfs/inode.c       | 56 +++++++++++++++++++++++--------
 7 files changed, 147 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 100020ca4658..ded36e065089 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -444,7 +444,8 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 			u32 bio_offset, struct bio_vec *bv);
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 			      u64 *orig_start, u64 *orig_block_len,
-			      u64 *ram_bytes, bool nowait, bool strict);
+			      u64 *ram_bytes, bool nowait, bool strict,
+			      u64 *disk_bytenr_ret, u64 *extent_offset_ret);
 
 void btrfs_del_delalloc_inode(struct btrfs_inode *inode);
 struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry);
diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
index f015fa1b6301..5259fd556487 100644
--- a/fs/btrfs/defrag.c
+++ b/fs/btrfs/defrag.c
@@ -709,6 +709,10 @@ static struct extent_map *defrag_get_extent(struct btrfs_inode *inode,
 			em->start = start;
 			em->orig_start = start;
 			em->block_start = EXTENT_MAP_HOLE;
+			em->disk_bytenr = EXTENT_MAP_HOLE;
+			em->disk_num_bytes = 0;
+			em->ram_bytes = 0;
+			em->offset = 0;
 			em->len = key.offset - start;
 			break;
 		}
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index dd51a21b6a76..f59423897501 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -223,6 +223,58 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma
 	return next->block_start == prev->block_start;
 }
 
+/*
+ * Handle the ondisk data extents merge for @prev and @next.
+ *
+ * Only touches disk_bytenr/disk_num_bytes/offset/ram_bytes.
+ * For now only uncompressed regular extent can be merged.
+ *
+ * @prev and @next will be both updated to point to the new merged range.
+ * Thus one of them should be removed by the caller.
+ */
+static void merge_ondisk_extents(struct extent_map *prev, struct extent_map *next)
+{
+	u64 new_disk_bytenr;
+	u64 new_disk_num_bytes;
+	u64 new_offset;
+
+	/* @prev and @next should not be compressed. */
+	ASSERT(!extent_map_is_compressed(prev));
+	ASSERT(!extent_map_is_compressed(next));
+
+	/*
+	 * There are several different cases that @prev and @next can be merged.
+	 *
+	 * 1) They are referring to the same data extent
+	 * 2) Their ondisk data extents are adjacent and @prev is the tail
+	 *    and @next is the head of their data extents
+	 * 3) One of @prev/@next is referrring to a larger merged data extent.
+	 *    (test_case_3 of extent maps tests).
+	 *
+	 * The calculation here always merge the data extents first, then update
+	 * @offset using the new data extents.
+	 *
+	 * For case 1), the merged data extent would be the same.
+	 * For case 2), we just merge the two data extents into one.
+	 * For case 3), we just got the larger data extent.
+	 */
+	new_disk_bytenr = min(prev->disk_bytenr, next->disk_bytenr);
+	new_disk_num_bytes = max(prev->disk_bytenr + prev->disk_num_bytes,
+				 next->disk_bytenr + next->disk_num_bytes) -
+			     new_disk_bytenr;
+	new_offset = prev->disk_bytenr + prev->offset - new_disk_bytenr;
+
+	prev->disk_bytenr = new_disk_bytenr;
+	prev->disk_num_bytes = new_disk_num_bytes;
+	prev->ram_bytes = new_disk_num_bytes;
+	prev->offset = new_offset;
+
+	next->disk_bytenr = new_disk_bytenr;
+	next->disk_num_bytes = new_disk_num_bytes;
+	next->ram_bytes = new_disk_num_bytes;
+	next->offset = new_offset;
+}
+
 static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 {
 	struct extent_map *merge = NULL;
@@ -253,6 +305,9 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 			em->block_len += merge->block_len;
 			em->block_start = merge->block_start;
 			em->generation = max(em->generation, merge->generation);
+
+			if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
+				merge_ondisk_extents(merge, em);
 			em->flags |= EXTENT_FLAG_MERGED;
 
 			rb_erase_cached(&merge->rb_node, &tree->map);
@@ -267,6 +322,8 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 	if (rb && can_merge_extent_map(merge) && mergeable_maps(em, merge)) {
 		em->len += merge->len;
 		em->block_len += merge->block_len;
+		if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
+			merge_ondisk_extents(em, merge);
 		rb_erase_cached(&merge->rb_node, &tree->map);
 		RB_CLEAR_NODE(&merge->rb_node);
 		em->generation = max(em->generation, merge->generation);
@@ -541,6 +598,7 @@ static noinline int merge_extent_mapping(struct extent_map_tree *em_tree,
 	    !extent_map_is_compressed(em)) {
 		em->block_start += start_diff;
 		em->block_len = em->len;
+		em->offset += start_diff;
 	}
 	return add_extent_mapping(em_tree, em, 0);
 }
@@ -759,14 +817,18 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 					split->block_len = em->block_len;
 				else
 					split->block_len = split->len;
+				split->disk_bytenr = em->disk_bytenr;
 				split->disk_num_bytes = max(split->block_len,
 							    em->disk_num_bytes);
+				split->offset = em->offset;
 				split->ram_bytes = em->ram_bytes;
 			} else {
 				split->orig_start = split->start;
 				split->block_len = 0;
 				split->block_start = em->block_start;
+				split->disk_bytenr = em->disk_bytenr;
 				split->disk_num_bytes = 0;
+				split->offset = 0;
 				split->ram_bytes = split->len;
 			}
 
@@ -787,13 +849,14 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 			split->start = end;
 			split->len = em_end - end;
 			split->block_start = em->block_start;
+			split->disk_bytenr = em->disk_bytenr;
 			split->flags = flags;
 			split->generation = gen;
 
 			if (em->block_start < EXTENT_MAP_LAST_BYTE) {
 				split->disk_num_bytes = max(em->block_len,
 							    em->disk_num_bytes);
-
+				split->offset = em->offset + end - em->start;
 				split->ram_bytes = em->ram_bytes;
 				if (compressed) {
 					split->block_len = em->block_len;
@@ -806,10 +869,11 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 					split->orig_start = em->orig_start;
 				}
 			} else {
+				split->disk_num_bytes = 0;
+				split->offset = 0;
 				split->ram_bytes = split->len;
 				split->orig_start = split->start;
 				split->block_len = 0;
-				split->disk_num_bytes = 0;
 			}
 
 			if (extent_map_in_tree(em)) {
@@ -965,6 +1029,9 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 	/* First, replace the em with a new extent_map starting from * em->start */
 	split_pre->start = em->start;
 	split_pre->len = pre;
+	split_pre->disk_bytenr = new_logical;
+	split_pre->disk_num_bytes = split_pre->len;
+	split_pre->offset = 0;
 	split_pre->orig_start = split_pre->start;
 	split_pre->block_start = new_logical;
 	split_pre->block_len = split_pre->len;
@@ -983,10 +1050,12 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 	/* Insert the middle extent_map. */
 	split_mid->start = em->start + pre;
 	split_mid->len = em->len - pre;
+	split_mid->disk_bytenr = em->block_start + pre;
+	split_mid->disk_num_bytes = split_mid->len;
+	split_mid->offset = 0;
 	split_mid->orig_start = split_mid->start;
 	split_mid->block_start = em->block_start + pre;
 	split_mid->block_len = split_mid->len;
-	split_mid->disk_num_bytes = split_mid->block_len;
 	split_mid->ram_bytes = split_mid->len;
 	split_mid->flags = flags;
 	split_mid->generation = em->generation;
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 242a0c2e7a5e..848b4a4ecd6a 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -67,12 +67,29 @@ struct extent_map {
 	 */
 	u64 orig_start;
 
+	/*
+	 * The bytenr for of the full on-disk extent.
+	 *
+	 * For regular extents it's btrfs_file_extent_item::disk_bytenr.
+	 * For holes it's EXTENT_MAP_HOLE and for inline extents it's
+	 * EXTENT_MAP_INLINE.
+	 */
+	u64 disk_bytenr;
+
 	/*
 	 * The full on-disk extent length, matching
 	 * btrfs_file_extent_item::disk_num_bytes.
 	 */
 	u64 disk_num_bytes;
 
+	/*
+	 * Offset inside the decompressed extent.
+	 *
+	 * For regular extents it's btrfs_file_extent_item::offset.
+	 * For holes and inline extents it's 0.
+	 */
+	u64 offset;
+
 	/*
 	 * The decompressed size of the whole on-disk extent, matching
 	 * btrfs_file_extent_item::ram_bytes.
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index b552646a0ce6..96486f82ab5d 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -1280,12 +1280,17 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 		em->len = btrfs_file_extent_end(path) - extent_start;
 		em->orig_start = extent_start -
 			btrfs_file_extent_offset(leaf, fi);
-		em->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
 		bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 		if (bytenr == 0) {
 			em->block_start = EXTENT_MAP_HOLE;
+			em->disk_bytenr = EXTENT_MAP_HOLE;
+			em->disk_num_bytes = 0;
+			em->offset = 0;
 			return;
 		}
+		em->disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
+		em->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
+		em->offset = btrfs_file_extent_offset(leaf, fi);
 		if (compress_type != BTRFS_COMPRESS_NONE) {
 			extent_map_set_compression(em, compress_type);
 			em->block_start = bytenr;
@@ -1302,8 +1307,10 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 		ASSERT(extent_start == 0);
 
 		em->block_start = EXTENT_MAP_INLINE;
+		em->disk_bytenr = EXTENT_MAP_INLINE;
 		em->start = 0;
 		em->len = fs_info->sectorsize;
+		em->offset = 0;
 		/*
 		 * Initialize orig_start and block_len with the same values
 		 * as in inode.c:btrfs_get_extent().
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cdcd7e0785c1..af6de3549901 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1094,7 +1094,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 						   &cached_state);
 	}
 	ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
-			NULL, NULL, NULL, nowait, false);
+			NULL, NULL, NULL, nowait, false, NULL, NULL);
 	if (ret <= 0)
 		btrfs_drew_write_unlock(&root->snapshot_lock);
 	else
@@ -2161,6 +2161,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 		hole_em->orig_start = offset;
 
 		hole_em->block_start = EXTENT_MAP_HOLE;
+		hole_em->disk_bytenr = EXTENT_MAP_HOLE;
 		hole_em->block_len = 0;
 		hole_em->disk_num_bytes = 0;
 		hole_em->generation = trans->transid;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4d207c3b38d9..69a7cdeef81e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -139,9 +139,9 @@ static noinline int run_delalloc_cow(struct btrfs_inode *inode,
 				     bool pages_dirty);
 static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 				       u64 len, u64 orig_start, u64 block_start,
-				       u64 block_len, u64 orig_block_len,
+				       u64 block_len, u64 disk_num_bytes,
 				       u64 ram_bytes, int compress_type,
-				       int type);
+				       int type, u64 disk_bytenr, u64 offset);
 
 static int data_reloc_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
 					  u64 root, void *warn_ctx)
@@ -1166,7 +1166,8 @@ static void submit_one_async_extent(struct async_chunk *async_chunk,
 			  ins.offset,			/* orig_block_len */
 			  async_extent->ram_size,	/* ram_bytes */
 			  async_extent->compress_type,
-			  BTRFS_ORDERED_COMPRESSED);
+			  BTRFS_ORDERED_COMPRESSED,
+			  ins.objectid, 0);
 	if (IS_ERR(em)) {
 		ret = PTR_ERR(em);
 		goto out_free_reserve;
@@ -1429,7 +1430,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 				  ins.offset, /* orig_block_len */
 				  ram_size, /* ram_bytes */
 				  BTRFS_COMPRESS_NONE, /* compress_type */
-				  BTRFS_ORDERED_REGULAR /* type */);
+				  BTRFS_ORDERED_REGULAR /* type */,
+				  ins.objectid, 0);
 		if (IS_ERR(em)) {
 			ret = PTR_ERR(em);
 			goto out_reserve;
@@ -1859,6 +1861,7 @@ struct can_nocow_file_extent_args {
 	 */
 
 	u64 block_start;
+	u64 orig_disk_bytenr;
 	u64 orig_disk_num_bytes;
 	u64 orig_offset;
 	/* Number of bytes that can be written to in NOCOW mode. */
@@ -1897,6 +1900,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
 
 	/* Can't access these fields unless we know it's not an inline extent. */
 	args->block_start = btrfs_file_extent_disk_bytenr(leaf, fi);
+	args->orig_disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 	args->orig_disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
 	args->orig_offset = btrfs_file_extent_offset(leaf, fi);
 
@@ -2169,7 +2173,10 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 					  nocow_args.num_bytes, /* block_len */
 					  nocow_args.orig_disk_num_bytes, /* orig_block_len */
 					  ram_bytes, BTRFS_COMPRESS_NONE,
-					  BTRFS_ORDERED_PREALLOC);
+					  BTRFS_ORDERED_PREALLOC,
+					  nocow_args.orig_disk_bytenr,
+					  cur_offset - found_key.offset +
+					  nocow_args.orig_offset);
 			if (IS_ERR(em)) {
 				btrfs_dec_nocow_writers(nocow_bg);
 				ret = PTR_ERR(em);
@@ -4999,6 +5006,7 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
 			hole_em->orig_start = cur_offset;
 
 			hole_em->block_start = EXTENT_MAP_HOLE;
+			hole_em->disk_bytenr = EXTENT_MAP_HOLE;
 			hole_em->block_len = 0;
 			hole_em->disk_num_bytes = 0;
 			hole_em->ram_bytes = hole_size;
@@ -6860,6 +6868,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 	}
 	em->start = EXTENT_MAP_HOLE;
 	em->orig_start = EXTENT_MAP_HOLE;
+	em->disk_bytenr = EXTENT_MAP_HOLE;
 	em->len = (u64)-1;
 	em->block_len = (u64)-1;
 
@@ -7025,7 +7034,9 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 						  const u64 block_len,
 						  const u64 orig_block_len,
 						  const u64 ram_bytes,
-						  const int type)
+						  const int type,
+						  const u64 disk_bytenr,
+						  const u64 offset)
 {
 	struct extent_map *em = NULL;
 	struct btrfs_ordered_extent *ordered;
@@ -7034,7 +7045,7 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 		em = create_io_em(inode, start, len, orig_start, block_start,
 				  block_len, orig_block_len, ram_bytes,
 				  BTRFS_COMPRESS_NONE, /* compress_type */
-				  type);
+				  type, disk_bytenr, offset);
 		if (IS_ERR(em))
 			goto out;
 	}
@@ -7085,7 +7096,8 @@ static struct extent_map *btrfs_new_extent_direct(struct btrfs_inode *inode,
 
 	em = btrfs_create_dio_extent(inode, dio_data, start, ins.offset, start,
 				     ins.objectid, ins.offset, ins.offset,
-				     ins.offset, BTRFS_ORDERED_REGULAR);
+				     ins.offset, BTRFS_ORDERED_REGULAR,
+				     ins.objectid, 0);
 	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
 	if (IS_ERR(em))
 		btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset,
@@ -7129,7 +7141,8 @@ static bool btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr)
  */
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 			      u64 *orig_start, u64 *orig_block_len,
-			      u64 *ram_bytes, bool nowait, bool strict)
+			      u64 *ram_bytes, bool nowait, bool strict,
+			      u64 *disk_bytenr_ret, u64 *new_offset_ret)
 {
 	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
 	struct can_nocow_file_extent_args nocow_args = { 0 };
@@ -7218,6 +7231,11 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 		*orig_start = key.offset - nocow_args.orig_offset;
 	if (orig_block_len)
 		*orig_block_len = nocow_args.orig_disk_num_bytes;
+	if (disk_bytenr_ret)
+		*disk_bytenr_ret = nocow_args.orig_disk_bytenr;
+	if (new_offset_ret)
+		*new_offset_ret = offset - key.offset +
+				  nocow_args.orig_offset;
 
 	*len = nocow_args.num_bytes;
 	ret = 1;
@@ -7324,7 +7342,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 				       u64 len, u64 orig_start, u64 block_start,
 				       u64 block_len, u64 disk_num_bytes,
 				       u64 ram_bytes, int compress_type,
-				       int type)
+				       int type, u64 disk_bytenr, u64 offset)
 {
 	struct extent_map *em;
 	int ret;
@@ -7381,9 +7399,11 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 	em->len = len;
 	em->block_len = block_len;
 	em->block_start = block_start;
+	em->disk_bytenr = disk_bytenr;
 	em->disk_num_bytes = disk_num_bytes;
 	em->ram_bytes = ram_bytes;
 	em->generation = -1;
+	em->offset = offset;
 	em->flags |= EXTENT_FLAG_PINNED;
 	if (type == BTRFS_ORDERED_COMPRESSED)
 		extent_map_set_compression(em, compress_type);
@@ -7410,6 +7430,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	struct extent_map *em = *map;
 	int type;
 	u64 block_start, orig_start, orig_block_len, ram_bytes;
+	u64 disk_bytenr;
+	u64 new_offset;
 	struct btrfs_block_group *bg;
 	bool can_nocow = false;
 	bool space_reserved = false;
@@ -7437,7 +7459,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 		block_start = em->block_start + (start - em->start);
 
 		if (can_nocow_extent(inode, start, &len, &orig_start,
-				     &orig_block_len, &ram_bytes, false, false) == 1) {
+				     &orig_block_len, &ram_bytes, false, false,
+				     &disk_bytenr, &new_offset) == 1) {
 			bg = btrfs_inc_nocow_writers(fs_info, block_start);
 			if (bg)
 				can_nocow = true;
@@ -7465,7 +7488,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 		em2 = btrfs_create_dio_extent(BTRFS_I(inode), dio_data, start, len,
 					      orig_start, block_start,
 					      len, orig_block_len,
-					      ram_bytes, type);
+					      ram_bytes, type,
+					      disk_bytenr, new_offset);
 		btrfs_dec_nocow_writers(bg);
 		if (type == BTRFS_ORDERED_PREALLOC) {
 			free_extent_map(em);
@@ -9784,6 +9808,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		em->orig_start = cur_offset;
 		em->len = ins.offset;
 		em->block_start = ins.objectid;
+		em->disk_bytenr = ins.objectid;
+		em->offset = 0;
 		em->block_len = ins.offset;
 		em->disk_num_bytes = ins.offset;
 		em->ram_bytes = ins.offset;
@@ -10526,7 +10552,8 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	em = create_io_em(inode, start, num_bytes,
 			  start - encoded->unencoded_offset, ins.objectid,
 			  ins.offset, ins.offset, ram_bytes, compression,
-			  BTRFS_ORDERED_COMPRESSED);
+			  BTRFS_ORDERED_COMPRESSED, ins.objectid,
+			  encoded->unencoded_offset);
 	if (IS_ERR(em)) {
 		ret = PTR_ERR(em);
 		goto out_free_reserved;
@@ -10856,7 +10883,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 		free_extent_map(em);
 		em = NULL;
 
-		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL, false, true);
+		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL,
+				       false, true, NULL, NULL);
 		if (ret < 0) {
 			goto out;
 		} else if (ret) {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 4/8] btrfs: introduce extra sanity checks for extent maps
  2024-04-08 22:33 [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start Qu Wenruo
                   ` (2 preceding siblings ...)
  2024-04-08 22:33 ` [PATCH RFC 3/8] btrfs: introduce new members for extent_map Qu Wenruo
@ 2024-04-08 22:33 ` Qu Wenruo
  2024-04-08 22:33 ` [PATCH RFC 5/8] btrfs: remove extent_map::orig_start member Qu Wenruo
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2024-04-08 22:33 UTC (permalink / raw)
  To: linux-btrfs

Since extent_map structure has the all the needed members to represent a
file extent directly, we can apply all the file extent sanity checks to an extent
map.

The new sanity checks would cross check both the old members
(block_start/block_len/orig_start) and the new members
(disk_bytenr/disk_num_bytes/offset).

There is a special case for offset/orig_start/start cross check, we only
do such sanity check for compressed extent:

- Only compressed read/encoded write really utilize orig_start
  This can be proved by the cleanup patch of orig_start.

- Merged data extents can lead to false alerts
  The problem is, with disk_bytenr/disk_num_bytes, if we're merging
  two extent maps like this:

    |<- data extent A -->|<-- data extent B -->|
              |<- em 1 ->|<- em 2 ->|

  Let's assume em2 has orig_offset of 0 and start of 0, and obvisouly
  offset 0.

  But after merging, the merged em would have offset of em1, screwing up
  whatever the @orig_start cross check against @start.

The checks happens at the following timing:

- add_extent_mapping()
  This is for newly added extent map

- replace_extent_mapping()
  This is for btrfs_drop_extent_map_range() and split_extent_map()

- try_merge_map()

Since the check is way more strict than before, the following code has
to be modified to pass the check:

- extent-map-tests
  Previously the test case never populate ram_bytes, not to mention the
  newly introduced disk_bytenr/disk_num_bytes.
  Populate the involved numbers mostly to follow the existing
  block_start/block_len values.

  There are two special cases worth mentioning:
  - test_case_3()
    The test case is already way too invalid that tree-checker will
    reject almost all extents.

    And there is a special unaligned regular extent which has mismatch
    disk_num_bytes (4096) and ram_bytes (4096 - 1).
    Fix it by all assigned the disk_num_bytes and ram_bytes to 4096 - 1.

  - test_case_7()
    An extent is inserted with 16K length, but on-disk extent size is
    only 4K.
    This means it must be a compressed extent, so set the compressed flag
    for it.

- setup_relocation_extent_mapping()
  This is mostly utilized by relocation code to read the chunk like an
  inode.
  So populate the extent map using a regular non-compressed extent.

In fact, the new cross checks already exposed a bug in
btrfs_drop_extent_map_range(), and caught tons of bugs in the new
members assignment.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_map.c             | 66 +++++++++++++++++++++++++++++++
 fs/btrfs/relocation.c             |  4 ++
 fs/btrfs/tests/extent-map-tests.c | 56 +++++++++++++++++++++++++-
 fs/btrfs/tests/inode-tests.c      |  2 +-
 4 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index f59423897501..7fd92366810a 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -275,6 +275,66 @@ static void merge_ondisk_extents(struct extent_map *prev, struct extent_map *nex
 	next->offset = new_offset;
 }
 
+static void dump_extent_map(const char *prefix, struct extent_map *em)
+{
+	if (!IS_ENABLED(CONFIG_BTRFS_DEBUG))
+		return;
+	pr_crit("%s, start=%llu len=%llu disk_bytenr=%llu disk_num_bytes=%llu ram_bytes=%llu offset=%llu orig_start=%llu block_start=%llu block_len=%llu flags=0x%x\n",
+		prefix, em->start, em->len, em->disk_bytenr, em->disk_num_bytes,
+		em->ram_bytes, em->offset, em->orig_start, em->block_start,
+		em->block_len, em->flags);
+	ASSERT(0);
+}
+
+/* Internal sanity checks for btrfs debug builds. */
+static void validate_extent_map(struct extent_map *em)
+{
+	if (!IS_ENABLED(CONFIG_BTRFS_DEBUG))
+		return;
+	if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE) {
+		if (em->disk_num_bytes == 0)
+			dump_extent_map("zero disk_num_bytes", em);
+		if (em->offset + em->len > em->ram_bytes)
+			dump_extent_map("ram_bytes too small", em);
+		if (em->offset + em->len > em->disk_num_bytes &&
+		    !extent_map_is_compressed(em))
+			dump_extent_map("disk_num_bytes too small", em);
+
+		if (extent_map_is_compressed(em)) {
+			if (em->block_start != em->disk_bytenr)
+				dump_extent_map(
+				"mismatch block_start/disk_bytenr/offset", em);
+			if (em->disk_num_bytes != em->block_len)
+				dump_extent_map(
+				"mismatch disk_num_bytes/block_len", em);
+			/*
+			 * Here we only check the start/orig_start/offset for
+			 * compressed extents.
+			 * This is because em::offset is always based on the
+			 * referred data extent, which can be merged.
+			 *
+			 * In that case, @offset would no longer match
+			 * em::start - em::orig_start, and cause false alert.
+			 *
+			 * Thankfully only compressed extent read/encoded write
+			 * really bothers @orig_start, so we can skip
+			 * the check for non-compressed extents.
+			 */
+			if (em->orig_start != em->start - em->offset)
+				dump_extent_map(
+				"mismatch orig_start/offset/start", em);
+
+		} else {
+			if (em->block_start != em->disk_bytenr + em->offset)
+				dump_extent_map(
+				"mismatch block_start/disk_bytenr/offset", em);
+		}
+	} else {
+		if (em->offset)
+			dump_extent_map("non-zero offset for hole/inline", em);
+	}
+}
+
 static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 {
 	struct extent_map *merge = NULL;
@@ -310,6 +370,7 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 				merge_ondisk_extents(merge, em);
 			em->flags |= EXTENT_FLAG_MERGED;
 
+			validate_extent_map(em);
 			rb_erase_cached(&merge->rb_node, &tree->map);
 			RB_CLEAR_NODE(&merge->rb_node);
 			free_extent_map(merge);
@@ -324,6 +385,7 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 		em->block_len += merge->block_len;
 		if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
 			merge_ondisk_extents(em, merge);
+		validate_extent_map(em);
 		rb_erase_cached(&merge->rb_node, &tree->map);
 		RB_CLEAR_NODE(&merge->rb_node);
 		em->generation = max(em->generation, merge->generation);
@@ -431,6 +493,7 @@ static int add_extent_mapping(struct extent_map_tree *tree,
 
 	lockdep_assert_held_write(&tree->lock);
 
+	validate_extent_map(em);
 	ret = tree_insert(&tree->map, em);
 	if (ret)
 		goto out;
@@ -529,6 +592,9 @@ static void replace_extent_mapping(struct extent_map_tree *tree,
 {
 	lockdep_assert_held_write(&tree->lock);
 
+	validate_extent_map(cur);
+	validate_extent_map(new);
+
 	WARN_ON(cur->flags & EXTENT_FLAG_PINNED);
 	ASSERT(extent_map_in_tree(cur));
 	if (!(cur->flags & EXTENT_FLAG_LOGGING))
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 5c9ef6717f84..9007064f619e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2955,9 +2955,13 @@ static noinline_for_stack int setup_relocation_extent_mapping(struct inode *inod
 		return -ENOMEM;
 
 	em->start = start;
+	em->orig_start = start;
 	em->len = end + 1 - start;
 	em->block_len = em->len;
 	em->block_start = block_start;
+	em->disk_bytenr = block_start;
+	em->disk_num_bytes = em->len;
+	em->ram_bytes = em->len;
 	em->flags |= EXTENT_FLAG_PINNED;
 
 	lock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached_state);
diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
index 80e71c5cb7ab..96be45454e36 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -72,6 +72,9 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 	em->len = SZ_16K;
 	em->block_start = 0;
 	em->block_len = SZ_16K;
+	em->disk_bytenr = 0;
+	em->disk_num_bytes = SZ_16K;
+	em->ram_bytes = SZ_16K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -90,9 +93,13 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 	}
 
 	em->start = SZ_16K;
+	em->orig_start = SZ_16K;
 	em->len = SZ_4K;
 	em->block_start = SZ_32K; /* avoid merging */
 	em->block_len = SZ_4K;
+	em->disk_bytenr = SZ_32K; /* avoid merging */
+	em->disk_num_bytes = SZ_4K;
+	em->ram_bytes = SZ_4K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -111,9 +118,13 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 
 	/* Add [0, 8K), should return [0, 16K) instead. */
 	em->start = start;
+	em->orig_start = start;
 	em->len = len;
 	em->block_start = start;
 	em->block_len = len;
+	em->disk_bytenr = start;
+	em->disk_num_bytes = len;
+	em->ram_bytes = len;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -165,6 +176,9 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 	em->len = SZ_1K;
 	em->block_start = EXTENT_MAP_INLINE;
 	em->block_len = (u64)-1;
+	em->disk_bytenr = EXTENT_MAP_INLINE;
+	em->disk_num_bytes = 0;
+	em->ram_bytes = SZ_1K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -183,9 +197,13 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 	}
 
 	em->start = SZ_4K;
+	em->orig_start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_4K;
 	em->block_len = SZ_4K;
+	em->disk_bytenr = SZ_4K;
+	em->disk_num_bytes = SZ_4K;
+	em->ram_bytes = SZ_4K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -207,6 +225,9 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 	em->len = SZ_1K;
 	em->block_start = EXTENT_MAP_INLINE;
 	em->block_len = (u64)-1;
+	em->disk_bytenr = EXTENT_MAP_INLINE;
+	em->disk_num_bytes = 0;
+	em->ram_bytes = SZ_1K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -249,9 +270,13 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
 
 	/* Add [4K, 8K) */
 	em->start = SZ_4K;
+	em->orig_start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_4K;
 	em->block_len = SZ_4K;
+	em->disk_bytenr = SZ_4K;
+	em->disk_num_bytes = SZ_4K;
+	em->ram_bytes = SZ_4K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -273,6 +298,9 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
 	em->len = SZ_16K;
 	em->block_start = 0;
 	em->block_len = SZ_16K;
+	em->disk_bytenr = 0;
+	em->disk_num_bytes = SZ_16K;
+	em->ram_bytes = SZ_16K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, start, len);
 	write_unlock(&em_tree->lock);
@@ -356,6 +384,9 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	em->len = SZ_8K;
 	em->block_start = 0;
 	em->block_len = SZ_8K;
+	em->disk_bytenr = 0;
+	em->disk_num_bytes = SZ_8K;
+	em->ram_bytes = SZ_8K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -374,9 +405,13 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 
 	/* Add [8K, 32K) */
 	em->start = SZ_8K;
+	em->orig_start = SZ_8K;
 	em->len = 24 * SZ_1K;
 	em->block_start = SZ_16K; /* avoid merging */
 	em->block_len = 24 * SZ_1K;
+	em->disk_bytenr = SZ_16K; /* avoid merging */
+	em->disk_num_bytes = 24 * SZ_1K;
+	em->ram_bytes = 24 * SZ_1K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -394,9 +429,13 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	}
 	/* Add [0K, 32K) */
 	em->start = 0;
+	em->orig_start = 0;
 	em->len = SZ_32K;
 	em->block_start = 0;
 	em->block_len = SZ_32K;
+	em->disk_bytenr = 0;
+	em->disk_num_bytes = SZ_32K;
+	em->ram_bytes = SZ_32K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, start, len);
 	write_unlock(&em_tree->lock);
@@ -477,9 +516,13 @@ static int add_compressed_extent(struct btrfs_fs_info *fs_info,
 	}
 
 	em->start = start;
+	em->orig_start = start;
 	em->len = len;
 	em->block_start = block_start;
 	em->block_len = SZ_4K;
+	em->disk_bytenr = block_start;
+	em->disk_num_bytes = SZ_4K;
+	em->ram_bytes = len;
 	em->flags |= EXTENT_FLAG_COMPRESS_ZLIB;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
@@ -701,9 +744,13 @@ static int test_case_6(struct btrfs_fs_info *fs_info, struct extent_map_tree *em
 	}
 
 	em->start = SZ_4K;
+	em->orig_start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_16K;
 	em->block_len = SZ_16K;
+	em->disk_bytenr = SZ_16K;
+	em->disk_num_bytes = SZ_16K;
+	em->ram_bytes = SZ_16K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, 0, SZ_8K);
 	write_unlock(&em_tree->lock);
@@ -763,7 +810,10 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
 	em->len = SZ_16K;
 	em->block_start = 0;
 	em->block_len = SZ_4K;
-	em->flags |= EXTENT_FLAG_PINNED;
+	em->disk_bytenr = 0;
+	em->disk_num_bytes = SZ_4K;
+	em->ram_bytes = SZ_16K;
+	em->flags |= (EXTENT_FLAG_PINNED | EXTENT_FLAG_COMPRESS_ZLIB);
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
@@ -782,9 +832,13 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
 
 	/* [32K, 48K), not pinned */
 	em->start = SZ_32K;
+	em->orig_start = SZ_32K;
 	em->len = SZ_16K;
 	em->block_start = SZ_32K;
 	em->block_len = SZ_16K;
+	em->disk_bytenr = SZ_32K;
+	em->disk_num_bytes = SZ_16K;
+	em->ram_bytes = SZ_16K;
 	write_lock(&em_tree->lock);
 	ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, em->len);
 	write_unlock(&em_tree->lock);
diff --git a/fs/btrfs/tests/inode-tests.c b/fs/btrfs/tests/inode-tests.c
index 99da9d34b77a..0895c6e06812 100644
--- a/fs/btrfs/tests/inode-tests.c
+++ b/fs/btrfs/tests/inode-tests.c
@@ -117,7 +117,7 @@ static void setup_file_extents(struct btrfs_root *root, u32 sectorsize)
 
 	/* Now for a regular extent */
 	insert_extent(root, offset, sectorsize - 1, sectorsize - 1, 0,
-		      disk_bytenr, sectorsize, BTRFS_FILE_EXTENT_REG, 0, slot);
+		      disk_bytenr, sectorsize - 1, BTRFS_FILE_EXTENT_REG, 0, slot);
 	slot++;
 	disk_bytenr += sectorsize;
 	offset += sectorsize - 1;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 5/8] btrfs: remove extent_map::orig_start member
  2024-04-08 22:33 [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start Qu Wenruo
                   ` (3 preceding siblings ...)
  2024-04-08 22:33 ` [PATCH RFC 4/8] btrfs: introduce extra sanity checks for extent maps Qu Wenruo
@ 2024-04-08 22:33 ` Qu Wenruo
  2024-04-09 14:59   ` David Sterba
  2024-04-08 22:33 ` [PATCH RFC 6/8] btrfs: remove extent_map::block_len member Qu Wenruo
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2024-04-08 22:33 UTC (permalink / raw)
  To: linux-btrfs

Since we have extent_map::offset, the old extent_map::orig_start is just
extent_map::start - extent_map::offset for non-hole/inline extents.

And since the new extent_map::offset would be verified by
validate_extent_map() already meanwhile the old orig_start is not, let's
just remove the old member from all call sites.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/btrfs_inode.h            |  2 +-
 fs/btrfs/compression.c            |  2 +-
 fs/btrfs/defrag.c                 |  1 -
 fs/btrfs/extent_map.c             | 29 +----------
 fs/btrfs/extent_map.h             |  9 ----
 fs/btrfs/file-item.c              |  5 +-
 fs/btrfs/file.c                   |  3 +-
 fs/btrfs/inode.c                  | 37 +++++---------
 fs/btrfs/relocation.c             |  1 -
 fs/btrfs/tests/extent-map-tests.c |  9 ----
 fs/btrfs/tests/inode-tests.c      | 84 +++++++++++++------------------
 fs/btrfs/tree-log.c               |  2 +-
 include/trace/events/btrfs.h      | 14 ++----
 13 files changed, 60 insertions(+), 138 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index ded36e065089..f4514ee273ce 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -443,7 +443,7 @@ int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
 bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 			u32 bio_offset, struct bio_vec *bv);
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
-			      u64 *orig_start, u64 *orig_block_len,
+			      u64 *orig_block_len,
 			      u64 *ram_bytes, bool nowait, bool strict,
 			      u64 *disk_bytenr_ret, u64 *extent_offset_ret);
 
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index c981903c8cd7..24993be16333 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -590,7 +590,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
 	cb = alloc_compressed_bio(inode, file_offset, REQ_OP_READ,
 				  end_bbio_comprssed_read);
 
-	cb->start = em->orig_start;
+	cb->start = em->start - em->offset;
 	em_len = em->len;
 	em_start = em->start;
 
diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
index 5259fd556487..47fb2afb1513 100644
--- a/fs/btrfs/defrag.c
+++ b/fs/btrfs/defrag.c
@@ -707,7 +707,6 @@ static struct extent_map *defrag_get_extent(struct btrfs_inode *inode,
 		 */
 		if (key.offset > start) {
 			em->start = start;
-			em->orig_start = start;
 			em->block_start = EXTENT_MAP_HOLE;
 			em->disk_bytenr = EXTENT_MAP_HOLE;
 			em->disk_num_bytes = 0;
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 7fd92366810a..03d1d791bdca 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -279,9 +279,9 @@ static void dump_extent_map(const char *prefix, struct extent_map *em)
 {
 	if (!IS_ENABLED(CONFIG_BTRFS_DEBUG))
 		return;
-	pr_crit("%s, start=%llu len=%llu disk_bytenr=%llu disk_num_bytes=%llu ram_bytes=%llu offset=%llu orig_start=%llu block_start=%llu block_len=%llu flags=0x%x\n",
+	pr_crit("%s, start=%llu len=%llu disk_bytenr=%llu disk_num_bytes=%llu ram_bytes=%llu offset=%llu block_start=%llu block_len=%llu flags=0x%x\n",
 		prefix, em->start, em->len, em->disk_bytenr, em->disk_num_bytes,
-		em->ram_bytes, em->offset, em->orig_start, em->block_start,
+		em->ram_bytes, em->offset, em->block_start,
 		em->block_len, em->flags);
 	ASSERT(0);
 }
@@ -307,23 +307,6 @@ static void validate_extent_map(struct extent_map *em)
 			if (em->disk_num_bytes != em->block_len)
 				dump_extent_map(
 				"mismatch disk_num_bytes/block_len", em);
-			/*
-			 * Here we only check the start/orig_start/offset for
-			 * compressed extents.
-			 * This is because em::offset is always based on the
-			 * referred data extent, which can be merged.
-			 *
-			 * In that case, @offset would no longer match
-			 * em::start - em::orig_start, and cause false alert.
-			 *
-			 * Thankfully only compressed extent read/encoded write
-			 * really bothers @orig_start, so we can skip
-			 * the check for non-compressed extents.
-			 */
-			if (em->orig_start != em->start - em->offset)
-				dump_extent_map(
-				"mismatch orig_start/offset/start", em);
-
 		} else {
 			if (em->block_start != em->disk_bytenr + em->offset)
 				dump_extent_map(
@@ -360,7 +343,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 			merge = rb_entry(rb, struct extent_map, rb_node);
 		if (rb && can_merge_extent_map(merge) && mergeable_maps(merge, em)) {
 			em->start = merge->start;
-			em->orig_start = merge->orig_start;
 			em->len += merge->len;
 			em->block_len += merge->block_len;
 			em->block_start = merge->block_start;
@@ -876,7 +858,6 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 			split->len = start - em->start;
 
 			if (em->block_start < EXTENT_MAP_LAST_BYTE) {
-				split->orig_start = em->orig_start;
 				split->block_start = em->block_start;
 
 				if (compressed)
@@ -889,7 +870,6 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 				split->offset = em->offset;
 				split->ram_bytes = em->ram_bytes;
 			} else {
-				split->orig_start = split->start;
 				split->block_len = 0;
 				split->block_start = em->block_start;
 				split->disk_bytenr = em->disk_bytenr;
@@ -926,19 +906,16 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 				split->ram_bytes = em->ram_bytes;
 				if (compressed) {
 					split->block_len = em->block_len;
-					split->orig_start = em->orig_start;
 				} else {
 					const u64 diff = end - em->start;
 
 					split->block_len = split->len;
 					split->block_start += diff;
-					split->orig_start = em->orig_start;
 				}
 			} else {
 				split->disk_num_bytes = 0;
 				split->offset = 0;
 				split->ram_bytes = split->len;
-				split->orig_start = split->start;
 				split->block_len = 0;
 			}
 
@@ -1098,7 +1075,6 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 	split_pre->disk_bytenr = new_logical;
 	split_pre->disk_num_bytes = split_pre->len;
 	split_pre->offset = 0;
-	split_pre->orig_start = split_pre->start;
 	split_pre->block_start = new_logical;
 	split_pre->block_len = split_pre->len;
 	split_pre->disk_num_bytes = split_pre->block_len;
@@ -1119,7 +1095,6 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 	split_mid->disk_bytenr = em->block_start + pre;
 	split_mid->disk_num_bytes = split_mid->len;
 	split_mid->offset = 0;
-	split_mid->orig_start = split_mid->start;
 	split_mid->block_start = em->block_start + pre;
 	split_mid->block_len = split_mid->len;
 	split_mid->ram_bytes = split_mid->len;
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 848b4a4ecd6a..31a39751429e 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -58,15 +58,6 @@ struct extent_map {
 	 */
 	u64 len;
 
-	/*
-	 * The file offset of the original file extent before splitting.
-	 *
-	 * This is an in-memory only member, matching
-	 * extent_map::start - btrfs_file_extent_item::offset for
-	 * regular/preallocated extents. EXTENT_MAP_HOLE otherwise.
-	 */
-	u64 orig_start;
-
 	/*
 	 * The bytenr for of the full on-disk extent.
 	 *
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 96486f82ab5d..70698ff04200 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -1278,8 +1278,6 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 	    type == BTRFS_FILE_EXTENT_PREALLOC) {
 		em->start = extent_start;
 		em->len = btrfs_file_extent_end(path) - extent_start;
-		em->orig_start = extent_start -
-			btrfs_file_extent_offset(leaf, fi);
 		bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 		if (bytenr == 0) {
 			em->block_start = EXTENT_MAP_HOLE;
@@ -1312,10 +1310,9 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 		em->len = fs_info->sectorsize;
 		em->offset = 0;
 		/*
-		 * Initialize orig_start and block_len with the same values
+		 * Initialize block_len with the same values
 		 * as in inode.c:btrfs_get_extent().
 		 */
-		em->orig_start = EXTENT_MAP_HOLE;
 		em->block_len = (u64)-1;
 		extent_map_set_compression(em, compress_type);
 	} else {
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index af6de3549901..a90b9e1aa982 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1094,7 +1094,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 						   &cached_state);
 	}
 	ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
-			NULL, NULL, NULL, nowait, false, NULL, NULL);
+			NULL, NULL, nowait, false, NULL, NULL);
 	if (ret <= 0)
 		btrfs_drew_write_unlock(&root->snapshot_lock);
 	else
@@ -2158,7 +2158,6 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 		hole_em->start = offset;
 		hole_em->len = end - offset;
 		hole_em->ram_bytes = hole_em->len;
-		hole_em->orig_start = offset;
 
 		hole_em->block_start = EXTENT_MAP_HOLE;
 		hole_em->disk_bytenr = EXTENT_MAP_HOLE;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 69a7cdeef81e..24c11a1f1a93 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -138,7 +138,7 @@ static noinline int run_delalloc_cow(struct btrfs_inode *inode,
 				     u64 end, struct writeback_control *wbc,
 				     bool pages_dirty);
 static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
-				       u64 len, u64 orig_start, u64 block_start,
+				       u64 len, u64 block_start,
 				       u64 block_len, u64 disk_num_bytes,
 				       u64 ram_bytes, int compress_type,
 				       int type, u64 disk_bytenr, u64 offset);
@@ -1160,7 +1160,6 @@ static void submit_one_async_extent(struct async_chunk *async_chunk,
 	/* Here we're doing allocation and writeback of the compressed pages */
 	em = create_io_em(inode, start,
 			  async_extent->ram_size,	/* len */
-			  start,			/* orig_start */
 			  ins.objectid,			/* block_start */
 			  ins.offset,			/* block_len */
 			  ins.offset,			/* orig_block_len */
@@ -1424,7 +1423,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 
 		ram_size = ins.offset;
 		em = create_io_em(inode, start, ins.offset, /* len */
-				  start, /* orig_start */
 				  ins.objectid, /* block_start */
 				  ins.offset, /* block_len */
 				  ins.offset, /* orig_block_len */
@@ -2164,11 +2162,9 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 		nocow_end = cur_offset + nocow_args.num_bytes - 1;
 		is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
 		if (is_prealloc) {
-			u64 orig_start = found_key.offset - nocow_args.orig_offset;
 			struct extent_map *em;
 
 			em = create_io_em(inode, cur_offset, nocow_args.num_bytes,
-					  orig_start,
 					  nocow_args.block_start, /* block_start */
 					  nocow_args.num_bytes, /* block_len */
 					  nocow_args.orig_disk_num_bytes, /* orig_block_len */
@@ -5003,7 +4999,6 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
 			}
 			hole_em->start = cur_offset;
 			hole_em->len = hole_size;
-			hole_em->orig_start = cur_offset;
 
 			hole_em->block_start = EXTENT_MAP_HOLE;
 			hole_em->disk_bytenr = EXTENT_MAP_HOLE;
@@ -6867,7 +6862,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 		goto out;
 	}
 	em->start = EXTENT_MAP_HOLE;
-	em->orig_start = EXTENT_MAP_HOLE;
 	em->disk_bytenr = EXTENT_MAP_HOLE;
 	em->len = (u64)-1;
 	em->block_len = (u64)-1;
@@ -6960,7 +6954,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 
 		/* New extent overlaps with existing one */
 		em->start = start;
-		em->orig_start = start;
 		em->len = found_key.offset - start;
 		em->block_start = EXTENT_MAP_HOLE;
 		goto insert;
@@ -6996,7 +6989,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 	}
 not_found:
 	em->start = start;
-	em->orig_start = start;
 	em->len = len;
 	em->block_start = EXTENT_MAP_HOLE;
 insert:
@@ -7029,7 +7021,6 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 						  struct btrfs_dio_data *dio_data,
 						  const u64 start,
 						  const u64 len,
-						  const u64 orig_start,
 						  const u64 block_start,
 						  const u64 block_len,
 						  const u64 orig_block_len,
@@ -7042,7 +7033,7 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 	struct btrfs_ordered_extent *ordered;
 
 	if (type != BTRFS_ORDERED_NOCOW) {
-		em = create_io_em(inode, start, len, orig_start, block_start,
+		em = create_io_em(inode, start, len, block_start,
 				  block_len, orig_block_len, ram_bytes,
 				  BTRFS_COMPRESS_NONE, /* compress_type */
 				  type, disk_bytenr, offset);
@@ -7094,7 +7085,7 @@ static struct extent_map *btrfs_new_extent_direct(struct btrfs_inode *inode,
 	if (ret)
 		return ERR_PTR(ret);
 
-	em = btrfs_create_dio_extent(inode, dio_data, start, ins.offset, start,
+	em = btrfs_create_dio_extent(inode, dio_data, start, ins.offset,
 				     ins.objectid, ins.offset, ins.offset,
 				     ins.offset, BTRFS_ORDERED_REGULAR,
 				     ins.objectid, 0);
@@ -7140,7 +7131,7 @@ static bool btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr)
  *	 any ordered extents.
  */
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
-			      u64 *orig_start, u64 *orig_block_len,
+			      u64 *orig_block_len,
 			      u64 *ram_bytes, bool nowait, bool strict,
 			      u64 *disk_bytenr_ret, u64 *new_offset_ret)
 {
@@ -7227,8 +7218,6 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 		}
 	}
 
-	if (orig_start)
-		*orig_start = key.offset - nocow_args.orig_offset;
 	if (orig_block_len)
 		*orig_block_len = nocow_args.orig_disk_num_bytes;
 	if (disk_bytenr_ret)
@@ -7339,7 +7328,7 @@ static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
 
 /* The callers of this must take lock_extent() */
 static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
-				       u64 len, u64 orig_start, u64 block_start,
+				       u64 len, u64 block_start,
 				       u64 block_len, u64 disk_num_bytes,
 				       u64 ram_bytes, int compress_type,
 				       int type, u64 disk_bytenr, u64 offset)
@@ -7376,7 +7365,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 		ASSERT(ram_bytes == len);
 
 		/* Since it's a new extent, we should not have any offset. */
-		ASSERT(orig_start == start);
+		ASSERT(offset == 0);
 		break;
 	case BTRFS_ORDERED_COMPRESSED:
 		/* Must be compressed. */
@@ -7395,7 +7384,6 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 		return ERR_PTR(-ENOMEM);
 
 	em->start = start;
-	em->orig_start = orig_start;
 	em->len = len;
 	em->block_len = block_len;
 	em->block_start = block_start;
@@ -7429,7 +7417,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
 	struct extent_map *em = *map;
 	int type;
-	u64 block_start, orig_start, orig_block_len, ram_bytes;
+	u64 block_start, orig_block_len, ram_bytes;
 	u64 disk_bytenr;
 	u64 new_offset;
 	struct btrfs_block_group *bg;
@@ -7458,7 +7446,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 		len = min(len, em->len - (start - em->start));
 		block_start = em->block_start + (start - em->start);
 
-		if (can_nocow_extent(inode, start, &len, &orig_start,
+		if (can_nocow_extent(inode, start, &len,
 				     &orig_block_len, &ram_bytes, false, false,
 				     &disk_bytenr, &new_offset) == 1) {
 			bg = btrfs_inc_nocow_writers(fs_info, block_start);
@@ -7486,7 +7474,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 		space_reserved = true;
 
 		em2 = btrfs_create_dio_extent(BTRFS_I(inode), dio_data, start, len,
-					      orig_start, block_start,
+					      block_start,
 					      len, orig_block_len,
 					      ram_bytes, type,
 					      disk_bytenr, new_offset);
@@ -9805,7 +9793,6 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		}
 
 		em->start = cur_offset;
-		em->orig_start = cur_offset;
 		em->len = ins.offset;
 		em->block_start = ins.objectid;
 		em->disk_bytenr = ins.objectid;
@@ -10314,7 +10301,7 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 		disk_io_size = em->block_len;
 		count = em->block_len;
 		encoded->unencoded_len = em->ram_bytes;
-		encoded->unencoded_offset = iocb->ki_pos - em->orig_start;
+		encoded->unencoded_offset = iocb->ki_pos - em->start + em->offset;
 		ret = btrfs_encoded_io_compression_from_extent(fs_info,
 							       extent_map_compression(em));
 		if (ret < 0)
@@ -10550,7 +10537,7 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	extent_reserved = true;
 
 	em = create_io_em(inode, start, num_bytes,
-			  start - encoded->unencoded_offset, ins.objectid,
+			  ins.objectid,
 			  ins.offset, ins.offset, ram_bytes, compression,
 			  BTRFS_ORDERED_COMPRESSED, ins.objectid,
 			  encoded->unencoded_offset);
@@ -10883,7 +10870,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 		free_extent_map(em);
 		em = NULL;
 
-		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL,
+		ret = can_nocow_extent(inode, start, &len, NULL, NULL,
 				       false, true, NULL, NULL);
 		if (ret < 0) {
 			goto out;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 9007064f619e..2dfb197c2a96 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2955,7 +2955,6 @@ static noinline_for_stack int setup_relocation_extent_mapping(struct inode *inod
 		return -ENOMEM;
 
 	em->start = start;
-	em->orig_start = start;
 	em->len = end + 1 - start;
 	em->block_len = em->len;
 	em->block_start = block_start;
diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
index 96be45454e36..a55adecd5955 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -93,7 +93,6 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 	}
 
 	em->start = SZ_16K;
-	em->orig_start = SZ_16K;
 	em->len = SZ_4K;
 	em->block_start = SZ_32K; /* avoid merging */
 	em->block_len = SZ_4K;
@@ -118,7 +117,6 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 
 	/* Add [0, 8K), should return [0, 16K) instead. */
 	em->start = start;
-	em->orig_start = start;
 	em->len = len;
 	em->block_start = start;
 	em->block_len = len;
@@ -197,7 +195,6 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 	}
 
 	em->start = SZ_4K;
-	em->orig_start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_4K;
 	em->block_len = SZ_4K;
@@ -270,7 +267,6 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
 
 	/* Add [4K, 8K) */
 	em->start = SZ_4K;
-	em->orig_start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_4K;
 	em->block_len = SZ_4K;
@@ -405,7 +401,6 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 
 	/* Add [8K, 32K) */
 	em->start = SZ_8K;
-	em->orig_start = SZ_8K;
 	em->len = 24 * SZ_1K;
 	em->block_start = SZ_16K; /* avoid merging */
 	em->block_len = 24 * SZ_1K;
@@ -429,7 +424,6 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	}
 	/* Add [0K, 32K) */
 	em->start = 0;
-	em->orig_start = 0;
 	em->len = SZ_32K;
 	em->block_start = 0;
 	em->block_len = SZ_32K;
@@ -516,7 +510,6 @@ static int add_compressed_extent(struct btrfs_fs_info *fs_info,
 	}
 
 	em->start = start;
-	em->orig_start = start;
 	em->len = len;
 	em->block_start = block_start;
 	em->block_len = SZ_4K;
@@ -744,7 +737,6 @@ static int test_case_6(struct btrfs_fs_info *fs_info, struct extent_map_tree *em
 	}
 
 	em->start = SZ_4K;
-	em->orig_start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_16K;
 	em->block_len = SZ_16K;
@@ -832,7 +824,6 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
 
 	/* [32K, 48K), not pinned */
 	em->start = SZ_32K;
-	em->orig_start = SZ_32K;
 	em->len = SZ_16K;
 	em->block_start = SZ_32K;
 	em->block_len = SZ_16K;
diff --git a/fs/btrfs/tests/inode-tests.c b/fs/btrfs/tests/inode-tests.c
index 0895c6e06812..1b8c39edfc18 100644
--- a/fs/btrfs/tests/inode-tests.c
+++ b/fs/btrfs/tests/inode-tests.c
@@ -358,9 +358,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("unexpected flags set, want 0 have %u", em->flags);
 		goto out;
 	}
-	if (em->orig_start != em->start) {
-		test_err("wrong orig offset, want %llu, have %llu", em->start,
-			 em->orig_start);
+	if (em->offset != 0) {
+		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
 	offset = em->start + em->len;
@@ -386,9 +385,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("unexpected flags set, want 0 have %u", em->flags);
 		goto out;
 	}
-	if (em->orig_start != em->start) {
-		test_err("wrong orig offset, want %llu, have %llu", em->start,
-			 em->orig_start);
+	if (em->offset != 0) {
+		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
 	disk_bytenr = em->block_start;
@@ -437,9 +435,9 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("unexpected flags set, want 0 have %u", em->flags);
 		goto out;
 	}
-	if (em->orig_start != orig_start) {
-		test_err("wrong orig offset, want %llu, have %llu",
-			 orig_start, em->orig_start);
+	if (em->start - em->offset != orig_start) {
+		test_err("wrong offset, want %llu, have %llu",
+			 em->start - orig_start, em->offset);
 		goto out;
 	}
 	disk_bytenr += (em->start - orig_start);
@@ -472,9 +470,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 			 prealloc_only, em->flags);
 		goto out;
 	}
-	if (em->orig_start != em->start) {
-		test_err("wrong orig offset, want %llu, have %llu", em->start,
-			 em->orig_start);
+	if (em->offset != 0) {
+		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
 	offset = em->start + em->len;
@@ -501,9 +498,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 			 prealloc_only, em->flags);
 		goto out;
 	}
-	if (em->orig_start != em->start) {
-		test_err("wrong orig offset, want %llu, have %llu", em->start,
-			 em->orig_start);
+	if (em->offset != 0) {
+		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
 	disk_bytenr = em->block_start;
@@ -530,15 +526,14 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("unexpected flags set, want 0 have %u", em->flags);
 		goto out;
 	}
-	if (em->orig_start != orig_start) {
-		test_err("unexpected orig offset, wanted %llu, have %llu",
-			 orig_start, em->orig_start);
+	if (em->start - em->offset != orig_start) {
+		test_err("unexpected offset, wanted %llu, have %llu",
+			 em->start - orig_start, em->offset);
 		goto out;
 	}
-	if (em->block_start != (disk_bytenr + (em->start - em->orig_start))) {
+	if (em->block_start != disk_bytenr + em->offset) {
 		test_err("unexpected block start, wanted %llu, have %llu",
-			 disk_bytenr + (em->start - em->orig_start),
-			 em->block_start);
+			 disk_bytenr + em->offset, em->block_start);
 		goto out;
 	}
 	offset = em->start + em->len;
@@ -564,15 +559,14 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 			 prealloc_only, em->flags);
 		goto out;
 	}
-	if (em->orig_start != orig_start) {
-		test_err("wrong orig offset, want %llu, have %llu", orig_start,
-			 em->orig_start);
+	if (em->start - em->offset != orig_start) {
+		test_err("wrong offset, want %llu, have %llu",
+			 em->start - orig_start, em->offset);
 		goto out;
 	}
-	if (em->block_start != (disk_bytenr + (em->start - em->orig_start))) {
+	if (em->block_start != disk_bytenr + em->offset) {
 		test_err("unexpected block start, wanted %llu, have %llu",
-			 disk_bytenr + (em->start - em->orig_start),
-			 em->block_start);
+			 disk_bytenr + em->offset, em->block_start);
 		goto out;
 	}
 	offset = em->start + em->len;
@@ -599,9 +593,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 			 compressed_only, em->flags);
 		goto out;
 	}
-	if (em->orig_start != em->start) {
-		test_err("wrong orig offset, want %llu, have %llu",
-			 em->start, em->orig_start);
+	if (em->offset != 0) {
+		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
 	if (extent_map_compression(em) != BTRFS_COMPRESS_ZLIB) {
@@ -633,9 +626,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 			 compressed_only, em->flags);
 		goto out;
 	}
-	if (em->orig_start != em->start) {
-		test_err("wrong orig offset, want %llu, have %llu",
-			 em->start, em->orig_start);
+	if (em->offset != 0) {
+		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
 	if (extent_map_compression(em) != BTRFS_COMPRESS_ZLIB) {
@@ -667,9 +659,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("unexpected flags set, want 0 have %u", em->flags);
 		goto out;
 	}
-	if (em->orig_start != em->start) {
-		test_err("wrong orig offset, want %llu, have %llu", em->start,
-			 em->orig_start);
+	if (em->offset != 0) {
+		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
 	offset = em->start + em->len;
@@ -696,9 +687,9 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 			 compressed_only, em->flags);
 		goto out;
 	}
-	if (em->orig_start != orig_start) {
-		test_err("wrong orig offset, want %llu, have %llu",
-			 em->start, orig_start);
+	if (em->start - em->offset != orig_start) {
+		test_err("wrong offset, want %llu, have %llu",
+			 em->start - orig_start, em->offset);
 		goto out;
 	}
 	if (extent_map_compression(em) != BTRFS_COMPRESS_ZLIB) {
@@ -729,9 +720,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("unexpected flags set, want 0 have %u", em->flags);
 		goto out;
 	}
-	if (em->orig_start != em->start) {
-		test_err("wrong orig offset, want %llu, have %llu", em->start,
-			 em->orig_start);
+	if (em->offset != 0) {
+		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
 	offset = em->start + em->len;
@@ -762,9 +752,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 			 vacancy_only, em->flags);
 		goto out;
 	}
-	if (em->orig_start != em->start) {
-		test_err("wrong orig offset, want %llu, have %llu", em->start,
-			 em->orig_start);
+	if (em->offset != 0) {
+		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
 	offset = em->start + em->len;
@@ -789,9 +778,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("unexpected flags set, want 0 have %u", em->flags);
 		goto out;
 	}
-	if (em->orig_start != em->start) {
-		test_err("wrong orig offset, want %llu, have %llu", em->start,
-			 em->orig_start);
+	if (em->offset != 0) {
+		test_err("wrong orig offset, want 0, have %llu", em->offset);
 		goto out;
 	}
 	ret = 0;
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 2a13ca1eb7c5..e43c0128a39f 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4684,7 +4684,7 @@ static int log_one_extent(struct btrfs_trans_handle *trans,
 	struct extent_buffer *leaf;
 	struct btrfs_key key;
 	enum btrfs_compression_type compress_type;
-	u64 extent_offset = em->start - em->orig_start;
+	u64 extent_offset = em->offset;
 	u64 block_len;
 	int ret;
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 766cfd48386c..7dcc28cd1699 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -293,7 +293,6 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
 		__field(	u64,  ino		)
 		__field(	u64,  start		)
 		__field(	u64,  len		)
-		__field(	u64,  orig_start	)
 		__field(	u64,  block_start	)
 		__field(	u64,  block_len		)
 		__field(	u32,  flags		)
@@ -305,7 +304,6 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
 		__entry->ino		= btrfs_ino(inode);
 		__entry->start		= map->start;
 		__entry->len		= map->len;
-		__entry->orig_start	= map->orig_start;
 		__entry->block_start	= map->block_start;
 		__entry->block_len	= map->block_len;
 		__entry->flags		= map->flags;
@@ -313,13 +311,11 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
 	),
 
 	TP_printk_btrfs("root=%llu(%s) ino=%llu start=%llu len=%llu "
-		  "orig_start=%llu block_start=%llu(%s) "
-		  "block_len=%llu flags=%s refs=%u",
+		  "block_start=%llu(%s) block_len=%llu flags=%s refs=%u",
 		  show_root_type(__entry->root_objectid),
 		  __entry->ino,
 		  __entry->start,
 		  __entry->len,
-		  __entry->orig_start,
 		  show_map_type(__entry->block_start),
 		  __entry->block_len,
 		  show_map_flags(__entry->flags),
@@ -863,7 +859,7 @@ TRACE_EVENT(btrfs_add_block_group,
 		{ BTRFS_DROP_DELAYED_REF,   "DROP_DELAYED_REF" },	\
 		{ BTRFS_ADD_DELAYED_EXTENT, "ADD_DELAYED_EXTENT" }, 	\
 		{ BTRFS_UPDATE_DELAYED_HEAD, "UPDATE_DELAYED_HEAD" })
-			
+
 
 DECLARE_EVENT_CLASS(btrfs_delayed_tree_ref,
 
@@ -877,7 +873,7 @@ DECLARE_EVENT_CLASS(btrfs_delayed_tree_ref,
 	TP_STRUCT__entry_btrfs(
 		__field(	u64,  bytenr		)
 		__field(	u64,  num_bytes		)
-		__field(	int,  action		) 
+		__field(	int,  action		)
 		__field(	u64,  parent		)
 		__field(	u64,  ref_root		)
 		__field(	int,  level		)
@@ -940,7 +936,7 @@ DECLARE_EVENT_CLASS(btrfs_delayed_data_ref,
 	TP_STRUCT__entry_btrfs(
 		__field(	u64,  bytenr		)
 		__field(	u64,  num_bytes		)
-		__field(	int,  action		) 
+		__field(	int,  action		)
 		__field(	u64,  parent		)
 		__field(	u64,  ref_root		)
 		__field(	u64,  owner		)
@@ -1006,7 +1002,7 @@ DECLARE_EVENT_CLASS(btrfs_delayed_ref_head,
 	TP_STRUCT__entry_btrfs(
 		__field(	u64,  bytenr		)
 		__field(	u64,  num_bytes		)
-		__field(	int,  action		) 
+		__field(	int,  action		)
 		__field(	int,  is_data		)
 	),
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 6/8] btrfs: remove extent_map::block_len member
  2024-04-08 22:33 [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start Qu Wenruo
                   ` (4 preceding siblings ...)
  2024-04-08 22:33 ` [PATCH RFC 5/8] btrfs: remove extent_map::orig_start member Qu Wenruo
@ 2024-04-08 22:33 ` Qu Wenruo
  2024-04-08 22:33 ` [PATCH RFC 7/8] btrfs: remove extent_map::block_start member Qu Wenruo
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2024-04-08 22:33 UTC (permalink / raw)
  To: linux-btrfs

The extent_map::block_len is either extent_map::len (non-compressed
extent) or extent_map::disk_num_bytes (compressed extent).

Since we already have sanity checks to do the cross-check between the
new and old members, we can drop the old extent_map::block_len now.

For most call sites, they can manually select extent_map::len or
extent_map::disk_num_bytes, since most if not all of them have checked
if the extent is compressed.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c            |  2 +-
 fs/btrfs/extent_map.c             | 41 +++++++++++--------------------
 fs/btrfs/extent_map.h             |  9 -------
 fs/btrfs/file-item.c              |  7 ------
 fs/btrfs/file.c                   |  1 -
 fs/btrfs/inode.c                  | 36 +++++++++------------------
 fs/btrfs/relocation.c             |  1 -
 fs/btrfs/tests/extent-map-tests.c | 41 ++++++++++---------------------
 fs/btrfs/tree-log.c               |  4 +--
 include/trace/events/btrfs.h      |  5 +---
 10 files changed, 42 insertions(+), 105 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 24993be16333..0a97a0e39731 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -585,7 +585,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
 	}
 
 	ASSERT(extent_map_is_compressed(em));
-	compressed_len = em->block_len;
+	compressed_len = em->disk_num_bytes;
 
 	cb = alloc_compressed_bio(inode, file_offset, REQ_OP_READ,
 				  end_bbio_comprssed_read);
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 03d1d791bdca..932f5cb791b0 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -177,11 +177,18 @@ static struct rb_node *__tree_search(struct rb_root *root, u64 offset,
 	return NULL;
 }
 
+static inline u64 extent_map_block_len(const struct extent_map *em)
+{
+	if (extent_map_is_compressed(em))
+		return em->disk_num_bytes;
+	return em->len;
+}
+
 static inline u64 extent_map_block_end(const struct extent_map *em)
 {
-	if (em->block_start + em->block_len < em->block_start)
+	if (em->block_start + extent_map_block_len(em) < em->block_start)
 		return (u64)-1;
-	return em->block_start + em->block_len;
+	return em->block_start + extent_map_block_len(em);
 }
 
 static bool can_merge_extent_map(const struct extent_map *em)
@@ -279,10 +286,10 @@ static void dump_extent_map(const char *prefix, struct extent_map *em)
 {
 	if (!IS_ENABLED(CONFIG_BTRFS_DEBUG))
 		return;
-	pr_crit("%s, start=%llu len=%llu disk_bytenr=%llu disk_num_bytes=%llu ram_bytes=%llu offset=%llu block_start=%llu block_len=%llu flags=0x%x\n",
+	pr_crit("%s, start=%llu len=%llu disk_bytenr=%llu disk_num_bytes=%llu ram_bytes=%llu offset=%llu block_start=%llu flags=0x%x\n",
 		prefix, em->start, em->len, em->disk_bytenr, em->disk_num_bytes,
 		em->ram_bytes, em->offset, em->block_start,
-		em->block_len, em->flags);
+		em->flags);
 	ASSERT(0);
 }
 
@@ -304,9 +311,6 @@ static void validate_extent_map(struct extent_map *em)
 			if (em->block_start != em->disk_bytenr)
 				dump_extent_map(
 				"mismatch block_start/disk_bytenr/offset", em);
-			if (em->disk_num_bytes != em->block_len)
-				dump_extent_map(
-				"mismatch disk_num_bytes/block_len", em);
 		} else {
 			if (em->block_start != em->disk_bytenr + em->offset)
 				dump_extent_map(
@@ -344,7 +348,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 		if (rb && can_merge_extent_map(merge) && mergeable_maps(merge, em)) {
 			em->start = merge->start;
 			em->len += merge->len;
-			em->block_len += merge->block_len;
 			em->block_start = merge->block_start;
 			em->generation = max(em->generation, merge->generation);
 
@@ -364,7 +367,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 		merge = rb_entry(rb, struct extent_map, rb_node);
 	if (rb && can_merge_extent_map(merge) && mergeable_maps(em, merge)) {
 		em->len += merge->len;
-		em->block_len += merge->block_len;
 		if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
 			merge_ondisk_extents(em, merge);
 		validate_extent_map(em);
@@ -645,7 +647,6 @@ static noinline int merge_extent_mapping(struct extent_map_tree *em_tree,
 	if (em->block_start < EXTENT_MAP_LAST_BYTE &&
 	    !extent_map_is_compressed(em)) {
 		em->block_start += start_diff;
-		em->block_len = em->len;
 		em->offset += start_diff;
 	}
 	return add_extent_mapping(em_tree, em, 0);
@@ -860,17 +861,11 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 			if (em->block_start < EXTENT_MAP_LAST_BYTE) {
 				split->block_start = em->block_start;
 
-				if (compressed)
-					split->block_len = em->block_len;
-				else
-					split->block_len = split->len;
 				split->disk_bytenr = em->disk_bytenr;
-				split->disk_num_bytes = max(split->block_len,
-							    em->disk_num_bytes);
+				split->disk_num_bytes = em->disk_num_bytes;
 				split->offset = em->offset;
 				split->ram_bytes = em->ram_bytes;
 			} else {
-				split->block_len = 0;
 				split->block_start = em->block_start;
 				split->disk_bytenr = em->disk_bytenr;
 				split->disk_num_bytes = 0;
@@ -900,23 +895,18 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 			split->generation = gen;
 
 			if (em->block_start < EXTENT_MAP_LAST_BYTE) {
-				split->disk_num_bytes = max(em->block_len,
-							    em->disk_num_bytes);
+				split->disk_num_bytes = em->disk_num_bytes;
 				split->offset = em->offset + end - em->start;
 				split->ram_bytes = em->ram_bytes;
-				if (compressed) {
-					split->block_len = em->block_len;
-				} else {
+				if (!compressed) {
 					const u64 diff = end - em->start;
 
-					split->block_len = split->len;
 					split->block_start += diff;
 				}
 			} else {
 				split->disk_num_bytes = 0;
 				split->offset = 0;
 				split->ram_bytes = split->len;
-				split->block_len = 0;
 			}
 
 			if (extent_map_in_tree(em)) {
@@ -1076,8 +1066,6 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 	split_pre->disk_num_bytes = split_pre->len;
 	split_pre->offset = 0;
 	split_pre->block_start = new_logical;
-	split_pre->block_len = split_pre->len;
-	split_pre->disk_num_bytes = split_pre->block_len;
 	split_pre->ram_bytes = split_pre->len;
 	split_pre->flags = flags;
 	split_pre->generation = em->generation;
@@ -1096,7 +1084,6 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 	split_mid->disk_num_bytes = split_mid->len;
 	split_mid->offset = 0;
 	split_mid->block_start = em->block_start + pre;
-	split_mid->block_len = split_mid->len;
 	split_mid->ram_bytes = split_mid->len;
 	split_mid->flags = flags;
 	split_mid->generation = em->generation;
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 31a39751429e..bb7681bb7dba 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -99,15 +99,6 @@ struct extent_map {
 	 */
 	u64 block_start;
 
-	/*
-	 * The on-disk length for the file extent.
-	 *
-	 * For compressed extents it matches btrfs_file_extent_item::disk_num_bytes.
-	 * For uncompressed extents it matches extent_map::len.
-	 * For holes and inline extents it's -1 and shouldn't be used.
-	 */
-	u64 block_len;
-
 	/*
 	 * Generation of the extent map, for merged em it's the highest
 	 * generation of all merged ems.
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 70698ff04200..fd1e0e431e76 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -1292,11 +1292,9 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 		if (compress_type != BTRFS_COMPRESS_NONE) {
 			extent_map_set_compression(em, compress_type);
 			em->block_start = bytenr;
-			em->block_len = em->disk_num_bytes;
 		} else {
 			bytenr += btrfs_file_extent_offset(leaf, fi);
 			em->block_start = bytenr;
-			em->block_len = em->len;
 			if (type == BTRFS_FILE_EXTENT_PREALLOC)
 				em->flags |= EXTENT_FLAG_PREALLOC;
 		}
@@ -1309,11 +1307,6 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 		em->start = 0;
 		em->len = fs_info->sectorsize;
 		em->offset = 0;
-		/*
-		 * Initialize block_len with the same values
-		 * as in inode.c:btrfs_get_extent().
-		 */
-		em->block_len = (u64)-1;
 		extent_map_set_compression(em, compress_type);
 	} else {
 		btrfs_err(fs_info,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a90b9e1aa982..cbb0263f5a18 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2161,7 +2161,6 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 
 		hole_em->block_start = EXTENT_MAP_HOLE;
 		hole_em->disk_bytenr = EXTENT_MAP_HOLE;
-		hole_em->block_len = 0;
 		hole_em->disk_num_bytes = 0;
 		hole_em->generation = trans->transid;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 24c11a1f1a93..7dbc0c163316 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -139,7 +139,7 @@ static noinline int run_delalloc_cow(struct btrfs_inode *inode,
 				     bool pages_dirty);
 static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 				       u64 len, u64 block_start,
-				       u64 block_len, u64 disk_num_bytes,
+				       u64 disk_num_bytes,
 				       u64 ram_bytes, int compress_type,
 				       int type, u64 disk_bytenr, u64 offset);
 
@@ -1161,7 +1161,6 @@ static void submit_one_async_extent(struct async_chunk *async_chunk,
 	em = create_io_em(inode, start,
 			  async_extent->ram_size,	/* len */
 			  ins.objectid,			/* block_start */
-			  ins.offset,			/* block_len */
 			  ins.offset,			/* orig_block_len */
 			  async_extent->ram_size,	/* ram_bytes */
 			  async_extent->compress_type,
@@ -1424,7 +1423,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 		ram_size = ins.offset;
 		em = create_io_em(inode, start, ins.offset, /* len */
 				  ins.objectid, /* block_start */
-				  ins.offset, /* block_len */
 				  ins.offset, /* orig_block_len */
 				  ram_size, /* ram_bytes */
 				  BTRFS_COMPRESS_NONE, /* compress_type */
@@ -2166,7 +2164,6 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 
 			em = create_io_em(inode, cur_offset, nocow_args.num_bytes,
 					  nocow_args.block_start, /* block_start */
-					  nocow_args.num_bytes, /* block_len */
 					  nocow_args.orig_disk_num_bytes, /* orig_block_len */
 					  ram_bytes, BTRFS_COMPRESS_NONE,
 					  BTRFS_ORDERED_PREALLOC,
@@ -5002,7 +4999,6 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
 
 			hole_em->block_start = EXTENT_MAP_HOLE;
 			hole_em->disk_bytenr = EXTENT_MAP_HOLE;
-			hole_em->block_len = 0;
 			hole_em->disk_num_bytes = 0;
 			hole_em->ram_bytes = hole_size;
 			hole_em->generation = btrfs_get_fs_generation(fs_info);
@@ -6864,7 +6860,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 	em->start = EXTENT_MAP_HOLE;
 	em->disk_bytenr = EXTENT_MAP_HOLE;
 	em->len = (u64)-1;
-	em->block_len = (u64)-1;
 
 	path = btrfs_alloc_path();
 	if (!path) {
@@ -7022,7 +7017,6 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 						  const u64 start,
 						  const u64 len,
 						  const u64 block_start,
-						  const u64 block_len,
 						  const u64 orig_block_len,
 						  const u64 ram_bytes,
 						  const int type,
@@ -7034,14 +7028,14 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 
 	if (type != BTRFS_ORDERED_NOCOW) {
 		em = create_io_em(inode, start, len, block_start,
-				  block_len, orig_block_len, ram_bytes,
+				  orig_block_len, ram_bytes,
 				  BTRFS_COMPRESS_NONE, /* compress_type */
 				  type, disk_bytenr, offset);
 		if (IS_ERR(em))
 			goto out;
 	}
 	ordered = btrfs_alloc_ordered_extent(inode, start, len, len,
-					     block_start, block_len, 0,
+					     block_start, len, 0,
 					     (1 << type) |
 					     (1 << BTRFS_ORDERED_DIRECT),
 					     BTRFS_COMPRESS_NONE);
@@ -7086,7 +7080,7 @@ static struct extent_map *btrfs_new_extent_direct(struct btrfs_inode *inode,
 		return ERR_PTR(ret);
 
 	em = btrfs_create_dio_extent(inode, dio_data, start, ins.offset,
-				     ins.objectid, ins.offset, ins.offset,
+				     ins.objectid, ins.offset,
 				     ins.offset, BTRFS_ORDERED_REGULAR,
 				     ins.objectid, 0);
 	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
@@ -7329,7 +7323,7 @@ static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
 /* The callers of this must take lock_extent() */
 static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 				       u64 len, u64 block_start,
-				       u64 block_len, u64 disk_num_bytes,
+				       u64 disk_num_bytes,
 				       u64 ram_bytes, int compress_type,
 				       int type, u64 disk_bytenr, u64 offset)
 {
@@ -7350,16 +7344,10 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 
 	switch (type) {
 	case BTRFS_ORDERED_PREALLOC:
-		/* Uncompressed extents. */
-		ASSERT(block_len == len);
-
 		/* We're only referring part of a larger preallocated extent. */
-		ASSERT(block_len <= ram_bytes);
+		ASSERT(len <= ram_bytes);
 		break;
 	case BTRFS_ORDERED_REGULAR:
-		/* Uncompressed extents. */
-		ASSERT(block_len == len);
-
 		/* COW results a new extent matching our file extent size. */
 		ASSERT(disk_num_bytes == len);
 		ASSERT(ram_bytes == len);
@@ -7385,7 +7373,6 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 
 	em->start = start;
 	em->len = len;
-	em->block_len = block_len;
 	em->block_start = block_start;
 	em->disk_bytenr = disk_bytenr;
 	em->disk_num_bytes = disk_num_bytes;
@@ -7475,7 +7462,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 
 		em2 = btrfs_create_dio_extent(BTRFS_I(inode), dio_data, start, len,
 					      block_start,
-					      len, orig_block_len,
+					      orig_block_len,
 					      ram_bytes, type,
 					      disk_bytenr, new_offset);
 		btrfs_dec_nocow_writers(bg);
@@ -9797,7 +9784,6 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		em->block_start = ins.objectid;
 		em->disk_bytenr = ins.objectid;
 		em->offset = 0;
-		em->block_len = ins.offset;
 		em->disk_num_bytes = ins.offset;
 		em->ram_bytes = ins.offset;
 		em->flags |= EXTENT_FLAG_PREALLOC;
@@ -10294,12 +10280,12 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 		 * Bail if the buffer isn't large enough to return the whole
 		 * compressed extent.
 		 */
-		if (em->block_len > count) {
+		if (em->disk_num_bytes > count) {
 			ret = -ENOBUFS;
 			goto out_em;
 		}
-		disk_io_size = em->block_len;
-		count = em->block_len;
+		disk_io_size = em->disk_num_bytes;
+		count = em->disk_num_bytes;
 		encoded->unencoded_len = em->ram_bytes;
 		encoded->unencoded_offset = iocb->ki_pos - em->start + em->offset;
 		ret = btrfs_encoded_io_compression_from_extent(fs_info,
@@ -10538,7 +10524,7 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 
 	em = create_io_em(inode, start, num_bytes,
 			  ins.objectid,
-			  ins.offset, ins.offset, ram_bytes, compression,
+			  ins.offset, ram_bytes, compression,
 			  BTRFS_ORDERED_COMPRESSED, ins.objectid,
 			  encoded->unencoded_offset);
 	if (IS_ERR(em)) {
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 2dfb197c2a96..95a8588dcf8e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2956,7 +2956,6 @@ static noinline_for_stack int setup_relocation_extent_mapping(struct inode *inod
 
 	em->start = start;
 	em->len = end + 1 - start;
-	em->block_len = em->len;
 	em->block_start = block_start;
 	em->disk_bytenr = block_start;
 	em->disk_num_bytes = em->len;
diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
index a55adecd5955..dc14907c65f9 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -25,9 +25,10 @@ static void free_extent_map_tree(struct extent_map_tree *em_tree)
 #ifdef CONFIG_BTRFS_DEBUG
 		if (refcount_read(&em->refs) != 1) {
 			test_err(
-"em leak: em (start %llu len %llu block_start %llu block_len %llu) refs %d",
+"em leak: em (start %llu len %llu block_start %llu disk_num_bytes %llu offset %llu) refs %d",
 				 em->start, em->len, em->block_start,
-				 em->block_len, refcount_read(&em->refs));
+				 em->disk_num_bytes, em->offset,
+				 refcount_read(&em->refs));
 
 			refcount_set(&em->refs, 1);
 		}
@@ -71,7 +72,6 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 	em->start = 0;
 	em->len = SZ_16K;
 	em->block_start = 0;
-	em->block_len = SZ_16K;
 	em->disk_bytenr = 0;
 	em->disk_num_bytes = SZ_16K;
 	em->ram_bytes = SZ_16K;
@@ -95,7 +95,6 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 	em->start = SZ_16K;
 	em->len = SZ_4K;
 	em->block_start = SZ_32K; /* avoid merging */
-	em->block_len = SZ_4K;
 	em->disk_bytenr = SZ_32K; /* avoid merging */
 	em->disk_num_bytes = SZ_4K;
 	em->ram_bytes = SZ_4K;
@@ -119,7 +118,6 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 	em->start = start;
 	em->len = len;
 	em->block_start = start;
-	em->block_len = len;
 	em->disk_bytenr = start;
 	em->disk_num_bytes = len;
 	em->ram_bytes = len;
@@ -137,11 +135,11 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 		goto out;
 	}
 	if (em->start != 0 || extent_map_end(em) != SZ_16K ||
-	    em->block_start != 0 || em->block_len != SZ_16K) {
+	    em->block_start != 0 || em->disk_num_bytes != SZ_16K) {
 		test_err(
-"case1 [%llu %llu]: ret %d return a wrong em (start %llu len %llu block_start %llu block_len %llu",
+"case1 [%llu %llu]: ret %d return a wrong em (start %llu len %llu block_start %llu disk_num_bytes %llu",
 			 start, start + len, ret, em->start, em->len,
-			 em->block_start, em->block_len);
+			 em->block_start, em->disk_num_bytes);
 		ret = -EINVAL;
 	}
 	free_extent_map(em);
@@ -173,7 +171,6 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 	em->start = 0;
 	em->len = SZ_1K;
 	em->block_start = EXTENT_MAP_INLINE;
-	em->block_len = (u64)-1;
 	em->disk_bytenr = EXTENT_MAP_INLINE;
 	em->disk_num_bytes = 0;
 	em->ram_bytes = SZ_1K;
@@ -197,7 +194,6 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 	em->start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_4K;
-	em->block_len = SZ_4K;
 	em->disk_bytenr = SZ_4K;
 	em->disk_num_bytes = SZ_4K;
 	em->ram_bytes = SZ_4K;
@@ -221,7 +217,6 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 	em->start = 0;
 	em->len = SZ_1K;
 	em->block_start = EXTENT_MAP_INLINE;
-	em->block_len = (u64)-1;
 	em->disk_bytenr = EXTENT_MAP_INLINE;
 	em->disk_num_bytes = 0;
 	em->ram_bytes = SZ_1K;
@@ -238,11 +233,10 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 		goto out;
 	}
 	if (em->start != 0 || extent_map_end(em) != SZ_1K ||
-	    em->block_start != EXTENT_MAP_INLINE || em->block_len != (u64)-1) {
+	    em->block_start != EXTENT_MAP_INLINE) {
 		test_err(
-"case2 [0 1K]: ret %d return a wrong em (start %llu len %llu block_start %llu block_len %llu",
-			 ret, em->start, em->len, em->block_start,
-			 em->block_len);
+"case2 [0 1K]: ret %d return a wrong em (start %llu len %llu block_start %llu",
+			 ret, em->start, em->len, em->block_start);
 		ret = -EINVAL;
 	}
 	free_extent_map(em);
@@ -269,7 +263,6 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
 	em->start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_4K;
-	em->block_len = SZ_4K;
 	em->disk_bytenr = SZ_4K;
 	em->disk_num_bytes = SZ_4K;
 	em->ram_bytes = SZ_4K;
@@ -293,7 +286,6 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
 	em->start = 0;
 	em->len = SZ_16K;
 	em->block_start = 0;
-	em->block_len = SZ_16K;
 	em->disk_bytenr = 0;
 	em->disk_num_bytes = SZ_16K;
 	em->ram_bytes = SZ_16K;
@@ -316,11 +308,11 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
 	 * em->start.
 	 */
 	if (start < em->start || start + len > extent_map_end(em) ||
-	    em->start != em->block_start || em->len != em->block_len) {
+	    em->start != em->block_start) {
 		test_err(
 "case3 [%llu %llu): ret %d em (start %llu len %llu block_start %llu block_len %llu)",
 			 start, start + len, ret, em->start, em->len,
-			 em->block_start, em->block_len);
+			 em->block_start, em->disk_num_bytes);
 		ret = -EINVAL;
 	}
 	free_extent_map(em);
@@ -379,7 +371,6 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	em->start = 0;
 	em->len = SZ_8K;
 	em->block_start = 0;
-	em->block_len = SZ_8K;
 	em->disk_bytenr = 0;
 	em->disk_num_bytes = SZ_8K;
 	em->ram_bytes = SZ_8K;
@@ -403,7 +394,6 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	em->start = SZ_8K;
 	em->len = 24 * SZ_1K;
 	em->block_start = SZ_16K; /* avoid merging */
-	em->block_len = 24 * SZ_1K;
 	em->disk_bytenr = SZ_16K; /* avoid merging */
 	em->disk_num_bytes = 24 * SZ_1K;
 	em->ram_bytes = 24 * SZ_1K;
@@ -426,7 +416,6 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	em->start = 0;
 	em->len = SZ_32K;
 	em->block_start = 0;
-	em->block_len = SZ_32K;
 	em->disk_bytenr = 0;
 	em->disk_num_bytes = SZ_32K;
 	em->ram_bytes = SZ_32K;
@@ -446,9 +435,9 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	}
 	if (start < em->start || start + len > extent_map_end(em)) {
 		test_err(
-"case4 [%llu %llu): ret %d, added wrong em (start %llu len %llu block_start %llu block_len %llu)",
+"case4 [%llu %llu): ret %d, added wrong em (start %llu len %llu block_start %llu disk_num_bytes %llu)",
 			 start, start + len, ret, em->start, em->len, em->block_start,
-			 em->block_len);
+			 em->disk_num_bytes);
 		ret = -EINVAL;
 	}
 	free_extent_map(em);
@@ -512,7 +501,6 @@ static int add_compressed_extent(struct btrfs_fs_info *fs_info,
 	em->start = start;
 	em->len = len;
 	em->block_start = block_start;
-	em->block_len = SZ_4K;
 	em->disk_bytenr = block_start;
 	em->disk_num_bytes = SZ_4K;
 	em->ram_bytes = len;
@@ -739,7 +727,6 @@ static int test_case_6(struct btrfs_fs_info *fs_info, struct extent_map_tree *em
 	em->start = SZ_4K;
 	em->len = SZ_4K;
 	em->block_start = SZ_16K;
-	em->block_len = SZ_16K;
 	em->disk_bytenr = SZ_16K;
 	em->disk_num_bytes = SZ_16K;
 	em->ram_bytes = SZ_16K;
@@ -801,7 +788,6 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
 	em->start = 0;
 	em->len = SZ_16K;
 	em->block_start = 0;
-	em->block_len = SZ_4K;
 	em->disk_bytenr = 0;
 	em->disk_num_bytes = SZ_4K;
 	em->ram_bytes = SZ_16K;
@@ -826,7 +812,6 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
 	em->start = SZ_32K;
 	em->len = SZ_16K;
 	em->block_start = SZ_32K;
-	em->block_len = SZ_16K;
 	em->disk_bytenr = SZ_32K;
 	em->disk_num_bytes = SZ_16K;
 	em->ram_bytes = SZ_16K;
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index e43c0128a39f..5ca7f2623b56 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4645,7 +4645,7 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
 	/* If we're compressed we have to save the entire range of csums. */
 	if (extent_map_is_compressed(em)) {
 		csum_offset = 0;
-		csum_len = max(em->block_len, em->disk_num_bytes);
+		csum_len = em->disk_num_bytes;
 	} else {
 		csum_offset = mod_start - em->start;
 		csum_len = mod_len;
@@ -4694,7 +4694,7 @@ static int log_one_extent(struct btrfs_trans_handle *trans,
 	else
 		btrfs_set_stack_file_extent_type(&fi, BTRFS_FILE_EXTENT_REG);
 
-	block_len = max(em->block_len, em->disk_num_bytes);
+	block_len = em->disk_num_bytes;
 	compress_type = extent_map_compression(em);
 	if (compress_type != BTRFS_COMPRESS_NONE) {
 		btrfs_set_stack_file_extent_disk_bytenr(&fi, em->block_start);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 7dcc28cd1699..0d0775bde14c 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -294,7 +294,6 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
 		__field(	u64,  start		)
 		__field(	u64,  len		)
 		__field(	u64,  block_start	)
-		__field(	u64,  block_len		)
 		__field(	u32,  flags		)
 		__field(	int,  refs		)
 	),
@@ -305,19 +304,17 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
 		__entry->start		= map->start;
 		__entry->len		= map->len;
 		__entry->block_start	= map->block_start;
-		__entry->block_len	= map->block_len;
 		__entry->flags		= map->flags;
 		__entry->refs		= refcount_read(&map->refs);
 	),
 
 	TP_printk_btrfs("root=%llu(%s) ino=%llu start=%llu len=%llu "
-		  "block_start=%llu(%s) block_len=%llu flags=%s refs=%u",
+		  "block_start=%llu(%s) flags=%s refs=%u",
 		  show_root_type(__entry->root_objectid),
 		  __entry->ino,
 		  __entry->start,
 		  __entry->len,
 		  show_map_type(__entry->block_start),
-		  __entry->block_len,
 		  show_map_flags(__entry->flags),
 		  __entry->refs)
 );
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 7/8] btrfs: remove extent_map::block_start member
  2024-04-08 22:33 [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start Qu Wenruo
                   ` (5 preceding siblings ...)
  2024-04-08 22:33 ` [PATCH RFC 6/8] btrfs: remove extent_map::block_len member Qu Wenruo
@ 2024-04-08 22:33 ` Qu Wenruo
  2024-04-08 22:33 ` [PATCH RFC 8/8] btrfs: reorder disk_bytenr/disk_num_bytes/ram_bytes/offset parameters Qu Wenruo
  2024-04-09 14:57 ` [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start David Sterba
  8 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2024-04-08 22:33 UTC (permalink / raw)
  To: linux-btrfs

The member extent_map::block_start can be calculated from
extent_map::disk_bytenr + extent_map::offset for regular extents.
And otherwise just extent_map::disk_bytenr.

And this is already validated by the validate_extent_map().

Now we can remove the member.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c            |  3 +-
 fs/btrfs/defrag.c                 |  9 ++-
 fs/btrfs/extent_io.c              | 10 ++--
 fs/btrfs/extent_map.c             | 56 +++++------------
 fs/btrfs/extent_map.h             | 22 ++++---
 fs/btrfs/file-item.c              |  4 --
 fs/btrfs/file.c                   | 11 ++--
 fs/btrfs/inode.c                  | 61 ++++++++-----------
 fs/btrfs/relocation.c             |  1 -
 fs/btrfs/tests/extent-map-tests.c | 48 ++++++---------
 fs/btrfs/tests/inode-tests.c      | 99 ++++++++++++++++---------------
 fs/btrfs/tree-log.c               | 19 +++---
 fs/btrfs/zoned.c                  |  4 +-
 include/trace/events/btrfs.h      |  5 +-
 14 files changed, 149 insertions(+), 203 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 0a97a0e39731..88328e321360 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -507,7 +507,8 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 		 */
 		if (!em || cur < em->start ||
 		    (cur + fs_info->sectorsize > extent_map_end(em)) ||
-		    (em->block_start >> SECTOR_SHIFT) != orig_bio->bi_iter.bi_sector) {
+		    (extent_map_block_start(em) >> SECTOR_SHIFT) !=
+		    orig_bio->bi_iter.bi_sector) {
 			free_extent_map(em);
 			unlock_extent(tree, cur, page_end, NULL);
 			unlock_page(page);
diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
index 47fb2afb1513..233933519d6a 100644
--- a/fs/btrfs/defrag.c
+++ b/fs/btrfs/defrag.c
@@ -707,7 +707,6 @@ static struct extent_map *defrag_get_extent(struct btrfs_inode *inode,
 		 */
 		if (key.offset > start) {
 			em->start = start;
-			em->block_start = EXTENT_MAP_HOLE;
 			em->disk_bytenr = EXTENT_MAP_HOLE;
 			em->disk_num_bytes = 0;
 			em->ram_bytes = 0;
@@ -828,7 +827,7 @@ static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
 	 */
 	next = defrag_lookup_extent(inode, em->start + em->len, newer_than, locked);
 	/* No more em or hole */
-	if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
+	if (!next || next->disk_bytenr >= EXTENT_MAP_LAST_BYTE)
 		goto out;
 	if (next->flags & EXTENT_FLAG_PREALLOC)
 		goto out;
@@ -995,12 +994,12 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 		 * This is for users who want to convert inline extents to
 		 * regular ones through max_inline= mount option.
 		 */
-		if (em->block_start == EXTENT_MAP_INLINE &&
+		if (em->disk_bytenr == EXTENT_MAP_INLINE &&
 		    em->len <= inode->root->fs_info->max_inline)
 			goto next;
 
 		/* Skip holes and preallocated extents. */
-		if (em->block_start == EXTENT_MAP_HOLE ||
+		if (em->disk_bytenr == EXTENT_MAP_HOLE ||
 		    (em->flags & EXTENT_FLAG_PREALLOC))
 			goto next;
 
@@ -1065,7 +1064,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 		 * So if an inline extent passed all above checks, just add it
 		 * for defrag, and be converted to regular extents.
 		 */
-		if (em->block_start == EXTENT_MAP_INLINE)
+		if (em->disk_bytenr == EXTENT_MAP_INLINE)
 			goto add;
 
 		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bbdcb7475cea..1f9970fb2d08 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1093,10 +1093,10 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		iosize = min(extent_map_end(em) - cur, end - cur + 1);
 		iosize = ALIGN(iosize, blocksize);
 		if (compress_type != BTRFS_COMPRESS_NONE)
-			disk_bytenr = em->block_start;
+			disk_bytenr = em->disk_bytenr;
 		else
-			disk_bytenr = em->block_start + extent_offset;
-		block_start = em->block_start;
+			disk_bytenr = extent_map_block_start(em) + extent_offset;
+		block_start = extent_map_block_start(em);
 		if (em->flags & EXTENT_FLAG_PREALLOC)
 			block_start = EXTENT_MAP_HOLE;
 
@@ -1415,8 +1415,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
 		ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
 
-		block_start = em->block_start;
-		disk_bytenr = em->block_start + extent_offset;
+		block_start = extent_map_block_start(em);
+		disk_bytenr = extent_map_block_start(em) + extent_offset;
 
 		ASSERT(!extent_map_is_compressed(em));
 		ASSERT(block_start != EXTENT_MAP_HOLE);
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 932f5cb791b0..ddeed01251e0 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -186,9 +186,10 @@ static inline u64 extent_map_block_len(const struct extent_map *em)
 
 static inline u64 extent_map_block_end(const struct extent_map *em)
 {
-	if (em->block_start + extent_map_block_len(em) < em->block_start)
+	if (extent_map_block_start(em) + extent_map_block_len(em) <
+	    extent_map_block_start(em))
 		return (u64)-1;
-	return em->block_start + extent_map_block_len(em);
+	return extent_map_block_start(em) + extent_map_block_len(em);
 }
 
 static bool can_merge_extent_map(const struct extent_map *em)
@@ -223,11 +224,11 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma
 	if (prev->flags != next->flags)
 		return false;
 
-	if (next->block_start < EXTENT_MAP_LAST_BYTE - 1)
-		return next->block_start == extent_map_block_end(prev);
+	if (next->disk_bytenr < EXTENT_MAP_LAST_BYTE - 1)
+		return extent_map_block_start(next) == extent_map_block_end(prev);
 
 	/* HOLES and INLINE extents. */
-	return next->block_start == prev->block_start;
+	return next->disk_bytenr == prev->disk_bytenr;
 }
 
 /*
@@ -286,10 +287,9 @@ static void dump_extent_map(const char *prefix, struct extent_map *em)
 {
 	if (!IS_ENABLED(CONFIG_BTRFS_DEBUG))
 		return;
-	pr_crit("%s, start=%llu len=%llu disk_bytenr=%llu disk_num_bytes=%llu ram_bytes=%llu offset=%llu block_start=%llu flags=0x%x\n",
+	pr_crit("%s, start=%llu len=%llu disk_bytenr=%llu disk_num_bytes=%llu ram_bytes=%llu offset=%llu flags=0x%x\n",
 		prefix, em->start, em->len, em->disk_bytenr, em->disk_num_bytes,
-		em->ram_bytes, em->offset, em->block_start,
-		em->flags);
+		em->ram_bytes, em->offset, em->flags);
 	ASSERT(0);
 }
 
@@ -306,16 +306,6 @@ static void validate_extent_map(struct extent_map *em)
 		if (em->offset + em->len > em->disk_num_bytes &&
 		    !extent_map_is_compressed(em))
 			dump_extent_map("disk_num_bytes too small", em);
-
-		if (extent_map_is_compressed(em)) {
-			if (em->block_start != em->disk_bytenr)
-				dump_extent_map(
-				"mismatch block_start/disk_bytenr/offset", em);
-		} else {
-			if (em->block_start != em->disk_bytenr + em->offset)
-				dump_extent_map(
-				"mismatch block_start/disk_bytenr/offset", em);
-		}
 	} else {
 		if (em->offset)
 			dump_extent_map("non-zero offset for hole/inline", em);
@@ -348,7 +338,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
 		if (rb && can_merge_extent_map(merge) && mergeable_maps(merge, em)) {
 			em->start = merge->start;
 			em->len += merge->len;
-			em->block_start = merge->block_start;
 			em->generation = max(em->generation, merge->generation);
 
 			if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
@@ -644,11 +633,9 @@ static noinline int merge_extent_mapping(struct extent_map_tree *em_tree,
 	start_diff = start - em->start;
 	em->start = start;
 	em->len = end - start;
-	if (em->block_start < EXTENT_MAP_LAST_BYTE &&
-	    !extent_map_is_compressed(em)) {
-		em->block_start += start_diff;
+	if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE &&
+	    !extent_map_is_compressed(em))
 		em->offset += start_diff;
-	}
 	return add_extent_mapping(em_tree, em, 0);
 }
 
@@ -684,7 +671,7 @@ int btrfs_add_extent_mapping(struct btrfs_fs_info *fs_info,
 	 * Tree-checker should have rejected any inline extent with non-zero
 	 * file offset. Here just do a sanity check.
 	 */
-	if (em->block_start == EXTENT_MAP_INLINE)
+	if (em->disk_bytenr == EXTENT_MAP_INLINE)
 		ASSERT(em->start == 0);
 
 	ret = add_extent_mapping(em_tree, em, 0);
@@ -812,7 +799,6 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 		u64 gen;
 		unsigned long flags;
 		bool modified;
-		bool compressed;
 
 		if (em_end < end) {
 			next_em = next_extent_map(em);
@@ -846,7 +832,6 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 			goto remove_em;
 
 		gen = em->generation;
-		compressed = extent_map_is_compressed(em);
 
 		if (em->start < start) {
 			if (!split) {
@@ -858,15 +843,12 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 			split->start = em->start;
 			split->len = start - em->start;
 
-			if (em->block_start < EXTENT_MAP_LAST_BYTE) {
-				split->block_start = em->block_start;
-
+			if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE) {
 				split->disk_bytenr = em->disk_bytenr;
 				split->disk_num_bytes = em->disk_num_bytes;
 				split->offset = em->offset;
 				split->ram_bytes = em->ram_bytes;
 			} else {
-				split->block_start = em->block_start;
 				split->disk_bytenr = em->disk_bytenr;
 				split->disk_num_bytes = 0;
 				split->offset = 0;
@@ -889,20 +871,14 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 			}
 			split->start = end;
 			split->len = em_end - end;
-			split->block_start = em->block_start;
 			split->disk_bytenr = em->disk_bytenr;
 			split->flags = flags;
 			split->generation = gen;
 
-			if (em->block_start < EXTENT_MAP_LAST_BYTE) {
+			if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE) {
 				split->disk_num_bytes = em->disk_num_bytes;
 				split->offset = em->offset + end - em->start;
 				split->ram_bytes = em->ram_bytes;
-				if (!compressed) {
-					const u64 diff = end - em->start;
-
-					split->block_start += diff;
-				}
 			} else {
 				split->disk_num_bytes = 0;
 				split->offset = 0;
@@ -1051,7 +1027,7 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 
 	ASSERT(em->len == len);
 	ASSERT(!extent_map_is_compressed(em));
-	ASSERT(em->block_start < EXTENT_MAP_LAST_BYTE);
+	ASSERT(em->disk_bytenr < EXTENT_MAP_LAST_BYTE);
 	ASSERT(em->flags & EXTENT_FLAG_PINNED);
 	ASSERT(!(em->flags & EXTENT_FLAG_LOGGING));
 	ASSERT(!list_empty(&em->list));
@@ -1065,7 +1041,6 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 	split_pre->disk_bytenr = new_logical;
 	split_pre->disk_num_bytes = split_pre->len;
 	split_pre->offset = 0;
-	split_pre->block_start = new_logical;
 	split_pre->ram_bytes = split_pre->len;
 	split_pre->flags = flags;
 	split_pre->generation = em->generation;
@@ -1080,10 +1055,9 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
 	/* Insert the middle extent_map. */
 	split_mid->start = em->start + pre;
 	split_mid->len = em->len - pre;
-	split_mid->disk_bytenr = em->block_start + pre;
+	split_mid->disk_bytenr = extent_map_block_start(em) + pre;
 	split_mid->disk_num_bytes = split_mid->len;
 	split_mid->offset = 0;
-	split_mid->block_start = em->block_start + pre;
 	split_mid->ram_bytes = split_mid->len;
 	split_mid->flags = flags;
 	split_mid->generation = em->generation;
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index bb7681bb7dba..da5bf10f911d 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -87,18 +87,6 @@ struct extent_map {
 	 */
 	u64 ram_bytes;
 
-	/*
-	 * The on-disk logical bytenr for the file extent.
-	 *
-	 * For compressed extents it matches btrfs_file_extent_item::disk_bytenr.
-	 * For uncompressed extents it matches
-	 * btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset
-	 *
-	 * For holes it is EXTENT_MAP_HOLE and for inline extents it is
-	 * EXTENT_MAP_INLINE.
-	 */
-	u64 block_start;
-
 	/*
 	 * Generation of the extent map, for merged em it's the highest
 	 * generation of all merged ems.
@@ -159,6 +147,16 @@ static inline int extent_map_in_tree(const struct extent_map *em)
 	return !RB_EMPTY_NODE(&em->rb_node);
 }
 
+static inline u64 extent_map_block_start(const struct extent_map *em)
+{
+	if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE) {
+		if (extent_map_is_compressed(em))
+			return em->disk_bytenr;
+		return em->disk_bytenr + em->offset;
+	}
+	return em->disk_bytenr;
+}
+
 static inline u64 extent_map_end(const struct extent_map *em)
 {
 	if (em->start + em->len < em->start)
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index fd1e0e431e76..bdc914219215 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -1280,7 +1280,6 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 		em->len = btrfs_file_extent_end(path) - extent_start;
 		bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 		if (bytenr == 0) {
-			em->block_start = EXTENT_MAP_HOLE;
 			em->disk_bytenr = EXTENT_MAP_HOLE;
 			em->disk_num_bytes = 0;
 			em->offset = 0;
@@ -1291,10 +1290,8 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 		em->offset = btrfs_file_extent_offset(leaf, fi);
 		if (compress_type != BTRFS_COMPRESS_NONE) {
 			extent_map_set_compression(em, compress_type);
-			em->block_start = bytenr;
 		} else {
 			bytenr += btrfs_file_extent_offset(leaf, fi);
-			em->block_start = bytenr;
 			if (type == BTRFS_FILE_EXTENT_PREALLOC)
 				em->flags |= EXTENT_FLAG_PREALLOC;
 		}
@@ -1302,7 +1299,6 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 		/* Tree-checker has ensured this. */
 		ASSERT(extent_start == 0);
 
-		em->block_start = EXTENT_MAP_INLINE;
 		em->disk_bytenr = EXTENT_MAP_INLINE;
 		em->start = 0;
 		em->len = fs_info->sectorsize;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cbb0263f5a18..965d2ba34aeb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2159,7 +2159,6 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 		hole_em->len = end - offset;
 		hole_em->ram_bytes = hole_em->len;
 
-		hole_em->block_start = EXTENT_MAP_HOLE;
 		hole_em->disk_bytenr = EXTENT_MAP_HOLE;
 		hole_em->disk_num_bytes = 0;
 		hole_em->generation = trans->transid;
@@ -2192,7 +2191,7 @@ static int find_first_non_hole(struct btrfs_inode *inode, u64 *start, u64 *len)
 		return PTR_ERR(em);
 
 	/* Hole or vacuum extent(only exists in no-hole mode) */
-	if (em->block_start == EXTENT_MAP_HOLE) {
+	if (em->disk_bytenr == EXTENT_MAP_HOLE) {
 		ret = 1;
 		*len = em->start + em->len > *start + *len ?
 		       0 : *start + *len - em->start - em->len;
@@ -2848,7 +2847,7 @@ static int btrfs_zero_range_check_range_boundary(struct btrfs_inode *inode,
 	if (IS_ERR(em))
 		return PTR_ERR(em);
 
-	if (em->block_start == EXTENT_MAP_HOLE)
+	if (em->disk_bytenr == EXTENT_MAP_HOLE)
 		ret = RANGE_BOUNDARY_HOLE;
 	else if (em->flags & EXTENT_FLAG_PREALLOC)
 		ret = RANGE_BOUNDARY_PREALLOC_EXTENT;
@@ -2912,7 +2911,7 @@ static int btrfs_zero_range(struct inode *inode,
 		ASSERT(IS_ALIGNED(alloc_start, sectorsize));
 		len = offset + len - alloc_start;
 		offset = alloc_start;
-		alloc_hint = em->block_start + em->len;
+		alloc_hint = extent_map_block_start(em) + em->len;
 	}
 	free_extent_map(em);
 
@@ -2930,7 +2929,7 @@ static int btrfs_zero_range(struct inode *inode,
 							   mode);
 			goto out;
 		}
-		if (len < sectorsize && em->block_start != EXTENT_MAP_HOLE) {
+		if (len < sectorsize && em->disk_bytenr != EXTENT_MAP_HOLE) {
 			free_extent_map(em);
 			ret = btrfs_truncate_block(BTRFS_I(inode), offset, len,
 						   0);
@@ -3143,7 +3142,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 		last_byte = min(extent_map_end(em), alloc_end);
 		actual_end = min_t(u64, extent_map_end(em), offset + len);
 		last_byte = ALIGN(last_byte, blocksize);
-		if (em->block_start == EXTENT_MAP_HOLE ||
+		if (em->disk_bytenr == EXTENT_MAP_HOLE ||
 		    (cur_offset >= inode->i_size &&
 		     !(em->flags & EXTENT_FLAG_PREALLOC))) {
 			const u64 range_len = last_byte - cur_offset;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7dbc0c163316..cbc40d291d76 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -138,7 +138,7 @@ static noinline int run_delalloc_cow(struct btrfs_inode *inode,
 				     u64 end, struct writeback_control *wbc,
 				     bool pages_dirty);
 static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
-				       u64 len, u64 block_start,
+				       u64 len,
 				       u64 disk_num_bytes,
 				       u64 ram_bytes, int compress_type,
 				       int type, u64 disk_bytenr, u64 offset);
@@ -1160,7 +1160,6 @@ static void submit_one_async_extent(struct async_chunk *async_chunk,
 	/* Here we're doing allocation and writeback of the compressed pages */
 	em = create_io_em(inode, start,
 			  async_extent->ram_size,	/* len */
-			  ins.objectid,			/* block_start */
 			  ins.offset,			/* orig_block_len */
 			  async_extent->ram_size,	/* ram_bytes */
 			  async_extent->compress_type,
@@ -1238,15 +1237,15 @@ static u64 get_extent_allocation_hint(struct btrfs_inode *inode, u64 start,
 		 * first block in this inode and use that as a hint.  If that
 		 * block is also bogus then just don't worry about it.
 		 */
-		if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
+		if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
 			free_extent_map(em);
 			em = search_extent_mapping(em_tree, 0, 0);
-			if (em && em->block_start < EXTENT_MAP_LAST_BYTE)
-				alloc_hint = em->block_start;
+			if (em && em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
+				alloc_hint = extent_map_block_start(em);
 			if (em)
 				free_extent_map(em);
 		} else {
-			alloc_hint = em->block_start;
+			alloc_hint = extent_map_block_start(em);
 			free_extent_map(em);
 		}
 	}
@@ -1422,7 +1421,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 
 		ram_size = ins.offset;
 		em = create_io_em(inode, start, ins.offset, /* len */
-				  ins.objectid, /* block_start */
 				  ins.offset, /* orig_block_len */
 				  ram_size, /* ram_bytes */
 				  BTRFS_COMPRESS_NONE, /* compress_type */
@@ -2163,7 +2161,6 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 			struct extent_map *em;
 
 			em = create_io_em(inode, cur_offset, nocow_args.num_bytes,
-					  nocow_args.block_start, /* block_start */
 					  nocow_args.orig_disk_num_bytes, /* orig_block_len */
 					  ram_bytes, BTRFS_COMPRESS_NONE,
 					  BTRFS_ORDERED_PREALLOC,
@@ -2668,7 +2665,7 @@ static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode,
 		if (IS_ERR(em))
 			return PTR_ERR(em);
 
-		if (em->block_start != EXTENT_MAP_HOLE)
+		if (extent_map_block_start(em) != EXTENT_MAP_HOLE)
 			goto next;
 
 		em_len = em->len;
@@ -4997,7 +4994,6 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
 			hole_em->start = cur_offset;
 			hole_em->len = hole_size;
 
-			hole_em->block_start = EXTENT_MAP_HOLE;
 			hole_em->disk_bytenr = EXTENT_MAP_HOLE;
 			hole_em->disk_num_bytes = 0;
 			hole_em->ram_bytes = hole_size;
@@ -6847,7 +6843,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 	if (em) {
 		if (em->start > start || em->start + em->len <= start)
 			free_extent_map(em);
-		else if (em->block_start == EXTENT_MAP_INLINE && page)
+		else if (em->disk_bytenr == EXTENT_MAP_INLINE && page)
 			free_extent_map(em);
 		else
 			goto out;
@@ -6950,7 +6946,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 		/* New extent overlaps with existing one */
 		em->start = start;
 		em->len = found_key.offset - start;
-		em->block_start = EXTENT_MAP_HOLE;
+		em->disk_bytenr = EXTENT_MAP_HOLE;
 		goto insert;
 	}
 
@@ -6974,7 +6970,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 		 *
 		 * Other members are not utilized for inline extents.
 		 */
-		ASSERT(em->block_start == EXTENT_MAP_INLINE);
+		ASSERT(em->disk_bytenr == EXTENT_MAP_INLINE);
 		ASSERT(em->len == fs_info->sectorsize);
 
 		ret = read_inline_extent(inode, path, page);
@@ -6985,7 +6981,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 not_found:
 	em->start = start;
 	em->len = len;
-	em->block_start = EXTENT_MAP_HOLE;
+	em->disk_bytenr = EXTENT_MAP_HOLE;
 insert:
 	ret = 0;
 	btrfs_release_path(path);
@@ -7016,7 +7012,6 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 						  struct btrfs_dio_data *dio_data,
 						  const u64 start,
 						  const u64 len,
-						  const u64 block_start,
 						  const u64 orig_block_len,
 						  const u64 ram_bytes,
 						  const int type,
@@ -7027,7 +7022,7 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 	struct btrfs_ordered_extent *ordered;
 
 	if (type != BTRFS_ORDERED_NOCOW) {
-		em = create_io_em(inode, start, len, block_start,
+		em = create_io_em(inode, start, len,
 				  orig_block_len, ram_bytes,
 				  BTRFS_COMPRESS_NONE, /* compress_type */
 				  type, disk_bytenr, offset);
@@ -7035,7 +7030,7 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 			goto out;
 	}
 	ordered = btrfs_alloc_ordered_extent(inode, start, len, len,
-					     block_start, len, 0,
+					     disk_bytenr + offset, len, 0,
 					     (1 << type) |
 					     (1 << BTRFS_ORDERED_DIRECT),
 					     BTRFS_COMPRESS_NONE);
@@ -7080,7 +7075,7 @@ static struct extent_map *btrfs_new_extent_direct(struct btrfs_inode *inode,
 		return ERR_PTR(ret);
 
 	em = btrfs_create_dio_extent(inode, dio_data, start, ins.offset,
-				     ins.objectid, ins.offset,
+				     ins.offset,
 				     ins.offset, BTRFS_ORDERED_REGULAR,
 				     ins.objectid, 0);
 	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
@@ -7322,7 +7317,7 @@ static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
 
 /* The callers of this must take lock_extent() */
 static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
-				       u64 len, u64 block_start,
+				       u64 len,
 				       u64 disk_num_bytes,
 				       u64 ram_bytes, int compress_type,
 				       int type, u64 disk_bytenr, u64 offset)
@@ -7373,7 +7368,6 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 
 	em->start = start;
 	em->len = len;
-	em->block_start = block_start;
 	em->disk_bytenr = disk_bytenr;
 	em->disk_num_bytes = disk_num_bytes;
 	em->ram_bytes = ram_bytes;
@@ -7425,13 +7419,13 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	 */
 	if ((em->flags & EXTENT_FLAG_PREALLOC) ||
 	    ((BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW) &&
-	     em->block_start != EXTENT_MAP_HOLE)) {
+	     em->disk_bytenr != EXTENT_MAP_HOLE)) {
 		if (em->flags & EXTENT_FLAG_PREALLOC)
 			type = BTRFS_ORDERED_PREALLOC;
 		else
 			type = BTRFS_ORDERED_NOCOW;
 		len = min(len, em->len - (start - em->start));
-		block_start = em->block_start + (start - em->start);
+		block_start = extent_map_block_start(em) + (start - em->start);
 
 		if (can_nocow_extent(inode, start, &len,
 				     &orig_block_len, &ram_bytes, false, false,
@@ -7461,7 +7455,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 		space_reserved = true;
 
 		em2 = btrfs_create_dio_extent(BTRFS_I(inode), dio_data, start, len,
-					      block_start,
 					      orig_block_len,
 					      ram_bytes, type,
 					      disk_bytenr, new_offset);
@@ -7664,7 +7657,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	 * the generic code.
 	 */
 	if (extent_map_is_compressed(em) ||
-	    em->block_start == EXTENT_MAP_INLINE) {
+	    em->disk_bytenr == EXTENT_MAP_INLINE) {
 		free_extent_map(em);
 		/*
 		 * If we are in a NOWAIT context, return -EAGAIN in order to
@@ -7758,12 +7751,12 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	 * We trim the extents (and move the addr) even though iomap code does
 	 * that, since we have locked only the parts we are performing I/O in.
 	 */
-	if ((em->block_start == EXTENT_MAP_HOLE) ||
+	if ((em->disk_bytenr == EXTENT_MAP_HOLE) ||
 	    ((em->flags & EXTENT_FLAG_PREALLOC) && !write)) {
 		iomap->addr = IOMAP_NULL_ADDR;
 		iomap->type = IOMAP_HOLE;
 	} else {
-		iomap->addr = em->block_start + (start - em->start);
+		iomap->addr = extent_map_block_start(em) + (start - em->start);
 		iomap->type = IOMAP_MAPPED;
 	}
 	iomap->offset = start;
@@ -9781,7 +9774,6 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 
 		em->start = cur_offset;
 		em->len = ins.offset;
-		em->block_start = ins.objectid;
 		em->disk_bytenr = ins.objectid;
 		em->offset = 0;
 		em->disk_num_bytes = ins.offset;
@@ -10247,7 +10239,7 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 		goto out_unlock_extent;
 	}
 
-	if (em->block_start == EXTENT_MAP_INLINE) {
+	if (em->disk_bytenr == EXTENT_MAP_INLINE) {
 		u64 extent_start = em->start;
 
 		/*
@@ -10268,14 +10260,14 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 	 */
 	encoded->len = min_t(u64, extent_map_end(em),
 			     inode->vfs_inode.i_size) - iocb->ki_pos;
-	if (em->block_start == EXTENT_MAP_HOLE ||
+	if (em->disk_bytenr == EXTENT_MAP_HOLE ||
 	    (em->flags & EXTENT_FLAG_PREALLOC)) {
 		disk_bytenr = EXTENT_MAP_HOLE;
 		count = min_t(u64, count, encoded->len);
 		encoded->len = count;
 		encoded->unencoded_len = count;
 	} else if (extent_map_is_compressed(em)) {
-		disk_bytenr = em->block_start;
+		disk_bytenr = em->disk_bytenr;
 		/*
 		 * Bail if the buffer isn't large enough to return the whole
 		 * compressed extent.
@@ -10294,7 +10286,7 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 			goto out_em;
 		encoded->compression = ret;
 	} else {
-		disk_bytenr = em->block_start + (start - em->start);
+		disk_bytenr = extent_map_block_start(em) + (start - em->start);
 		if (encoded->len > count)
 			encoded->len = count;
 		/*
@@ -10523,7 +10515,6 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	extent_reserved = true;
 
 	em = create_io_em(inode, start, num_bytes,
-			  ins.objectid,
 			  ins.offset, ram_bytes, compression,
 			  BTRFS_ORDERED_COMPRESSED, ins.objectid,
 			  encoded->unencoded_offset);
@@ -10828,12 +10819,12 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 			goto out;
 		}
 
-		if (em->block_start == EXTENT_MAP_HOLE) {
+		if (em->disk_bytenr == EXTENT_MAP_HOLE) {
 			btrfs_warn(fs_info, "swapfile must not have holes");
 			ret = -EINVAL;
 			goto out;
 		}
-		if (em->block_start == EXTENT_MAP_INLINE) {
+		if (em->disk_bytenr == EXTENT_MAP_INLINE) {
 			/*
 			 * It's unlikely we'll ever actually find ourselves
 			 * here, as a file small enough to fit inline won't be
@@ -10851,7 +10842,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 			goto out;
 		}
 
-		logical_block_start = em->block_start + (start - em->start);
+		logical_block_start = extent_map_block_start(em) + (start - em->start);
 		len = min(len, em->len - (start - em->start));
 		free_extent_map(em);
 		em = NULL;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 95a8588dcf8e..e82f44a465c3 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2956,7 +2956,6 @@ static noinline_for_stack int setup_relocation_extent_mapping(struct inode *inod
 
 	em->start = start;
 	em->len = end + 1 - start;
-	em->block_start = block_start;
 	em->disk_bytenr = block_start;
 	em->disk_num_bytes = em->len;
 	em->ram_bytes = em->len;
diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
index dc14907c65f9..3cefadf370e7 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -25,8 +25,8 @@ static void free_extent_map_tree(struct extent_map_tree *em_tree)
 #ifdef CONFIG_BTRFS_DEBUG
 		if (refcount_read(&em->refs) != 1) {
 			test_err(
-"em leak: em (start %llu len %llu block_start %llu disk_num_bytes %llu offset %llu) refs %d",
-				 em->start, em->len, em->block_start,
+"em leak: em (start %llu len %llu disk_bytenr %llu disk_num_bytes %llu offset %llu) refs %d",
+				 em->start, em->len, em->disk_bytenr,
 				 em->disk_num_bytes, em->offset,
 				 refcount_read(&em->refs));
 
@@ -71,7 +71,6 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 	/* Add [0, 16K) */
 	em->start = 0;
 	em->len = SZ_16K;
-	em->block_start = 0;
 	em->disk_bytenr = 0;
 	em->disk_num_bytes = SZ_16K;
 	em->ram_bytes = SZ_16K;
@@ -94,7 +93,6 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 
 	em->start = SZ_16K;
 	em->len = SZ_4K;
-	em->block_start = SZ_32K; /* avoid merging */
 	em->disk_bytenr = SZ_32K; /* avoid merging */
 	em->disk_num_bytes = SZ_4K;
 	em->ram_bytes = SZ_4K;
@@ -117,7 +115,6 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 	/* Add [0, 8K), should return [0, 16K) instead. */
 	em->start = start;
 	em->len = len;
-	em->block_start = start;
 	em->disk_bytenr = start;
 	em->disk_num_bytes = len;
 	em->ram_bytes = len;
@@ -135,11 +132,11 @@ static int test_case_1(struct btrfs_fs_info *fs_info,
 		goto out;
 	}
 	if (em->start != 0 || extent_map_end(em) != SZ_16K ||
-	    em->block_start != 0 || em->disk_num_bytes != SZ_16K) {
+	    em->disk_bytenr != 0 || em->disk_num_bytes != SZ_16K) {
 		test_err(
-"case1 [%llu %llu]: ret %d return a wrong em (start %llu len %llu block_start %llu disk_num_bytes %llu",
+"case1 [%llu %llu]: ret %d return a wrong em (start %llu len %llu disk_bytenr %llu disk_num_bytes %llu",
 			 start, start + len, ret, em->start, em->len,
-			 em->block_start, em->disk_num_bytes);
+			 em->disk_bytenr, em->disk_num_bytes);
 		ret = -EINVAL;
 	}
 	free_extent_map(em);
@@ -170,7 +167,6 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 	/* Add [0, 1K) */
 	em->start = 0;
 	em->len = SZ_1K;
-	em->block_start = EXTENT_MAP_INLINE;
 	em->disk_bytenr = EXTENT_MAP_INLINE;
 	em->disk_num_bytes = 0;
 	em->ram_bytes = SZ_1K;
@@ -193,7 +189,6 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 
 	em->start = SZ_4K;
 	em->len = SZ_4K;
-	em->block_start = SZ_4K;
 	em->disk_bytenr = SZ_4K;
 	em->disk_num_bytes = SZ_4K;
 	em->ram_bytes = SZ_4K;
@@ -216,7 +211,6 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 	/* Add [0, 1K) */
 	em->start = 0;
 	em->len = SZ_1K;
-	em->block_start = EXTENT_MAP_INLINE;
 	em->disk_bytenr = EXTENT_MAP_INLINE;
 	em->disk_num_bytes = 0;
 	em->ram_bytes = SZ_1K;
@@ -233,10 +227,10 @@ static int test_case_2(struct btrfs_fs_info *fs_info,
 		goto out;
 	}
 	if (em->start != 0 || extent_map_end(em) != SZ_1K ||
-	    em->block_start != EXTENT_MAP_INLINE) {
+	    em->disk_bytenr != EXTENT_MAP_INLINE) {
 		test_err(
-"case2 [0 1K]: ret %d return a wrong em (start %llu len %llu block_start %llu",
-			 ret, em->start, em->len, em->block_start);
+"case2 [0 1K]: ret %d return a wrong em (start %llu len %llu disk_bytenr %llu",
+			 ret, em->start, em->len, em->disk_bytenr);
 		ret = -EINVAL;
 	}
 	free_extent_map(em);
@@ -262,7 +256,6 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
 	/* Add [4K, 8K) */
 	em->start = SZ_4K;
 	em->len = SZ_4K;
-	em->block_start = SZ_4K;
 	em->disk_bytenr = SZ_4K;
 	em->disk_num_bytes = SZ_4K;
 	em->ram_bytes = SZ_4K;
@@ -285,7 +278,6 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
 	/* Add [0, 16K) */
 	em->start = 0;
 	em->len = SZ_16K;
-	em->block_start = 0;
 	em->disk_bytenr = 0;
 	em->disk_num_bytes = SZ_16K;
 	em->ram_bytes = SZ_16K;
@@ -308,11 +300,11 @@ static int __test_case_3(struct btrfs_fs_info *fs_info,
 	 * em->start.
 	 */
 	if (start < em->start || start + len > extent_map_end(em) ||
-	    em->start != em->block_start) {
+	    em->start != extent_map_block_start(em)) {
 		test_err(
-"case3 [%llu %llu): ret %d em (start %llu len %llu block_start %llu block_len %llu)",
+"case3 [%llu %llu): ret %d em (start %llu len %llu disk_bytenr %llu block_len %llu)",
 			 start, start + len, ret, em->start, em->len,
-			 em->block_start, em->disk_num_bytes);
+			 em->disk_bytenr, em->disk_num_bytes);
 		ret = -EINVAL;
 	}
 	free_extent_map(em);
@@ -370,7 +362,6 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	/* Add [0K, 8K) */
 	em->start = 0;
 	em->len = SZ_8K;
-	em->block_start = 0;
 	em->disk_bytenr = 0;
 	em->disk_num_bytes = SZ_8K;
 	em->ram_bytes = SZ_8K;
@@ -393,7 +384,6 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	/* Add [8K, 32K) */
 	em->start = SZ_8K;
 	em->len = 24 * SZ_1K;
-	em->block_start = SZ_16K; /* avoid merging */
 	em->disk_bytenr = SZ_16K; /* avoid merging */
 	em->disk_num_bytes = 24 * SZ_1K;
 	em->ram_bytes = 24 * SZ_1K;
@@ -415,7 +405,6 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	/* Add [0K, 32K) */
 	em->start = 0;
 	em->len = SZ_32K;
-	em->block_start = 0;
 	em->disk_bytenr = 0;
 	em->disk_num_bytes = SZ_32K;
 	em->ram_bytes = SZ_32K;
@@ -435,9 +424,9 @@ static int __test_case_4(struct btrfs_fs_info *fs_info,
 	}
 	if (start < em->start || start + len > extent_map_end(em)) {
 		test_err(
-"case4 [%llu %llu): ret %d, added wrong em (start %llu len %llu block_start %llu disk_num_bytes %llu)",
-			 start, start + len, ret, em->start, em->len, em->block_start,
-			 em->disk_num_bytes);
+"case4 [%llu %llu): ret %d, added wrong em (start %llu len %llu disk_bytenr %llu disk_num_bytes %llu)",
+			 start, start + len, ret, em->start, em->len,
+			 em->disk_bytenr, em->disk_num_bytes);
 		ret = -EINVAL;
 	}
 	free_extent_map(em);
@@ -500,7 +489,6 @@ static int add_compressed_extent(struct btrfs_fs_info *fs_info,
 
 	em->start = start;
 	em->len = len;
-	em->block_start = block_start;
 	em->disk_bytenr = block_start;
 	em->disk_num_bytes = SZ_4K;
 	em->ram_bytes = len;
@@ -726,7 +714,6 @@ static int test_case_6(struct btrfs_fs_info *fs_info, struct extent_map_tree *em
 
 	em->start = SZ_4K;
 	em->len = SZ_4K;
-	em->block_start = SZ_16K;
 	em->disk_bytenr = SZ_16K;
 	em->disk_num_bytes = SZ_16K;
 	em->ram_bytes = SZ_16K;
@@ -787,7 +774,6 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
 	/* [0, 16K), pinned */
 	em->start = 0;
 	em->len = SZ_16K;
-	em->block_start = 0;
 	em->disk_bytenr = 0;
 	em->disk_num_bytes = SZ_4K;
 	em->ram_bytes = SZ_16K;
@@ -811,7 +797,6 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
 	/* [32K, 48K), not pinned */
 	em->start = SZ_32K;
 	em->len = SZ_16K;
-	em->block_start = SZ_32K;
 	em->disk_bytenr = SZ_32K;
 	em->disk_num_bytes = SZ_16K;
 	em->ram_bytes = SZ_16K;
@@ -876,8 +861,9 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
 		goto out;
 	}
 
-	if (em->block_start != SZ_32K + SZ_4K) {
-		test_err("em->block_start is %llu, expected 36K", em->block_start);
+	if (extent_map_block_start(em) != SZ_32K + SZ_4K) {
+		test_err("em->block_start is %llu, expected 36K",
+				extent_map_block_start(em));
 		goto out;
 	}
 
diff --git a/fs/btrfs/tests/inode-tests.c b/fs/btrfs/tests/inode-tests.c
index 1b8c39edfc18..d6fd1978934a 100644
--- a/fs/btrfs/tests/inode-tests.c
+++ b/fs/btrfs/tests/inode-tests.c
@@ -264,8 +264,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start != EXTENT_MAP_HOLE) {
-		test_err("expected a hole, got %llu", em->block_start);
+	if (em->disk_bytenr != EXTENT_MAP_HOLE) {
+		test_err("expected a hole, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	free_extent_map(em);
@@ -283,8 +283,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start != EXTENT_MAP_INLINE) {
-		test_err("expected an inline, got %llu", em->block_start);
+	if (em->disk_bytenr != EXTENT_MAP_INLINE) {
+		test_err("expected an inline, got %llu", em->disk_bytenr);
 		goto out;
 	}
 
@@ -321,8 +321,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start != EXTENT_MAP_HOLE) {
-		test_err("expected a hole, got %llu", em->block_start);
+	if (em->disk_bytenr != EXTENT_MAP_HOLE) {
+		test_err("expected a hole, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != 4) {
@@ -344,8 +344,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != sectorsize - 1) {
@@ -371,8 +371,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != sectorsize) {
@@ -389,7 +389,7 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
-	disk_bytenr = em->block_start;
+	disk_bytenr = extent_map_block_start(em);
 	orig_start = em->start;
 	offset = em->start + em->len;
 	free_extent_map(em);
@@ -399,8 +399,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start != EXTENT_MAP_HOLE) {
-		test_err("expected a hole, got %llu", em->block_start);
+	if (em->disk_bytenr != EXTENT_MAP_HOLE) {
+		test_err("expected a hole, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != sectorsize) {
@@ -421,8 +421,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != 2 * sectorsize) {
@@ -441,9 +441,9 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		goto out;
 	}
 	disk_bytenr += (em->start - orig_start);
-	if (em->block_start != disk_bytenr) {
+	if (extent_map_block_start(em) != disk_bytenr) {
 		test_err("wrong block start, want %llu, have %llu",
-			 disk_bytenr, em->block_start);
+			 disk_bytenr, extent_map_block_start(em));
 		goto out;
 	}
 	offset = em->start + em->len;
@@ -455,8 +455,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != sectorsize) {
@@ -483,8 +483,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != sectorsize) {
@@ -502,7 +502,7 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("wrong offset, want 0, have %llu", em->offset);
 		goto out;
 	}
-	disk_bytenr = em->block_start;
+	disk_bytenr = extent_map_block_start(em);
 	orig_start = em->start;
 	offset = em->start + em->len;
 	free_extent_map(em);
@@ -512,8 +512,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_HOLE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_HOLE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != sectorsize) {
@@ -531,9 +531,9 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 			 em->start - orig_start, em->offset);
 		goto out;
 	}
-	if (em->block_start != disk_bytenr + em->offset) {
+	if (extent_map_block_start(em) != disk_bytenr + em->offset) {
 		test_err("unexpected block start, wanted %llu, have %llu",
-			 disk_bytenr + em->offset, em->block_start);
+			 disk_bytenr + em->offset, extent_map_block_start(em));
 		goto out;
 	}
 	offset = em->start + em->len;
@@ -544,8 +544,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != 2 * sectorsize) {
@@ -564,9 +564,9 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 			 em->start - orig_start, em->offset);
 		goto out;
 	}
-	if (em->block_start != disk_bytenr + em->offset) {
+	if (extent_map_block_start(em) != disk_bytenr + em->offset) {
 		test_err("unexpected block start, wanted %llu, have %llu",
-			 disk_bytenr + em->offset, em->block_start);
+			 disk_bytenr + em->offset, extent_map_block_start(em));
 		goto out;
 	}
 	offset = em->start + em->len;
@@ -578,8 +578,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != 2 * sectorsize) {
@@ -611,8 +611,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != sectorsize) {
@@ -635,7 +635,7 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 			 BTRFS_COMPRESS_ZLIB, extent_map_compression(em));
 		goto out;
 	}
-	disk_bytenr = em->block_start;
+	disk_bytenr = extent_map_block_start(em);
 	orig_start = em->start;
 	offset = em->start + em->len;
 	free_extent_map(em);
@@ -645,8 +645,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != sectorsize) {
@@ -671,9 +671,9 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start != disk_bytenr) {
+	if (extent_map_block_start(em) != disk_bytenr) {
 		test_err("block start does not match, want %llu got %llu",
-			 disk_bytenr, em->block_start);
+			 disk_bytenr, extent_map_block_start(em));
 		goto out;
 	}
 	if (em->start != offset || em->len != 2 * sectorsize) {
@@ -706,8 +706,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != sectorsize) {
@@ -732,8 +732,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start != EXTENT_MAP_HOLE) {
-		test_err("expected a hole extent, got %llu", em->block_start);
+	if (em->disk_bytenr != EXTENT_MAP_HOLE) {
+		test_err("expected a hole extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	/*
@@ -764,8 +764,8 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (em->disk_bytenr >= EXTENT_MAP_LAST_BYTE) {
+		test_err("expected a real extent, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != offset || em->len != sectorsize) {
@@ -843,8 +843,8 @@ static int test_hole_first(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start != EXTENT_MAP_HOLE) {
-		test_err("expected a hole, got %llu", em->block_start);
+	if (em->disk_bytenr != EXTENT_MAP_HOLE) {
+		test_err("expected a hole, got %llu", em->disk_bytenr);
 		goto out;
 	}
 	if (em->start != 0 || em->len != sectorsize) {
@@ -865,8 +865,9 @@ static int test_hole_first(u32 sectorsize, u32 nodesize)
 		test_err("got an error when we shouldn't have");
 		goto out;
 	}
-	if (em->block_start != sectorsize) {
-		test_err("expected a real extent, got %llu", em->block_start);
+	if (extent_map_block_start(em) != sectorsize) {
+		test_err("expected a real extent, got %llu",
+			 extent_map_block_start(em));
 		goto out;
 	}
 	if (em->start != sectorsize || em->len != sectorsize) {
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 5ca7f2623b56..62211d83c586 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4572,6 +4572,7 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
 {
 	struct btrfs_ordered_extent *ordered;
 	struct btrfs_root *csum_root;
+	u64 block_start;
 	u64 csum_offset;
 	u64 csum_len;
 	u64 mod_start = em->start;
@@ -4581,7 +4582,7 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
 
 	if (inode->flags & BTRFS_INODE_NODATASUM ||
 	    (em->flags & EXTENT_FLAG_PREALLOC) ||
-	    em->block_start == EXTENT_MAP_HOLE)
+	    em->disk_bytenr == EXTENT_MAP_HOLE)
 		return 0;
 
 	list_for_each_entry(ordered, &ctx->ordered_extents, log_list) {
@@ -4652,10 +4653,11 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
 	}
 
 	/* block start is already adjusted for the file extent offset. */
-	csum_root = btrfs_csum_root(trans->fs_info, em->block_start);
-	ret = btrfs_lookup_csums_list(csum_root, em->block_start + csum_offset,
-				      em->block_start + csum_offset +
-				      csum_len - 1, &ordered_sums, 0, false);
+	block_start = extent_map_block_start(em);
+	csum_root = btrfs_csum_root(trans->fs_info, block_start);
+	ret = btrfs_lookup_csums_list(csum_root, block_start + csum_offset,
+				      block_start + csum_offset + csum_len - 1,
+				      &ordered_sums, 0, false);
 	if (ret)
 		return ret;
 
@@ -4685,6 +4687,7 @@ static int log_one_extent(struct btrfs_trans_handle *trans,
 	struct btrfs_key key;
 	enum btrfs_compression_type compress_type;
 	u64 extent_offset = em->offset;
+	u64 block_start = extent_map_block_start(em);
 	u64 block_len;
 	int ret;
 
@@ -4697,10 +4700,10 @@ static int log_one_extent(struct btrfs_trans_handle *trans,
 	block_len = em->disk_num_bytes;
 	compress_type = extent_map_compression(em);
 	if (compress_type != BTRFS_COMPRESS_NONE) {
-		btrfs_set_stack_file_extent_disk_bytenr(&fi, em->block_start);
+		btrfs_set_stack_file_extent_disk_bytenr(&fi, block_start);
 		btrfs_set_stack_file_extent_disk_num_bytes(&fi, block_len);
-	} else if (em->block_start < EXTENT_MAP_LAST_BYTE) {
-		btrfs_set_stack_file_extent_disk_bytenr(&fi, em->block_start -
+	} else if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE) {
+		btrfs_set_stack_file_extent_disk_bytenr(&fi, block_start -
 							extent_offset);
 		btrfs_set_stack_file_extent_disk_num_bytes(&fi, block_len);
 	}
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 4cba80b34387..d84720054229 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1769,7 +1769,9 @@ static void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered,
 	write_lock(&em_tree->lock);
 	em = search_extent_mapping(em_tree, ordered->file_offset,
 				   ordered->num_bytes);
-	em->block_start = logical;
+	/* The em should be a new COW extent, thus it should not has offset. */
+	ASSERT(em->offset == 0);
+	em->disk_bytenr = logical;
 	free_extent_map(em);
 	write_unlock(&em_tree->lock);
 }
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 0d0775bde14c..7a84bc26f7e1 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -293,7 +293,6 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
 		__field(	u64,  ino		)
 		__field(	u64,  start		)
 		__field(	u64,  len		)
-		__field(	u64,  block_start	)
 		__field(	u32,  flags		)
 		__field(	int,  refs		)
 	),
@@ -303,18 +302,16 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
 		__entry->ino		= btrfs_ino(inode);
 		__entry->start		= map->start;
 		__entry->len		= map->len;
-		__entry->block_start	= map->block_start;
 		__entry->flags		= map->flags;
 		__entry->refs		= refcount_read(&map->refs);
 	),
 
 	TP_printk_btrfs("root=%llu(%s) ino=%llu start=%llu len=%llu "
-		  "block_start=%llu(%s) flags=%s refs=%u",
+		  "flags=%s refs=%u",
 		  show_root_type(__entry->root_objectid),
 		  __entry->ino,
 		  __entry->start,
 		  __entry->len,
-		  show_map_type(__entry->block_start),
 		  show_map_flags(__entry->flags),
 		  __entry->refs)
 );
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 8/8] btrfs: reorder disk_bytenr/disk_num_bytes/ram_bytes/offset parameters
  2024-04-08 22:33 [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start Qu Wenruo
                   ` (6 preceding siblings ...)
  2024-04-08 22:33 ` [PATCH RFC 7/8] btrfs: remove extent_map::block_start member Qu Wenruo
@ 2024-04-08 22:33 ` Qu Wenruo
  2024-04-09 14:57 ` [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start David Sterba
  8 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2024-04-08 22:33 UTC (permalink / raw)
  To: linux-btrfs

Since we have cleaned up the old
block_start/block_len/orig_start/orig_block_len members, we can re-order
above parameters to a more common order.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/btrfs_inode.h |  6 +--
 fs/btrfs/file.c        |  2 +-
 fs/btrfs/inode.c       | 95 ++++++++++++++++++++++++------------------
 3 files changed, 58 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index f4514ee273ce..696e095e7111 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -443,9 +443,9 @@ int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
 bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 			u32 bio_offset, struct bio_vec *bv);
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
-			      u64 *orig_block_len,
-			      u64 *ram_bytes, bool nowait, bool strict,
-			      u64 *disk_bytenr_ret, u64 *extent_offset_ret);
+			      u64 *disk_bytenr, u64 *disk_num_bytes,
+			      u64 *ram_bytes, u64 *new_offset_ret,
+			      bool nowait, bool strict);
 
 void btrfs_del_delalloc_inode(struct btrfs_inode *inode);
 struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 965d2ba34aeb..b3ba2d4f2b8b 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1094,7 +1094,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 						   &cached_state);
 	}
 	ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
-			NULL, NULL, nowait, false, NULL, NULL);
+			NULL, NULL, NULL, NULL, nowait, false);
 	if (ret <= 0)
 		btrfs_drew_write_unlock(&root->snapshot_lock);
 	else
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cbc40d291d76..0d9d743719e9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -139,9 +139,10 @@ static noinline int run_delalloc_cow(struct btrfs_inode *inode,
 				     bool pages_dirty);
 static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 				       u64 len,
+				       u64 disk_bytenr,
 				       u64 disk_num_bytes,
-				       u64 ram_bytes, int compress_type,
-				       int type, u64 disk_bytenr, u64 offset);
+				       u64 ram_bytes, u64 offset, int compress_type,
+				       int type);
 
 static int data_reloc_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
 					  u64 root, void *warn_ctx)
@@ -1160,11 +1161,12 @@ static void submit_one_async_extent(struct async_chunk *async_chunk,
 	/* Here we're doing allocation and writeback of the compressed pages */
 	em = create_io_em(inode, start,
 			  async_extent->ram_size,	/* len */
-			  ins.offset,			/* orig_block_len */
+			  ins.objectid,			/* disk_bytenr */
+			  ins.offset,			/* disk_num_bytes */
 			  async_extent->ram_size,	/* ram_bytes */
+			  0,				/* offset */
 			  async_extent->compress_type,
-			  BTRFS_ORDERED_COMPRESSED,
-			  ins.objectid, 0);
+			  BTRFS_ORDERED_COMPRESSED);
 	if (IS_ERR(em)) {
 		ret = PTR_ERR(em);
 		goto out_free_reserve;
@@ -1421,11 +1423,12 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 
 		ram_size = ins.offset;
 		em = create_io_em(inode, start, ins.offset, /* len */
+				  ins.objectid, /* disk_bytenr */
 				  ins.offset, /* orig_block_len */
 				  ram_size, /* ram_bytes */
+				  0, /* offset */
 				  BTRFS_COMPRESS_NONE, /* compress_type */
-				  BTRFS_ORDERED_REGULAR /* type */,
-				  ins.objectid, 0);
+				  BTRFS_ORDERED_REGULAR /* type */);
 		if (IS_ERR(em)) {
 			ret = PTR_ERR(em);
 			goto out_reserve;
@@ -2161,12 +2164,13 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 			struct extent_map *em;
 
 			em = create_io_em(inode, cur_offset, nocow_args.num_bytes,
-					  nocow_args.orig_disk_num_bytes, /* orig_block_len */
-					  ram_bytes, BTRFS_COMPRESS_NONE,
-					  BTRFS_ORDERED_PREALLOC,
-					  nocow_args.orig_disk_bytenr,
+					  nocow_args.orig_disk_bytenr, /* disk_bytenr */
+					  nocow_args.orig_disk_num_bytes, /* disk_num_bytes */
+					  ram_bytes,
 					  cur_offset - found_key.offset +
-					  nocow_args.orig_offset);
+					  nocow_args.orig_offset, /* offset */
+					  BTRFS_COMPRESS_NONE,
+					  BTRFS_ORDERED_PREALLOC);
 			if (IS_ERR(em)) {
 				btrfs_dec_nocow_writers(nocow_bg);
 				ret = PTR_ERR(em);
@@ -7012,20 +7016,20 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 						  struct btrfs_dio_data *dio_data,
 						  const u64 start,
 						  const u64 len,
-						  const u64 orig_block_len,
-						  const u64 ram_bytes,
-						  const int type,
 						  const u64 disk_bytenr,
-						  const u64 offset)
+						  const u64 disk_num_bytes,
+						  const u64 ram_bytes,
+						  const u64 offset,
+						  const int type)
 {
 	struct extent_map *em = NULL;
 	struct btrfs_ordered_extent *ordered;
 
 	if (type != BTRFS_ORDERED_NOCOW) {
 		em = create_io_em(inode, start, len,
-				  orig_block_len, ram_bytes,
+				  disk_bytenr, disk_num_bytes, ram_bytes, offset,
 				  BTRFS_COMPRESS_NONE, /* compress_type */
-				  type, disk_bytenr, offset);
+				  type);
 		if (IS_ERR(em))
 			goto out;
 	}
@@ -7074,10 +7078,13 @@ static struct extent_map *btrfs_new_extent_direct(struct btrfs_inode *inode,
 	if (ret)
 		return ERR_PTR(ret);
 
-	em = btrfs_create_dio_extent(inode, dio_data, start, ins.offset,
-				     ins.offset,
-				     ins.offset, BTRFS_ORDERED_REGULAR,
-				     ins.objectid, 0);
+	em = btrfs_create_dio_extent(inode, dio_data,
+				     start, ins.offset,
+				     ins.objectid, /* disk_bytenr */
+				     ins.offset, /* disk_num_bytes */
+				     ins.offset, /* ram_bytes */
+				     0, /* offset */
+				     BTRFS_ORDERED_REGULAR);
 	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
 	if (IS_ERR(em))
 		btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset,
@@ -7120,9 +7127,9 @@ static bool btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr)
  *	 any ordered extents.
  */
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
-			      u64 *orig_block_len,
-			      u64 *ram_bytes, bool nowait, bool strict,
-			      u64 *disk_bytenr_ret, u64 *new_offset_ret)
+			      u64 *disk_bytenr, u64 *disk_num_bytes,
+			      u64 *ram_bytes, u64 *new_offset_ret,
+			      bool nowait, bool strict)
 {
 	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
 	struct can_nocow_file_extent_args nocow_args = { 0 };
@@ -7207,10 +7214,10 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 		}
 	}
 
-	if (orig_block_len)
-		*orig_block_len = nocow_args.orig_disk_num_bytes;
-	if (disk_bytenr_ret)
-		*disk_bytenr_ret = nocow_args.orig_disk_bytenr;
+	if (disk_bytenr)
+		*disk_bytenr = nocow_args.orig_disk_bytenr;
+	if (disk_num_bytes)
+		*disk_num_bytes = nocow_args.orig_disk_num_bytes;
 	if (new_offset_ret)
 		*new_offset_ret = offset - key.offset +
 				  nocow_args.orig_offset;
@@ -7318,9 +7325,12 @@ static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend,
 /* The callers of this must take lock_extent() */
 static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
 				       u64 len,
+				       u64 disk_bytenr,
 				       u64 disk_num_bytes,
-				       u64 ram_bytes, int compress_type,
-				       int type, u64 disk_bytenr, u64 offset)
+				       u64 ram_bytes,
+				       u64 offset,
+				       int compress_type,
+				       int type)
 {
 	struct extent_map *em;
 	int ret;
@@ -7428,8 +7438,9 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 		block_start = extent_map_block_start(em) + (start - em->start);
 
 		if (can_nocow_extent(inode, start, &len,
-				     &orig_block_len, &ram_bytes, false, false,
-				     &disk_bytenr, &new_offset) == 1) {
+				     &disk_bytenr, &orig_block_len,
+				     &ram_bytes, &new_offset,
+				     false, false) == 1) {
 			bg = btrfs_inc_nocow_writers(fs_info, block_start);
 			if (bg)
 				can_nocow = true;
@@ -7455,9 +7466,11 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 		space_reserved = true;
 
 		em2 = btrfs_create_dio_extent(BTRFS_I(inode), dio_data, start, len,
-					      orig_block_len,
-					      ram_bytes, type,
-					      disk_bytenr, new_offset);
+					      disk_bytenr, /* disk_bytenr. */
+					      orig_block_len, /* disk_num_bytes */
+					      ram_bytes, /* ram_bytes */
+					      new_offset, /* offset */
+					      type);
 		btrfs_dec_nocow_writers(bg);
 		if (type == BTRFS_ORDERED_PREALLOC) {
 			free_extent_map(em);
@@ -10515,9 +10528,9 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	extent_reserved = true;
 
 	em = create_io_em(inode, start, num_bytes,
-			  ins.offset, ram_bytes, compression,
-			  BTRFS_ORDERED_COMPRESSED, ins.objectid,
-			  encoded->unencoded_offset);
+			  ins.objectid, ins.offset, ram_bytes,
+			  encoded->unencoded_offset, compression,
+			  BTRFS_ORDERED_COMPRESSED);
 	if (IS_ERR(em)) {
 		ret = PTR_ERR(em);
 		goto out_free_reserved;
@@ -10847,8 +10860,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 		free_extent_map(em);
 		em = NULL;
 
-		ret = can_nocow_extent(inode, start, &len, NULL, NULL,
-				       false, true, NULL, NULL);
+		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL, NULL,
+				       false, true);
 		if (ret < 0) {
 			goto out;
 		} else if (ret) {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start
  2024-04-08 22:33 [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start Qu Wenruo
                   ` (7 preceding siblings ...)
  2024-04-08 22:33 ` [PATCH RFC 8/8] btrfs: reorder disk_bytenr/disk_num_bytes/ram_bytes/offset parameters Qu Wenruo
@ 2024-04-09 14:57 ` David Sterba
  2024-04-09 21:40   ` Qu Wenruo
  8 siblings, 1 reply; 22+ messages in thread
From: David Sterba @ 2024-04-09 14:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 09, 2024 at 08:03:39AM +0930, Qu Wenruo wrote:
> [REASON FOR RFC]
> Not all sanity checks are implemented, there is a missing check for
> ram_bytes on non-compressed extent.
> Because even without this series, generic/311 can generate a file extent
> with ram_bytes larger than disk_num_bytes.
> 
> This seems harmless, but I still want to fix it and implement a full
> version of the em sanity check.
> 
> [REPO]
> https://github.com/adam900710/linux/tree/em_cleanup
> 
> Which relies on previous changes on extent maps.
> 
> This series introduce two new members (disk_bytenr/offset) to
> extent_map, and removes three old members
> (block_start/block_len/offset), finally rename one member
> (orig_block_len -> disk_num_bytes).
> 
> This should save us one u64 for extent_map.
> 
> But to make things safe to migrate, I introduce extra sanity checks for
> extent_map, and do cross check for both old and new members.
> 
> The extra sanity checks already exposed one bug (thankfully harmless)
> causing em::block_start to be incorrect.
> 
> There is another bug related to bad btrfs_file_extent_item::ram_bytes,
> which can be larger than disk_num_bytes for non-compressed file extents.
> (Generated by generic/311 test case, but it seems to be created on-disk
>  first)
> 
> But so far, the patchset is fine for default fstests run.

I've only paged through the patches, besides the renames there are so
many changes where it's hard to spot a subtle bug, but overall it looks
ok.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 1/8] btrfs: rename extent_map::orig_block_len to disk_num_bytes
  2024-04-08 22:33 ` [PATCH RFC 1/8] btrfs: rename extent_map::orig_block_len to disk_num_bytes Qu Wenruo
@ 2024-04-09 14:58   ` David Sterba
  2024-04-09 21:38     ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: David Sterba @ 2024-04-09 14:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 09, 2024 at 08:03:40AM +0930, Qu Wenruo wrote:
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -2872,7 +2872,7 @@ static inline void btrfs_remove_log_ctx(struct btrfs_root *root,
>  	mutex_unlock(&root->log_mutex);
>  }
>  
> -/* 
> +/*

There is a whitespace change but please check your patches that they
don't contain such fixups unless it's in the modified code. Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 5/8] btrfs: remove extent_map::orig_start member
  2024-04-08 22:33 ` [PATCH RFC 5/8] btrfs: remove extent_map::orig_start member Qu Wenruo
@ 2024-04-09 14:59   ` David Sterba
  0 siblings, 0 replies; 22+ messages in thread
From: David Sterba @ 2024-04-09 14:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 09, 2024 at 08:03:44AM +0930, Qu Wenruo wrote:
> 
> @@ -877,7 +873,7 @@ DECLARE_EVENT_CLASS(btrfs_delayed_tree_ref,
>  	TP_STRUCT__entry_btrfs(
>  		__field(	u64,  bytenr		)
>  		__field(	u64,  num_bytes		)
> -		__field(	int,  action		) 
> +		__field(	int,  action		)

Also here and in the following hunks, trailing space, please drop that.

>  		__field(	u64,  parent		)
>  		__field(	u64,  ref_root		)
>  		__field(	int,  level		)
> @@ -940,7 +936,7 @@ DECLARE_EVENT_CLASS(btrfs_delayed_data_ref,
>  	TP_STRUCT__entry_btrfs(
>  		__field(	u64,  bytenr		)
>  		__field(	u64,  num_bytes		)
> -		__field(	int,  action		) 
> +		__field(	int,  action		)
>  		__field(	u64,  parent		)
>  		__field(	u64,  ref_root		)
>  		__field(	u64,  owner		)
> @@ -1006,7 +1002,7 @@ DECLARE_EVENT_CLASS(btrfs_delayed_ref_head,
>  	TP_STRUCT__entry_btrfs(
>  		__field(	u64,  bytenr		)
>  		__field(	u64,  num_bytes		)
> -		__field(	int,  action		) 
> +		__field(	int,  action		)
>  		__field(	int,  is_data		)
>  	),
>  
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 1/8] btrfs: rename extent_map::orig_block_len to disk_num_bytes
  2024-04-09 14:58   ` David Sterba
@ 2024-04-09 21:38     ` Qu Wenruo
  0 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2024-04-09 21:38 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



在 2024/4/10 00:28, David Sterba 写道:
> On Tue, Apr 09, 2024 at 08:03:40AM +0930, Qu Wenruo wrote:
>> --- a/fs/btrfs/tree-log.c
>> +++ b/fs/btrfs/tree-log.c
>> @@ -2872,7 +2872,7 @@ static inline void btrfs_remove_log_ctx(struct btrfs_root *root,
>>   	mutex_unlock(&root->log_mutex);
>>   }
>>
>> -/*
>> +/*
>
> There is a whitespace change but please check your patches that they
> don't contain such fixups unless it's in the modified code. Thanks.
>
It turns out to be LSP server.

Every time I modified a file, clangd seems to fix all whitespaces.

I'll need to find out the option not to do that.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start
  2024-04-09 14:57 ` [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start David Sterba
@ 2024-04-09 21:40   ` Qu Wenruo
  2024-04-09 22:18     ` David Sterba
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2024-04-09 21:40 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



在 2024/4/10 00:27, David Sterba 写道:
> On Tue, Apr 09, 2024 at 08:03:39AM +0930, Qu Wenruo wrote:
>> [REASON FOR RFC]
>> Not all sanity checks are implemented, there is a missing check for
>> ram_bytes on non-compressed extent.
>> Because even without this series, generic/311 can generate a file extent
>> with ram_bytes larger than disk_num_bytes.
>>
>> This seems harmless, but I still want to fix it and implement a full
>> version of the em sanity check.
>>
>> [REPO]
>> https://github.com/adam900710/linux/tree/em_cleanup
>>
>> Which relies on previous changes on extent maps.
>>
>> This series introduce two new members (disk_bytenr/offset) to
>> extent_map, and removes three old members
>> (block_start/block_len/offset), finally rename one member
>> (orig_block_len -> disk_num_bytes).
>>
>> This should save us one u64 for extent_map.
>>
>> But to make things safe to migrate, I introduce extra sanity checks for
>> extent_map, and do cross check for both old and new members.
>>
>> The extra sanity checks already exposed one bug (thankfully harmless)
>> causing em::block_start to be incorrect.
>>
>> There is another bug related to bad btrfs_file_extent_item::ram_bytes,
>> which can be larger than disk_num_bytes for non-compressed file extents.
>> (Generated by generic/311 test case, but it seems to be created on-disk
>>   first)
>>
>> But so far, the patchset is fine for default fstests run.
>
> I've only paged through the patches, besides the renames there are so
> many changes where it's hard to spot a subtle bug, but overall it looks
> ok.
>

That's why this time I go sanity/cross checks immediately after adding
new members, to make sure the behavior is not changed.

So that even it's pretty hard to review, if something obviously wrong
happened, I'll hit a crash.

You will be surprised by how many crashes it triggered during the
development.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start
  2024-04-09 21:40   ` Qu Wenruo
@ 2024-04-09 22:18     ` David Sterba
  0 siblings, 0 replies; 22+ messages in thread
From: David Sterba @ 2024-04-09 22:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Wed, Apr 10, 2024 at 07:10:08AM +0930, Qu Wenruo wrote:
> 在 2024/4/10 00:27, David Sterba 写道:
> >>
> >> The extra sanity checks already exposed one bug (thankfully harmless)
> >> causing em::block_start to be incorrect.
> >>
> >> There is another bug related to bad btrfs_file_extent_item::ram_bytes,
> >> which can be larger than disk_num_bytes for non-compressed file extents.
> >> (Generated by generic/311 test case, but it seems to be created on-disk
> >>   first)
> >>
> >> But so far, the patchset is fine for default fstests run.
> >
> > I've only paged through the patches, besides the renames there are so
> > many changes where it's hard to spot a subtle bug, but overall it looks
> > ok.
> 
> That's why this time I go sanity/cross checks immediately after adding
> new members, to make sure the behavior is not changed.
> 
> So that even it's pretty hard to review, if something obviously wrong
> happened, I'll hit a crash.
> 
> You will be surprised by how many crashes it triggered during the
> development.

Yeah, I can imagine. If you make it work in the end then it's good.  The
self tests can verify the boundary conditions without having to create a
complete filesystem and just focused on some API so here it's easier.

The review I'd like to see is by somebody familiar with the extent maps
if it all makes sense or there are no missing test cases.

This series depends on another one that looks almost finished except
some comments in patch that adds just code comments (so no functional
changes). This kind of changes would be good to get enough testing in
for-next so please go ahead and add it in case there's not something
really serious outstanding. We can do some minor tweaks later.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 2/8] btrfs: rename members of can_nocow_file_extent_args
  2024-04-08 22:33 ` [PATCH RFC 2/8] btrfs: rename members of can_nocow_file_extent_args Qu Wenruo
@ 2024-04-11 14:46   ` Filipe Manana
  2024-04-11 22:03     ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: Filipe Manana @ 2024-04-11 14:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Apr 8, 2024 at 11:34 PM Qu Wenruo <wqu@suse.com> wrote:
>
> The structure can_nocow_file_extent_args is utilized to provide the
> needed info for a NOCOW writes.
>
> However some of its members are pretty confusing.
> For example, @disk_bytenr is not btrfs_file_extent_item::disk_bytenr,
> but with extra offset, thus it works more like extent_map::block_start.
>
> This patch would:
>
> - Rename members directly fetched from btrfs_file_extent_item
>   The new name would have "orig_" prefix, with the same member name from
>   btrfs_file_extent_item.
>
> - For the old @disk_bytenr, rename it to @block_start
>   As it's directly passed into create_io_em() as @block_start.

So I find these new names more confusing actually.

So the existing names reflect fields from struct
btrfs_file_extent_item, because NOCOW checks are always done against
the range of a file extent item, therefore the existing naming.

Sometimes it may be against the whole range of the extent item,
sometimes only a part of it, in which case disk_bytenr is incremented
by offsets.

This is the same logic with struct btrfs_ordered_extent: for a NOCOW
write disk_bytenr may either match the disk_bytenr of an existing file
extent item or it's adjusted by some offset in case it covers only
part of the extent item.

So currently we are both consistent with btrfs_ordered_extent besides
the fact the NOCOW checks are done against a file extent item.

I particularly find block_start not intuitive - block? Is it a block
number? What's the size of the block? Etc.
disk_bytenr is a lot more clear - it's a disk address in bytes.

>
> - Add extra comments explaining those members
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/inode.c | 51 ++++++++++++++++++++++++++++--------------------
>  1 file changed, 30 insertions(+), 21 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 2e0156943c7c..4d207c3b38d9 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1847,11 +1847,20 @@ struct can_nocow_file_extent_args {
>          */
>         bool free_path;
>
> -       /* Output fields. Only set when can_nocow_file_extent() returns 1. */
> +       /*
> +        * Output fields. Only set when can_nocow_file_extent() returns 1.
> +        *
> +        * @block_start:        The bytenr of the new nocow write should be at.
> +        * @orig_disk_bytenr:   The original data extent's disk_bytenr.

This orig_disk_bytenr field is not defined anywhere in this patch.

Thanks.

> +        * @orig_disk_num_bytes:The original data extent's disk_num_bytes.
> +        * @orig_offset:        The original offset inside the old data extent.
> +        *                      Caller should calculate their own
> +        *                      btrfs_file_extent_item::offset base on this.
> +        */
>
> -       u64 disk_bytenr;
> -       u64 disk_num_bytes;
> -       u64 extent_offset;
> +       u64 block_start;
> +       u64 orig_disk_num_bytes;
> +       u64 orig_offset;
>         /* Number of bytes that can be written to in NOCOW mode. */
>         u64 num_bytes;
>  };
> @@ -1887,9 +1896,9 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>                 goto out;
>
>         /* Can't access these fields unless we know it's not an inline extent. */
> -       args->disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
> -       args->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
> -       args->extent_offset = btrfs_file_extent_offset(leaf, fi);
> +       args->block_start = btrfs_file_extent_disk_bytenr(leaf, fi);
> +       args->orig_disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
> +       args->orig_offset = btrfs_file_extent_offset(leaf, fi);
>
>         if (!(inode->flags & BTRFS_INODE_NODATACOW) &&
>             extent_type == BTRFS_FILE_EXTENT_REG)
> @@ -1906,7 +1915,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>                 goto out;
>
>         /* An explicit hole, must COW. */
> -       if (args->disk_bytenr == 0)
> +       if (args->block_start == 0)
>                 goto out;
>
>         /* Compressed/encrypted/encoded extents must be COWed. */
> @@ -1925,8 +1934,8 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>         btrfs_release_path(path);
>
>         ret = btrfs_cross_ref_exist(root, btrfs_ino(inode),
> -                                   key->offset - args->extent_offset,
> -                                   args->disk_bytenr, args->strict, path);
> +                                   key->offset - args->orig_offset,
> +                                   args->block_start, args->strict, path);
>         WARN_ON_ONCE(ret > 0 && is_freespace_inode);
>         if (ret != 0)
>                 goto out;
> @@ -1947,15 +1956,15 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>             atomic_read(&root->snapshot_force_cow))
>                 goto out;
>
> -       args->disk_bytenr += args->extent_offset;
> -       args->disk_bytenr += args->start - key->offset;
> +       args->block_start += args->orig_offset;
> +       args->block_start += args->start - key->offset;
>         args->num_bytes = min(args->end + 1, extent_end) - args->start;
>
>         /*
>          * Force COW if csums exist in the range. This ensures that csums for a
>          * given extent are either valid or do not exist.
>          */
> -       ret = csum_exist_in_range(root->fs_info, args->disk_bytenr, args->num_bytes,
> +       ret = csum_exist_in_range(root->fs_info, args->block_start, args->num_bytes,
>                                   nowait);
>         WARN_ON_ONCE(ret > 0 && is_freespace_inode);
>         if (ret != 0)
> @@ -2112,7 +2121,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>                         goto must_cow;
>
>                 ret = 0;
> -               nocow_bg = btrfs_inc_nocow_writers(fs_info, nocow_args.disk_bytenr);
> +               nocow_bg = btrfs_inc_nocow_writers(fs_info, nocow_args.block_start);
>                 if (!nocow_bg) {
>  must_cow:
>                         /*
> @@ -2151,14 +2160,14 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>                 nocow_end = cur_offset + nocow_args.num_bytes - 1;
>                 is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
>                 if (is_prealloc) {
> -                       u64 orig_start = found_key.offset - nocow_args.extent_offset;
> +                       u64 orig_start = found_key.offset - nocow_args.orig_offset;
>                         struct extent_map *em;
>
>                         em = create_io_em(inode, cur_offset, nocow_args.num_bytes,
>                                           orig_start,
> -                                         nocow_args.disk_bytenr, /* block_start */
> +                                         nocow_args.block_start, /* block_start */
>                                           nocow_args.num_bytes, /* block_len */
> -                                         nocow_args.disk_num_bytes, /* orig_block_len */
> +                                         nocow_args.orig_disk_num_bytes, /* orig_block_len */
>                                           ram_bytes, BTRFS_COMPRESS_NONE,
>                                           BTRFS_ORDERED_PREALLOC);
>                         if (IS_ERR(em)) {
> @@ -2171,7 +2180,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>
>                 ordered = btrfs_alloc_ordered_extent(inode, cur_offset,
>                                 nocow_args.num_bytes, nocow_args.num_bytes,
> -                               nocow_args.disk_bytenr, nocow_args.num_bytes, 0,
> +                               nocow_args.block_start, nocow_args.num_bytes, 0,
>                                 is_prealloc
>                                 ? (1 << BTRFS_ORDERED_PREALLOC)
>                                 : (1 << BTRFS_ORDERED_NOCOW),
> @@ -7189,7 +7198,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>         }
>
>         ret = 0;
> -       if (btrfs_extent_readonly(fs_info, nocow_args.disk_bytenr))
> +       if (btrfs_extent_readonly(fs_info, nocow_args.block_start))
>                 goto out;
>
>         if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW) &&
> @@ -7206,9 +7215,9 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>         }
>
>         if (orig_start)
> -               *orig_start = key.offset - nocow_args.extent_offset;
> +               *orig_start = key.offset - nocow_args.orig_offset;
>         if (orig_block_len)
> -               *orig_block_len = nocow_args.disk_num_bytes;
> +               *orig_block_len = nocow_args.orig_disk_num_bytes;
>
>         *len = nocow_args.num_bytes;
>         ret = 1;
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 3/8] btrfs: introduce new members for extent_map
  2024-04-08 22:33 ` [PATCH RFC 3/8] btrfs: introduce new members for extent_map Qu Wenruo
@ 2024-04-11 14:56   ` Filipe Manana
  2024-04-11 21:52     ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: Filipe Manana @ 2024-04-11 14:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Apr 8, 2024 at 11:34 PM Qu Wenruo <wqu@suse.com> wrote:
>
> Introduce two new members for extent_map:
>
> - disk_bytenr
> - offset
>
> Both are matching the members with the same name inside
> btrfs_file_extent_items.
>
> For now this patch only touches those members when:
>
> - Reading btrfs_file_extent_items from disk
> - Inserting new holes
> - Merging two extent maps
>   With the new disk_bytenr and disk_num_bytes, doing merging would be a
>   little complex, as we have 3 different cases:
>
>   * Both extent maps are referring to the same data extent
>   * Both extent maps are referring to different data extents, but
>     those data extents are adjacent, and extent maps are at head/tail
>     of each data extents
>   * One of the extent map is referring to an merged and larger data
>     extent that covers both extent maps
>
>   The 3rd case seems only valid in selftest (test_case_3()), but
>   a new helper merge_ondisk_extents() should be able to handle all of
>   them.
>
> - Add a new member for can_nocow_file_extent_args
>   The new member is called "orig_disk_bytenr", for easier fetching the
>   old disk_bytenr.
>
> - Update the new members when doing extent map split
>   This is in fact a little simpler, as we only need to update
>   offset/len.
>
> - Update the new members when inserting new io extent map
>   This involves quite some NOCOW related functions, and adding two
>   parameters to a already long parameter list.
>
>   To avoid unexpected parameter change, the two new parameters,
>   @disk_bytenr and @offset are all added to the end of the list.
>
>   And they would be relocated when dropping the old
>   @block_start/@block_len/@orig_start members.
>
> For now, both the old members (block_start/block_len/orig_start) are
> co-existing with the new members (disk_bytenr/offset), meanwhile all the
> critical code is still using the old members only.
>
> The switch to new members would happen gradually to be bisect
> friendly.

I don't see why it is more bisect friendly.

If there's a bug the bisection will point to the patch that does the
switch, while the bug is very likely in the patch (this one) which is
adding the field and doing all its computations.

>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/btrfs_inode.h |  3 +-
>  fs/btrfs/defrag.c      |  4 +++
>  fs/btrfs/extent_map.c  | 75 ++++++++++++++++++++++++++++++++++++++++--
>  fs/btrfs/extent_map.h  | 17 ++++++++++
>  fs/btrfs/file-item.c   |  9 ++++-
>  fs/btrfs/file.c        |  3 +-
>  fs/btrfs/inode.c       | 56 +++++++++++++++++++++++--------
>  7 files changed, 147 insertions(+), 20 deletions(-)
>
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 100020ca4658..ded36e065089 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -444,7 +444,8 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
>                         u32 bio_offset, struct bio_vec *bv);
>  noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>                               u64 *orig_start, u64 *orig_block_len,
> -                             u64 *ram_bytes, bool nowait, bool strict);
> +                             u64 *ram_bytes, bool nowait, bool strict,
> +                             u64 *disk_bytenr_ret, u64 *extent_offset_ret);
>
>  void btrfs_del_delalloc_inode(struct btrfs_inode *inode);
>  struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry);
> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
> index f015fa1b6301..5259fd556487 100644
> --- a/fs/btrfs/defrag.c
> +++ b/fs/btrfs/defrag.c
> @@ -709,6 +709,10 @@ static struct extent_map *defrag_get_extent(struct btrfs_inode *inode,
>                         em->start = start;
>                         em->orig_start = start;
>                         em->block_start = EXTENT_MAP_HOLE;
> +                       em->disk_bytenr = EXTENT_MAP_HOLE;
> +                       em->disk_num_bytes = 0;
> +                       em->ram_bytes = 0;
> +                       em->offset = 0;
>                         em->len = key.offset - start;
>                         break;
>                 }
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index dd51a21b6a76..f59423897501 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -223,6 +223,58 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma
>         return next->block_start == prev->block_start;
>  }
>
> +/*
> + * Handle the ondisk data extents merge for @prev and @next.
> + *
> + * Only touches disk_bytenr/disk_num_bytes/offset/ram_bytes.
> + * For now only uncompressed regular extent can be merged.
> + *
> + * @prev and @next will be both updated to point to the new merged range.
> + * Thus one of them should be removed by the caller.
> + */
> +static void merge_ondisk_extents(struct extent_map *prev, struct extent_map *next)
> +{
> +       u64 new_disk_bytenr;
> +       u64 new_disk_num_bytes;
> +       u64 new_offset;
> +
> +       /* @prev and @next should not be compressed. */
> +       ASSERT(!extent_map_is_compressed(prev));
> +       ASSERT(!extent_map_is_compressed(next));
> +
> +       /*
> +        * There are several different cases that @prev and @next can be merged.

that -> where

> +        *
> +        * 1) They are referring to the same data extent
> +        * 2) Their ondisk data extents are adjacent and @prev is the tail
> +        *    and @next is the head of their data extents
> +        * 3) One of @prev/@next is referrring to a larger merged data extent.

 referrring ->  referring

> +        *    (test_case_3 of extent maps tests).
> +        *
> +        * The calculation here always merge the data extents first, then update
> +        * @offset using the new data extents.
> +        *
> +        * For case 1), the merged data extent would be the same.
> +        * For case 2), we just merge the two data extents into one.
> +        * For case 3), we just got the larger data extent.
> +        */
> +       new_disk_bytenr = min(prev->disk_bytenr, next->disk_bytenr);
> +       new_disk_num_bytes = max(prev->disk_bytenr + prev->disk_num_bytes,
> +                                next->disk_bytenr + next->disk_num_bytes) -
> +                            new_disk_bytenr;
> +       new_offset = prev->disk_bytenr + prev->offset - new_disk_bytenr;
> +
> +       prev->disk_bytenr = new_disk_bytenr;
> +       prev->disk_num_bytes = new_disk_num_bytes;
> +       prev->ram_bytes = new_disk_num_bytes;
> +       prev->offset = new_offset;
> +
> +       next->disk_bytenr = new_disk_bytenr;
> +       next->disk_num_bytes = new_disk_num_bytes;
> +       next->ram_bytes = new_disk_num_bytes;
> +       next->offset = new_offset;
> +}
> +
>  static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
>  {
>         struct extent_map *merge = NULL;
> @@ -253,6 +305,9 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
>                         em->block_len += merge->block_len;
>                         em->block_start = merge->block_start;
>                         em->generation = max(em->generation, merge->generation);
> +
> +                       if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
> +                               merge_ondisk_extents(merge, em);
>                         em->flags |= EXTENT_FLAG_MERGED;
>
>                         rb_erase_cached(&merge->rb_node, &tree->map);
> @@ -267,6 +322,8 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
>         if (rb && can_merge_extent_map(merge) && mergeable_maps(em, merge)) {
>                 em->len += merge->len;
>                 em->block_len += merge->block_len;
> +               if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
> +                       merge_ondisk_extents(em, merge);
>                 rb_erase_cached(&merge->rb_node, &tree->map);
>                 RB_CLEAR_NODE(&merge->rb_node);
>                 em->generation = max(em->generation, merge->generation);
> @@ -541,6 +598,7 @@ static noinline int merge_extent_mapping(struct extent_map_tree *em_tree,
>             !extent_map_is_compressed(em)) {
>                 em->block_start += start_diff;
>                 em->block_len = em->len;
> +               em->offset += start_diff;
>         }
>         return add_extent_mapping(em_tree, em, 0);
>  }
> @@ -759,14 +817,18 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
>                                         split->block_len = em->block_len;
>                                 else
>                                         split->block_len = split->len;
> +                               split->disk_bytenr = em->disk_bytenr;
>                                 split->disk_num_bytes = max(split->block_len,
>                                                             em->disk_num_bytes);
> +                               split->offset = em->offset;
>                                 split->ram_bytes = em->ram_bytes;
>                         } else {
>                                 split->orig_start = split->start;
>                                 split->block_len = 0;
>                                 split->block_start = em->block_start;
> +                               split->disk_bytenr = em->disk_bytenr;
>                                 split->disk_num_bytes = 0;
> +                               split->offset = 0;
>                                 split->ram_bytes = split->len;
>                         }
>
> @@ -787,13 +849,14 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
>                         split->start = end;
>                         split->len = em_end - end;
>                         split->block_start = em->block_start;
> +                       split->disk_bytenr = em->disk_bytenr;
>                         split->flags = flags;
>                         split->generation = gen;
>
>                         if (em->block_start < EXTENT_MAP_LAST_BYTE) {
>                                 split->disk_num_bytes = max(em->block_len,
>                                                             em->disk_num_bytes);
> -
> +                               split->offset = em->offset + end - em->start;
>                                 split->ram_bytes = em->ram_bytes;
>                                 if (compressed) {
>                                         split->block_len = em->block_len;
> @@ -806,10 +869,11 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
>                                         split->orig_start = em->orig_start;
>                                 }
>                         } else {
> +                               split->disk_num_bytes = 0;
> +                               split->offset = 0;
>                                 split->ram_bytes = split->len;
>                                 split->orig_start = split->start;
>                                 split->block_len = 0;
> -                               split->disk_num_bytes = 0;
>                         }
>
>                         if (extent_map_in_tree(em)) {
> @@ -965,6 +1029,9 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
>         /* First, replace the em with a new extent_map starting from * em->start */
>         split_pre->start = em->start;
>         split_pre->len = pre;
> +       split_pre->disk_bytenr = new_logical;
> +       split_pre->disk_num_bytes = split_pre->len;
> +       split_pre->offset = 0;
>         split_pre->orig_start = split_pre->start;
>         split_pre->block_start = new_logical;
>         split_pre->block_len = split_pre->len;
> @@ -983,10 +1050,12 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
>         /* Insert the middle extent_map. */
>         split_mid->start = em->start + pre;
>         split_mid->len = em->len - pre;
> +       split_mid->disk_bytenr = em->block_start + pre;
> +       split_mid->disk_num_bytes = split_mid->len;
> +       split_mid->offset = 0;
>         split_mid->orig_start = split_mid->start;
>         split_mid->block_start = em->block_start + pre;
>         split_mid->block_len = split_mid->len;
> -       split_mid->disk_num_bytes = split_mid->block_len;
>         split_mid->ram_bytes = split_mid->len;
>         split_mid->flags = flags;
>         split_mid->generation = em->generation;
> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
> index 242a0c2e7a5e..848b4a4ecd6a 100644
> --- a/fs/btrfs/extent_map.h
> +++ b/fs/btrfs/extent_map.h
> @@ -67,12 +67,29 @@ struct extent_map {
>          */
>         u64 orig_start;
>
> +       /*
> +        * The bytenr for of the full on-disk extent.

"for of" should be just "of".

I've only skimmed through the patch, but it seems ok.

Thanks.

> +        *
> +        * For regular extents it's btrfs_file_extent_item::disk_bytenr.
> +        * For holes it's EXTENT_MAP_HOLE and for inline extents it's
> +        * EXTENT_MAP_INLINE.
> +        */
> +       u64 disk_bytenr;
> +
>         /*
>          * The full on-disk extent length, matching
>          * btrfs_file_extent_item::disk_num_bytes.
>          */
>         u64 disk_num_bytes;
>
> +       /*
> +        * Offset inside the decompressed extent.
> +        *
> +        * For regular extents it's btrfs_file_extent_item::offset.
> +        * For holes and inline extents it's 0.
> +        */
> +       u64 offset;
> +
>         /*
>          * The decompressed size of the whole on-disk extent, matching
>          * btrfs_file_extent_item::ram_bytes.
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index b552646a0ce6..96486f82ab5d 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -1280,12 +1280,17 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>                 em->len = btrfs_file_extent_end(path) - extent_start;
>                 em->orig_start = extent_start -
>                         btrfs_file_extent_offset(leaf, fi);
> -               em->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
>                 bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>                 if (bytenr == 0) {
>                         em->block_start = EXTENT_MAP_HOLE;
> +                       em->disk_bytenr = EXTENT_MAP_HOLE;
> +                       em->disk_num_bytes = 0;
> +                       em->offset = 0;
>                         return;
>                 }
> +               em->disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
> +               em->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
> +               em->offset = btrfs_file_extent_offset(leaf, fi);
>                 if (compress_type != BTRFS_COMPRESS_NONE) {
>                         extent_map_set_compression(em, compress_type);
>                         em->block_start = bytenr;
> @@ -1302,8 +1307,10 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>                 ASSERT(extent_start == 0);
>
>                 em->block_start = EXTENT_MAP_INLINE;
> +               em->disk_bytenr = EXTENT_MAP_INLINE;
>                 em->start = 0;
>                 em->len = fs_info->sectorsize;
> +               em->offset = 0;
>                 /*
>                  * Initialize orig_start and block_len with the same values
>                  * as in inode.c:btrfs_get_extent().
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index cdcd7e0785c1..af6de3549901 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1094,7 +1094,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>                                                    &cached_state);
>         }
>         ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
> -                       NULL, NULL, NULL, nowait, false);
> +                       NULL, NULL, NULL, nowait, false, NULL, NULL);
>         if (ret <= 0)
>                 btrfs_drew_write_unlock(&root->snapshot_lock);
>         else
> @@ -2161,6 +2161,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>                 hole_em->orig_start = offset;
>
>                 hole_em->block_start = EXTENT_MAP_HOLE;
> +               hole_em->disk_bytenr = EXTENT_MAP_HOLE;
>                 hole_em->block_len = 0;
>                 hole_em->disk_num_bytes = 0;
>                 hole_em->generation = trans->transid;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 4d207c3b38d9..69a7cdeef81e 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -139,9 +139,9 @@ static noinline int run_delalloc_cow(struct btrfs_inode *inode,
>                                      bool pages_dirty);
>  static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
>                                        u64 len, u64 orig_start, u64 block_start,
> -                                      u64 block_len, u64 orig_block_len,
> +                                      u64 block_len, u64 disk_num_bytes,
>                                        u64 ram_bytes, int compress_type,
> -                                      int type);
> +                                      int type, u64 disk_bytenr, u64 offset);
>
>  static int data_reloc_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
>                                           u64 root, void *warn_ctx)
> @@ -1166,7 +1166,8 @@ static void submit_one_async_extent(struct async_chunk *async_chunk,
>                           ins.offset,                   /* orig_block_len */
>                           async_extent->ram_size,       /* ram_bytes */
>                           async_extent->compress_type,
> -                         BTRFS_ORDERED_COMPRESSED);
> +                         BTRFS_ORDERED_COMPRESSED,
> +                         ins.objectid, 0);
>         if (IS_ERR(em)) {
>                 ret = PTR_ERR(em);
>                 goto out_free_reserve;
> @@ -1429,7 +1430,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>                                   ins.offset, /* orig_block_len */
>                                   ram_size, /* ram_bytes */
>                                   BTRFS_COMPRESS_NONE, /* compress_type */
> -                                 BTRFS_ORDERED_REGULAR /* type */);
> +                                 BTRFS_ORDERED_REGULAR /* type */,
> +                                 ins.objectid, 0);
>                 if (IS_ERR(em)) {
>                         ret = PTR_ERR(em);
>                         goto out_reserve;
> @@ -1859,6 +1861,7 @@ struct can_nocow_file_extent_args {
>          */
>
>         u64 block_start;
> +       u64 orig_disk_bytenr;
>         u64 orig_disk_num_bytes;
>         u64 orig_offset;
>         /* Number of bytes that can be written to in NOCOW mode. */
> @@ -1897,6 +1900,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>
>         /* Can't access these fields unless we know it's not an inline extent. */
>         args->block_start = btrfs_file_extent_disk_bytenr(leaf, fi);
> +       args->orig_disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>         args->orig_disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
>         args->orig_offset = btrfs_file_extent_offset(leaf, fi);
>
> @@ -2169,7 +2173,10 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>                                           nocow_args.num_bytes, /* block_len */
>                                           nocow_args.orig_disk_num_bytes, /* orig_block_len */
>                                           ram_bytes, BTRFS_COMPRESS_NONE,
> -                                         BTRFS_ORDERED_PREALLOC);
> +                                         BTRFS_ORDERED_PREALLOC,
> +                                         nocow_args.orig_disk_bytenr,
> +                                         cur_offset - found_key.offset +
> +                                         nocow_args.orig_offset);
>                         if (IS_ERR(em)) {
>                                 btrfs_dec_nocow_writers(nocow_bg);
>                                 ret = PTR_ERR(em);
> @@ -4999,6 +5006,7 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
>                         hole_em->orig_start = cur_offset;
>
>                         hole_em->block_start = EXTENT_MAP_HOLE;
> +                       hole_em->disk_bytenr = EXTENT_MAP_HOLE;
>                         hole_em->block_len = 0;
>                         hole_em->disk_num_bytes = 0;
>                         hole_em->ram_bytes = hole_size;
> @@ -6860,6 +6868,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
>         }
>         em->start = EXTENT_MAP_HOLE;
>         em->orig_start = EXTENT_MAP_HOLE;
> +       em->disk_bytenr = EXTENT_MAP_HOLE;
>         em->len = (u64)-1;
>         em->block_len = (u64)-1;
>
> @@ -7025,7 +7034,9 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
>                                                   const u64 block_len,
>                                                   const u64 orig_block_len,
>                                                   const u64 ram_bytes,
> -                                                 const int type)
> +                                                 const int type,
> +                                                 const u64 disk_bytenr,
> +                                                 const u64 offset)
>  {
>         struct extent_map *em = NULL;
>         struct btrfs_ordered_extent *ordered;
> @@ -7034,7 +7045,7 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
>                 em = create_io_em(inode, start, len, orig_start, block_start,
>                                   block_len, orig_block_len, ram_bytes,
>                                   BTRFS_COMPRESS_NONE, /* compress_type */
> -                                 type);
> +                                 type, disk_bytenr, offset);
>                 if (IS_ERR(em))
>                         goto out;
>         }
> @@ -7085,7 +7096,8 @@ static struct extent_map *btrfs_new_extent_direct(struct btrfs_inode *inode,
>
>         em = btrfs_create_dio_extent(inode, dio_data, start, ins.offset, start,
>                                      ins.objectid, ins.offset, ins.offset,
> -                                    ins.offset, BTRFS_ORDERED_REGULAR);
> +                                    ins.offset, BTRFS_ORDERED_REGULAR,
> +                                    ins.objectid, 0);
>         btrfs_dec_block_group_reservations(fs_info, ins.objectid);
>         if (IS_ERR(em))
>                 btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset,
> @@ -7129,7 +7141,8 @@ static bool btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr)
>   */
>  noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>                               u64 *orig_start, u64 *orig_block_len,
> -                             u64 *ram_bytes, bool nowait, bool strict)
> +                             u64 *ram_bytes, bool nowait, bool strict,
> +                             u64 *disk_bytenr_ret, u64 *new_offset_ret)
>  {
>         struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
>         struct can_nocow_file_extent_args nocow_args = { 0 };
> @@ -7218,6 +7231,11 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>                 *orig_start = key.offset - nocow_args.orig_offset;
>         if (orig_block_len)
>                 *orig_block_len = nocow_args.orig_disk_num_bytes;
> +       if (disk_bytenr_ret)
> +               *disk_bytenr_ret = nocow_args.orig_disk_bytenr;
> +       if (new_offset_ret)
> +               *new_offset_ret = offset - key.offset +
> +                                 nocow_args.orig_offset;
>
>         *len = nocow_args.num_bytes;
>         ret = 1;
> @@ -7324,7 +7342,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
>                                        u64 len, u64 orig_start, u64 block_start,
>                                        u64 block_len, u64 disk_num_bytes,
>                                        u64 ram_bytes, int compress_type,
> -                                      int type)
> +                                      int type, u64 disk_bytenr, u64 offset)
>  {
>         struct extent_map *em;
>         int ret;
> @@ -7381,9 +7399,11 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
>         em->len = len;
>         em->block_len = block_len;
>         em->block_start = block_start;
> +       em->disk_bytenr = disk_bytenr;
>         em->disk_num_bytes = disk_num_bytes;
>         em->ram_bytes = ram_bytes;
>         em->generation = -1;
> +       em->offset = offset;
>         em->flags |= EXTENT_FLAG_PINNED;
>         if (type == BTRFS_ORDERED_COMPRESSED)
>                 extent_map_set_compression(em, compress_type);
> @@ -7410,6 +7430,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>         struct extent_map *em = *map;
>         int type;
>         u64 block_start, orig_start, orig_block_len, ram_bytes;
> +       u64 disk_bytenr;
> +       u64 new_offset;
>         struct btrfs_block_group *bg;
>         bool can_nocow = false;
>         bool space_reserved = false;
> @@ -7437,7 +7459,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>                 block_start = em->block_start + (start - em->start);
>
>                 if (can_nocow_extent(inode, start, &len, &orig_start,
> -                                    &orig_block_len, &ram_bytes, false, false) == 1) {
> +                                    &orig_block_len, &ram_bytes, false, false,
> +                                    &disk_bytenr, &new_offset) == 1) {
>                         bg = btrfs_inc_nocow_writers(fs_info, block_start);
>                         if (bg)
>                                 can_nocow = true;
> @@ -7465,7 +7488,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>                 em2 = btrfs_create_dio_extent(BTRFS_I(inode), dio_data, start, len,
>                                               orig_start, block_start,
>                                               len, orig_block_len,
> -                                             ram_bytes, type);
> +                                             ram_bytes, type,
> +                                             disk_bytenr, new_offset);
>                 btrfs_dec_nocow_writers(bg);
>                 if (type == BTRFS_ORDERED_PREALLOC) {
>                         free_extent_map(em);
> @@ -9784,6 +9808,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
>                 em->orig_start = cur_offset;
>                 em->len = ins.offset;
>                 em->block_start = ins.objectid;
> +               em->disk_bytenr = ins.objectid;
> +               em->offset = 0;
>                 em->block_len = ins.offset;
>                 em->disk_num_bytes = ins.offset;
>                 em->ram_bytes = ins.offset;
> @@ -10526,7 +10552,8 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
>         em = create_io_em(inode, start, num_bytes,
>                           start - encoded->unencoded_offset, ins.objectid,
>                           ins.offset, ins.offset, ram_bytes, compression,
> -                         BTRFS_ORDERED_COMPRESSED);
> +                         BTRFS_ORDERED_COMPRESSED, ins.objectid,
> +                         encoded->unencoded_offset);
>         if (IS_ERR(em)) {
>                 ret = PTR_ERR(em);
>                 goto out_free_reserved;
> @@ -10856,7 +10883,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
>                 free_extent_map(em);
>                 em = NULL;
>
> -               ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL, false, true);
> +               ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL,
> +                                      false, true, NULL, NULL);
>                 if (ret < 0) {
>                         goto out;
>                 } else if (ret) {
> --
> 2.44.0
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 3/8] btrfs: introduce new members for extent_map
  2024-04-11 14:56   ` Filipe Manana
@ 2024-04-11 21:52     ` Qu Wenruo
  0 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2024-04-11 21:52 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



在 2024/4/12 00:26, Filipe Manana 写道:
> On Mon, Apr 8, 2024 at 11:34 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Introduce two new members for extent_map:
>>
>> - disk_bytenr
>> - offset
>>
>> Both are matching the members with the same name inside
>> btrfs_file_extent_items.
>>
>> For now this patch only touches those members when:
>>
>> - Reading btrfs_file_extent_items from disk
>> - Inserting new holes
>> - Merging two extent maps
>>    With the new disk_bytenr and disk_num_bytes, doing merging would be a
>>    little complex, as we have 3 different cases:
>>
>>    * Both extent maps are referring to the same data extent
>>    * Both extent maps are referring to different data extents, but
>>      those data extents are adjacent, and extent maps are at head/tail
>>      of each data extents
>>    * One of the extent map is referring to an merged and larger data
>>      extent that covers both extent maps
>>
>>    The 3rd case seems only valid in selftest (test_case_3()), but
>>    a new helper merge_ondisk_extents() should be able to handle all of
>>    them.
>>
>> - Add a new member for can_nocow_file_extent_args
>>    The new member is called "orig_disk_bytenr", for easier fetching the
>>    old disk_bytenr.
>>
>> - Update the new members when doing extent map split
>>    This is in fact a little simpler, as we only need to update
>>    offset/len.
>>
>> - Update the new members when inserting new io extent map
>>    This involves quite some NOCOW related functions, and adding two
>>    parameters to a already long parameter list.
>>
>>    To avoid unexpected parameter change, the two new parameters,
>>    @disk_bytenr and @offset are all added to the end of the list.
>>
>>    And they would be relocated when dropping the old
>>    @block_start/@block_len/@orig_start members.
>>
>> For now, both the old members (block_start/block_len/orig_start) are
>> co-existing with the new members (disk_bytenr/offset), meanwhile all the
>> critical code is still using the old members only.
>>
>> The switch to new members would happen gradually to be bisect
>> friendly.
>
> I don't see why it is more bisect friendly.
>
> If there's a bug the bisection will point to the patch that does the
> switch, while the bug is very likely in the patch (this one) which is
> adding the field and doing all its computations.

You're right.

Especially with the next sanity check patch, all crash would happen at
that patch.

I'll just remove the mention of bisection friendly.

Thanks,
Qu
>
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/btrfs_inode.h |  3 +-
>>   fs/btrfs/defrag.c      |  4 +++
>>   fs/btrfs/extent_map.c  | 75 ++++++++++++++++++++++++++++++++++++++++--
>>   fs/btrfs/extent_map.h  | 17 ++++++++++
>>   fs/btrfs/file-item.c   |  9 ++++-
>>   fs/btrfs/file.c        |  3 +-
>>   fs/btrfs/inode.c       | 56 +++++++++++++++++++++++--------
>>   7 files changed, 147 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
>> index 100020ca4658..ded36e065089 100644
>> --- a/fs/btrfs/btrfs_inode.h
>> +++ b/fs/btrfs/btrfs_inode.h
>> @@ -444,7 +444,8 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
>>                          u32 bio_offset, struct bio_vec *bv);
>>   noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>>                                u64 *orig_start, u64 *orig_block_len,
>> -                             u64 *ram_bytes, bool nowait, bool strict);
>> +                             u64 *ram_bytes, bool nowait, bool strict,
>> +                             u64 *disk_bytenr_ret, u64 *extent_offset_ret);
>>
>>   void btrfs_del_delalloc_inode(struct btrfs_inode *inode);
>>   struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry);
>> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
>> index f015fa1b6301..5259fd556487 100644
>> --- a/fs/btrfs/defrag.c
>> +++ b/fs/btrfs/defrag.c
>> @@ -709,6 +709,10 @@ static struct extent_map *defrag_get_extent(struct btrfs_inode *inode,
>>                          em->start = start;
>>                          em->orig_start = start;
>>                          em->block_start = EXTENT_MAP_HOLE;
>> +                       em->disk_bytenr = EXTENT_MAP_HOLE;
>> +                       em->disk_num_bytes = 0;
>> +                       em->ram_bytes = 0;
>> +                       em->offset = 0;
>>                          em->len = key.offset - start;
>>                          break;
>>                  }
>> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
>> index dd51a21b6a76..f59423897501 100644
>> --- a/fs/btrfs/extent_map.c
>> +++ b/fs/btrfs/extent_map.c
>> @@ -223,6 +223,58 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma
>>          return next->block_start == prev->block_start;
>>   }
>>
>> +/*
>> + * Handle the ondisk data extents merge for @prev and @next.
>> + *
>> + * Only touches disk_bytenr/disk_num_bytes/offset/ram_bytes.
>> + * For now only uncompressed regular extent can be merged.
>> + *
>> + * @prev and @next will be both updated to point to the new merged range.
>> + * Thus one of them should be removed by the caller.
>> + */
>> +static void merge_ondisk_extents(struct extent_map *prev, struct extent_map *next)
>> +{
>> +       u64 new_disk_bytenr;
>> +       u64 new_disk_num_bytes;
>> +       u64 new_offset;
>> +
>> +       /* @prev and @next should not be compressed. */
>> +       ASSERT(!extent_map_is_compressed(prev));
>> +       ASSERT(!extent_map_is_compressed(next));
>> +
>> +       /*
>> +        * There are several different cases that @prev and @next can be merged.
>
> that -> where
>
>> +        *
>> +        * 1) They are referring to the same data extent
>> +        * 2) Their ondisk data extents are adjacent and @prev is the tail
>> +        *    and @next is the head of their data extents
>> +        * 3) One of @prev/@next is referrring to a larger merged data extent.
>
>   referrring ->  referring
>
>> +        *    (test_case_3 of extent maps tests).
>> +        *
>> +        * The calculation here always merge the data extents first, then update
>> +        * @offset using the new data extents.
>> +        *
>> +        * For case 1), the merged data extent would be the same.
>> +        * For case 2), we just merge the two data extents into one.
>> +        * For case 3), we just got the larger data extent.
>> +        */
>> +       new_disk_bytenr = min(prev->disk_bytenr, next->disk_bytenr);
>> +       new_disk_num_bytes = max(prev->disk_bytenr + prev->disk_num_bytes,
>> +                                next->disk_bytenr + next->disk_num_bytes) -
>> +                            new_disk_bytenr;
>> +       new_offset = prev->disk_bytenr + prev->offset - new_disk_bytenr;
>> +
>> +       prev->disk_bytenr = new_disk_bytenr;
>> +       prev->disk_num_bytes = new_disk_num_bytes;
>> +       prev->ram_bytes = new_disk_num_bytes;
>> +       prev->offset = new_offset;
>> +
>> +       next->disk_bytenr = new_disk_bytenr;
>> +       next->disk_num_bytes = new_disk_num_bytes;
>> +       next->ram_bytes = new_disk_num_bytes;
>> +       next->offset = new_offset;
>> +}
>> +
>>   static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
>>   {
>>          struct extent_map *merge = NULL;
>> @@ -253,6 +305,9 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
>>                          em->block_len += merge->block_len;
>>                          em->block_start = merge->block_start;
>>                          em->generation = max(em->generation, merge->generation);
>> +
>> +                       if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
>> +                               merge_ondisk_extents(merge, em);
>>                          em->flags |= EXTENT_FLAG_MERGED;
>>
>>                          rb_erase_cached(&merge->rb_node, &tree->map);
>> @@ -267,6 +322,8 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em)
>>          if (rb && can_merge_extent_map(merge) && mergeable_maps(em, merge)) {
>>                  em->len += merge->len;
>>                  em->block_len += merge->block_len;
>> +               if (em->disk_bytenr < EXTENT_MAP_LAST_BYTE)
>> +                       merge_ondisk_extents(em, merge);
>>                  rb_erase_cached(&merge->rb_node, &tree->map);
>>                  RB_CLEAR_NODE(&merge->rb_node);
>>                  em->generation = max(em->generation, merge->generation);
>> @@ -541,6 +598,7 @@ static noinline int merge_extent_mapping(struct extent_map_tree *em_tree,
>>              !extent_map_is_compressed(em)) {
>>                  em->block_start += start_diff;
>>                  em->block_len = em->len;
>> +               em->offset += start_diff;
>>          }
>>          return add_extent_mapping(em_tree, em, 0);
>>   }
>> @@ -759,14 +817,18 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
>>                                          split->block_len = em->block_len;
>>                                  else
>>                                          split->block_len = split->len;
>> +                               split->disk_bytenr = em->disk_bytenr;
>>                                  split->disk_num_bytes = max(split->block_len,
>>                                                              em->disk_num_bytes);
>> +                               split->offset = em->offset;
>>                                  split->ram_bytes = em->ram_bytes;
>>                          } else {
>>                                  split->orig_start = split->start;
>>                                  split->block_len = 0;
>>                                  split->block_start = em->block_start;
>> +                               split->disk_bytenr = em->disk_bytenr;
>>                                  split->disk_num_bytes = 0;
>> +                               split->offset = 0;
>>                                  split->ram_bytes = split->len;
>>                          }
>>
>> @@ -787,13 +849,14 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
>>                          split->start = end;
>>                          split->len = em_end - end;
>>                          split->block_start = em->block_start;
>> +                       split->disk_bytenr = em->disk_bytenr;
>>                          split->flags = flags;
>>                          split->generation = gen;
>>
>>                          if (em->block_start < EXTENT_MAP_LAST_BYTE) {
>>                                  split->disk_num_bytes = max(em->block_len,
>>                                                              em->disk_num_bytes);
>> -
>> +                               split->offset = em->offset + end - em->start;
>>                                  split->ram_bytes = em->ram_bytes;
>>                                  if (compressed) {
>>                                          split->block_len = em->block_len;
>> @@ -806,10 +869,11 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
>>                                          split->orig_start = em->orig_start;
>>                                  }
>>                          } else {
>> +                               split->disk_num_bytes = 0;
>> +                               split->offset = 0;
>>                                  split->ram_bytes = split->len;
>>                                  split->orig_start = split->start;
>>                                  split->block_len = 0;
>> -                               split->disk_num_bytes = 0;
>>                          }
>>
>>                          if (extent_map_in_tree(em)) {
>> @@ -965,6 +1029,9 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
>>          /* First, replace the em with a new extent_map starting from * em->start */
>>          split_pre->start = em->start;
>>          split_pre->len = pre;
>> +       split_pre->disk_bytenr = new_logical;
>> +       split_pre->disk_num_bytes = split_pre->len;
>> +       split_pre->offset = 0;
>>          split_pre->orig_start = split_pre->start;
>>          split_pre->block_start = new_logical;
>>          split_pre->block_len = split_pre->len;
>> @@ -983,10 +1050,12 @@ int split_extent_map(struct btrfs_inode *inode, u64 start, u64 len, u64 pre,
>>          /* Insert the middle extent_map. */
>>          split_mid->start = em->start + pre;
>>          split_mid->len = em->len - pre;
>> +       split_mid->disk_bytenr = em->block_start + pre;
>> +       split_mid->disk_num_bytes = split_mid->len;
>> +       split_mid->offset = 0;
>>          split_mid->orig_start = split_mid->start;
>>          split_mid->block_start = em->block_start + pre;
>>          split_mid->block_len = split_mid->len;
>> -       split_mid->disk_num_bytes = split_mid->block_len;
>>          split_mid->ram_bytes = split_mid->len;
>>          split_mid->flags = flags;
>>          split_mid->generation = em->generation;
>> diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
>> index 242a0c2e7a5e..848b4a4ecd6a 100644
>> --- a/fs/btrfs/extent_map.h
>> +++ b/fs/btrfs/extent_map.h
>> @@ -67,12 +67,29 @@ struct extent_map {
>>           */
>>          u64 orig_start;
>>
>> +       /*
>> +        * The bytenr for of the full on-disk extent.
>
> "for of" should be just "of".
>
> I've only skimmed through the patch, but it seems ok.
>
> Thanks.
>
>> +        *
>> +        * For regular extents it's btrfs_file_extent_item::disk_bytenr.
>> +        * For holes it's EXTENT_MAP_HOLE and for inline extents it's
>> +        * EXTENT_MAP_INLINE.
>> +        */
>> +       u64 disk_bytenr;
>> +
>>          /*
>>           * The full on-disk extent length, matching
>>           * btrfs_file_extent_item::disk_num_bytes.
>>           */
>>          u64 disk_num_bytes;
>>
>> +       /*
>> +        * Offset inside the decompressed extent.
>> +        *
>> +        * For regular extents it's btrfs_file_extent_item::offset.
>> +        * For holes and inline extents it's 0.
>> +        */
>> +       u64 offset;
>> +
>>          /*
>>           * The decompressed size of the whole on-disk extent, matching
>>           * btrfs_file_extent_item::ram_bytes.
>> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
>> index b552646a0ce6..96486f82ab5d 100644
>> --- a/fs/btrfs/file-item.c
>> +++ b/fs/btrfs/file-item.c
>> @@ -1280,12 +1280,17 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>>                  em->len = btrfs_file_extent_end(path) - extent_start;
>>                  em->orig_start = extent_start -
>>                          btrfs_file_extent_offset(leaf, fi);
>> -               em->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
>>                  bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>>                  if (bytenr == 0) {
>>                          em->block_start = EXTENT_MAP_HOLE;
>> +                       em->disk_bytenr = EXTENT_MAP_HOLE;
>> +                       em->disk_num_bytes = 0;
>> +                       em->offset = 0;
>>                          return;
>>                  }
>> +               em->disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>> +               em->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
>> +               em->offset = btrfs_file_extent_offset(leaf, fi);
>>                  if (compress_type != BTRFS_COMPRESS_NONE) {
>>                          extent_map_set_compression(em, compress_type);
>>                          em->block_start = bytenr;
>> @@ -1302,8 +1307,10 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>>                  ASSERT(extent_start == 0);
>>
>>                  em->block_start = EXTENT_MAP_INLINE;
>> +               em->disk_bytenr = EXTENT_MAP_INLINE;
>>                  em->start = 0;
>>                  em->len = fs_info->sectorsize;
>> +               em->offset = 0;
>>                  /*
>>                   * Initialize orig_start and block_len with the same values
>>                   * as in inode.c:btrfs_get_extent().
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index cdcd7e0785c1..af6de3549901 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1094,7 +1094,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>>                                                     &cached_state);
>>          }
>>          ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
>> -                       NULL, NULL, NULL, nowait, false);
>> +                       NULL, NULL, NULL, nowait, false, NULL, NULL);
>>          if (ret <= 0)
>>                  btrfs_drew_write_unlock(&root->snapshot_lock);
>>          else
>> @@ -2161,6 +2161,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>>                  hole_em->orig_start = offset;
>>
>>                  hole_em->block_start = EXTENT_MAP_HOLE;
>> +               hole_em->disk_bytenr = EXTENT_MAP_HOLE;
>>                  hole_em->block_len = 0;
>>                  hole_em->disk_num_bytes = 0;
>>                  hole_em->generation = trans->transid;
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 4d207c3b38d9..69a7cdeef81e 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -139,9 +139,9 @@ static noinline int run_delalloc_cow(struct btrfs_inode *inode,
>>                                       bool pages_dirty);
>>   static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
>>                                         u64 len, u64 orig_start, u64 block_start,
>> -                                      u64 block_len, u64 orig_block_len,
>> +                                      u64 block_len, u64 disk_num_bytes,
>>                                         u64 ram_bytes, int compress_type,
>> -                                      int type);
>> +                                      int type, u64 disk_bytenr, u64 offset);
>>
>>   static int data_reloc_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
>>                                            u64 root, void *warn_ctx)
>> @@ -1166,7 +1166,8 @@ static void submit_one_async_extent(struct async_chunk *async_chunk,
>>                            ins.offset,                   /* orig_block_len */
>>                            async_extent->ram_size,       /* ram_bytes */
>>                            async_extent->compress_type,
>> -                         BTRFS_ORDERED_COMPRESSED);
>> +                         BTRFS_ORDERED_COMPRESSED,
>> +                         ins.objectid, 0);
>>          if (IS_ERR(em)) {
>>                  ret = PTR_ERR(em);
>>                  goto out_free_reserve;
>> @@ -1429,7 +1430,8 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>>                                    ins.offset, /* orig_block_len */
>>                                    ram_size, /* ram_bytes */
>>                                    BTRFS_COMPRESS_NONE, /* compress_type */
>> -                                 BTRFS_ORDERED_REGULAR /* type */);
>> +                                 BTRFS_ORDERED_REGULAR /* type */,
>> +                                 ins.objectid, 0);
>>                  if (IS_ERR(em)) {
>>                          ret = PTR_ERR(em);
>>                          goto out_reserve;
>> @@ -1859,6 +1861,7 @@ struct can_nocow_file_extent_args {
>>           */
>>
>>          u64 block_start;
>> +       u64 orig_disk_bytenr;
>>          u64 orig_disk_num_bytes;
>>          u64 orig_offset;
>>          /* Number of bytes that can be written to in NOCOW mode. */
>> @@ -1897,6 +1900,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>>
>>          /* Can't access these fields unless we know it's not an inline extent. */
>>          args->block_start = btrfs_file_extent_disk_bytenr(leaf, fi);
>> +       args->orig_disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>>          args->orig_disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
>>          args->orig_offset = btrfs_file_extent_offset(leaf, fi);
>>
>> @@ -2169,7 +2173,10 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>>                                            nocow_args.num_bytes, /* block_len */
>>                                            nocow_args.orig_disk_num_bytes, /* orig_block_len */
>>                                            ram_bytes, BTRFS_COMPRESS_NONE,
>> -                                         BTRFS_ORDERED_PREALLOC);
>> +                                         BTRFS_ORDERED_PREALLOC,
>> +                                         nocow_args.orig_disk_bytenr,
>> +                                         cur_offset - found_key.offset +
>> +                                         nocow_args.orig_offset);
>>                          if (IS_ERR(em)) {
>>                                  btrfs_dec_nocow_writers(nocow_bg);
>>                                  ret = PTR_ERR(em);
>> @@ -4999,6 +5006,7 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
>>                          hole_em->orig_start = cur_offset;
>>
>>                          hole_em->block_start = EXTENT_MAP_HOLE;
>> +                       hole_em->disk_bytenr = EXTENT_MAP_HOLE;
>>                          hole_em->block_len = 0;
>>                          hole_em->disk_num_bytes = 0;
>>                          hole_em->ram_bytes = hole_size;
>> @@ -6860,6 +6868,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
>>          }
>>          em->start = EXTENT_MAP_HOLE;
>>          em->orig_start = EXTENT_MAP_HOLE;
>> +       em->disk_bytenr = EXTENT_MAP_HOLE;
>>          em->len = (u64)-1;
>>          em->block_len = (u64)-1;
>>
>> @@ -7025,7 +7034,9 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
>>                                                    const u64 block_len,
>>                                                    const u64 orig_block_len,
>>                                                    const u64 ram_bytes,
>> -                                                 const int type)
>> +                                                 const int type,
>> +                                                 const u64 disk_bytenr,
>> +                                                 const u64 offset)
>>   {
>>          struct extent_map *em = NULL;
>>          struct btrfs_ordered_extent *ordered;
>> @@ -7034,7 +7045,7 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
>>                  em = create_io_em(inode, start, len, orig_start, block_start,
>>                                    block_len, orig_block_len, ram_bytes,
>>                                    BTRFS_COMPRESS_NONE, /* compress_type */
>> -                                 type);
>> +                                 type, disk_bytenr, offset);
>>                  if (IS_ERR(em))
>>                          goto out;
>>          }
>> @@ -7085,7 +7096,8 @@ static struct extent_map *btrfs_new_extent_direct(struct btrfs_inode *inode,
>>
>>          em = btrfs_create_dio_extent(inode, dio_data, start, ins.offset, start,
>>                                       ins.objectid, ins.offset, ins.offset,
>> -                                    ins.offset, BTRFS_ORDERED_REGULAR);
>> +                                    ins.offset, BTRFS_ORDERED_REGULAR,
>> +                                    ins.objectid, 0);
>>          btrfs_dec_block_group_reservations(fs_info, ins.objectid);
>>          if (IS_ERR(em))
>>                  btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset,
>> @@ -7129,7 +7141,8 @@ static bool btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr)
>>    */
>>   noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>>                                u64 *orig_start, u64 *orig_block_len,
>> -                             u64 *ram_bytes, bool nowait, bool strict)
>> +                             u64 *ram_bytes, bool nowait, bool strict,
>> +                             u64 *disk_bytenr_ret, u64 *new_offset_ret)
>>   {
>>          struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
>>          struct can_nocow_file_extent_args nocow_args = { 0 };
>> @@ -7218,6 +7231,11 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>>                  *orig_start = key.offset - nocow_args.orig_offset;
>>          if (orig_block_len)
>>                  *orig_block_len = nocow_args.orig_disk_num_bytes;
>> +       if (disk_bytenr_ret)
>> +               *disk_bytenr_ret = nocow_args.orig_disk_bytenr;
>> +       if (new_offset_ret)
>> +               *new_offset_ret = offset - key.offset +
>> +                                 nocow_args.orig_offset;
>>
>>          *len = nocow_args.num_bytes;
>>          ret = 1;
>> @@ -7324,7 +7342,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
>>                                         u64 len, u64 orig_start, u64 block_start,
>>                                         u64 block_len, u64 disk_num_bytes,
>>                                         u64 ram_bytes, int compress_type,
>> -                                      int type)
>> +                                      int type, u64 disk_bytenr, u64 offset)
>>   {
>>          struct extent_map *em;
>>          int ret;
>> @@ -7381,9 +7399,11 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start,
>>          em->len = len;
>>          em->block_len = block_len;
>>          em->block_start = block_start;
>> +       em->disk_bytenr = disk_bytenr;
>>          em->disk_num_bytes = disk_num_bytes;
>>          em->ram_bytes = ram_bytes;
>>          em->generation = -1;
>> +       em->offset = offset;
>>          em->flags |= EXTENT_FLAG_PINNED;
>>          if (type == BTRFS_ORDERED_COMPRESSED)
>>                  extent_map_set_compression(em, compress_type);
>> @@ -7410,6 +7430,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>>          struct extent_map *em = *map;
>>          int type;
>>          u64 block_start, orig_start, orig_block_len, ram_bytes;
>> +       u64 disk_bytenr;
>> +       u64 new_offset;
>>          struct btrfs_block_group *bg;
>>          bool can_nocow = false;
>>          bool space_reserved = false;
>> @@ -7437,7 +7459,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>>                  block_start = em->block_start + (start - em->start);
>>
>>                  if (can_nocow_extent(inode, start, &len, &orig_start,
>> -                                    &orig_block_len, &ram_bytes, false, false) == 1) {
>> +                                    &orig_block_len, &ram_bytes, false, false,
>> +                                    &disk_bytenr, &new_offset) == 1) {
>>                          bg = btrfs_inc_nocow_writers(fs_info, block_start);
>>                          if (bg)
>>                                  can_nocow = true;
>> @@ -7465,7 +7488,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>>                  em2 = btrfs_create_dio_extent(BTRFS_I(inode), dio_data, start, len,
>>                                                orig_start, block_start,
>>                                                len, orig_block_len,
>> -                                             ram_bytes, type);
>> +                                             ram_bytes, type,
>> +                                             disk_bytenr, new_offset);
>>                  btrfs_dec_nocow_writers(bg);
>>                  if (type == BTRFS_ORDERED_PREALLOC) {
>>                          free_extent_map(em);
>> @@ -9784,6 +9808,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
>>                  em->orig_start = cur_offset;
>>                  em->len = ins.offset;
>>                  em->block_start = ins.objectid;
>> +               em->disk_bytenr = ins.objectid;
>> +               em->offset = 0;
>>                  em->block_len = ins.offset;
>>                  em->disk_num_bytes = ins.offset;
>>                  em->ram_bytes = ins.offset;
>> @@ -10526,7 +10552,8 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
>>          em = create_io_em(inode, start, num_bytes,
>>                            start - encoded->unencoded_offset, ins.objectid,
>>                            ins.offset, ins.offset, ram_bytes, compression,
>> -                         BTRFS_ORDERED_COMPRESSED);
>> +                         BTRFS_ORDERED_COMPRESSED, ins.objectid,
>> +                         encoded->unencoded_offset);
>>          if (IS_ERR(em)) {
>>                  ret = PTR_ERR(em);
>>                  goto out_free_reserved;
>> @@ -10856,7 +10883,8 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
>>                  free_extent_map(em);
>>                  em = NULL;
>>
>> -               ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL, false, true);
>> +               ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL,
>> +                                      false, true, NULL, NULL);
>>                  if (ret < 0) {
>>                          goto out;
>>                  } else if (ret) {
>> --
>> 2.44.0
>>
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 2/8] btrfs: rename members of can_nocow_file_extent_args
  2024-04-11 14:46   ` Filipe Manana
@ 2024-04-11 22:03     ` Qu Wenruo
  2024-04-12 13:21       ` Filipe Manana
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2024-04-11 22:03 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



在 2024/4/12 00:16, Filipe Manana 写道:
> On Mon, Apr 8, 2024 at 11:34 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>> The structure can_nocow_file_extent_args is utilized to provide the
>> needed info for a NOCOW writes.
>>
>> However some of its members are pretty confusing.
>> For example, @disk_bytenr is not btrfs_file_extent_item::disk_bytenr,
>> but with extra offset, thus it works more like extent_map::block_start.
>>
>> This patch would:
>>
>> - Rename members directly fetched from btrfs_file_extent_item
>>    The new name would have "orig_" prefix, with the same member name from
>>    btrfs_file_extent_item.
>>
>> - For the old @disk_bytenr, rename it to @block_start
>>    As it's directly passed into create_io_em() as @block_start.
>
> So I find these new names more confusing actually.
>
> So the existing names reflect fields from struct
> btrfs_file_extent_item, because NOCOW checks are always done against
> the range of a file extent item, therefore the existing naming.

It's true for @extent_offset, but @disk_bytenr is not the case.

It's calculated by file_extent_item::disk_bytenr + file_extent_item::offset.

That's why I find the old @disk_bytenr very confusing (and caused
several crashes in my sanity checks).

>
> Sometimes it may be against the whole range of the extent item,
> sometimes only a part of it, in which case disk_bytenr is incremented
> by offsets.
>
> This is the same logic with struct btrfs_ordered_extent: for a NOCOW
> write disk_bytenr may either match the disk_bytenr of an existing file
> extent item or it's adjusted by some offset in case it covers only
> part of the extent item.

The NOCOW ordered extent would skip the file extent map updates, that's
why it doesn't really need an super accurate disk_bytenr/disk_num_bytes
to match data extents.

>
> So currently we are both consistent with btrfs_ordered_extent besides
> the fact the NOCOW checks are done against a file extent item.
>
> I particularly find block_start not intuitive - block? Is it a block
> number? What's the size of the block? Etc.
> disk_bytenr is a lot more clear - it's a disk address in bytes.

Well, the new @block_start matches the old extent_map::block_start.

I have to say, we do not have a solid definition on "disk_bytenr" in the
first place.

Should it always match ondisk file_extent_item::disk_bytenr, or should
it act like "block_start" of the old extent_map?

And if we have separate definitions, one to always match file extent
item disk_bytenr, and one to match the real IO start bytenr, what should
be their names?

I hope we can get a good naming to solve the confusion, any good idea?

Thanks,
Qu
>
>>
>> - Add extra comments explaining those members
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/inode.c | 51 ++++++++++++++++++++++++++++--------------------
>>   1 file changed, 30 insertions(+), 21 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 2e0156943c7c..4d207c3b38d9 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -1847,11 +1847,20 @@ struct can_nocow_file_extent_args {
>>           */
>>          bool free_path;
>>
>> -       /* Output fields. Only set when can_nocow_file_extent() returns 1. */
>> +       /*
>> +        * Output fields. Only set when can_nocow_file_extent() returns 1.
>> +        *
>> +        * @block_start:        The bytenr of the new nocow write should be at.
>> +        * @orig_disk_bytenr:   The original data extent's disk_bytenr.
>
> This orig_disk_bytenr field is not defined anywhere in this patch.
>
> Thanks.
>
>> +        * @orig_disk_num_bytes:The original data extent's disk_num_bytes.
>> +        * @orig_offset:        The original offset inside the old data extent.
>> +        *                      Caller should calculate their own
>> +        *                      btrfs_file_extent_item::offset base on this.
>> +        */
>>
>> -       u64 disk_bytenr;
>> -       u64 disk_num_bytes;
>> -       u64 extent_offset;
>> +       u64 block_start;
>> +       u64 orig_disk_num_bytes;
>> +       u64 orig_offset;
>>          /* Number of bytes that can be written to in NOCOW mode. */
>>          u64 num_bytes;
>>   };
>> @@ -1887,9 +1896,9 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>>                  goto out;
>>
>>          /* Can't access these fields unless we know it's not an inline extent. */
>> -       args->disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>> -       args->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
>> -       args->extent_offset = btrfs_file_extent_offset(leaf, fi);
>> +       args->block_start = btrfs_file_extent_disk_bytenr(leaf, fi);
>> +       args->orig_disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
>> +       args->orig_offset = btrfs_file_extent_offset(leaf, fi);
>>
>>          if (!(inode->flags & BTRFS_INODE_NODATACOW) &&
>>              extent_type == BTRFS_FILE_EXTENT_REG)
>> @@ -1906,7 +1915,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>>                  goto out;
>>
>>          /* An explicit hole, must COW. */
>> -       if (args->disk_bytenr == 0)
>> +       if (args->block_start == 0)
>>                  goto out;
>>
>>          /* Compressed/encrypted/encoded extents must be COWed. */
>> @@ -1925,8 +1934,8 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>>          btrfs_release_path(path);
>>
>>          ret = btrfs_cross_ref_exist(root, btrfs_ino(inode),
>> -                                   key->offset - args->extent_offset,
>> -                                   args->disk_bytenr, args->strict, path);
>> +                                   key->offset - args->orig_offset,
>> +                                   args->block_start, args->strict, path);
>>          WARN_ON_ONCE(ret > 0 && is_freespace_inode);
>>          if (ret != 0)
>>                  goto out;
>> @@ -1947,15 +1956,15 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>>              atomic_read(&root->snapshot_force_cow))
>>                  goto out;
>>
>> -       args->disk_bytenr += args->extent_offset;
>> -       args->disk_bytenr += args->start - key->offset;
>> +       args->block_start += args->orig_offset;
>> +       args->block_start += args->start - key->offset;
>>          args->num_bytes = min(args->end + 1, extent_end) - args->start;
>>
>>          /*
>>           * Force COW if csums exist in the range. This ensures that csums for a
>>           * given extent are either valid or do not exist.
>>           */
>> -       ret = csum_exist_in_range(root->fs_info, args->disk_bytenr, args->num_bytes,
>> +       ret = csum_exist_in_range(root->fs_info, args->block_start, args->num_bytes,
>>                                    nowait);
>>          WARN_ON_ONCE(ret > 0 && is_freespace_inode);
>>          if (ret != 0)
>> @@ -2112,7 +2121,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>>                          goto must_cow;
>>
>>                  ret = 0;
>> -               nocow_bg = btrfs_inc_nocow_writers(fs_info, nocow_args.disk_bytenr);
>> +               nocow_bg = btrfs_inc_nocow_writers(fs_info, nocow_args.block_start);
>>                  if (!nocow_bg) {
>>   must_cow:
>>                          /*
>> @@ -2151,14 +2160,14 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>>                  nocow_end = cur_offset + nocow_args.num_bytes - 1;
>>                  is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
>>                  if (is_prealloc) {
>> -                       u64 orig_start = found_key.offset - nocow_args.extent_offset;
>> +                       u64 orig_start = found_key.offset - nocow_args.orig_offset;
>>                          struct extent_map *em;
>>
>>                          em = create_io_em(inode, cur_offset, nocow_args.num_bytes,
>>                                            orig_start,
>> -                                         nocow_args.disk_bytenr, /* block_start */
>> +                                         nocow_args.block_start, /* block_start */
>>                                            nocow_args.num_bytes, /* block_len */
>> -                                         nocow_args.disk_num_bytes, /* orig_block_len */
>> +                                         nocow_args.orig_disk_num_bytes, /* orig_block_len */
>>                                            ram_bytes, BTRFS_COMPRESS_NONE,
>>                                            BTRFS_ORDERED_PREALLOC);
>>                          if (IS_ERR(em)) {
>> @@ -2171,7 +2180,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>>
>>                  ordered = btrfs_alloc_ordered_extent(inode, cur_offset,
>>                                  nocow_args.num_bytes, nocow_args.num_bytes,
>> -                               nocow_args.disk_bytenr, nocow_args.num_bytes, 0,
>> +                               nocow_args.block_start, nocow_args.num_bytes, 0,
>>                                  is_prealloc
>>                                  ? (1 << BTRFS_ORDERED_PREALLOC)
>>                                  : (1 << BTRFS_ORDERED_NOCOW),
>> @@ -7189,7 +7198,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>>          }
>>
>>          ret = 0;
>> -       if (btrfs_extent_readonly(fs_info, nocow_args.disk_bytenr))
>> +       if (btrfs_extent_readonly(fs_info, nocow_args.block_start))
>>                  goto out;
>>
>>          if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW) &&
>> @@ -7206,9 +7215,9 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>>          }
>>
>>          if (orig_start)
>> -               *orig_start = key.offset - nocow_args.extent_offset;
>> +               *orig_start = key.offset - nocow_args.orig_offset;
>>          if (orig_block_len)
>> -               *orig_block_len = nocow_args.disk_num_bytes;
>> +               *orig_block_len = nocow_args.orig_disk_num_bytes;
>>
>>          *len = nocow_args.num_bytes;
>>          ret = 1;
>> --
>> 2.44.0
>>
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 2/8] btrfs: rename members of can_nocow_file_extent_args
  2024-04-11 22:03     ` Qu Wenruo
@ 2024-04-12 13:21       ` Filipe Manana
  2024-04-12 22:00         ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: Filipe Manana @ 2024-04-12 13:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Thu, Apr 11, 2024 at 11:03 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/4/12 00:16, Filipe Manana 写道:
> > On Mon, Apr 8, 2024 at 11:34 PM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> The structure can_nocow_file_extent_args is utilized to provide the
> >> needed info for a NOCOW writes.
> >>
> >> However some of its members are pretty confusing.
> >> For example, @disk_bytenr is not btrfs_file_extent_item::disk_bytenr,
> >> but with extra offset, thus it works more like extent_map::block_start.
> >>
> >> This patch would:
> >>
> >> - Rename members directly fetched from btrfs_file_extent_item
> >>    The new name would have "orig_" prefix, with the same member name from
> >>    btrfs_file_extent_item.
> >>
> >> - For the old @disk_bytenr, rename it to @block_start
> >>    As it's directly passed into create_io_em() as @block_start.
> >
> > So I find these new names more confusing actually.
> >
> > So the existing names reflect fields from struct
> > btrfs_file_extent_item, because NOCOW checks are always done against
> > the range of a file extent item, therefore the existing naming.
>
> It's true for @extent_offset, but @disk_bytenr is not the case.
>
> It's calculated by file_extent_item::disk_bytenr + file_extent_item::offset.
>
> That's why I find the old @disk_bytenr very confusing (and caused
> several crashes in my sanity checks).
>
> >
> > Sometimes it may be against the whole range of the extent item,
> > sometimes only a part of it, in which case disk_bytenr is incremented
> > by offsets.
> >
> > This is the same logic with struct btrfs_ordered_extent: for a NOCOW
> > write disk_bytenr may either match the disk_bytenr of an existing file
> > extent item or it's adjusted by some offset in case it covers only
> > part of the extent item.
>
> The NOCOW ordered extent would skip the file extent map updates, that's
> why it doesn't really need an super accurate disk_bytenr/disk_num_bytes
> to match data extents.
>
> >
> > So currently we are both consistent with btrfs_ordered_extent besides
> > the fact the NOCOW checks are done against a file extent item.
> >
> > I particularly find block_start not intuitive - block? Is it a block
> > number? What's the size of the block? Etc.
> > disk_bytenr is a lot more clear - it's a disk address in bytes.
>
> Well, the new @block_start matches the old extent_map::block_start.

So it becomes a single exception, different from everywhere else.
Doesn't seem like a good thing in general.

>
> I have to say, we do not have a solid definition on "disk_bytenr" in the
> first place.

Well I find the name clear, it is a disk location measured by a byte address.
block_start is not so clear for anyone not familiar with btrfs'
internals, it makes me think of a block number and wonder what's the
block size, etc.

>
> Should it always match ondisk file_extent_item::disk_bytenr, or should
> it act like "block_start" of the old extent_map?

It's always about a range of a file extent item, be it the whole range
or just a part of it.
I don't see why it's confusing to use disk_bytenr, etc.
I find it more confusing to use something else, or at least what's
being proposed in this patch.

>
> And if we have separate definitions, one to always match file extent
> item disk_bytenr, and one to match the real IO start bytenr, what should
> be their names?
>
> I hope we can get a good naming to solve the confusion, any good idea?

For me the current naming is fine and I don't find it confusing... So,
I'm not sure what to tell you.

>
> Thanks,
> Qu
> >
> >>
> >> - Add extra comments explaining those members
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>   fs/btrfs/inode.c | 51 ++++++++++++++++++++++++++++--------------------
> >>   1 file changed, 30 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index 2e0156943c7c..4d207c3b38d9 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -1847,11 +1847,20 @@ struct can_nocow_file_extent_args {
> >>           */
> >>          bool free_path;
> >>
> >> -       /* Output fields. Only set when can_nocow_file_extent() returns 1. */
> >> +       /*
> >> +        * Output fields. Only set when can_nocow_file_extent() returns 1.
> >> +        *
> >> +        * @block_start:        The bytenr of the new nocow write should be at.
> >> +        * @orig_disk_bytenr:   The original data extent's disk_bytenr.
> >
> > This orig_disk_bytenr field is not defined anywhere in this patch.
> >
> > Thanks.
> >
> >> +        * @orig_disk_num_bytes:The original data extent's disk_num_bytes.
> >> +        * @orig_offset:        The original offset inside the old data extent.
> >> +        *                      Caller should calculate their own
> >> +        *                      btrfs_file_extent_item::offset base on this.
> >> +        */
> >>
> >> -       u64 disk_bytenr;
> >> -       u64 disk_num_bytes;
> >> -       u64 extent_offset;
> >> +       u64 block_start;
> >> +       u64 orig_disk_num_bytes;
> >> +       u64 orig_offset;
> >>          /* Number of bytes that can be written to in NOCOW mode. */
> >>          u64 num_bytes;
> >>   };
> >> @@ -1887,9 +1896,9 @@ static int can_nocow_file_extent(struct btrfs_path *path,
> >>                  goto out;
> >>
> >>          /* Can't access these fields unless we know it's not an inline extent. */
> >> -       args->disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
> >> -       args->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
> >> -       args->extent_offset = btrfs_file_extent_offset(leaf, fi);
> >> +       args->block_start = btrfs_file_extent_disk_bytenr(leaf, fi);
> >> +       args->orig_disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
> >> +       args->orig_offset = btrfs_file_extent_offset(leaf, fi);
> >>
> >>          if (!(inode->flags & BTRFS_INODE_NODATACOW) &&
> >>              extent_type == BTRFS_FILE_EXTENT_REG)
> >> @@ -1906,7 +1915,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
> >>                  goto out;
> >>
> >>          /* An explicit hole, must COW. */
> >> -       if (args->disk_bytenr == 0)
> >> +       if (args->block_start == 0)
> >>                  goto out;
> >>
> >>          /* Compressed/encrypted/encoded extents must be COWed. */
> >> @@ -1925,8 +1934,8 @@ static int can_nocow_file_extent(struct btrfs_path *path,
> >>          btrfs_release_path(path);
> >>
> >>          ret = btrfs_cross_ref_exist(root, btrfs_ino(inode),
> >> -                                   key->offset - args->extent_offset,
> >> -                                   args->disk_bytenr, args->strict, path);
> >> +                                   key->offset - args->orig_offset,
> >> +                                   args->block_start, args->strict, path);
> >>          WARN_ON_ONCE(ret > 0 && is_freespace_inode);
> >>          if (ret != 0)
> >>                  goto out;
> >> @@ -1947,15 +1956,15 @@ static int can_nocow_file_extent(struct btrfs_path *path,
> >>              atomic_read(&root->snapshot_force_cow))
> >>                  goto out;
> >>
> >> -       args->disk_bytenr += args->extent_offset;
> >> -       args->disk_bytenr += args->start - key->offset;
> >> +       args->block_start += args->orig_offset;
> >> +       args->block_start += args->start - key->offset;
> >>          args->num_bytes = min(args->end + 1, extent_end) - args->start;
> >>
> >>          /*
> >>           * Force COW if csums exist in the range. This ensures that csums for a
> >>           * given extent are either valid or do not exist.
> >>           */
> >> -       ret = csum_exist_in_range(root->fs_info, args->disk_bytenr, args->num_bytes,
> >> +       ret = csum_exist_in_range(root->fs_info, args->block_start, args->num_bytes,
> >>                                    nowait);
> >>          WARN_ON_ONCE(ret > 0 && is_freespace_inode);
> >>          if (ret != 0)
> >> @@ -2112,7 +2121,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> >>                          goto must_cow;
> >>
> >>                  ret = 0;
> >> -               nocow_bg = btrfs_inc_nocow_writers(fs_info, nocow_args.disk_bytenr);
> >> +               nocow_bg = btrfs_inc_nocow_writers(fs_info, nocow_args.block_start);
> >>                  if (!nocow_bg) {
> >>   must_cow:
> >>                          /*
> >> @@ -2151,14 +2160,14 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> >>                  nocow_end = cur_offset + nocow_args.num_bytes - 1;
> >>                  is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
> >>                  if (is_prealloc) {
> >> -                       u64 orig_start = found_key.offset - nocow_args.extent_offset;
> >> +                       u64 orig_start = found_key.offset - nocow_args.orig_offset;
> >>                          struct extent_map *em;
> >>
> >>                          em = create_io_em(inode, cur_offset, nocow_args.num_bytes,
> >>                                            orig_start,
> >> -                                         nocow_args.disk_bytenr, /* block_start */
> >> +                                         nocow_args.block_start, /* block_start */
> >>                                            nocow_args.num_bytes, /* block_len */
> >> -                                         nocow_args.disk_num_bytes, /* orig_block_len */
> >> +                                         nocow_args.orig_disk_num_bytes, /* orig_block_len */
> >>                                            ram_bytes, BTRFS_COMPRESS_NONE,
> >>                                            BTRFS_ORDERED_PREALLOC);
> >>                          if (IS_ERR(em)) {
> >> @@ -2171,7 +2180,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
> >>
> >>                  ordered = btrfs_alloc_ordered_extent(inode, cur_offset,
> >>                                  nocow_args.num_bytes, nocow_args.num_bytes,
> >> -                               nocow_args.disk_bytenr, nocow_args.num_bytes, 0,
> >> +                               nocow_args.block_start, nocow_args.num_bytes, 0,
> >>                                  is_prealloc
> >>                                  ? (1 << BTRFS_ORDERED_PREALLOC)
> >>                                  : (1 << BTRFS_ORDERED_NOCOW),
> >> @@ -7189,7 +7198,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
> >>          }
> >>
> >>          ret = 0;
> >> -       if (btrfs_extent_readonly(fs_info, nocow_args.disk_bytenr))
> >> +       if (btrfs_extent_readonly(fs_info, nocow_args.block_start))
> >>                  goto out;
> >>
> >>          if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW) &&
> >> @@ -7206,9 +7215,9 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
> >>          }
> >>
> >>          if (orig_start)
> >> -               *orig_start = key.offset - nocow_args.extent_offset;
> >> +               *orig_start = key.offset - nocow_args.orig_offset;
> >>          if (orig_block_len)
> >> -               *orig_block_len = nocow_args.disk_num_bytes;
> >> +               *orig_block_len = nocow_args.orig_disk_num_bytes;
> >>
> >>          *len = nocow_args.num_bytes;
> >>          ret = 1;
> >> --
> >> 2.44.0
> >>
> >>
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 2/8] btrfs: rename members of can_nocow_file_extent_args
  2024-04-12 13:21       ` Filipe Manana
@ 2024-04-12 22:00         ` Qu Wenruo
  2024-04-12 22:12           ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2024-04-12 22:00 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Qu Wenruo, linux-btrfs



在 2024/4/12 22:51, Filipe Manana 写道:
> On Thu, Apr 11, 2024 at 11:03 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
[...]
>>
>> Well, the new @block_start matches the old extent_map::block_start.
>
> So it becomes a single exception, different from everywhere else.
> Doesn't seem like a good thing in general.

OK, I can get rid of the @block_start name.

>
>>
>> I have to say, we do not have a solid definition on "disk_bytenr" in the
>> first place.
>
> Well I find the name clear, it is a disk location measured by a byte address.
> block_start is not so clear for anyone not familiar with btrfs'
> internals, it makes me think of a block number and wonder what's the
> block size, etc.
>
>>
>> Should it always match ondisk file_extent_item::disk_bytenr, or should
>> it act like "block_start" of the old extent_map?
>
> It's always about a range of a file extent item, be it the whole range
> or just a part of it.
> I don't see why it's confusing to use disk_bytenr, etc.
> I find it more confusing to use something else, or at least what's
> being proposed in this patch.

Well, IMHO since we take the name @disk_bytenr from btrfs file extent
item, and btrfs file extent uses @disk_bytenr to uniquely locate a data
extent, then we should also follow it to use @disk_bytenr for the same
purpose.

So that every time we see the name @disk_bytenr, we know it can be used
to locate a data extent, without any need for weird offset calculation.

That's why I'm strongly against adding any offset into @disk_bytenr.
And I believe that's the biggest difference in our points of view.

Although in this particular case, I can use some extra prefixs like
"orig_" or "fe_" (for file extent), so that those members can be later
directly passed to create_io_em() without extra offset calculation.

Would that be a acceptable trade-off?


Another solution would be just drop this patch, and do extra calulation
resulting something like this:

	create_io_em(...,
		     disk_bytenr - whatever_offset, /* disk_bytenr */
		     offset - whatever_offset, /* offset */
		     PREALLOC, ...);

At least that does not sound sane to me, and can be bug prune.
You won't believe how many different crashes I hit just due to the weird
disk_bytenr calculation here, and that's the biggest reason I have

Thanks,
Qu


>
>>
>> And if we have separate definitions, one to always match file extent
>> item disk_bytenr, and one to match the real IO start bytenr, what should
>> be their names?
>>
>> I hope we can get a good naming to solve the confusion, any good idea?
>
> For me the current naming is fine and I don't find it confusing... So,
> I'm not sure what to tell you.
>
>>
>> Thanks,
>> Qu
>>>
>>>>
>>>> - Add extra comments explaining those members
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>    fs/btrfs/inode.c | 51 ++++++++++++++++++++++++++++--------------------
>>>>    1 file changed, 30 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>> index 2e0156943c7c..4d207c3b38d9 100644
>>>> --- a/fs/btrfs/inode.c
>>>> +++ b/fs/btrfs/inode.c
>>>> @@ -1847,11 +1847,20 @@ struct can_nocow_file_extent_args {
>>>>            */
>>>>           bool free_path;
>>>>
>>>> -       /* Output fields. Only set when can_nocow_file_extent() returns 1. */
>>>> +       /*
>>>> +        * Output fields. Only set when can_nocow_file_extent() returns 1.
>>>> +        *
>>>> +        * @block_start:        The bytenr of the new nocow write should be at.
>>>> +        * @orig_disk_bytenr:   The original data extent's disk_bytenr.
>>>
>>> This orig_disk_bytenr field is not defined anywhere in this patch.
>>>
>>> Thanks.
>>>
>>>> +        * @orig_disk_num_bytes:The original data extent's disk_num_bytes.
>>>> +        * @orig_offset:        The original offset inside the old data extent.
>>>> +        *                      Caller should calculate their own
>>>> +        *                      btrfs_file_extent_item::offset base on this.
>>>> +        */
>>>>
>>>> -       u64 disk_bytenr;
>>>> -       u64 disk_num_bytes;
>>>> -       u64 extent_offset;
>>>> +       u64 block_start;
>>>> +       u64 orig_disk_num_bytes;
>>>> +       u64 orig_offset;
>>>>           /* Number of bytes that can be written to in NOCOW mode. */
>>>>           u64 num_bytes;
>>>>    };
>>>> @@ -1887,9 +1896,9 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>>>>                   goto out;
>>>>
>>>>           /* Can't access these fields unless we know it's not an inline extent. */
>>>> -       args->disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>>>> -       args->disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
>>>> -       args->extent_offset = btrfs_file_extent_offset(leaf, fi);
>>>> +       args->block_start = btrfs_file_extent_disk_bytenr(leaf, fi);
>>>> +       args->orig_disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
>>>> +       args->orig_offset = btrfs_file_extent_offset(leaf, fi);
>>>>
>>>>           if (!(inode->flags & BTRFS_INODE_NODATACOW) &&
>>>>               extent_type == BTRFS_FILE_EXTENT_REG)
>>>> @@ -1906,7 +1915,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>>>>                   goto out;
>>>>
>>>>           /* An explicit hole, must COW. */
>>>> -       if (args->disk_bytenr == 0)
>>>> +       if (args->block_start == 0)
>>>>                   goto out;
>>>>
>>>>           /* Compressed/encrypted/encoded extents must be COWed. */
>>>> @@ -1925,8 +1934,8 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>>>>           btrfs_release_path(path);
>>>>
>>>>           ret = btrfs_cross_ref_exist(root, btrfs_ino(inode),
>>>> -                                   key->offset - args->extent_offset,
>>>> -                                   args->disk_bytenr, args->strict, path);
>>>> +                                   key->offset - args->orig_offset,
>>>> +                                   args->block_start, args->strict, path);
>>>>           WARN_ON_ONCE(ret > 0 && is_freespace_inode);
>>>>           if (ret != 0)
>>>>                   goto out;
>>>> @@ -1947,15 +1956,15 @@ static int can_nocow_file_extent(struct btrfs_path *path,
>>>>               atomic_read(&root->snapshot_force_cow))
>>>>                   goto out;
>>>>
>>>> -       args->disk_bytenr += args->extent_offset;
>>>> -       args->disk_bytenr += args->start - key->offset;
>>>> +       args->block_start += args->orig_offset;
>>>> +       args->block_start += args->start - key->offset;
>>>>           args->num_bytes = min(args->end + 1, extent_end) - args->start;
>>>>
>>>>           /*
>>>>            * Force COW if csums exist in the range. This ensures that csums for a
>>>>            * given extent are either valid or do not exist.
>>>>            */
>>>> -       ret = csum_exist_in_range(root->fs_info, args->disk_bytenr, args->num_bytes,
>>>> +       ret = csum_exist_in_range(root->fs_info, args->block_start, args->num_bytes,
>>>>                                     nowait);
>>>>           WARN_ON_ONCE(ret > 0 && is_freespace_inode);
>>>>           if (ret != 0)
>>>> @@ -2112,7 +2121,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>>>>                           goto must_cow;
>>>>
>>>>                   ret = 0;
>>>> -               nocow_bg = btrfs_inc_nocow_writers(fs_info, nocow_args.disk_bytenr);
>>>> +               nocow_bg = btrfs_inc_nocow_writers(fs_info, nocow_args.block_start);
>>>>                   if (!nocow_bg) {
>>>>    must_cow:
>>>>                           /*
>>>> @@ -2151,14 +2160,14 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>>>>                   nocow_end = cur_offset + nocow_args.num_bytes - 1;
>>>>                   is_prealloc = extent_type == BTRFS_FILE_EXTENT_PREALLOC;
>>>>                   if (is_prealloc) {
>>>> -                       u64 orig_start = found_key.offset - nocow_args.extent_offset;
>>>> +                       u64 orig_start = found_key.offset - nocow_args.orig_offset;
>>>>                           struct extent_map *em;
>>>>
>>>>                           em = create_io_em(inode, cur_offset, nocow_args.num_bytes,
>>>>                                             orig_start,
>>>> -                                         nocow_args.disk_bytenr, /* block_start */
>>>> +                                         nocow_args.block_start, /* block_start */
>>>>                                             nocow_args.num_bytes, /* block_len */
>>>> -                                         nocow_args.disk_num_bytes, /* orig_block_len */
>>>> +                                         nocow_args.orig_disk_num_bytes, /* orig_block_len */
>>>>                                             ram_bytes, BTRFS_COMPRESS_NONE,
>>>>                                             BTRFS_ORDERED_PREALLOC);
>>>>                           if (IS_ERR(em)) {
>>>> @@ -2171,7 +2180,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
>>>>
>>>>                   ordered = btrfs_alloc_ordered_extent(inode, cur_offset,
>>>>                                   nocow_args.num_bytes, nocow_args.num_bytes,
>>>> -                               nocow_args.disk_bytenr, nocow_args.num_bytes, 0,
>>>> +                               nocow_args.block_start, nocow_args.num_bytes, 0,
>>>>                                   is_prealloc
>>>>                                   ? (1 << BTRFS_ORDERED_PREALLOC)
>>>>                                   : (1 << BTRFS_ORDERED_NOCOW),
>>>> @@ -7189,7 +7198,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>>>>           }
>>>>
>>>>           ret = 0;
>>>> -       if (btrfs_extent_readonly(fs_info, nocow_args.disk_bytenr))
>>>> +       if (btrfs_extent_readonly(fs_info, nocow_args.block_start))
>>>>                   goto out;
>>>>
>>>>           if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW) &&
>>>> @@ -7206,9 +7215,9 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
>>>>           }
>>>>
>>>>           if (orig_start)
>>>> -               *orig_start = key.offset - nocow_args.extent_offset;
>>>> +               *orig_start = key.offset - nocow_args.orig_offset;
>>>>           if (orig_block_len)
>>>> -               *orig_block_len = nocow_args.disk_num_bytes;
>>>> +               *orig_block_len = nocow_args.orig_disk_num_bytes;
>>>>
>>>>           *len = nocow_args.num_bytes;
>>>>           ret = 1;
>>>> --
>>>> 2.44.0
>>>>
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 2/8] btrfs: rename members of can_nocow_file_extent_args
  2024-04-12 22:00         ` Qu Wenruo
@ 2024-04-12 22:12           ` Qu Wenruo
  0 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2024-04-12 22:12 UTC (permalink / raw)
  To: Qu Wenruo, Filipe Manana; +Cc: linux-btrfs



在 2024/4/13 07:30, Qu Wenruo 写道:
> 
> 
> 在 2024/4/12 22:51, Filipe Manana 写道:
>> On Thu, Apr 11, 2024 at 11:03 PM Qu Wenruo <quwenruo.btrfs@gmx.com> 
>> wrote:
> [...]
>>>
>>> Well, the new @block_start matches the old extent_map::block_start.
>>
>> So it becomes a single exception, different from everywhere else.
>> Doesn't seem like a good thing in general.
> 
> OK, I can get rid of the @block_start name.
> 
>>
>>>
>>> I have to say, we do not have a solid definition on "disk_bytenr" in the
>>> first place.
>>
>> Well I find the name clear, it is a disk location measured by a byte 
>> address.
>> block_start is not so clear for anyone not familiar with btrfs'
>> internals, it makes me think of a block number and wonder what's the
>> block size, etc.
>>
>>>
>>> Should it always match ondisk file_extent_item::disk_bytenr, or should
>>> it act like "block_start" of the old extent_map?
>>
>> It's always about a range of a file extent item, be it the whole range
>> or just a part of it.
>> I don't see why it's confusing to use disk_bytenr, etc.
>> I find it more confusing to use something else, or at least what's
>> being proposed in this patch.
> 
> Well, IMHO since we take the name @disk_bytenr from btrfs file extent
> item, and btrfs file extent uses @disk_bytenr to uniquely locate a data
> extent, then we should also follow it to use @disk_bytenr for the same
> purpose.
> 
> So that every time we see the name @disk_bytenr, we know it can be used
> to locate a data extent, without any need for weird offset calculation.
> 
> That's why I'm strongly against adding any offset into @disk_bytenr.
> And I believe that's the biggest difference in our points of view.
> 
> Although in this particular case, I can use some extra prefixs like
> "orig_" or "fe_" (for file extent), so that those members can be later
> directly passed to create_io_em() without extra offset calculation.
> 
> Would that be a acceptable trade-off?
> 
> 
> Another solution would be just drop this patch, and do extra calulation
> resulting something like this:
> 
>      create_io_em(...,
>               disk_bytenr - whatever_offset, /* disk_bytenr */
>               offset - whatever_offset, /* offset */
>               PREALLOC, ...);
> 
> At least that does not sound sane to me, and can be bug prune.
> You won't believe how many different crashes I hit just due to the weird
> disk_bytenr calculation here, and that's the biggest reason I have

the extra sanity checks.

> 
> Thanks,
> Qu
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-04-12 22:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-08 22:33 [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start Qu Wenruo
2024-04-08 22:33 ` [PATCH RFC 1/8] btrfs: rename extent_map::orig_block_len to disk_num_bytes Qu Wenruo
2024-04-09 14:58   ` David Sterba
2024-04-09 21:38     ` Qu Wenruo
2024-04-08 22:33 ` [PATCH RFC 2/8] btrfs: rename members of can_nocow_file_extent_args Qu Wenruo
2024-04-11 14:46   ` Filipe Manana
2024-04-11 22:03     ` Qu Wenruo
2024-04-12 13:21       ` Filipe Manana
2024-04-12 22:00         ` Qu Wenruo
2024-04-12 22:12           ` Qu Wenruo
2024-04-08 22:33 ` [PATCH RFC 3/8] btrfs: introduce new members for extent_map Qu Wenruo
2024-04-11 14:56   ` Filipe Manana
2024-04-11 21:52     ` Qu Wenruo
2024-04-08 22:33 ` [PATCH RFC 4/8] btrfs: introduce extra sanity checks for extent maps Qu Wenruo
2024-04-08 22:33 ` [PATCH RFC 5/8] btrfs: remove extent_map::orig_start member Qu Wenruo
2024-04-09 14:59   ` David Sterba
2024-04-08 22:33 ` [PATCH RFC 6/8] btrfs: remove extent_map::block_len member Qu Wenruo
2024-04-08 22:33 ` [PATCH RFC 7/8] btrfs: remove extent_map::block_start member Qu Wenruo
2024-04-08 22:33 ` [PATCH RFC 8/8] btrfs: reorder disk_bytenr/disk_num_bytes/ram_bytes/offset parameters Qu Wenruo
2024-04-09 14:57 ` [PATCH 0/8] btrfs: extent-map: use disk_bytenr/offset to replace block_start/block_len/orig_start David Sterba
2024-04-09 21:40   ` Qu Wenruo
2024-04-09 22:18     ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox