[PATCH v2 0/5] fiemap extension for more physical information

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/5] fiemap extension for more physical information
@ 2024-03-28  1:22 Sweet Tea Dorminy
  2024-03-28  1:22 ` [PATCH v2 1/5] fs: fiemap: add physical_length field to extents Sweet Tea Dorminy
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-28  1:22 UTC (permalink / raw)
  To: Jonathan Corbet, Chris Mason, Josef Bacik, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Sweet Tea Dorminy,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team

For many years, various btrfs users have written programs to discover
the actual disk space used by files, using root-only interfaces.
However, this information is a great fit for fiemap: it is inherently
tied to extent information, all filesystems can use it, and the
capabilities required for FIEMAP make sense for this additional
information also.

Hence, this patchset adds various additional information to fiemap,
and extends btrfs to return it.  This uses some of the reserved padding
in the fiemap extent structure, so programs unaware of the changes
will be unaffected.

This is based on next-20240325. I've tested the btrfs part of this with
the standard btrfs testing matrix locally, and verified that the physical extent
information returned there is correct.

Changelog:

v2:
 - Adopted PHYS_LEN flag and COMPRESSED flag from the previous version,
   as per Andreas Dilger' comment.
   https://patchwork.ozlabs.org/project/linux-ext4/patch/4f8d5dc5b51a43efaf16c39398c23a6276e40a30.1386778303.git.dsterba@suse.cz/

v1: https://lore.kernel.org/linux-fsdevel/20240315030334.GQ6184@frogsfrogsfrogs/T/#t

Sweet Tea Dorminy (5):
  fs: fiemap: add physical_length field to extents
  fs: fiemap: update fiemap_fill_next_extent() signature
  fs: fiemap: add new COMPRESSED extent state
  btrfs: fiemap: emit new COMPRESSED fiemap state.
  btrfs: fiemap: return extent physical size

 Documentation/filesystems/fiemap.rst | 35 ++++++++++----
 fs/bcachefs/fs.c                     |  7 +--
 fs/btrfs/extent_io.c                 | 72 ++++++++++++++++++----------
 fs/ext4/extents.c                    |  1 +
 fs/f2fs/data.c                       |  8 ++--
 fs/f2fs/inline.c                     |  3 +-
 fs/ioctl.c                           | 11 +++--
 fs/iomap/fiemap.c                    |  2 +-
 fs/nilfs2/inode.c                    |  6 +--
 fs/ntfs3/frecord.c                   |  7 +--
 fs/ocfs2/extent_map.c                |  4 +-
 fs/smb/client/smb2ops.c              |  1 +
 include/linux/fiemap.h               |  2 +-
 include/uapi/linux/fiemap.h          | 34 ++++++++++---
 14 files changed, 130 insertions(+), 63 deletions(-)

base-commit: 1fdad13606e104ff103ca19d2d660830cb36d43e
-- 
2.43.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/5] fs: fiemap: add physical_length field to extents
  2024-03-28  1:22 [PATCH v2 0/5] fiemap extension for more physical information Sweet Tea Dorminy
@ 2024-03-28  1:22 ` Sweet Tea Dorminy
  2024-03-28  1:22 ` [PATCH v2 2/5] fs: fiemap: update fiemap_fill_next_extent() signature Sweet Tea Dorminy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-28  1:22 UTC (permalink / raw)
  To: Jonathan Corbet, Chris Mason, Josef Bacik, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Sweet Tea Dorminy,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team

Some filesystems support compressed extents which have a larger logical
size than physical, and for those filesystems, it can be useful for
userspace to know how much space those extents actually use. For
instance, the compsize [1] tool for btrfs currently uses btrfs-internal,
root-only ioctl to find the actual disk space used by a file; it would
be better and more useful for this information to require fewer
privileges and to be usable on more filesystems. Therefore, use one of
the padding u64s in the fiemap extent structure to return the actual
physical length; and, for now, return this as equal to the logical
length.

[1] https://github.com/kilobyte/compsize

Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
---
 Documentation/filesystems/fiemap.rst | 28 +++++++++++++++++-------
 fs/ioctl.c                           |  3 ++-
 include/uapi/linux/fiemap.h          | 32 ++++++++++++++++++++++------
 3 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/Documentation/filesystems/fiemap.rst b/Documentation/filesystems/fiemap.rst
index 93fc96f760aa..c2bfa107c8d7 100644
--- a/Documentation/filesystems/fiemap.rst
+++ b/Documentation/filesystems/fiemap.rst
@@ -80,14 +80,24 @@ Each extent is described by a single fiemap_extent structure as
 returned in fm_extents::
 
     struct fiemap_extent {
-	    __u64	fe_logical;  /* logical offset in bytes for the start of
-				* the extent */
-	    __u64	fe_physical; /* physical offset in bytes for the start
-				* of the extent */
-	    __u64	fe_length;   /* length in bytes for the extent */
-	    __u64	fe_reserved64[2];
-	    __u32	fe_flags;    /* FIEMAP_EXTENT_* flags for this extent */
-	    __u32	fe_reserved[3];
+            /*
+             * logical offset in bytes for the start of
+             * the extent from the beginning of the file
+             */
+            __u64 fe_logical;
+            /*
+             * physical offset in bytes for the start
+             * of the extent from the beginning of the disk
+             */
+            __u64 fe_physical;
+            /* logical length in bytes for this extent */
+            __u64 fe_logical_length;
+            /* physical length in bytes for this extent */
+            __u64 fe_physical_length;
+            __u64 fe_reserved64[1];
+            /* FIEMAP_EXTENT_* flags for this extent */
+            __u32 fe_flags;
+            __u32 fe_reserved[3];
     };
 
 All offsets and lengths are in bytes and mirror those on disk.  It is valid
@@ -175,6 +185,8 @@ FIEMAP_EXTENT_MERGED
   userspace would be highly inefficient, the kernel will try to merge most
   adjacent blocks into 'extents'.
 
+FIEMAP_EXTENT_HAS_PHYS_LEN
+  This will be set if the file system populated the physical length field.
 
 VFS -> File System Implementation
 ---------------------------------
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 1d5abfdf0f22..850ad46fd923 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -138,7 +138,8 @@ int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical,
 	memset(&extent, 0, sizeof(extent));
 	extent.fe_logical = logical;
 	extent.fe_physical = phys;
-	extent.fe_length = len;
+	extent.fe_logical_length = len;
+	extent.fe_physical_length = len;
 	extent.fe_flags = flags;
 
 	dest += fieinfo->fi_extents_mapped;
diff --git a/include/uapi/linux/fiemap.h b/include/uapi/linux/fiemap.h
index 24ca0c00cae3..3079159b8e94 100644
--- a/include/uapi/linux/fiemap.h
+++ b/include/uapi/linux/fiemap.h
@@ -14,14 +14,30 @@
 
 #include <linux/types.h>
 
+/*
+ * For backward compatibility, where the member of the struct was called
+ * fe_length instead of fe_logical_length.
+ */
+#define fe_length fe_logical_length
+
 struct fiemap_extent {
-	__u64 fe_logical;  /* logical offset in bytes for the start of
-			    * the extent from the beginning of the file */
-	__u64 fe_physical; /* physical offset in bytes for the start
-			    * of the extent from the beginning of the disk */
-	__u64 fe_length;   /* length in bytes for this extent */
-	__u64 fe_reserved64[2];
-	__u32 fe_flags;    /* FIEMAP_EXTENT_* flags for this extent */
+	/*
+	 * logical offset in bytes for the start of
+	 * the extent from the beginning of the file
+	 */
+	__u64 fe_logical;
+	/*
+	 * physical offset in bytes for the start
+	 * of the extent from the beginning of the disk
+	 */
+	__u64 fe_physical;
+	/* logical length in bytes for this extent */
+	__u64 fe_logical_length;
+	/* physical length in bytes for this extent */
+	__u64 fe_physical_length;
+	__u64 fe_reserved64[1];
+	/* FIEMAP_EXTENT_* flags for this extent */
+	__u32 fe_flags;
 	__u32 fe_reserved[3];
 };
 
@@ -66,5 +82,7 @@ struct fiemap {
 						    * merged for efficiency. */
 #define FIEMAP_EXTENT_SHARED		0x00002000 /* Space shared with other
 						    * files. */
+#define FIEMAP_EXTENT_HAS_PHYS_LEN	0x00004000 /* Physical length is valid
+						    * and set by FS. */
 
 #endif /* _UAPI_LINUX_FIEMAP_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/5] fs: fiemap: update fiemap_fill_next_extent() signature
  2024-03-28  1:22 [PATCH v2 0/5] fiemap extension for more physical information Sweet Tea Dorminy
  2024-03-28  1:22 ` [PATCH v2 1/5] fs: fiemap: add physical_length field to extents Sweet Tea Dorminy
@ 2024-03-28  1:22 ` Sweet Tea Dorminy
  2024-03-28  1:22 ` [PATCH v2 3/5] fs: fiemap: add new COMPRESSED extent state Sweet Tea Dorminy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-28  1:22 UTC (permalink / raw)
  To: Jonathan Corbet, Chris Mason, Josef Bacik, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Sweet Tea Dorminy,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team

Update the signature of fiemap_fill_next_extent() to allow passing a
physical length. Update all callers to pass a 0 physical length -- since
none set the EXTENT_HAS_PHYS_LEN flag, this value doesn't matter.

Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
---
 Documentation/filesystems/fiemap.rst | 3 ++-
 fs/bcachefs/fs.c                     | 7 ++++---
 fs/btrfs/extent_io.c                 | 4 ++--
 fs/ext4/extents.c                    | 1 +
 fs/f2fs/data.c                       | 8 +++++---
 fs/f2fs/inline.c                     | 3 ++-
 fs/ioctl.c                           | 9 +++++----
 fs/iomap/fiemap.c                    | 2 +-
 fs/nilfs2/inode.c                    | 6 +++---
 fs/ntfs3/frecord.c                   | 7 ++++---
 fs/ocfs2/extent_map.c                | 4 ++--
 fs/smb/client/smb2ops.c              | 1 +
 include/linux/fiemap.h               | 2 +-
 13 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/Documentation/filesystems/fiemap.rst b/Documentation/filesystems/fiemap.rst
index c2bfa107c8d7..c060bb83f5d8 100644
--- a/Documentation/filesystems/fiemap.rst
+++ b/Documentation/filesystems/fiemap.rst
@@ -236,7 +236,8 @@ For each extent in the request range, the file system should call
 the helper function, fiemap_fill_next_extent()::
 
   int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
-			      u64 phys, u64 len, u32 flags, u32 dev);
+			      u64 phys, u64 log_len, u64 phys_len, u32 flags,
+                              u32 dev);
 
 fiemap_fill_next_extent() will use the passed values to populate the
 next free extent in the fm_extents array. 'General' extent flags will
diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index 99a0abeadbe2..05f488d06099 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -931,7 +931,8 @@ static int bch2_fill_extent(struct bch_fs *c,
 			ret = fiemap_fill_next_extent(info,
 						bkey_start_offset(k.k) << 9,
 						offset << 9,
-						k.k->size << 9, flags|flags2);
+						k.k->size << 9, 0,
+						flags|flags2);
 			if (ret)
 				return ret;
 		}
@@ -940,13 +941,13 @@ static int bch2_fill_extent(struct bch_fs *c,
 	} else if (bkey_extent_is_inline_data(k.k)) {
 		return fiemap_fill_next_extent(info,
 					       bkey_start_offset(k.k) << 9,
-					       0, k.k->size << 9,
+					       0, k.k->size << 9, 0,
 					       flags|
 					       FIEMAP_EXTENT_DATA_INLINE);
 	} else if (k.k->type == KEY_TYPE_reservation) {
 		return fiemap_fill_next_extent(info,
 					       bkey_start_offset(k.k) << 9,
-					       0, k.k->size << 9,
+					       0, k.k->size << 9, 0,
 					       flags|
 					       FIEMAP_EXTENT_DELALLOC|
 					       FIEMAP_EXTENT_UNWRITTEN);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7441245b1ceb..8503ee8ef897 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2527,7 +2527,7 @@ static int flush_fiemap_cache(struct fiemap_extent_info *fieinfo,
 		int ret;
 
 		ret = fiemap_fill_next_extent(fieinfo, entry->offset,
-					      entry->phys, entry->len,
+					      entry->phys, entry->len, 0,
 					      entry->flags);
 		/*
 		 * Ignore 1 (reached max entries) because we keep track of that
@@ -2743,7 +2743,7 @@ static int emit_last_fiemap_cache(struct fiemap_extent_info *fieinfo,
 		return 0;
 
 	ret = fiemap_fill_next_extent(fieinfo, cache->offset, cache->phys,
-				      cache->len, cache->flags);
+				      cache->len, 0, cache->flags);
 	cache->cached = false;
 	if (ret > 0)
 		ret = 0;
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e57054bdc5fd..2adade3c202a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2215,6 +2215,7 @@ static int ext4_fill_es_cache_info(struct inode *inode,
 				(__u64)es.es_lblk << blksize_bits,
 				(__u64)es.es_pblk << blksize_bits,
 				(__u64)es.es_len << blksize_bits,
+				0,
 				flags);
 		if (next == 0)
 			break;
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index d9494b5fc7c1..87f8d828e038 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1834,7 +1834,8 @@ static int f2fs_xattr_fiemap(struct inode *inode,
 		if (!xnid)
 			flags |= FIEMAP_EXTENT_LAST;
 
-		err = fiemap_fill_next_extent(fieinfo, 0, phys, len, flags);
+		err = fiemap_fill_next_extent(
+				fieinfo, 0, phys, len, 0, flags);
 		trace_f2fs_fiemap(inode, 0, phys, len, flags, err);
 		if (err)
 			return err;
@@ -1860,7 +1861,8 @@ static int f2fs_xattr_fiemap(struct inode *inode,
 	}
 
 	if (phys) {
-		err = fiemap_fill_next_extent(fieinfo, 0, phys, len, flags);
+		err = fiemap_fill_next_extent(
+				fieinfo, 0, phys, len, 0, flags);
 		trace_f2fs_fiemap(inode, 0, phys, len, flags, err);
 	}
 
@@ -1979,7 +1981,7 @@ int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 			flags |= FIEMAP_EXTENT_DATA_ENCRYPTED;
 
 		ret = fiemap_fill_next_extent(fieinfo, logical,
-				phys, size, flags);
+				phys, size, 0, flags);
 		trace_f2fs_fiemap(inode, logical, phys, size, flags, ret);
 		if (ret)
 			goto out;
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index ac00423f117b..49d2f87fe048 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -806,7 +806,8 @@ int f2fs_inline_data_fiemap(struct inode *inode,
 	byteaddr = (__u64)ni.blk_addr << inode->i_sb->s_blocksize_bits;
 	byteaddr += (char *)inline_data_addr(inode, ipage) -
 					(char *)F2FS_INODE(ipage);
-	err = fiemap_fill_next_extent(fieinfo, start, byteaddr, ilen, flags);
+	err = fiemap_fill_next_extent(
+			fieinfo, start, byteaddr, ilen, 0, flags);
 	trace_f2fs_fiemap(inode, start, byteaddr, ilen, flags, err);
 out:
 	f2fs_put_page(ipage, 1);
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 850ad46fd923..1ecd46608ded 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -99,7 +99,8 @@ static int ioctl_fibmap(struct file *filp, int __user *p)
  * @fieinfo:	Fiemap context passed into ->fiemap
  * @logical:	Extent logical start offset, in bytes
  * @phys:	Extent physical start offset, in bytes
- * @len:	Extent length, in bytes
+ * @log_len:	Extent logical length, in bytes
+ * @phys_len:	Extent physical length, in bytes (optional)
  * @flags:	FIEMAP_EXTENT flags that describe this extent
  *
  * Called from file system ->fiemap callback. Will populate extent
@@ -110,7 +111,7 @@ static int ioctl_fibmap(struct file *filp, int __user *p)
  * extent that will fit in user array.
  */
 int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical,
-			    u64 phys, u64 len, u32 flags)
+			    u64 phys, u64 log_len, u64 phys_len, u32 flags)
 {
 	struct fiemap_extent extent;
 	struct fiemap_extent __user *dest = fieinfo->fi_extents_start;
@@ -138,8 +139,8 @@ int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical,
 	memset(&extent, 0, sizeof(extent));
 	extent.fe_logical = logical;
 	extent.fe_physical = phys;
-	extent.fe_logical_length = len;
-	extent.fe_physical_length = len;
+	extent.fe_logical_length = log_len;
+	extent.fe_physical_length = phys_len;
 	extent.fe_flags = flags;
 
 	dest += fieinfo->fi_extents_mapped;
diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
index 610ca6f1ec9b..013e843c8d10 100644
--- a/fs/iomap/fiemap.c
+++ b/fs/iomap/fiemap.c
@@ -36,7 +36,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 
 	return fiemap_fill_next_extent(fi, iomap->offset,
 			iomap->addr != IOMAP_NULL_ADDR ? iomap->addr : 0,
-			iomap->length, flags);
+			iomap->length, 0, flags);
 }
 
 static loff_t iomap_fiemap_iter(const struct iomap_iter *iter,
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 7340a01d80e1..4d3c347c982b 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -1190,7 +1190,7 @@ int nilfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 			if (size) {
 				/* End of the current extent */
 				ret = fiemap_fill_next_extent(
-					fieinfo, logical, phys, size, flags);
+					fieinfo, logical, phys, size, 0, flags);
 				if (ret)
 					break;
 			}
@@ -1240,7 +1240,7 @@ int nilfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 					flags |= FIEMAP_EXTENT_LAST;
 
 				ret = fiemap_fill_next_extent(
-					fieinfo, logical, phys, size, flags);
+					fieinfo, logical, phys, size, 0, flags);
 				if (ret)
 					break;
 				size = 0;
@@ -1256,7 +1256,7 @@ int nilfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 					/* Terminate the current extent */
 					ret = fiemap_fill_next_extent(
 						fieinfo, logical, phys, size,
-						flags);
+						0, flags);
 					if (ret || blkoff > end_blkoff)
 						break;
 
diff --git a/fs/ntfs3/frecord.c b/fs/ntfs3/frecord.c
index 7f27382e0ce2..ef0ed913428b 100644
--- a/fs/ntfs3/frecord.c
+++ b/fs/ntfs3/frecord.c
@@ -1947,7 +1947,7 @@ int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo,
 	if (!attr || !attr->non_res) {
 		err = fiemap_fill_next_extent(
 			fieinfo, 0, 0,
-			attr ? le32_to_cpu(attr->res.data_size) : 0,
+			attr ? le32_to_cpu(attr->res.data_size) : 0, 0,
 			FIEMAP_EXTENT_DATA_INLINE | FIEMAP_EXTENT_LAST |
 				FIEMAP_EXTENT_MERGED);
 		goto out;
@@ -2042,7 +2042,7 @@ int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo,
 				flags |= FIEMAP_EXTENT_LAST;
 
 			err = fiemap_fill_next_extent(fieinfo, vbo, lbo, dlen,
-						      flags);
+						      0, flags);
 			if (err < 0)
 				break;
 			if (err == 1) {
@@ -2062,7 +2062,8 @@ int ni_fiemap(struct ntfs_inode *ni, struct fiemap_extent_info *fieinfo,
 		if (vbo + bytes >= end)
 			flags |= FIEMAP_EXTENT_LAST;
 
-		err = fiemap_fill_next_extent(fieinfo, vbo, lbo, bytes, flags);
+		err = fiemap_fill_next_extent(fieinfo, vbo, lbo, bytes, 0,
+					      flags);
 		if (err < 0)
 			break;
 		if (err == 1) {
diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index 70a768b623cf..eabdf97cd685 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -723,7 +723,7 @@ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
 					 id2.i_data.id_data);
 
 		ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count,
-					      flags);
+					      0, flags);
 		if (ret < 0)
 			return ret;
 	}
@@ -794,7 +794,7 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb->s_clustersize_bits;
 
 		ret = fiemap_fill_next_extent(fieinfo, virt_bytes, phys_bytes,
-					      len_bytes, fe_flags);
+					      len_bytes, 0, fe_flags);
 		if (ret)
 			break;
 
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c
index 2ed456948f34..c7404a78fe03 100644
--- a/fs/smb/client/smb2ops.c
+++ b/fs/smb/client/smb2ops.c
@@ -3778,6 +3778,7 @@ static int smb3_fiemap(struct cifs_tcon *tcon,
 				le64_to_cpu(out_data[i].file_offset),
 				le64_to_cpu(out_data[i].file_offset),
 				le64_to_cpu(out_data[i].length),
+				0,
 				flags);
 		if (rc < 0)
 			goto out;
diff --git a/include/linux/fiemap.h b/include/linux/fiemap.h
index c50882f19235..17a6c32cdf3f 100644
--- a/include/linux/fiemap.h
+++ b/include/linux/fiemap.h
@@ -16,6 +16,6 @@ struct fiemap_extent_info {
 int fiemap_prep(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		u64 start, u64 *len, u32 supported_flags);
 int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
-			    u64 phys, u64 len, u32 flags);
+			    u64 phys, u64 log_len, u64 phys_len, u32 flags);
 
 #endif /* _LINUX_FIEMAP_H 1 */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/5] fs: fiemap: add new COMPRESSED extent state
  2024-03-28  1:22 [PATCH v2 0/5] fiemap extension for more physical information Sweet Tea Dorminy
  2024-03-28  1:22 ` [PATCH v2 1/5] fs: fiemap: add physical_length field to extents Sweet Tea Dorminy
  2024-03-28  1:22 ` [PATCH v2 2/5] fs: fiemap: update fiemap_fill_next_extent() signature Sweet Tea Dorminy
@ 2024-03-28  1:22 ` Sweet Tea Dorminy
  2024-03-28  1:22 ` [PATCH v2 4/5] btrfs: fiemap: emit new COMPRESSED fiemap state Sweet Tea Dorminy
  2024-03-28  1:22 ` [PATCH v2 5/5] btrfs: fiemap: return extent physical size Sweet Tea Dorminy
  4 siblings, 0 replies; 14+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-28  1:22 UTC (permalink / raw)
  To: Jonathan Corbet, Chris Mason, Josef Bacik, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Sweet Tea Dorminy,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team

This goes closely with the new physical length field in struct
fiemap_extent, as when physical length is not equal to logical length
the reason is frequently compression.

Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
---
 Documentation/filesystems/fiemap.rst | 4 ++++
 fs/ioctl.c                           | 3 ++-
 include/uapi/linux/fiemap.h          | 2 ++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/fiemap.rst b/Documentation/filesystems/fiemap.rst
index c060bb83f5d8..16bd7faba5e0 100644
--- a/Documentation/filesystems/fiemap.rst
+++ b/Documentation/filesystems/fiemap.rst
@@ -162,6 +162,10 @@ FIEMAP_EXTENT_DATA_ENCRYPTED
   This will also set FIEMAP_EXTENT_ENCODED
   The data in this extent has been encrypted by the file system.
 
+FIEMAP_EXTENT_DATA_COMPRESSED
+  This will also set FIEMAP_EXTENT_ENCODED
+  The data in this extent is compressed by the file system.
+
 FIEMAP_EXTENT_NOT_ALIGNED
   Extent offsets and length are not guaranteed to be block aligned.
 
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 1ecd46608ded..5d5a8fedc953 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -126,7 +126,8 @@ int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo, u64 logical,
 		return 1;
 
 #define SET_UNKNOWN_FLAGS	(FIEMAP_EXTENT_DELALLOC)
-#define SET_NO_UNMOUNTED_IO_FLAGS	(FIEMAP_EXTENT_DATA_ENCRYPTED)
+#define SET_NO_UNMOUNTED_IO_FLAGS	(FIEMAP_EXTENT_DATA_ENCRYPTED|\
+					 FIEMAP_EXTENT_DATA_COMPRESSED)
 #define SET_NOT_ALIGNED_FLAGS	(FIEMAP_EXTENT_DATA_TAIL|FIEMAP_EXTENT_DATA_INLINE)
 
 	if (flags & SET_UNKNOWN_FLAGS)
diff --git a/include/uapi/linux/fiemap.h b/include/uapi/linux/fiemap.h
index 3079159b8e94..ea97e33ddbb3 100644
--- a/include/uapi/linux/fiemap.h
+++ b/include/uapi/linux/fiemap.h
@@ -67,6 +67,8 @@ struct fiemap {
 						    * Sets EXTENT_UNKNOWN. */
 #define FIEMAP_EXTENT_ENCODED		0x00000008 /* Data can not be read
 						    * while fs is unmounted */
+#define FIEMAP_EXTENT_DATA_COMPRESSED	0x00000040 /* Data is compressed by fs.
+						    * Sets EXTENT_ENCODED. */
 #define FIEMAP_EXTENT_DATA_ENCRYPTED	0x00000080 /* Data is encrypted by fs.
 						    * Sets EXTENT_NO_BYPASS. */
 #define FIEMAP_EXTENT_NOT_ALIGNED	0x00000100 /* Extent offsets may not be
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/5] btrfs: fiemap: emit new COMPRESSED fiemap state.
  2024-03-28  1:22 [PATCH v2 0/5] fiemap extension for more physical information Sweet Tea Dorminy
                   ` (2 preceding siblings ...)
  2024-03-28  1:22 ` [PATCH v2 3/5] fs: fiemap: add new COMPRESSED extent state Sweet Tea Dorminy
@ 2024-03-28  1:22 ` Sweet Tea Dorminy
  2024-03-28  1:22 ` [PATCH v2 5/5] btrfs: fiemap: return extent physical size Sweet Tea Dorminy
  4 siblings, 0 replies; 14+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-28  1:22 UTC (permalink / raw)
  To: Jonathan Corbet, Chris Mason, Josef Bacik, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Sweet Tea Dorminy,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team

Now that fiemap has a compressed state flag, emit it on compressed
extents.

Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
---
 fs/btrfs/extent_io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8503ee8ef897..30fcbb9393fe 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2656,7 +2656,7 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
 			if (range_end <= cache_end)
 				return 0;
 
-			if (!(flags & (FIEMAP_EXTENT_ENCODED | FIEMAP_EXTENT_DELALLOC)))
+			if (!(flags & (FIEMAP_EXTENT_DATA_COMPRESSED | FIEMAP_EXTENT_DELALLOC)))
 				phys += cache_end - offset;
 
 			offset = cache_end;
@@ -3186,7 +3186,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 		}
 
 		if (compression != BTRFS_COMPRESS_NONE)
-			flags |= FIEMAP_EXTENT_ENCODED;
+			flags |= FIEMAP_EXTENT_DATA_COMPRESSED;
 
 		if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
 			flags |= FIEMAP_EXTENT_DATA_INLINE;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/5] btrfs: fiemap: return extent physical size
  2024-03-28  1:22 [PATCH v2 0/5] fiemap extension for more physical information Sweet Tea Dorminy
                   ` (3 preceding siblings ...)
  2024-03-28  1:22 ` [PATCH v2 4/5] btrfs: fiemap: emit new COMPRESSED fiemap state Sweet Tea Dorminy
@ 2024-03-28  1:22 ` Sweet Tea Dorminy
  2024-03-31  9:03   ` Qu Wenruo
  4 siblings, 1 reply; 14+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-28  1:22 UTC (permalink / raw)
  To: Jonathan Corbet, Chris Mason, Josef Bacik, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Sweet Tea Dorminy,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team

Now that fiemap allows returning extent physical size, make btrfs return
the appropriate extent's actual disk size.

Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
---
 fs/btrfs/extent_io.c | 70 ++++++++++++++++++++++++++++----------------
 1 file changed, 45 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 30fcbb9393fe..9921dc1567d6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2456,7 +2456,8 @@ int try_release_extent_mapping(struct page *page, gfp_t mask)
 struct btrfs_fiemap_entry {
 	u64 offset;
 	u64 phys;
-	u64 len;
+	u64 log_len;
+	u64 phys_len;
 	u32 flags;
 };
 
@@ -2514,7 +2515,8 @@ struct fiemap_cache {
 	/* Fields for the cached extent (unsubmitted, not ready, extent). */
 	u64 offset;
 	u64 phys;
-	u64 len;
+	u64 log_len;
+	u64 phys_len;
 	u32 flags;
 	bool cached;
 };
@@ -2527,8 +2529,8 @@ static int flush_fiemap_cache(struct fiemap_extent_info *fieinfo,
 		int ret;
 
 		ret = fiemap_fill_next_extent(fieinfo, entry->offset,
-					      entry->phys, entry->len, 0,
-					      entry->flags);
+					      entry->phys, entry->log_len,
+					      entry->phys_len, entry->flags);
 		/*
 		 * Ignore 1 (reached max entries) because we keep track of that
 		 * ourselves in emit_fiemap_extent().
@@ -2553,7 +2555,8 @@ static int flush_fiemap_cache(struct fiemap_extent_info *fieinfo,
  */
 static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
 				struct fiemap_cache *cache,
-				u64 offset, u64 phys, u64 len, u32 flags)
+				u64 offset, u64 phys, u64 log_len,
+				u64 phys_len, u32 flags)
 {
 	struct btrfs_fiemap_entry *entry;
 	u64 cache_end;
@@ -2561,6 +2564,9 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
 	/* Set at the end of extent_fiemap(). */
 	ASSERT((flags & FIEMAP_EXTENT_LAST) == 0);
 
+	/* We always set the correct physical length. */
+	flags |= FIEMAP_EXTENT_HAS_PHYS_LEN;
+
 	if (!cache->cached)
 		goto assign;
 
@@ -2596,7 +2602,7 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
 	 * or equals to what we have in cache->offset. We deal with this as
 	 * described below.
 	 */
-	cache_end = cache->offset + cache->len;
+	cache_end = cache->offset + cache->log_len;
 	if (cache_end > offset) {
 		if (offset == cache->offset) {
 			/*
@@ -2620,10 +2626,10 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
 			 * where a previously found file extent item was split
 			 * due to an ordered extent completing.
 			 */
-			cache->len = offset - cache->offset;
+			cache->log_len = offset - cache->offset;
 			goto emit;
 		} else {
-			const u64 range_end = offset + len;
+			const u64 range_end = offset + log_len;
 
 			/*
 			 * The offset of the file extent item we have just found
@@ -2656,11 +2662,13 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
 			if (range_end <= cache_end)
 				return 0;
 
-			if (!(flags & (FIEMAP_EXTENT_DATA_COMPRESSED | FIEMAP_EXTENT_DELALLOC)))
+			if (!(flags & (FIEMAP_EXTENT_DATA_COMPRESSED | FIEMAP_EXTENT_DELALLOC))) {
 				phys += cache_end - offset;
+				phys_len -= cache_end - offset;
+			}
 
 			offset = cache_end;
-			len = range_end - cache_end;
+			log_len = range_end - cache_end;
 			goto emit;
 		}
 	}
@@ -2670,15 +2678,17 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
 	 * 1) Their logical addresses are continuous
 	 *
 	 * 2) Their physical addresses are continuous
-	 *    So truly compressed (physical size smaller than logical size)
-	 *    extents won't get merged with each other
 	 *
 	 * 3) Share same flags
+	 *
+	 * 4) Not compressed
 	 */
-	if (cache->offset + cache->len  == offset &&
-	    cache->phys + cache->len == phys  &&
-	    cache->flags == flags) {
-		cache->len += len;
+	if (cache->offset + cache->log_len  == offset &&
+	    cache->phys + cache->log_len == phys  &&
+	    cache->flags == flags &&
+	    !(flags & FIEMAP_EXTENT_DATA_COMPRESSED)) {
+		cache->log_len += log_len;
+		cache->phys_len += phys_len;
 		return 0;
 	}
 
@@ -2695,7 +2705,7 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
 		 * to miss it.
 		 */
 		entry = &cache->entries[cache->entries_size - 1];
-		cache->next_search_offset = entry->offset + entry->len;
+		cache->next_search_offset = entry->offset + entry->log_len;
 		cache->cached = false;
 
 		return BTRFS_FIEMAP_FLUSH_CACHE;
@@ -2704,7 +2714,8 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
 	entry = &cache->entries[cache->entries_pos];
 	entry->offset = cache->offset;
 	entry->phys = cache->phys;
-	entry->len = cache->len;
+	entry->log_len = cache->log_len;
+	entry->phys_len = cache->phys_len;
 	entry->flags = cache->flags;
 	cache->entries_pos++;
 	cache->extents_mapped++;
@@ -2717,7 +2728,8 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
 	cache->cached = true;
 	cache->offset = offset;
 	cache->phys = phys;
-	cache->len = len;
+	cache->log_len = log_len;
+	cache->phys_len = phys_len;
 	cache->flags = flags;
 
 	return 0;
@@ -2743,7 +2755,8 @@ static int emit_last_fiemap_cache(struct fiemap_extent_info *fieinfo,
 		return 0;
 
 	ret = fiemap_fill_next_extent(fieinfo, cache->offset, cache->phys,
-				      cache->len, 0, cache->flags);
+				      cache->log_len, cache->phys_len,
+				      cache->flags);
 	cache->cached = false;
 	if (ret > 0)
 		ret = 0;
@@ -2937,13 +2950,15 @@ static int fiemap_process_hole(struct btrfs_inode *inode,
 			}
 			ret = emit_fiemap_extent(fieinfo, cache, prealloc_start,
 						 disk_bytenr + extent_offset,
-						 prealloc_len, prealloc_flags);
+						 prealloc_len, prealloc_len,
+						 prealloc_flags);
 			if (ret)
 				return ret;
 			extent_offset += prealloc_len;
 		}
 
 		ret = emit_fiemap_extent(fieinfo, cache, delalloc_start, 0,
+					 delalloc_end + 1 - delalloc_start,
 					 delalloc_end + 1 - delalloc_start,
 					 FIEMAP_EXTENT_DELALLOC |
 					 FIEMAP_EXTENT_UNKNOWN);
@@ -2984,7 +2999,8 @@ static int fiemap_process_hole(struct btrfs_inode *inode,
 		}
 		ret = emit_fiemap_extent(fieinfo, cache, prealloc_start,
 					 disk_bytenr + extent_offset,
-					 prealloc_len, prealloc_flags);
+					 prealloc_len, prealloc_len,
+					 prealloc_flags);
 		if (ret)
 			return ret;
 	}
@@ -3130,6 +3146,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 		u64 extent_offset = 0;
 		u64 extent_gen;
 		u64 disk_bytenr = 0;
+		u64 disk_size = 0;
 		u64 flags = 0;
 		int extent_type;
 		u8 compression;
@@ -3192,7 +3209,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 			flags |= FIEMAP_EXTENT_DATA_INLINE;
 			flags |= FIEMAP_EXTENT_NOT_ALIGNED;
 			ret = emit_fiemap_extent(fieinfo, &cache, key.offset, 0,
-						 extent_len, flags);
+						 extent_len, extent_len, flags);
 		} else if (extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
 			ret = fiemap_process_hole(inode, fieinfo, &cache,
 						  &delalloc_cached_state,
@@ -3207,6 +3224,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 						  backref_ctx, 0, 0, 0,
 						  key.offset, extent_end - 1);
 		} else {
+			disk_size = btrfs_file_extent_disk_num_bytes(leaf, ei);
 			/* We have a regular extent. */
 			if (fieinfo->fi_extents_max) {
 				ret = btrfs_is_data_extent_shared(inode,
@@ -3221,7 +3239,9 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 
 			ret = emit_fiemap_extent(fieinfo, &cache, key.offset,
 						 disk_bytenr + extent_offset,
-						 extent_len, flags);
+						 extent_len,
+						 disk_size - extent_offset,
+						 flags);
 		}
 
 		if (ret < 0) {
@@ -3259,7 +3279,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 		prev_extent_end = range_end;
 	}
 
-	if (cache.cached && cache.offset + cache.len >= last_extent_end) {
+	if (cache.cached && cache.offset + cache.log_len >= last_extent_end) {
 		const u64 i_size = i_size_read(&inode->vfs_inode);
 
 		if (prev_extent_end < i_size) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
  2024-03-28  1:22 ` [PATCH v2 5/5] btrfs: fiemap: return extent physical size Sweet Tea Dorminy
@ 2024-03-31  9:03   ` Qu Wenruo
  2024-04-03  5:32     ` Sweet Tea Dorminy
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2024-03-31  9:03 UTC (permalink / raw)
  To: Sweet Tea Dorminy, Jonathan Corbet, Chris Mason, Josef Bacik,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team

在 2024/3/28 11:52, Sweet Tea Dorminy 写道:
> Now that fiemap allows returning extent physical size, make btrfs return
> the appropriate extent's actual disk size.
> 
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
[...]
> @@ -3221,7 +3239,9 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
>   
>   			ret = emit_fiemap_extent(fieinfo, &cache, key.offset,
>   						 disk_bytenr + extent_offset,
> -						 extent_len, flags);
> +						 extent_len,
> +						 disk_size - extent_offset,

This means, we will emit a entry that uses the end to the physical 
extent end.

Considering a file layout like this:

	item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
		generation 7 type 1 (regular)
		extent data disk byte 13631488 nr 65536
		extent data offset 0 nr 4096 ram 65536
		extent compression 0 (none)
	item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
		generation 8 type 1 (regular)
		extent data disk byte 13697024 nr 4096
		extent data offset 0 nr 4096 ram 4096
		extent compression 0 (none)
	item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
		generation 7 type 1 (regular)
		extent data disk byte 13631488 nr 65536
		extent data offset 8192 nr 57344 ram 65536
		extent compression 0 (none)

For fiemap, we would got something like this:

fileoff 0, logical len 4k, phy 13631488, phy len 64K
fileoff 4k, logical len 4k, phy 13697024, phy len 4k
fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k

[HOW TO CALCULATE WASTED SPACE IN USER SPACE]
My concern is on the first entry. It indicates that we have wasted 60K 
(phy len is 64K, while logical len is only 4K)

But that information is not correct, as in reality we only wasted 4K, 
the remaining 56K is still referred by file range [8K, 64K).

Do you mean that user space program should maintain a mapping of each 
utilized physical range, and when handling the reported file range [8K, 
64K), the user space program should find that the physical range covers 
with one existing extent, and do calculation correctly?

[COMPRESSION REPRESENTATION]
The biggest problem other than the complexity in user space is the 
handling of compressed extents.

Should we return the physical bytenr (disk_bytenr of file extent item) 
directly or with the extent offset added?
Either way it doesn't look consistent to me, compared to non-compressed 
extents.

[ALTERNATIVE FORMAT]
The other alternative would be following the btrfs ondisk format, 
providing a unique physical bytenr for any file extent, then the 
offset/referred length inside the uncompressed extent.

That would handle compressed and regular extents more consistent, and a 
little easier for user space tool to handle (really just a tiny bit 
easier, no range overlap check needed), but more complex to represent, 
and I'm not sure if any other filesystem would be happy to accept the 
extra members they don't care.

Thanks,
Qu

> +						 flags);
>   		}
>   
>   		if (ret < 0) {
> @@ -3259,7 +3279,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
>   		prev_extent_end = range_end;
>   	}
>   
> -	if (cache.cached && cache.offset + cache.len >= last_extent_end) {
> +	if (cache.cached && cache.offset + cache.log_len >= last_extent_end) {
>   		const u64 i_size = i_size_read(&inode->vfs_inode);
>   
>   		if (prev_extent_end < i_size) {

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
  2024-03-31  9:03   ` Qu Wenruo
@ 2024-04-03  5:32     ` Sweet Tea Dorminy
  2024-04-03  5:52       ` Qu Wenruo
  2024-04-03  6:02       ` Sweet Tea Dorminy
  0 siblings, 2 replies; 14+ messages in thread
From: Sweet Tea Dorminy @ 2024-04-03  5:32 UTC (permalink / raw)
  To: Qu Wenruo, Jonathan Corbet, Chris Mason, Josef Bacik,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team

On 3/31/24 05:03, Qu Wenruo wrote:
> 
> 
> 在 2024/3/28 11:52, Sweet Tea Dorminy 写道:
>> Now that fiemap allows returning extent physical size, make btrfs return
>> the appropriate extent's actual disk size.
>>
>> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> [...]
>> @@ -3221,7 +3239,9 @@ int extent_fiemap(struct btrfs_inode *inode, 
>> struct fiemap_extent_info *fieinfo,
>>               ret = emit_fiemap_extent(fieinfo, &cache, key.offset,
>>                            disk_bytenr + extent_offset,
>> -                         extent_len, flags);
>> +                         extent_len,
>> +                         disk_size - extent_offset,
> 
> This means, we will emit a entry that uses the end to the physical 
> extent end.
> 
> Considering a file layout like this:
> 
>      item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>          generation 7 type 1 (regular)
>          extent data disk byte 13631488 nr 65536
>          extent data offset 0 nr 4096 ram 65536
>          extent compression 0 (none)
>      item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
>          generation 8 type 1 (regular)
>          extent data disk byte 13697024 nr 4096
>          extent data offset 0 nr 4096 ram 4096
>          extent compression 0 (none)
>      item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
>          generation 7 type 1 (regular)
>          extent data disk byte 13631488 nr 65536
>          extent data offset 8192 nr 57344 ram 65536
>          extent compression 0 (none)
> 
> For fiemap, we would got something like this:
> 
> fileoff 0, logical len 4k, phy 13631488, phy len 64K
> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
> 
> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
> My concern is on the first entry. It indicates that we have wasted 60K 
> (phy len is 64K, while logical len is only 4K)
> 
> But that information is not correct, as in reality we only wasted 4K, 
> the remaining 56K is still referred by file range [8K, 64K).
> 
> Do you mean that user space program should maintain a mapping of each 
> utilized physical range, and when handling the reported file range [8K, 
> 64K), the user space program should find that the physical range covers 
> with one existing extent, and do calculation correctly?

My goal is to give an unprivileged interface for tools like compsize to 
figure out how much space is used by a particular set of files. They 
report the total disk space referenced by the provided list of files, 
currently by doing a tree search (CAP_SYS_ADMIN) for all the extents 
pertaining to the requested files and deduplicating extents based on 
disk_bytenr.

It seems simplest to me for userspace for the kernel to emit the entire 
extent for each part of it referenced in a file, and let userspace deal 
with deduplicating extents. This is also most similar to the existing 
tree-search based interface. Reporting whole extents gives more 
flexibility for userspace to figure out how to report bookend extents, 
or shared extents, or ...

It does seem a little weird where if you request with fiemap only e.g. 
4k-16k range in that example file you'll get reported all 68k involved, 
but I can't figure out a way to fix that without having the kernel keep 
track of used parts of the extents as part of reporting, which sounds 
expensive.

You're right that I'm being inconsistent, taking off extent_offset from 
the reported disk size when that isn't what I should be doing, so I 
fixed that in v3.

> 
> [COMPRESSION REPRESENTATION]
> The biggest problem other than the complexity in user space is the 
> handling of compressed extents.
> 
> Should we return the physical bytenr (disk_bytenr of file extent item) 
> directly or with the extent offset added?
> Either way it doesn't look consistent to me, compared to non-compressed 
> extents.
> 

As I understand it, the goal of reporting physical bytenr is to provide 
a number which we could theoretically then resolve into a disk location 
or few if we cared, but which doesn't necessarily have any physical 
meaning. To quote the fiemap documentation page: "It is always undefined 
to try to update the data in-place by writing to the indicated location 
without the assistance of the filesystem". So I think I'd prefer to 
always report the entire size of the entire extent being referenced.

> [ALTERNATIVE FORMAT]
> The other alternative would be following the btrfs ondisk format, 
> providing a unique physical bytenr for any file extent, then the 
> offset/referred length inside the uncompressed extent.
> 
> That would handle compressed and regular extents more consistent, and a 
> little easier for user space tool to handle (really just a tiny bit 
> easier, no range overlap check needed), but more complex to represent, 
> and I'm not sure if any other filesystem would be happy to accept the 
> extra members they don't care.

I really want to make sure that this interface reports the unused space 
in e.g bookend extents well -- compsize has been an important tool for 
me in this respect, e.g. a time when a 10g file was taking up 110g of 
actual disk space. If we report the entire length of the entire extent, 
then when used on whole files one can establish the space referenced by 
that file but not used; similarly on multiple files. So while I like the 
simplicity of just reporting the used length, I don't think there's a 
way to make compsize unprivileged with that approach.

Thank you!!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
  2024-04-03  5:32     ` Sweet Tea Dorminy
@ 2024-04-03  5:52       ` Qu Wenruo
  2024-04-03  7:18         ` Sweet Tea Dorminy
  2024-04-03  6:02       ` Sweet Tea Dorminy
  1 sibling, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2024-04-03  5:52 UTC (permalink / raw)
  To: Sweet Tea Dorminy, Jonathan Corbet, Chris Mason, Josef Bacik,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team



在 2024/4/3 16:02, Sweet Tea Dorminy 写道:
[...]
>>
>> fileoff 0, logical len 4k, phy 13631488, phy len 64K
>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
>>
>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
>> My concern is on the first entry. It indicates that we have wasted 60K 
>> (phy len is 64K, while logical len is only 4K)
>>
>> But that information is not correct, as in reality we only wasted 4K, 
>> the remaining 56K is still referred by file range [8K, 64K).
>>
>> Do you mean that user space program should maintain a mapping of each 
>> utilized physical range, and when handling the reported file range 
>> [8K, 64K), the user space program should find that the physical range 
>> covers with one existing extent, and do calculation correctly?
> 
> My goal is to give an unprivileged interface for tools like compsize to 
> figure out how much space is used by a particular set of files. They 
> report the total disk space referenced by the provided list of files, 
> currently by doing a tree search (CAP_SYS_ADMIN) for all the extents 
> pertaining to the requested files and deduplicating extents based on 
> disk_bytenr.
> 
> It seems simplest to me for userspace for the kernel to emit the entire 
> extent for each part of it referenced in a file, and let userspace deal 
> with deduplicating extents. This is also most similar to the existing 
> tree-search based interface. Reporting whole extents gives more 
> flexibility for userspace to figure out how to report bookend extents, 
> or shared extents, or ...

That's totally fine, no matter what solution you go, (reporting exactly 
as the on-disk file extent, or with offset into consideration), user 
space always need to maintain some type of mapping to calculate the 
wasted space by bookend extents.

> 
> It does seem a little weird where if you request with fiemap only e.g. 
> 4k-16k range in that example file you'll get reported all 68k involved, 
> but I can't figure out a way to fix that without having the kernel keep 
> track of used parts of the extents as part of reporting, which sounds 
> expensive.

I do not think mapping 4k-16K is a common scenario either, but since you 
mentioned, at least we need a consistent way to emit a filemap entry.

The tracking part can be done in the user space.

> 
> You're right that I'm being inconsistent, taking off extent_offset from 
> the reported disk size when that isn't what I should be doing, so I 
> fixed that in v3.
> 
>>
>> [COMPRESSION REPRESENTATION]
>> The biggest problem other than the complexity in user space is the 
>> handling of compressed extents.
>>
>> Should we return the physical bytenr (disk_bytenr of file extent item) 
>> directly or with the extent offset added?
>> Either way it doesn't look consistent to me, compared to 
>> non-compressed extents.
>>
> 
> As I understand it, the goal of reporting physical bytenr is to provide 
> a number which we could theoretically then resolve into a disk location 
> or few if we cared, but which doesn't necessarily have any physical 
> meaning. To quote the fiemap documentation page: "It is always undefined 
> to try to update the data in-place by writing to the indicated location 
> without the assistance of the filesystem". So I think I'd prefer to 
> always report the entire size of the entire extent being referenced.

The concern is, if we have a compressed file extent, reflinked to 
different part of the file.

Then the fiemap returns all different physical bytenr (since offset is 
added), user space tool have no idea they are the same extent on-disk.
Furthermore, if we emit the physical + offset directly to user space 
(which can be beyond the compressed extent), then we also have another 
uncompressed extent at previous physical + offset.

Would that lead to bad calculation in user space to determine how many 
bytes are really used?

> 
>> [ALTERNATIVE FORMAT]
>> The other alternative would be following the btrfs ondisk format, 
>> providing a unique physical bytenr for any file extent, then the 
>> offset/referred length inside the uncompressed extent.
>>
>> That would handle compressed and regular extents more consistent, and 
>> a little easier for user space tool to handle (really just a tiny bit 
>> easier, no range overlap check needed), but more complex to represent, 
>> and I'm not sure if any other filesystem would be happy to accept the 
>> extra members they don't care.
> 
> I really want to make sure that this interface reports the unused space 
> in e.g bookend extents well -- compsize has been an important tool for 
> me in this respect, e.g. a time when a 10g file was taking up 110g of 
> actual disk space. If we report the entire length of the entire extent, 
> then when used on whole files one can establish the space referenced by 
> that file but not used; similarly on multiple files. So while I like the 
> simplicity of just reporting the used length, I don't think there's a 
> way to make compsize unprivileged with that approach.

Why not? In user space we just need to maintain a mapping of each 
referred range.

Then we get the real actual disk space, meanwhile the fiemap report is 
no different than "btrfs ins dump-tree" for file extents (we have all 
the things we need, filepos, length (num_bytes), disk_bytenr, 
disk_num_bytes, offset, and ram_bytes.

For unused space, since we have the mapping, we can iterate through the 
mapping, finding out all the sectors not referred by any file extents.

It should really just be a fiemap based (and unprivilleged) compsize.
Or did I miss some important things?

Thanks,
Qu

> 
> Thank you!!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
  2024-04-03  5:32     ` Sweet Tea Dorminy
  2024-04-03  5:52       ` Qu Wenruo
@ 2024-04-03  6:02       ` Sweet Tea Dorminy
  2024-04-03  7:19         ` Qu Wenruo
  1 sibling, 1 reply; 14+ messages in thread
From: Sweet Tea Dorminy @ 2024-04-03  6:02 UTC (permalink / raw)
  To: Qu Wenruo, Jonathan Corbet, Chris Mason, Josef Bacik,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team

>> This means, we will emit a entry that uses the end to the physical 
>> extent end.
>>
>> Considering a file layout like this:
>>
>>      item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>>          generation 7 type 1 (regular)
>>          extent data disk byte 13631488 nr 65536
>>          extent data offset 0 nr 4096 ram 65536
>>          extent compression 0 (none)
>>      item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
>>          generation 8 type 1 (regular)
>>          extent data disk byte 13697024 nr 4096
>>          extent data offset 0 nr 4096 ram 4096
>>          extent compression 0 (none)
>>      item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
>>          generation 7 type 1 (regular)
>>          extent data disk byte 13631488 nr 65536
>>          extent data offset 8192 nr 57344 ram 65536
>>          extent compression 0 (none)
>>
>> For fiemap, we would got something like this:
>>
>> fileoff 0, logical len 4k, phy 13631488, phy len 64K
>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
>>
>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
>> My concern is on the first entry. It indicates that we have wasted 60K 
>> (phy len is 64K, while logical len is only 4K)
>>
>> But that information is not correct, as in reality we only wasted 4K, 
>> the remaining 56K is still referred by file range [8K, 64K).
>>
>> Do you mean that user space program should maintain a mapping of each 
>> utilized physical range, and when handling the reported file range 
>> [8K, 64K), the user space program should find that the physical range 
>> covers with one existing extent, and do calculation correctly?
> 
> My goal is to give an unprivileged interface for tools like compsize to 
> figure out how much space is used by a particular set of files. They 
> report the total disk space referenced by the provided list of files, 
> currently by doing a tree search (CAP_SYS_ADMIN) for all the extents 
> pertaining to the requested files and deduplicating extents based on 
> disk_bytenr.
> 
> It seems simplest to me for userspace for the kernel to emit the entire 
> extent for each part of it referenced in a file, and let userspace deal 
> with deduplicating extents. This is also most similar to the existing 
> tree-search based interface. Reporting whole extents gives more 
> flexibility for userspace to figure out how to report bookend extents, 
> or shared extents, or ...
> 
> It does seem a little weird where if you request with fiemap only e.g. 
> 4k-16k range in that example file you'll get reported all 68k involved, 
> but I can't figure out a way to fix that without having the kernel keep 
> track of used parts of the extents as part of reporting, which sounds 
> expensive.
> 
> You're right that I'm being inconsistent, taking off extent_offset from 
> the reported disk size when that isn't what I should be doing, so I 
> fixed that in v3.

Ah, I think I grasp a point I'd missed before.
- Without setting disk_bytenr to the actual start of the data on disk, 
there's no way to find the location of the actual data on disk within 
the extent from fiemap alone
- But reporting disk_bytenr + offset, to get actual start of data on 
disk, means we need to report a physical size to figure out the end of 
the extent and we can't know the beginning.

We can't convey both actual location, start, and end of the extent in 
just two pieces of information.

On the other hand, if someone really needs to know the actual location 
on disk of their data, they could use the tree_search ioctl as root to 
do so?

So I still think we should be reporting entire extents but am less 
confident that it doesn't break existing users. I am not sure how common 
it is to take fiemap output on btrfs and use it to try to get to 
physical data on disk - do you know of a tool that does so?

Thank you!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
  2024-04-03  5:52       ` Qu Wenruo
@ 2024-04-03  7:18         ` Sweet Tea Dorminy
  0 siblings, 0 replies; 14+ messages in thread
From: Sweet Tea Dorminy @ 2024-04-03  7:18 UTC (permalink / raw)
  To: Qu Wenruo, Jonathan Corbet, Chris Mason, Josef Bacik,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	linux-doc, linux-btrfs, linux-fsdevel, kernel-team



On 4/3/24 01:52, Qu Wenruo wrote:
> 
> 
> 在 2024/4/3 16:02, Sweet Tea Dorminy 写道:
> [...]
>>>
>>> fileoff 0, logical len 4k, phy 13631488, phy len 64K
>>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
>>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
>>>
>>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
>>> My concern is on the first entry. It indicates that we have wasted 
>>> 60K (phy len is 64K, while logical len is only 4K)
>>>
>>> But that information is not correct, as in reality we only wasted 4K, 
>>> the remaining 56K is still referred by file range [8K, 64K).
>>>
>>> Do you mean that user space program should maintain a mapping of each 
>>> utilized physical range, and when handling the reported file range 
>>> [8K, 64K), the user space program should find that the physical range 
>>> covers with one existing extent, and do calculation correctly?
>>
>> My goal is to give an unprivileged interface for tools like compsize 
>> to figure out how much space is used by a particular set of files. 
>> They report the total disk space referenced by the provided list of 
>> files, currently by doing a tree search (CAP_SYS_ADMIN) for all the 
>> extents pertaining to the requested files and deduplicating extents 
>> based on disk_bytenr.
>>
>> It seems simplest to me for userspace for the kernel to emit the 
>> entire extent for each part of it referenced in a file, and let 
>> userspace deal with deduplicating extents. This is also most similar 
>> to the existing tree-search based interface. Reporting whole extents 
>> gives more flexibility for userspace to figure out how to report 
>> bookend extents, or shared extents, or ...
> 
> That's totally fine, no matter what solution you go, (reporting exactly 
> as the on-disk file extent, or with offset into consideration), user 
> space always need to maintain some type of mapping to calculate the 
> wasted space by bookend extents.
> 
>>
>> It does seem a little weird where if you request with fiemap only e.g. 
>> 4k-16k range in that example file you'll get reported all 68k 
>> involved, but I can't figure out a way to fix that without having the 
>> kernel keep track of used parts of the extents as part of reporting, 
>> which sounds expensive.
> 
> I do not think mapping 4k-16K is a common scenario either, but since you 
> mentioned, at least we need a consistent way to emit a filemap entry.
> 
> The tracking part can be done in the user space.
> 
>>
>> You're right that I'm being inconsistent, taking off extent_offset 
>> from the reported disk size when that isn't what I should be doing, so 
>> I fixed that in v3.
>>
>>>
>>> [COMPRESSION REPRESENTATION]
>>> The biggest problem other than the complexity in user space is the 
>>> handling of compressed extents.
>>>
>>> Should we return the physical bytenr (disk_bytenr of file extent 
>>> item) directly or with the extent offset added?
>>> Either way it doesn't look consistent to me, compared to 
>>> non-compressed extents.
>>>
>>
>> As I understand it, the goal of reporting physical bytenr is to 
>> provide a number which we could theoretically then resolve into a disk 
>> location or few if we cared, but which doesn't necessarily have any 
>> physical meaning. To quote the fiemap documentation page: "It is 
>> always undefined to try to update the data in-place by writing to the 
>> indicated location without the assistance of the filesystem". So I 
>> think I'd prefer to always report the entire size of the entire extent 
>> being referenced.
> 
> The concern is, if we have a compressed file extent, reflinked to 
> different part of the file.
> 
> Then the fiemap returns all different physical bytenr (since offset is 
> added), user space tool have no idea they are the same extent on-disk.
> Furthermore, if we emit the physical + offset directly to user space 
> (which can be beyond the compressed extent), then we also have another 
> uncompressed extent at previous physical + offset.
> 
> Would that lead to bad calculation in user space to determine how many 
> bytes are really used?
> 
>>
>>> [ALTERNATIVE FORMAT]
>>> The other alternative would be following the btrfs ondisk format, 
>>> providing a unique physical bytenr for any file extent, then the 
>>> offset/referred length inside the uncompressed extent.
>>>
>>> That would handle compressed and regular extents more consistent, and 
>>> a little easier for user space tool to handle (really just a tiny bit 
>>> easier, no range overlap check needed), but more complex to 
>>> represent, and I'm not sure if any other filesystem would be happy to 
>>> accept the extra members they don't care.
>>
>> I really want to make sure that this interface reports the unused 
>> space in e.g bookend extents well -- compsize has been an important 
>> tool for me in this respect, e.g. a time when a 10g file was taking up 
>> 110g of actual disk space. If we report the entire length of the 
>> entire extent, then when used on whole files one can establish the 
>> space referenced by that file but not used; similarly on multiple 
>> files. So while I like the simplicity of just reporting the used 
>> length, I don't think there's a way to make compsize unprivileged with 
>> that approach.
> 
> Why not? In user space we just need to maintain a mapping of each 
> referred range.
> 
> Then we get the real actual disk space, meanwhile the fiemap report is 
> no different than "btrfs ins dump-tree" for file extents (we have all 
> the things we need, filepos, length (num_bytes), disk_bytenr, 
> disk_num_bytes, offset, and ram_bytes

The fiemap output (in this changeset) has equivalents of filepos, 
length; disk_bytenr + offset, disk_num_bytes - offset -- we don't get 
ram_bytes and we get two computed values from the three dump-tree outputs.
If it were reporting whole extents, it'd be disk_bytenr, disk_num_bytes, 
and we'd be missing offset.
I think we'd need a third piece of information about physical space in 
order to convey all three equivalents of disk_bytenr, disk_num_bytes, 
offset. And without that third piece of information, we can't both match 
up disk extents and also know exactly what disk bytenr data is stored 
at, I think? But maybe you're proposing exactly that, having a third number?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
  2024-04-03  6:02       ` Sweet Tea Dorminy
@ 2024-04-03  7:19         ` Qu Wenruo
  2024-04-09 18:52           ` David Sterba
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2024-04-03  7:19 UTC (permalink / raw)
  To: Sweet Tea Dorminy, Qu Wenruo, Jonathan Corbet, Chris Mason,
	Josef Bacik, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, linux-doc, linux-btrfs, linux-fsdevel, kernel-team



在 2024/4/3 16:32, Sweet Tea Dorminy 写道:
>>> This means, we will emit a entry that uses the end to the physical
>>> extent end.
>>>
>>> Considering a file layout like this:
>>>
>>>      item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>>>          generation 7 type 1 (regular)
>>>          extent data disk byte 13631488 nr 65536
>>>          extent data offset 0 nr 4096 ram 65536
>>>          extent compression 0 (none)
>>>      item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
>>>          generation 8 type 1 (regular)
>>>          extent data disk byte 13697024 nr 4096
>>>          extent data offset 0 nr 4096 ram 4096
>>>          extent compression 0 (none)
>>>      item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
>>>          generation 7 type 1 (regular)
>>>          extent data disk byte 13631488 nr 65536
>>>          extent data offset 8192 nr 57344 ram 65536
>>>          extent compression 0 (none)
>>>
>>> For fiemap, we would got something like this:
>>>
>>> fileoff 0, logical len 4k, phy 13631488, phy len 64K
>>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
>>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
>>>
>>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
>>> My concern is on the first entry. It indicates that we have wasted
>>> 60K (phy len is 64K, while logical len is only 4K)
>>>
>>> But that information is not correct, as in reality we only wasted 4K,
>>> the remaining 56K is still referred by file range [8K, 64K).
>>>
>>> Do you mean that user space program should maintain a mapping of each
>>> utilized physical range, and when handling the reported file range
>>> [8K, 64K), the user space program should find that the physical range
>>> covers with one existing extent, and do calculation correctly?
>>
>> My goal is to give an unprivileged interface for tools like compsize
>> to figure out how much space is used by a particular set of files.
>> They report the total disk space referenced by the provided list of
>> files, currently by doing a tree search (CAP_SYS_ADMIN) for all the
>> extents pertaining to the requested files and deduplicating extents
>> based on disk_bytenr.
>>
>> It seems simplest to me for userspace for the kernel to emit the
>> entire extent for each part of it referenced in a file, and let
>> userspace deal with deduplicating extents. This is also most similar
>> to the existing tree-search based interface. Reporting whole extents
>> gives more flexibility for userspace to figure out how to report
>> bookend extents, or shared extents, or ...
>>
>> It does seem a little weird where if you request with fiemap only e.g.
>> 4k-16k range in that example file you'll get reported all 68k
>> involved, but I can't figure out a way to fix that without having the
>> kernel keep track of used parts of the extents as part of reporting,
>> which sounds expensive.
>>
>> You're right that I'm being inconsistent, taking off extent_offset
>> from the reported disk size when that isn't what I should be doing, so
>> I fixed that in v3.
>
> Ah, I think I grasp a point I'd missed before.
> - Without setting disk_bytenr to the actual start of the data on disk,
> there's no way to find the location of the actual data on disk within
> the extent from fiemap alone

Yes, that's my point.

> - But reporting disk_bytenr + offset, to get actual start of data on
> disk, means we need to report a physical size to figure out the end of
> the extent and we can't know the beginning.

disk_bytenr + offset + disk_num_bytes, and with the existing things like
length (aka, num_bytes), filepos (aka, key.offset) flags
(compression/hole/preallocated etc), we have everything we need to know
for regular extents.

For compressed extents, we also need ram_bytes.

If you ask me, I'd say put all the extra members into fiemap entry if we
have the space...

It would be u64 * 4 if we go 1:1 on the file extent items, otherwise we
may cheap on offset and ram_bytes (u32 is enough for btrfs at least), in
that case it would be u64 * 2 + u32 * 2.

But I'm also 100% sure, the extra members would not be welcomed by other
filesystems either.

Thanks,
Qu

>
> We can't convey both actual location, start, and end of the extent in
> just two pieces of information.
>
> On the other hand, if someone really needs to know the actual location
> on disk of their data, they could use the tree_search ioctl as root to
> do so?
>
> So I still think we should be reporting entire extents but am less
> confident that it doesn't break existing users. I am not sure how common
> it is to take fiemap output on btrfs and use it to try to get to
> physical data on disk - do you know of a tool that does so?
>
> Thank you!
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
  2024-04-03  7:19         ` Qu Wenruo
@ 2024-04-09 18:52           ` David Sterba
  2024-04-09 19:31             ` Andreas Dilger
  0 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2024-04-09 18:52 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Sweet Tea Dorminy, Qu Wenruo, Jonathan Corbet, Chris Mason,
	Josef Bacik, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, linux-doc, linux-btrfs, linux-fsdevel, kernel-team

On Wed, Apr 03, 2024 at 05:49:42PM +1030, Qu Wenruo wrote:
> 
> 
> 在 2024/4/3 16:32, Sweet Tea Dorminy 写道:
> >>> This means, we will emit a entry that uses the end to the physical
> >>> extent end.
> >>>
> >>> Considering a file layout like this:
> >>>
> >>>      item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
> >>>          generation 7 type 1 (regular)
> >>>          extent data disk byte 13631488 nr 65536
> >>>          extent data offset 0 nr 4096 ram 65536
> >>>          extent compression 0 (none)
> >>>      item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
> >>>          generation 8 type 1 (regular)
> >>>          extent data disk byte 13697024 nr 4096
> >>>          extent data offset 0 nr 4096 ram 4096
> >>>          extent compression 0 (none)
> >>>      item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
> >>>          generation 7 type 1 (regular)
> >>>          extent data disk byte 13631488 nr 65536
> >>>          extent data offset 8192 nr 57344 ram 65536
> >>>          extent compression 0 (none)
> >>>
> >>> For fiemap, we would got something like this:
> >>>
> >>> fileoff 0, logical len 4k, phy 13631488, phy len 64K
> >>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
> >>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
> >>>
> >>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
> >>> My concern is on the first entry. It indicates that we have wasted
> >>> 60K (phy len is 64K, while logical len is only 4K)
> >>>
> >>> But that information is not correct, as in reality we only wasted 4K,
> >>> the remaining 56K is still referred by file range [8K, 64K).
> >>>
> >>> Do you mean that user space program should maintain a mapping of each
> >>> utilized physical range, and when handling the reported file range
> >>> [8K, 64K), the user space program should find that the physical range
> >>> covers with one existing extent, and do calculation correctly?
> >>
> >> My goal is to give an unprivileged interface for tools like compsize
> >> to figure out how much space is used by a particular set of files.
> >> They report the total disk space referenced by the provided list of
> >> files, currently by doing a tree search (CAP_SYS_ADMIN) for all the
> >> extents pertaining to the requested files and deduplicating extents
> >> based on disk_bytenr.
> >>
> >> It seems simplest to me for userspace for the kernel to emit the
> >> entire extent for each part of it referenced in a file, and let
> >> userspace deal with deduplicating extents. This is also most similar
> >> to the existing tree-search based interface. Reporting whole extents
> >> gives more flexibility for userspace to figure out how to report
> >> bookend extents, or shared extents, or ...
> >>
> >> It does seem a little weird where if you request with fiemap only e.g.
> >> 4k-16k range in that example file you'll get reported all 68k
> >> involved, but I can't figure out a way to fix that without having the
> >> kernel keep track of used parts of the extents as part of reporting,
> >> which sounds expensive.
> >>
> >> You're right that I'm being inconsistent, taking off extent_offset
> >> from the reported disk size when that isn't what I should be doing, so
> >> I fixed that in v3.
> >
> > Ah, I think I grasp a point I'd missed before.
> > - Without setting disk_bytenr to the actual start of the data on disk,
> > there's no way to find the location of the actual data on disk within
> > the extent from fiemap alone
> 
> Yes, that's my point.
> 
> > - But reporting disk_bytenr + offset, to get actual start of data on
> > disk, means we need to report a physical size to figure out the end of
> > the extent and we can't know the beginning.
> 
> disk_bytenr + offset + disk_num_bytes, and with the existing things like
> length (aka, num_bytes), filepos (aka, key.offset) flags
> (compression/hole/preallocated etc), we have everything we need to know
> for regular extents.
> 
> For compressed extents, we also need ram_bytes.
> 
> If you ask me, I'd say put all the extra members into fiemap entry if we
> have the space...
> 
> It would be u64 * 4 if we go 1:1 on the file extent items, otherwise we
> may cheap on offset and ram_bytes (u32 is enough for btrfs at least), in
> that case it would be u64 * 2 + u32 * 2.
> 
> But I'm also 100% sure, the extra members would not be welcomed by other
> filesystems either.

That's probably right, too many btrfs-specific information in the
generic FIEMAP, but we may also do our own enhanced fiemap ioctl that
would provide all the information you suggest and we'd be free to put
the compression information there too.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
  2024-04-09 18:52           ` David Sterba
@ 2024-04-09 19:31             ` Andreas Dilger
  0 siblings, 0 replies; 14+ messages in thread
From: Andreas Dilger @ 2024-04-09 19:31 UTC (permalink / raw)
  To: David Sterba
  Cc: Qu Wenruo, Sweet Tea Dorminy, Qu Wenruo, Jonathan Corbet,
	Chris Mason, Josef Bacik, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, linux-doc, linux-btrfs,
	linux-fsdevel, kernel-team

[-- Attachment #1: Type: text/plain, Size: 5431 bytes --]

On Apr 9, 2024, at 12:52 PM, David Sterba <dsterba@suse.cz> wrote:
> 
> On Wed, Apr 03, 2024 at 05:49:42PM +1030, Qu Wenruo wrote:
>> 
>> 
>> 在 2024/4/3 16:32, Sweet Tea Dorminy 写道:
>>>>> This means, we will emit a entry that uses the end to the physical
>>>>> extent end.
>>>>> 
>>>>> Considering a file layout like this:
>>>>> 
>>>>>      item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>>>>>          generation 7 type 1 (regular)
>>>>>          extent data disk byte 13631488 nr 65536
>>>>>          extent data offset 0 nr 4096 ram 65536
>>>>>          extent compression 0 (none)
>>>>>      item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
>>>>>          generation 8 type 1 (regular)
>>>>>          extent data disk byte 13697024 nr 4096
>>>>>          extent data offset 0 nr 4096 ram 4096
>>>>>          extent compression 0 (none)
>>>>>      item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
>>>>>          generation 7 type 1 (regular)
>>>>>          extent data disk byte 13631488 nr 65536
>>>>>          extent data offset 8192 nr 57344 ram 65536
>>>>>          extent compression 0 (none)
>>>>> 
>>>>> For fiemap, we would got something like this:
>>>>> 
>>>>> fileoff 0, logical len 4k, phy 13631488, phy len 64K
>>>>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
>>>>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
>>>>> 
>>>>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
>>>>> My concern is on the first entry. It indicates that we have wasted
>>>>> 60K (phy len is 64K, while logical len is only 4K)
>>>>> 
>>>>> But that information is not correct, as in reality we only wasted 4K,
>>>>> the remaining 56K is still referred by file range [8K, 64K).
>>>>> 
>>>>> Do you mean that user space program should maintain a mapping of each
>>>>> utilized physical range, and when handling the reported file range
>>>>> [8K, 64K), the user space program should find that the physical range
>>>>> covers with one existing extent, and do calculation correctly?
>>>> 
>>>> My goal is to give an unprivileged interface for tools like compsize
>>>> to figure out how much space is used by a particular set of files.
>>>> They report the total disk space referenced by the provided list of
>>>> files, currently by doing a tree search (CAP_SYS_ADMIN) for all the
>>>> extents pertaining to the requested files and deduplicating extents
>>>> based on disk_bytenr.
>>>> 
>>>> It seems simplest to me for userspace for the kernel to emit the
>>>> entire extent for each part of it referenced in a file, and let
>>>> userspace deal with deduplicating extents. This is also most similar
>>>> to the existing tree-search based interface. Reporting whole extents
>>>> gives more flexibility for userspace to figure out how to report
>>>> bookend extents, or shared extents, or ...
>>>> 
>>>> It does seem a little weird where if you request with fiemap only e.g.
>>>> 4k-16k range in that example file you'll get reported all 68k
>>>> involved, but I can't figure out a way to fix that without having the
>>>> kernel keep track of used parts of the extents as part of reporting,
>>>> which sounds expensive.
>>>> 
>>>> You're right that I'm being inconsistent, taking off extent_offset
>>>> from the reported disk size when that isn't what I should be doing, so
>>>> I fixed that in v3.
>>> 
>>> Ah, I think I grasp a point I'd missed before.
>>> - Without setting disk_bytenr to the actual start of the data on disk,
>>> there's no way to find the location of the actual data on disk within
>>> the extent from fiemap alone
>> 
>> Yes, that's my point.
>> 
>>> - But reporting disk_bytenr + offset, to get actual start of data on
>>> disk, means we need to report a physical size to figure out the end of
>>> the extent and we can't know the beginning.
>> 
>> disk_bytenr + offset + disk_num_bytes, and with the existing things like
>> length (aka, num_bytes), filepos (aka, key.offset) flags
>> (compression/hole/preallocated etc), we have everything we need to know
>> for regular extents.
>> 
>> For compressed extents, we also need ram_bytes.
>> 
>> If you ask me, I'd say put all the extra members into fiemap entry if we
>> have the space...
>> 
>> It would be u64 * 4 if we go 1:1 on the file extent items, otherwise we
>> may cheap on offset and ram_bytes (u32 is enough for btrfs at least), in
>> that case it would be u64 * 2 + u32 * 2.
>> 
>> But I'm also 100% sure, the extra members would not be welcomed by other
>> filesystems either.
> 
> That's probably right, too many btrfs-specific information in the
> generic FIEMAP, but we may also do our own enhanced fiemap ioctl that
> would provide all the information you suggest and we'd be free to put
> the compression information there too.

I read this thread when it was first posted, but I don't understand what
these extra fields actually mean?  Definitely adding the logical/physical
length makes sense for compressed extents, but I didn't see any clear
explanation of what these other fields actually mean?

I'm extrapolating something like btrfs has aggregated compressed chunks
that have multiple independent/disjoint blocks within a chunk, and you
are trying to get the exact offset within the compression byte stream
for the start of each block in the chunk?

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-09 19:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-28  1:22 [PATCH v2 0/5] fiemap extension for more physical information Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 1/5] fs: fiemap: add physical_length field to extents Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 2/5] fs: fiemap: update fiemap_fill_next_extent() signature Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 3/5] fs: fiemap: add new COMPRESSED extent state Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 4/5] btrfs: fiemap: emit new COMPRESSED fiemap state Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 5/5] btrfs: fiemap: return extent physical size Sweet Tea Dorminy
2024-03-31  9:03   ` Qu Wenruo
2024-04-03  5:32     ` Sweet Tea Dorminy
2024-04-03  5:52       ` Qu Wenruo
2024-04-03  7:18         ` Sweet Tea Dorminy
2024-04-03  6:02       ` Sweet Tea Dorminy
2024-04-03  7:19         ` Qu Wenruo
2024-04-09 18:52           ` David Sterba
2024-04-09 19:31             ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).