* [PATCH for-next v3 0/2] add ioctl to query metadata and protection info capabilities
[not found] <CGME20250610132307epcas5p4c6c107e84642a1367600afe9167655b8@epcas5p4.samsung.com>
@ 2025-06-10 13:22 ` Anuj Gupta
[not found] ` <CGME20250610132312epcas5p20cdd1a3119df8ffc68770f06745e8481@epcas5p2.samsung.com>
[not found] ` <CGME20250610132317epcas5p442ce20c039224fb691ab0ba03fcb21e7@epcas5p4.samsung.com>
0 siblings, 2 replies; 10+ messages in thread
From: Anuj Gupta @ 2025-06-10 13:22 UTC (permalink / raw)
To: vincent.fu, jack, anuj1072538, axboe, viro, brauner, hch,
martin.petersen, ebiggers, adilger
Cc: linux-block, linux-fsdevel, joshi.k, Anuj Gupta
Hi all,
This patch series adds a new ioctl to query metadata and integrity
capability. Patch 1 adds a pi_size field in blk_integrity struct which
is later used to export this value to the user as well.
Patch 2 introduces a new ioctl to query integrity capability.
v2->v3
better naming for uapi struct fields (Martin)
validate integrity fields in blk-settings.c (Christoph)
v1 -> v2
introduce metadata_size, storage_tag_size and ref_tag_size field in the
uapi struct (Martin)
uapi struct fields comment improvements (Martin)
add csum_type definitions to the uapi file (Martin)
add fpc_* prefix to uapi struct fields (Andreas)
bump the size of rsvd and hence the uapi struct to 32 bytes (Andreas)
use correct value for ioctl (Andreas)
use clearer names for CRC (Eric)
Anuj Gupta (2):
block: introduce pi_size field in blk_integrity
fs: add ioctl to query metadata and protection info capabilities
block/blk-integrity.c | 53 +++++++++++++++++++++++++++++++++++
block/blk-settings.c | 37 ++++++++++++++++++++++++
block/ioctl.c | 3 ++
drivers/nvme/host/core.c | 1 +
drivers/scsi/sd_dif.c | 1 +
include/linux/blk-integrity.h | 7 +++++
include/linux/blkdev.h | 1 +
include/uapi/linux/fs.h | 43 ++++++++++++++++++++++++++++
8 files changed, 146 insertions(+)
--
2.25.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH for-next v3 1/2] block: introduce pi_size field in blk_integrity
[not found] ` <CGME20250610132312epcas5p20cdd1a3119df8ffc68770f06745e8481@epcas5p2.samsung.com>
@ 2025-06-10 13:22 ` Anuj Gupta
2025-06-11 3:44 ` Christoph Hellwig
2025-06-13 1:51 ` Martin K. Petersen
0 siblings, 2 replies; 10+ messages in thread
From: Anuj Gupta @ 2025-06-10 13:22 UTC (permalink / raw)
To: vincent.fu, jack, anuj1072538, axboe, viro, brauner, hch,
martin.petersen, ebiggers, adilger
Cc: linux-block, linux-fsdevel, joshi.k, Anuj Gupta,
Christoph Hellwig
Introduce a new pi_size field in struct blk_integrity to explicitly
represent the size (in bytes) of the protection information (PI) tuple.
This is a prep patch.
Add validation in blk_validate_integrity_limits() to ensure that
pi_size matches the expected size for know checksum types and never
exceeds the tuple_size.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
---
block/blk-settings.c | 37 +++++++++++++++++++++++++++++++++++++
drivers/nvme/host/core.c | 1 +
drivers/scsi/sd_dif.c | 1 +
include/linux/blkdev.h | 1 +
4 files changed, 40 insertions(+)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index a000daafbfb4..125b23acb5b7 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -14,6 +14,8 @@
#include <linux/jiffies.h>
#include <linux/gfp.h>
#include <linux/dma-mapping.h>
+#include <linux/t10-pi.h>
+#include <linux/crc64.h>
#include "blk.h"
#include "blk-rq-qos.h"
@@ -135,6 +137,41 @@ static int blk_validate_integrity_limits(struct queue_limits *lim)
return -EINVAL;
}
+ if (bi->pi_size > bi->tuple_size) {
+ pr_warn("pi_size (%u) exceeds tuple_size (%u)\n",
+ bi->pi_size, bi->tuple_size);
+ return -EINVAL;
+ }
+
+ switch (bi->csum_type) {
+ case BLK_INTEGRITY_CSUM_NONE:
+ if (!bi->pi_size) {
+ pr_warn("pi_size must be 0 when checksum type \
+ is none\n");
+ return -EINVAL;
+ }
+ break;
+ case BLK_INTEGRITY_CSUM_CRC:
+ case BLK_INTEGRITY_CSUM_IP:
+ if (bi->pi_size != sizeof(struct t10_pi_tuple)) {
+ pr_warn("pi_size mismatch for T10 PI: expected \
+ %zu, got %u\n",
+ sizeof(struct t10_pi_tuple),
+ bi->pi_size);
+ return -EINVAL;
+ }
+ break;
+ case BLK_INTEGRITY_CSUM_CRC64:
+ if (bi->pi_size != sizeof(struct crc64_pi_tuple)) {
+ pr_warn("pi_size mismatch for CRC64 PI: \
+ expected %zu, got %u\n",
+ sizeof(struct crc64_pi_tuple),
+ bi->pi_size);
+ return -EINVAL;
+ }
+ break;
+ }
+
if (!bi->interval_exp)
bi->interval_exp = ilog2(lim->logical_block_size);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f69a232a000a..a9a2a0ca9797 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1868,6 +1868,7 @@ static bool nvme_init_integrity(struct nvme_ns_head *head,
}
bi->tuple_size = head->ms;
+ bi->pi_size = head->pi_size;
bi->pi_offset = info->pi_offset;
return true;
}
diff --git a/drivers/scsi/sd_dif.c b/drivers/scsi/sd_dif.c
index ae6ce6f5d622..9c39a82298da 100644
--- a/drivers/scsi/sd_dif.c
+++ b/drivers/scsi/sd_dif.c
@@ -53,6 +53,7 @@ void sd_dif_config_host(struct scsi_disk *sdkp, struct queue_limits *lim)
bi->flags |= BLK_INTEGRITY_REF_TAG;
bi->tuple_size = sizeof(struct t10_pi_tuple);
+ bi->pi_size = bi->tuple_size;
if (dif && type) {
bi->flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 332b56f323d9..1ed604b70e0f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -120,6 +120,7 @@ struct blk_integrity {
unsigned char pi_offset;
unsigned char interval_exp;
unsigned char tag_size;
+ unsigned char pi_size;
};
typedef unsigned int __bitwise blk_mode_t;
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH for-next v3 2/2] fs: add ioctl to query metadata and protection info capabilities
[not found] ` <CGME20250610132317epcas5p442ce20c039224fb691ab0ba03fcb21e7@epcas5p4.samsung.com>
@ 2025-06-10 13:22 ` Anuj Gupta
2025-06-11 3:53 ` Christoph Hellwig
2025-06-11 10:23 ` Christian Brauner
0 siblings, 2 replies; 10+ messages in thread
From: Anuj Gupta @ 2025-06-10 13:22 UTC (permalink / raw)
To: vincent.fu, jack, anuj1072538, axboe, viro, brauner, hch,
martin.petersen, ebiggers, adilger
Cc: linux-block, linux-fsdevel, joshi.k, Anuj Gupta
Add a new ioctl, FS_IOC_GETLBMD_CAP, to query metadata and protection
info (PI) capabilities. This ioctl returns information about the files
integrity profile. This is useful for userspace applications to
understand a files end-to-end data protection support and configure the
I/O accordingly.
For now this interface is only supported by block devices. However the
design and placement of this ioctl in generic FS ioctl space allows us
to extend it to work over files as well. This maybe useful when
filesystems start supporting PI-aware layouts.
A new structure struct logical_block_metadata_cap is introduced, which
contains the following fields:
1. lbmd_flags: bitmask of logical block metadata capability flags
2. lbmd_interval: the amount of data described by each unit of logical
block metadata
3. lbmd_size: size in bytes of the logical block metadata associated
with each interval
4. lbmd_opaque_size: size in bytes of the opaque block tag associated
with each interval
5. lbmd_opaque_offset: offset in bytes of the opaque block tag within
the logical block metadata
6. lbmd_pi_size: size in bytes of the T10 PI tuple associated with each
interval
7. lbmd_pi_offset: offset in bytes of T10 PI tuple within the logical
block metadata
8. lbmd_pi_guard_tag_type: T10 PI guard tag type
9. lbmd_pi_app_tag_size: size in bytes of the T10 PI application tag
10. lbmd_pi_ref_tag_size: size in bytes of the T10 PI reference tag
11. lbmd_pi_storage_tag_size: size in bytes of the T10 PI storage tag
12. lbmd_rsvd: reserved for future use
The internal logic to fetch the capability is encapsulated in a helper
function blk_get_meta_cap(), which uses the blk_integrity profile
associated with the device. The ioctl returns -EOPNOTSUPP, if
CONFIG_BLK_DEV_INTEGRITY is not enabled.
Suggested-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
block/blk-integrity.c | 53 +++++++++++++++++++++++++++++++++++
block/ioctl.c | 3 ++
include/linux/blk-integrity.h | 7 +++++
include/uapi/linux/fs.h | 43 ++++++++++++++++++++++++++++
4 files changed, 106 insertions(+)
diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index e4e2567061f9..f9ad5bdb84f5 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -13,6 +13,7 @@
#include <linux/scatterlist.h>
#include <linux/export.h>
#include <linux/slab.h>
+#include <linux/t10-pi.h>
#include "blk.h"
@@ -54,6 +55,58 @@ int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *bio)
return segments;
}
+int blk_get_meta_cap(struct block_device *bdev,
+ struct logical_block_metadata_cap __user *argp)
+{
+ struct blk_integrity *bi = blk_get_integrity(bdev->bd_disk);
+ struct logical_block_metadata_cap meta_cap = {};
+
+ if (!bi)
+ goto out;
+
+ if (bi->flags & BLK_INTEGRITY_DEVICE_CAPABLE)
+ meta_cap.lbmd_flags |= LBMD_PI_CAP_INTEGRITY;
+ if (bi->flags & BLK_INTEGRITY_REF_TAG)
+ meta_cap.lbmd_flags |= LBMD_PI_CAP_REFTAG;
+ meta_cap.lbmd_interval = 1 << bi->interval_exp;
+ meta_cap.lbmd_size = bi->tuple_size;
+ if (bi->csum_type == BLK_INTEGRITY_CSUM_NONE) {
+ /* treat entire tuple as opaque block tag */
+ meta_cap.lbmd_opaque_size = bi->tuple_size;
+ goto out;
+ }
+ meta_cap.lbmd_pi_size = bi->pi_size;
+ meta_cap.lbmd_pi_offset = bi->pi_offset;
+ meta_cap.lbmd_opaque_size = bi->tuple_size - bi->pi_size;
+ if (meta_cap.lbmd_opaque_size && !bi->pi_offset)
+ meta_cap.lbmd_opaque_offset = bi->pi_size;
+
+ meta_cap.lbmd_guard_tag_type = bi->csum_type;
+ meta_cap.lbmd_app_tag_size = 2;
+
+ if (bi->flags & BLK_INTEGRITY_REF_TAG) {
+ switch (bi->csum_type) {
+ case BLK_INTEGRITY_CSUM_CRC64:
+ meta_cap.lbmd_ref_tag_size =
+ sizeof_field(struct crc64_pi_tuple, ref_tag);
+ break;
+ case BLK_INTEGRITY_CSUM_CRC:
+ case BLK_INTEGRITY_CSUM_IP:
+ meta_cap.lbmd_ref_tag_size =
+ sizeof_field(struct t10_pi_tuple, ref_tag);
+ break;
+ default:
+ break;
+ }
+ }
+
+out:
+ if (copy_to_user(argp, &meta_cap,
+ sizeof(struct logical_block_metadata_cap)))
+ return -EFAULT;
+ return 0;
+}
+
/**
* blk_rq_map_integrity_sg - Map integrity metadata into a scatterlist
* @rq: request to map
diff --git a/block/ioctl.c b/block/ioctl.c
index e472cc1030c6..19782f7b5ff1 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -13,6 +13,7 @@
#include <linux/uaccess.h>
#include <linux/pagemap.h>
#include <linux/io_uring/cmd.h>
+#include <linux/blk-integrity.h>
#include <uapi/linux/blkdev.h>
#include "blk.h"
#include "blk-crypto-internal.h"
@@ -643,6 +644,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
return blkdev_pr_preempt(bdev, mode, argp, true);
case IOC_PR_CLEAR:
return blkdev_pr_clear(bdev, mode, argp);
+ case FS_IOC_GETLBMD_CAP:
+ return blk_get_meta_cap(bdev, argp);
default:
return -ENOIOCTLCMD;
}
diff --git a/include/linux/blk-integrity.h b/include/linux/blk-integrity.h
index c7eae0bfb013..b4aff4dff843 100644
--- a/include/linux/blk-integrity.h
+++ b/include/linux/blk-integrity.h
@@ -29,6 +29,8 @@ int blk_rq_map_integrity_sg(struct request *, struct scatterlist *);
int blk_rq_count_integrity_sg(struct request_queue *, struct bio *);
int blk_rq_integrity_map_user(struct request *rq, void __user *ubuf,
ssize_t bytes);
+int blk_get_meta_cap(struct block_device *bdev,
+ struct logical_block_metadata_cap __user *argp);
static inline bool
blk_integrity_queue_supports_integrity(struct request_queue *q)
@@ -92,6 +94,11 @@ static inline struct bio_vec rq_integrity_vec(struct request *rq)
rq->bio->bi_integrity->bip_iter);
}
#else /* CONFIG_BLK_DEV_INTEGRITY */
+static inline int blk_get_meta_cap(struct block_device *bdev,
+ struct logical_block_metadata_cap __user *argp)
+{
+ return -EOPNOTSUPP;
+}
static inline int blk_rq_count_integrity_sg(struct request_queue *q,
struct bio *b)
{
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 0098b0ce8ccb..70350d5a4cd6 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -91,6 +91,47 @@ struct fs_sysfs_path {
__u8 name[128];
};
+/* Protection info capability flags */
+#define LBMD_PI_CAP_INTEGRITY (1 << 0)
+#define LBMD_PI_CAP_REFTAG (1 << 1)
+
+/* Checksum types for Protection Information */
+#define LBMD_PI_CSUM_NONE 0
+#define LBMD_PI_CSUM_IP 1
+#define LBMD_PI_CSUM_CRC16_T10DIF 2
+#define LBMD_PI_CSUM_CRC64_NVME 4
+
+/*
+ * struct logical_block_metadata_cap - Logical block metadata
+ * @lbmd_flags: Bitmask of logical block metadata capability flags
+ * @lbmd_interval: The amount of data described by each unit of logical block metadata
+ * @lbmd_size: Size in bytes of the logical block metadata associated with each interval
+ * @lbmd_opaque_size: Size in bytes of the opaque block tag associated with each interval
+ * @lbmd_opaque_offset: Offset in bytes of the opaque block tag within the logical block metadata
+ * @lbmd_pi_size: Size in bytes of the T10 PI tuple associated with each interval
+ * @lbmd_pi_offset: Offset in bytes of T10 PI tuple within the logical block metadata
+ * @lbmd_pi_guard_tag_type: T10 PI guard tag type
+ * @lbmd_pi_app_tag_size: Size in bytes of the T10 PI application tag
+ * @lbmd_pi_ref_tag_size: Size in bytes of the T10 PI reference tag
+ * @lbmd_pi_storage_tag_size: Size in bytes of the T10 PI storage tag
+ * @lbmd_rsvd: Reserved for future use
+ */
+
+struct logical_block_metadata_cap {
+ __u32 lbmd_flags;
+ __u16 lbmd_interval;
+ __u8 lbmd_size;
+ __u8 lbmd_opaque_size;
+ __u8 lbmd_opaque_offset;
+ __u8 lbmd_pi_size;
+ __u8 lbmd_pi_offset;
+ __u8 lbmd_guard_tag_type;
+ __u8 lbmd_app_tag_size;
+ __u8 lbmd_ref_tag_size;
+ __u8 lbmd_storage_tag_size;
+ __u8 lbmd_rsvd[17];
+};
+
/* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
#define FILE_DEDUPE_RANGE_SAME 0
#define FILE_DEDUPE_RANGE_DIFFERS 1
@@ -247,6 +288,8 @@ struct fsxattr {
* also /sys/kernel/debug/ for filesystems with debugfs exports
*/
#define FS_IOC_GETFSSYSFSPATH _IOR(0x15, 1, struct fs_sysfs_path)
+/* Get logical block metadata capability details */
+#define FS_IOC_GETLBMD_CAP _IOR(0x15, 2, struct logical_block_metadata_cap)
/*
* Inode flags (FS_IOC_GETFLAGS / FS_IOC_SETFLAGS)
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH for-next v3 1/2] block: introduce pi_size field in blk_integrity
2025-06-10 13:22 ` [PATCH for-next v3 1/2] block: introduce pi_size field in blk_integrity Anuj Gupta
@ 2025-06-11 3:44 ` Christoph Hellwig
2025-06-13 1:51 ` Martin K. Petersen
1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2025-06-11 3:44 UTC (permalink / raw)
To: Anuj Gupta
Cc: vincent.fu, jack, anuj1072538, axboe, viro, brauner, hch,
martin.petersen, ebiggers, adilger, linux-block, linux-fsdevel,
joshi.k, Christoph Hellwig
On Tue, Jun 10, 2025 at 06:52:53PM +0530, Anuj Gupta wrote:
> + switch (bi->csum_type) {
> + case BLK_INTEGRITY_CSUM_NONE:
> + if (!bi->pi_size) {
> + pr_warn("pi_size must be 0 when checksum type \
> + is none\n");
Based on the message and how the code should work I think the
test above is inverted.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-next v3 2/2] fs: add ioctl to query metadata and protection info capabilities
2025-06-10 13:22 ` [PATCH for-next v3 2/2] fs: add ioctl to query metadata and protection info capabilities Anuj Gupta
@ 2025-06-11 3:53 ` Christoph Hellwig
2025-06-11 10:23 ` Christian Brauner
1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2025-06-11 3:53 UTC (permalink / raw)
To: Anuj Gupta
Cc: vincent.fu, jack, anuj1072538, axboe, viro, brauner, hch,
martin.petersen, ebiggers, adilger, linux-block, linux-fsdevel,
joshi.k
On Tue, Jun 10, 2025 at 06:52:54PM +0530, Anuj Gupta wrote:
> +int blk_get_meta_cap(struct block_device *bdev,
> + struct logical_block_metadata_cap __user *argp)
> +{
> + struct blk_integrity *bi = blk_get_integrity(bdev->bd_disk);
> + struct logical_block_metadata_cap meta_cap = {};
> +
> + if (!bi)
> + goto out;
So is returning an all zeroed structure really the expected interface?
It feels kinda unusual, but if we want to go with that for extensibility
it should probably frow a comment and some language in the man page to
explain what fields to check for support.
> + if (bi->csum_type == BLK_INTEGRITY_CSUM_NONE) {
> + /* treat entire tuple as opaque block tag */
> + meta_cap.lbmd_opaque_size = bi->tuple_size;
> + goto out;
Purely cosmetic, but the mix of this and the later switch looks a
bit odd. Instead I'd..
> + }
> + meta_cap.lbmd_pi_size = bi->pi_size;
> + meta_cap.lbmd_pi_offset = bi->pi_offset;
> + meta_cap.lbmd_opaque_size = bi->tuple_size - bi->pi_size;
> + if (meta_cap.lbmd_opaque_size && !bi->pi_offset)
> + meta_cap.lbmd_opaque_offset = bi->pi_size;
> +
> + meta_cap.lbmd_guard_tag_type = bi->csum_type;
... keep this in the common branch. All the assignment should do
the right thing even for the non-PI metadata case even if they
do a little extra work.
> + meta_cap.lbmd_app_tag_size = 2;
And then just guard this with:
if (bi->csum_type != BLK_INTEGRITY_CSUM_NONE)
meta_cap.lbmd_app_tag_size = 2;
> +
> +/*
> + * struct logical_block_metadata_cap - Logical block metadata
> + * @lbmd_flags: Bitmask of logical block metadata capability flags
> + * @lbmd_interval: The amount of data described by each unit of logical block metadata
> + * @lbmd_size: Size in bytes of the logical block metadata associated with each interval
> + * @lbmd_opaque_size: Size in bytes of the opaque block tag associated with each interval
> + * @lbmd_opaque_offset: Offset in bytes of the opaque block tag within the logical block metadata
> + * @lbmd_pi_size: Size in bytes of the T10 PI tuple associated with each interval
> + * @lbmd_pi_offset: Offset in bytes of T10 PI tuple within the logical block metadata
> + * @lbmd_pi_guard_tag_type: T10 PI guard tag type
> + * @lbmd_pi_app_tag_size: Size in bytes of the T10 PI application tag
> + * @lbmd_pi_ref_tag_size: Size in bytes of the T10 PI reference tag
> + * @lbmd_pi_storage_tag_size: Size in bytes of the T10 PI storage tag
> + * @lbmd_rsvd: Reserved for future use
I find this style of comment really hard to read due to the long
lines vs just comments next to the fields. But it's become
fashionable lately, so..
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-next v3 2/2] fs: add ioctl to query metadata and protection info capabilities
2025-06-10 13:22 ` [PATCH for-next v3 2/2] fs: add ioctl to query metadata and protection info capabilities Anuj Gupta
2025-06-11 3:53 ` Christoph Hellwig
@ 2025-06-11 10:23 ` Christian Brauner
2025-06-12 5:15 ` Christoph Hellwig
2025-06-18 5:56 ` Anuj gupta
1 sibling, 2 replies; 10+ messages in thread
From: Christian Brauner @ 2025-06-11 10:23 UTC (permalink / raw)
To: Anuj Gupta
Cc: vincent.fu, jack, anuj1072538, axboe, viro, hch, martin.petersen,
ebiggers, adilger, linux-block, linux-fsdevel, joshi.k
On Tue, Jun 10, 2025 at 06:52:54PM +0530, Anuj Gupta wrote:
> Add a new ioctl, FS_IOC_GETLBMD_CAP, to query metadata and protection
> info (PI) capabilities. This ioctl returns information about the files
> integrity profile. This is useful for userspace applications to
> understand a files end-to-end data protection support and configure the
> I/O accordingly.
>
> For now this interface is only supported by block devices. However the
> design and placement of this ioctl in generic FS ioctl space allows us
> to extend it to work over files as well. This maybe useful when
> filesystems start supporting PI-aware layouts.
>
> A new structure struct logical_block_metadata_cap is introduced, which
> contains the following fields:
>
> 1. lbmd_flags: bitmask of logical block metadata capability flags
> 2. lbmd_interval: the amount of data described by each unit of logical
> block metadata
> 3. lbmd_size: size in bytes of the logical block metadata associated
> with each interval
> 4. lbmd_opaque_size: size in bytes of the opaque block tag associated
> with each interval
> 5. lbmd_opaque_offset: offset in bytes of the opaque block tag within
> the logical block metadata
> 6. lbmd_pi_size: size in bytes of the T10 PI tuple associated with each
> interval
> 7. lbmd_pi_offset: offset in bytes of T10 PI tuple within the logical
> block metadata
> 8. lbmd_pi_guard_tag_type: T10 PI guard tag type
> 9. lbmd_pi_app_tag_size: size in bytes of the T10 PI application tag
> 10. lbmd_pi_ref_tag_size: size in bytes of the T10 PI reference tag
> 11. lbmd_pi_storage_tag_size: size in bytes of the T10 PI storage tag
> 12. lbmd_rsvd: reserved for future use
>
> The internal logic to fetch the capability is encapsulated in a helper
> function blk_get_meta_cap(), which uses the blk_integrity profile
> associated with the device. The ioctl returns -EOPNOTSUPP, if
> CONFIG_BLK_DEV_INTEGRITY is not enabled.
>
> Suggested-by: Martin K. Petersen <martin.petersen@oracle.com>
> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
> block/blk-integrity.c | 53 +++++++++++++++++++++++++++++++++++
> block/ioctl.c | 3 ++
> include/linux/blk-integrity.h | 7 +++++
> include/uapi/linux/fs.h | 43 ++++++++++++++++++++++++++++
> 4 files changed, 106 insertions(+)
>
> diff --git a/block/blk-integrity.c b/block/blk-integrity.c
> index e4e2567061f9..f9ad5bdb84f5 100644
> --- a/block/blk-integrity.c
> +++ b/block/blk-integrity.c
> @@ -13,6 +13,7 @@
> #include <linux/scatterlist.h>
> #include <linux/export.h>
> #include <linux/slab.h>
> +#include <linux/t10-pi.h>
>
> #include "blk.h"
>
> @@ -54,6 +55,58 @@ int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *bio)
> return segments;
> }
>
> +int blk_get_meta_cap(struct block_device *bdev,
> + struct logical_block_metadata_cap __user *argp)
> +{
> + struct blk_integrity *bi = blk_get_integrity(bdev->bd_disk);
> + struct logical_block_metadata_cap meta_cap = {};
> +
> + if (!bi)
> + goto out;
> +
> + if (bi->flags & BLK_INTEGRITY_DEVICE_CAPABLE)
> + meta_cap.lbmd_flags |= LBMD_PI_CAP_INTEGRITY;
> + if (bi->flags & BLK_INTEGRITY_REF_TAG)
> + meta_cap.lbmd_flags |= LBMD_PI_CAP_REFTAG;
> + meta_cap.lbmd_interval = 1 << bi->interval_exp;
> + meta_cap.lbmd_size = bi->tuple_size;
> + if (bi->csum_type == BLK_INTEGRITY_CSUM_NONE) {
> + /* treat entire tuple as opaque block tag */
> + meta_cap.lbmd_opaque_size = bi->tuple_size;
> + goto out;
> + }
> + meta_cap.lbmd_pi_size = bi->pi_size;
> + meta_cap.lbmd_pi_offset = bi->pi_offset;
> + meta_cap.lbmd_opaque_size = bi->tuple_size - bi->pi_size;
> + if (meta_cap.lbmd_opaque_size && !bi->pi_offset)
> + meta_cap.lbmd_opaque_offset = bi->pi_size;
> +
> + meta_cap.lbmd_guard_tag_type = bi->csum_type;
> + meta_cap.lbmd_app_tag_size = 2;
> +
> + if (bi->flags & BLK_INTEGRITY_REF_TAG) {
> + switch (bi->csum_type) {
> + case BLK_INTEGRITY_CSUM_CRC64:
> + meta_cap.lbmd_ref_tag_size =
> + sizeof_field(struct crc64_pi_tuple, ref_tag);
> + break;
> + case BLK_INTEGRITY_CSUM_CRC:
> + case BLK_INTEGRITY_CSUM_IP:
> + meta_cap.lbmd_ref_tag_size =
> + sizeof_field(struct t10_pi_tuple, ref_tag);
> + break;
> + default:
> + break;
> + }
> + }
> +
> +out:
> + if (copy_to_user(argp, &meta_cap,
> + sizeof(struct logical_block_metadata_cap)))
> + return -EFAULT;
> + return 0;
> +}
> +
> /**
> * blk_rq_map_integrity_sg - Map integrity metadata into a scatterlist
> * @rq: request to map
> diff --git a/block/ioctl.c b/block/ioctl.c
> index e472cc1030c6..19782f7b5ff1 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -13,6 +13,7 @@
> #include <linux/uaccess.h>
> #include <linux/pagemap.h>
> #include <linux/io_uring/cmd.h>
> +#include <linux/blk-integrity.h>
> #include <uapi/linux/blkdev.h>
> #include "blk.h"
> #include "blk-crypto-internal.h"
> @@ -643,6 +644,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, blk_mode_t mode,
> return blkdev_pr_preempt(bdev, mode, argp, true);
> case IOC_PR_CLEAR:
> return blkdev_pr_clear(bdev, mode, argp);
> + case FS_IOC_GETLBMD_CAP:
> + return blk_get_meta_cap(bdev, argp);
> default:
> return -ENOIOCTLCMD;
> }
> diff --git a/include/linux/blk-integrity.h b/include/linux/blk-integrity.h
> index c7eae0bfb013..b4aff4dff843 100644
> --- a/include/linux/blk-integrity.h
> +++ b/include/linux/blk-integrity.h
> @@ -29,6 +29,8 @@ int blk_rq_map_integrity_sg(struct request *, struct scatterlist *);
> int blk_rq_count_integrity_sg(struct request_queue *, struct bio *);
> int blk_rq_integrity_map_user(struct request *rq, void __user *ubuf,
> ssize_t bytes);
> +int blk_get_meta_cap(struct block_device *bdev,
> + struct logical_block_metadata_cap __user *argp);
>
> static inline bool
> blk_integrity_queue_supports_integrity(struct request_queue *q)
> @@ -92,6 +94,11 @@ static inline struct bio_vec rq_integrity_vec(struct request *rq)
> rq->bio->bi_integrity->bip_iter);
> }
> #else /* CONFIG_BLK_DEV_INTEGRITY */
> +static inline int blk_get_meta_cap(struct block_device *bdev,
> + struct logical_block_metadata_cap __user *argp)
> +{
> + return -EOPNOTSUPP;
> +}
> static inline int blk_rq_count_integrity_sg(struct request_queue *q,
> struct bio *b)
> {
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 0098b0ce8ccb..70350d5a4cd6 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -91,6 +91,47 @@ struct fs_sysfs_path {
> __u8 name[128];
> };
>
> +/* Protection info capability flags */
> +#define LBMD_PI_CAP_INTEGRITY (1 << 0)
> +#define LBMD_PI_CAP_REFTAG (1 << 1)
> +
> +/* Checksum types for Protection Information */
> +#define LBMD_PI_CSUM_NONE 0
> +#define LBMD_PI_CSUM_IP 1
> +#define LBMD_PI_CSUM_CRC16_T10DIF 2
> +#define LBMD_PI_CSUM_CRC64_NVME 4
> +
> +/*
> + * struct logical_block_metadata_cap - Logical block metadata
> + * @lbmd_flags: Bitmask of logical block metadata capability flags
> + * @lbmd_interval: The amount of data described by each unit of logical block metadata
> + * @lbmd_size: Size in bytes of the logical block metadata associated with each interval
> + * @lbmd_opaque_size: Size in bytes of the opaque block tag associated with each interval
> + * @lbmd_opaque_offset: Offset in bytes of the opaque block tag within the logical block metadata
> + * @lbmd_pi_size: Size in bytes of the T10 PI tuple associated with each interval
> + * @lbmd_pi_offset: Offset in bytes of T10 PI tuple within the logical block metadata
> + * @lbmd_pi_guard_tag_type: T10 PI guard tag type
> + * @lbmd_pi_app_tag_size: Size in bytes of the T10 PI application tag
> + * @lbmd_pi_ref_tag_size: Size in bytes of the T10 PI reference tag
> + * @lbmd_pi_storage_tag_size: Size in bytes of the T10 PI storage tag
> + * @lbmd_rsvd: Reserved for future use
> + */
> +
> +struct logical_block_metadata_cap {
> + __u32 lbmd_flags;
> + __u16 lbmd_interval;
> + __u8 lbmd_size;
> + __u8 lbmd_opaque_size;
> + __u8 lbmd_opaque_offset;
> + __u8 lbmd_pi_size;
> + __u8 lbmd_pi_offset;
> + __u8 lbmd_guard_tag_type;
> + __u8 lbmd_app_tag_size;
> + __u8 lbmd_ref_tag_size;
> + __u8 lbmd_storage_tag_size;
> + __u8 lbmd_rsvd[17];
Don't do this hard-coded form of extensiblity. ioctl()s are inherently
extensible because they encode the size. Instead of switching on the
full ioctl, switch on the ioctl number. See for example fs/pidfs:
/* Extensible IOCTL. */
if (_IOC_NR(cmd) == _IOC_NR(PIDFD_GET_INFO))
return pidfd_info(file, cmd, arg);
static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
{
struct pidfd_info __user *uinfo = (struct pidfd_info __user *)arg;
<snip>
size_t usize = _IOC_SIZE(cmd);
struct pidfd_info kinfo = {};
if (!uinfo)
return -EINVAL;
if (usize < PIDFD_INFO_SIZE_VER0)
return -EINVAL; /* First version, no smaller struct possible */
pidfs uses a mask field to allow request-response modification:
if (copy_from_user(&mask, &uinfo->mask, sizeof(mask)))
return -EFAULT;
Fill in kinfo struct with the info you know about or that userspace
requested:
kinfo.ppid = task_ppid_nr_ns(task, NULL);
kinfo.tgid = task_tgid_vnr(task);
kinfo.pid = task_pid_vnr(task);
kinfo.mask |= PIDFD_INFO_PID;
Then copy the portion out that userspace knows about. We have a
dedicated helper for that:
/*
* If userspace and the kernel have the same struct size it can just
* be copied. If userspace provides an older struct, only the bits that
* userspace knows about will be copied. If userspace provides a new
* struct, only the bits that the kernel knows about will be copied.
*/
return copy_struct_to_user(uinfo, usize, &kinfo, sizeof(kinfo), NULL);
}
(Only requirement is that a zero value means "no info", i.e., can't be a
valid value. If you want zero to be a valid value then a mask member
might be helpful where the info that was available is raised.)
This whole approach is well-tested and works. You can grow the struct as
needed. If userspace doesn't care about any additional info even if the
struct grows it can just always request the minimal info and nothing
extra will ever be copied.
> +};
> +
> /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
> #define FILE_DEDUPE_RANGE_SAME 0
> #define FILE_DEDUPE_RANGE_DIFFERS 1
> @@ -247,6 +288,8 @@ struct fsxattr {
> * also /sys/kernel/debug/ for filesystems with debugfs exports
> */
> #define FS_IOC_GETFSSYSFSPATH _IOR(0x15, 1, struct fs_sysfs_path)
> +/* Get logical block metadata capability details */
> +#define FS_IOC_GETLBMD_CAP _IOR(0x15, 2, struct logical_block_metadata_cap)
>
> /*
> * Inode flags (FS_IOC_GETFLAGS / FS_IOC_SETFLAGS)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-next v3 2/2] fs: add ioctl to query metadata and protection info capabilities
2025-06-11 10:23 ` Christian Brauner
@ 2025-06-12 5:15 ` Christoph Hellwig
2025-06-12 12:24 ` Christian Brauner
2025-06-18 5:56 ` Anuj gupta
1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-06-12 5:15 UTC (permalink / raw)
To: Christian Brauner
Cc: Anuj Gupta, vincent.fu, jack, anuj1072538, axboe, viro, hch,
martin.petersen, ebiggers, adilger, linux-block, linux-fsdevel,
joshi.k
On Wed, Jun 11, 2025 at 12:23:00PM +0200, Christian Brauner wrote:
> > +struct logical_block_metadata_cap {
> > + __u32 lbmd_flags;
> > + __u16 lbmd_interval;
> > + __u8 lbmd_size;
> > + __u8 lbmd_opaque_size;
> > + __u8 lbmd_opaque_offset;
> > + __u8 lbmd_pi_size;
> > + __u8 lbmd_pi_offset;
> > + __u8 lbmd_guard_tag_type;
> > + __u8 lbmd_app_tag_size;
> > + __u8 lbmd_ref_tag_size;
> > + __u8 lbmd_storage_tag_size;
> > + __u8 lbmd_rsvd[17];
>
> Don't do this hard-coded form of extensiblity. ioctl()s are inherently
> extensible because they encode the size. Instead of switching on the
> full ioctl, switch on the ioctl number. See for example fs/pidfs:
Umm, yes and no. The size encoding in the ioctl is great. But having
a few fields in a structure that already has flags allows for much
easier extensions for small amounts of data. Without the reserved
fields, this structure is 15 bytes long. So we'll need at least 1
do pad to a natural alignment. I think adding another 16 (aka
two u64s) seems pretty reasonable for painless extensions.
---end quoted text---
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-next v3 2/2] fs: add ioctl to query metadata and protection info capabilities
2025-06-12 5:15 ` Christoph Hellwig
@ 2025-06-12 12:24 ` Christian Brauner
0 siblings, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2025-06-12 12:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Anuj Gupta, vincent.fu, jack, anuj1072538, axboe, viro,
martin.petersen, ebiggers, adilger, linux-block, linux-fsdevel,
joshi.k
On Wed, Jun 11, 2025 at 10:15:37PM -0700, Christoph Hellwig wrote:
> On Wed, Jun 11, 2025 at 12:23:00PM +0200, Christian Brauner wrote:
> > > +struct logical_block_metadata_cap {
> > > + __u32 lbmd_flags;
> > > + __u16 lbmd_interval;
> > > + __u8 lbmd_size;
> > > + __u8 lbmd_opaque_size;
> > > + __u8 lbmd_opaque_offset;
> > > + __u8 lbmd_pi_size;
> > > + __u8 lbmd_pi_offset;
> > > + __u8 lbmd_guard_tag_type;
> > > + __u8 lbmd_app_tag_size;
> > > + __u8 lbmd_ref_tag_size;
> > > + __u8 lbmd_storage_tag_size;
> > > + __u8 lbmd_rsvd[17];
> >
> > Don't do this hard-coded form of extensiblity. ioctl()s are inherently
> > extensible because they encode the size. Instead of switching on the
> > full ioctl, switch on the ioctl number. See for example fs/pidfs:
>
> Umm, yes and no. The size encoding in the ioctl is great. But having
> a few fields in a structure that already has flags allows for much
> easier extensions for small amounts of data. Without the reserved
> fields, this structure is 15 bytes long. So we'll need at least 1
> do pad to a natural alignment. I think adding another 16 (aka
> two u64s) seems pretty reasonable for painless extensions.
I'm really against having structures that can't simply grow as needed in
2025. That has bitten us so often I don't see the point in perpetuating
this fixed-size stuff especially since userspace is very very used to
this extensibility by now. And also because you don't have to
pointlessly copy data you don't need in and out of the kernel on
principle alone.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-next v3 1/2] block: introduce pi_size field in blk_integrity
2025-06-10 13:22 ` [PATCH for-next v3 1/2] block: introduce pi_size field in blk_integrity Anuj Gupta
2025-06-11 3:44 ` Christoph Hellwig
@ 2025-06-13 1:51 ` Martin K. Petersen
1 sibling, 0 replies; 10+ messages in thread
From: Martin K. Petersen @ 2025-06-13 1:51 UTC (permalink / raw)
To: Anuj Gupta
Cc: vincent.fu, jack, anuj1072538, axboe, viro, brauner, hch,
martin.petersen, ebiggers, adilger, linux-block, linux-fsdevel,
joshi.k, Christoph Hellwig
Hi Anuj!
> Introduce a new pi_size field in struct blk_integrity to explicitly
> represent the size (in bytes) of the protection information (PI)
> tuple. This is a prep patch.
Instead of introducing a new thing which means what the old thing was
supposed to mean, I'd prefer to change the existing tuple_size to
metadata_size. There aren't that many occurrences in the code so it
would be a largely mechanical change.
This patch would then become the second in the series introducing a
pi_tuple_size member to explicitly define the size of the PI tuple (if
any).
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-next v3 2/2] fs: add ioctl to query metadata and protection info capabilities
2025-06-11 10:23 ` Christian Brauner
2025-06-12 5:15 ` Christoph Hellwig
@ 2025-06-18 5:56 ` Anuj gupta
1 sibling, 0 replies; 10+ messages in thread
From: Anuj gupta @ 2025-06-18 5:56 UTC (permalink / raw)
To: Christian Brauner
Cc: Anuj Gupta, vincent.fu, jack, axboe, viro, hch, martin.petersen,
ebiggers, adilger, linux-block, linux-fsdevel, joshi.k
> Don't do this hard-coded form of extensiblity. ioctl()s are inherently
> extensible because they encode the size. Instead of switching on the
> full ioctl, switch on the ioctl number. See for example fs/pidfs:
>
> /* Extensible IOCTL. */
> if (_IOC_NR(cmd) == _IOC_NR(PIDFD_GET_INFO))
> return pidfd_info(file, cmd, arg);
>
> static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
> {
> struct pidfd_info __user *uinfo = (struct pidfd_info __user *)arg;
> <snip>
> size_t usize = _IOC_SIZE(cmd);
> struct pidfd_info kinfo = {};
>
> if (!uinfo)
> return -EINVAL;
> if (usize < PIDFD_INFO_SIZE_VER0)
> return -EINVAL; /* First version, no smaller struct possible */
>
> pidfs uses a mask field to allow request-response modification:
Thanks for the detailed feedback — very helpful.
For now, I'll keep it simple and skip adding a mask field since all
fields in the struct are always returned.
> (Only requirement is that a zero value means "no info", i.e., can't be a
> valid value. If you want zero to be a valid value then a mask member
> might be helpful where the info that was available is raised.)
To clarify on the zero values: the fields in this struct are
capability fields, where a zero value indicates that the hardware
doesn’t support the corresponding feature. None of the fields have zero
as a valid value when the feature is supported, so a mask isn’t
necessary.
I sent another version [1] with your feedback applied. Please see if it
aligns with what you had in mind.
[1] https://lore.kernel.org/linux-block/20250618055153.48823-1-anuj20.g@samsung.com/
--
Anuj Gupta
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-06-18 5:57 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20250610132307epcas5p4c6c107e84642a1367600afe9167655b8@epcas5p4.samsung.com>
2025-06-10 13:22 ` [PATCH for-next v3 0/2] add ioctl to query metadata and protection info capabilities Anuj Gupta
[not found] ` <CGME20250610132312epcas5p20cdd1a3119df8ffc68770f06745e8481@epcas5p2.samsung.com>
2025-06-10 13:22 ` [PATCH for-next v3 1/2] block: introduce pi_size field in blk_integrity Anuj Gupta
2025-06-11 3:44 ` Christoph Hellwig
2025-06-13 1:51 ` Martin K. Petersen
[not found] ` <CGME20250610132317epcas5p442ce20c039224fb691ab0ba03fcb21e7@epcas5p4.samsung.com>
2025-06-10 13:22 ` [PATCH for-next v3 2/2] fs: add ioctl to query metadata and protection info capabilities Anuj Gupta
2025-06-11 3:53 ` Christoph Hellwig
2025-06-11 10:23 ` Christian Brauner
2025-06-12 5:15 ` Christoph Hellwig
2025-06-12 12:24 ` Christian Brauner
2025-06-18 5:56 ` Anuj gupta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).