fio.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] io_uring r/w with metadata
@ 2025-07-23 17:04 Vincent Fu
  2025-07-23 17:04 ` [PATCH 1/2] engines/io_uring: support " Vincent Fu
  2025-07-23 17:04 ` [PATCH 2/2] t/io_uring_pi: test script for io_uring PI Vincent Fu
  0 siblings, 2 replies; 7+ messages in thread
From: Vincent Fu @ 2025-07-23 17:04 UTC (permalink / raw)
  To: anuj1072538, axboe, hch, joshi.k, anuj20.g, fio; +Cc: Vincent Fu

These two patches add support for io_uring read and write operations for
block devices with metadata. The first patch adds this support to the
io_uring ioengine and the second patch adds a script for this feature
using NVMe devices.

This code has been tested on the kernel below:
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.17.integrity

with the patch below applied

https://lore.kernel.org/linux-block/20250722120755.87501-1-anuj20.g@samsung.com/

This code has passed testing under the following settings:

QEMU NVMe device 512B data size; 8B, 16B, 64B metadata size; 16b Guard PI
QEMU NVMe device 4K data size; 16B, 64B metadata size, 64b Guard PI

I have not been able to run the following tests due to platform
issues:

QEMU NVMe device 4K data size; any metadata size; 16b Guard PI
   nvme format fails
scsi-debug device 
   module is loaded with dif=1,2,or 3 but PI support is not detected

Would appreciate help anyone can provide for the missing test platforms.

Vincent Fu (2):
  engines/io_uring: support r/w with metadata
  t/io_uring_pi: test script for io_uring PI

 engines/io_uring.c  | 241 ++++++++++++++++++++++++--
 engines/nvme.c      |  35 ++--
 engines/nvme.h      |  18 ++
 io_u.h              |   1 +
 os/linux/io_uring.h |  15 ++
 t/io_uring_pi.py    | 408 ++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 681 insertions(+), 37 deletions(-)
 create mode 100644 t/io_uring_pi.py

-- 
2.47.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] engines/io_uring: support r/w with metadata
  2025-07-23 17:04 [PATCH 0/2] io_uring r/w with metadata Vincent Fu
@ 2025-07-23 17:04 ` Vincent Fu
  2025-07-23 17:23   ` Jens Axboe
  2025-07-23 17:04 ` [PATCH 2/2] t/io_uring_pi: test script for io_uring PI Vincent Fu
  1 sibling, 1 reply; 7+ messages in thread
From: Vincent Fu @ 2025-07-23 17:04 UTC (permalink / raw)
  To: anuj1072538, axboe, hch, joshi.k, anuj20.g, fio; +Cc: Vincent Fu

Use the new IOCTL to query block devices for metadata support. Then use
the new io_uring capability to send read and write commands with
metadata.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
---
 engines/io_uring.c  | 241 +++++++++++++++++++++++++++++++++++++++++---
 engines/nvme.c      |  35 +++----
 engines/nvme.h      |  18 ++++
 io_u.h              |   1 +
 os/linux/io_uring.h |  15 +++
 5 files changed, 273 insertions(+), 37 deletions(-)

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 87018f84..180db5b5 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -30,6 +30,71 @@
 
 #include <sys/stat.h>
 
+#ifndef IO_INTEGRITY_CHK_GUARD
+/* flags for integrity meta */
+#define IO_INTEGRITY_CHK_GUARD		(1U << 0) /* enforce guard check */
+#define IO_INTEGRITY_CHK_REFTAG		(1U << 1) /* enforce ref check */
+#define IO_INTEGRITY_CHK_APPTAG		(1U << 2) /* enforce app check */
+#endif /* IO_INTEGRITY_CHK_GUARD */
+
+#ifndef FS_IOC_GETLBMD_CAP
+/* Protection info capability flags */
+#define	LBMD_PI_CAP_INTEGRITY		(1 << 0)
+#define	LBMD_PI_CAP_REFTAG		(1 << 1)
+
+/* Checksum types for Protection Information */
+#define LBMD_PI_CSUM_NONE		0
+#define LBMD_PI_CSUM_IP			1
+#define LBMD_PI_CSUM_CRC16_T10DIF	2
+#define LBMD_PI_CSUM_CRC64_NVME		4
+
+/*
+ * Logical block metadata capability descriptor
+ * If the device does not support metadata, all the fields will be zero.
+ * Applications must check lbmd_flags to determine whether metadata is
+ * supported or not.
+ */
+struct logical_block_metadata_cap {
+	/* Bitmask of logical block metadata capability flags */
+	__u32	lbmd_flags;
+	/*
+	 * The amount of data described by each unit of logical block
+	 * metadata
+	 */
+	__u16	lbmd_interval;
+	/*
+	 * Size in bytes of the logical block metadata associated with each
+	 * interval
+	 */
+	__u8	lbmd_size;
+	/*
+	 * Size in bytes of the opaque block tag associated with each
+	 * interval
+	 */
+	__u8	lbmd_opaque_size;
+	/*
+	 * Offset in bytes of the opaque block tag within the logical block
+	 * metadata
+	 */
+	__u8	lbmd_opaque_offset;
+	/* Size in bytes of the T10 PI tuple associated with each interval */
+	__u8	lbmd_pi_size;
+	/* Offset in bytes of T10 PI tuple within the logical block metadata */
+	__u8	lbmd_pi_offset;
+	/* T10 PI guard tag type */
+	__u8	lbmd_guard_tag_type;
+	/* Size in bytes of the T10 PI application tag */
+	__u8	lbmd_app_tag_size;
+	/* Size in bytes of the T10 PI reference tag */
+	__u8	lbmd_ref_tag_size;
+	/* Size in bytes of the T10 PI storage tag */
+	__u8	lbmd_storage_tag_size;
+	__u8	pad;
+};
+
+#define FS_IOC_GETLBMD_CAP			_IOWR(0x15, 2, struct logical_block_metadata_cap)
+#endif /* FS_IOC_GETLBMD_CAP */
+
 enum uring_cmd_type {
 	FIO_URING_CMD_NVME = 1,
 };
@@ -73,6 +138,7 @@ struct ioring_data {
 
 	struct io_u **io_u_index;
 	char *md_buf;
+	char *pi_attr;
 
 	int *fds;
 
@@ -436,6 +502,27 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 				sqe->len = 1;
 			}
 		}
+		if (o->md_per_io_size) {
+			struct io_uring_attr_pi *pi_attr = io_u->pi_attr;
+			struct nvme_data *data = FILE_ENG_DATA(io_u->file);
+
+			sqe->attr_type_mask = IORING_RW_ATTR_FLAG_PI;
+			sqe->attr_ptr = (__u64)(uintptr_t)pi_attr;
+			pi_attr->addr = (__u64)(uintptr_t)io_u->mmap_data;
+			pi_attr->len = o->md_per_io_size;
+			pi_attr->app_tag = o->apptag;
+			pi_attr->flags = 0;
+			if (strstr(o->pi_chk, "GUARD") != NULL)
+				pi_attr->flags |= IO_INTEGRITY_CHK_GUARD;
+			if (strstr(o->pi_chk, "REFTAG") != NULL) {
+				__u64 slba = get_slba(data, io_u->offset);
+
+				pi_attr->flags |= IO_INTEGRITY_CHK_REFTAG;
+				pi_attr->seed = (__u32)slba;
+			}
+			if (strstr(o->pi_chk, "APPTAG") != NULL)
+				pi_attr->flags |= IO_INTEGRITY_CHK_APPTAG;
+		}
 		sqe->rw_flags = 0;
 		if (!td->o.odirect && o->uncached)
 			sqe->rw_flags |= RWF_DONTCACHE;
@@ -565,6 +652,7 @@ static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
 static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 {
 	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
 	struct io_uring_cqe *cqe;
 	struct io_u *io_u;
 	unsigned index;
@@ -590,6 +678,19 @@ static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 			io_u->error = -cqe->res;
 		else
 			io_u->resid = io_u->xfer_buflen - cqe->res;
+		return io_u;
+	}
+
+	if (o->md_per_io_size) {
+		struct nvme_data *data;
+		int ret;
+
+		data = FILE_ENG_DATA(io_u->file);
+		if (data->pi_type && (io_u->ddir == DDIR_READ) && !o->pi_act) {
+			ret = fio_nvme_pi_verify(data, io_u);
+			if (ret)
+				io_u->error = ret;
+		}
 	}
 
 	return io_u;
@@ -802,6 +903,21 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 		o->cmd_type == FIO_URING_CMD_NVME)
 		fio_ioring_cmd_nvme_pi(td, io_u);
 
+	if (!strcmp(td->io_ops->name, "io_uring") && o->md_per_io_size) {
+		struct nvme_data *data = FILE_ENG_DATA(io_u->file);
+		struct nvme_cmd_ext_io_opts ext_opts = {0};
+
+		if (data->pi_type) {
+			if (o->pi_act)
+				ext_opts.io_flags |= NVME_IO_PRINFO_PRACT;
+
+			ext_opts.io_flags |= o->prchk;
+			ext_opts.apptag = o->apptag;
+			ext_opts.apptag_mask = o->apptag_mask;
+		}
+		fio_nvme_generate_guard(io_u, &ext_opts);
+	}
+
 	tail = *ring->tail;
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
 	atomic_store_release(ring->tail, tail + 1);
@@ -920,6 +1036,7 @@ static void fio_ioring_cleanup(struct thread_data *td)
 		fio_cmdprio_cleanup(&ld->cmdprio);
 		free(ld->io_u_index);
 		free(ld->md_buf);
+		free(ld->pi_attr);
 		free(ld->iovecs);
 		free(ld->fds);
 		free(ld->dsm);
@@ -1347,12 +1464,20 @@ static int fio_ioring_init(struct thread_data *td)
 	/* io_u index */
 	ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
 
+	if (!strcmp(td->io_ops->name, "io_uring") && o->md_per_io_size) {
+		if (o->apptag_mask != 0xffff) {
+			log_err("fio: io_uring with metadata requires an apptag_mask of 0xffff\n");
+			return 1;
+		}
+	}
+
 	/*
 	 * metadata buffer for nvme command.
 	 * We are only supporting iomem=malloc / mem=malloc as of now.
 	 */
-	if (!strcmp(td->io_ops->name, "io_uring_cmd") &&
-	    (o->cmd_type == FIO_URING_CMD_NVME) && o->md_per_io_size) {
+	if ((!strcmp(td->io_ops->name, "io_uring_cmd") &&
+	    (o->cmd_type == FIO_URING_CMD_NVME) && o->md_per_io_size) ||
+	    (!strcmp(td->io_ops->name, "io_uring") && o->md_per_io_size)) {
 		md_size = (unsigned long long) o->md_per_io_size
 				* (unsigned long long) td->o.iodepth;
 		md_size += page_mask + td->o.mem_align;
@@ -1363,6 +1488,12 @@ static int fio_ioring_init(struct thread_data *td)
 			free(ld);
 			return 1;
 		}
+		ld->pi_attr = calloc(ld->iodepth, sizeof(struct io_uring_attr_pi));
+		if (!ld->pi_attr) {
+			free(ld->md_buf);
+			free(ld);
+			return 1;
+		}
 	}
 	parse_prchk_flags(o);
 
@@ -1429,22 +1560,23 @@ static int fio_ioring_io_u_init(struct thread_data *td, struct io_u *io_u)
 	struct ioring_data *ld = td->io_ops_data;
 	struct ioring_options *o = td->eo;
 	struct nvme_pi_data *pi_data;
-	char *p;
+	char *p, *q;
 
 	ld->io_u_index[io_u->index] = io_u;
 
-	if (!strcmp(td->io_ops->name, "io_uring_cmd")) {
-		p = PTR_ALIGN(ld->md_buf, page_mask) + td->o.mem_align;
-		p += o->md_per_io_size * io_u->index;
-		io_u->mmap_data = p;
-
-		if (!o->pi_act) {
-			pi_data = calloc(1, sizeof(*pi_data));
-			pi_data->io_flags |= o->prchk;
-			pi_data->apptag_mask = o->apptag_mask;
-			pi_data->apptag = o->apptag;
-			io_u->engine_data = pi_data;
-		}
+	p = PTR_ALIGN(ld->md_buf, page_mask) + td->o.mem_align;
+	p += o->md_per_io_size * io_u->index;
+	io_u->mmap_data = p;
+	q = ld->pi_attr;
+	q += (sizeof(struct io_uring_attr_pi) * io_u->index);
+	io_u->pi_attr = q;
+
+	if (!o->pi_act) {
+		pi_data = calloc(1, sizeof(*pi_data));
+		pi_data->io_flags |= o->prchk;
+		pi_data->apptag_mask = o->apptag_mask;
+		pi_data->apptag = o->apptag;
+		io_u->engine_data = pi_data;
 	}
 
 	return 0;
@@ -1463,11 +1595,90 @@ static void fio_ioring_io_u_free(struct thread_data *td, struct io_u *io_u)
 	}
 }
 
+static int fio_get_pi_info(struct fio_file *f, __u64 *nlba, __u32 pi_act,
+			     struct nvme_data *data)
+{
+	struct logical_block_metadata_cap md_cap;
+	int ret;
+	int fd, err = 0;
+
+	fd = open(f->file_name, O_RDONLY);
+	if (fd < 0)
+		return -errno;
+
+	ret = ioctl(fd, FS_IOC_GETLBMD_CAP, &md_cap);
+	if (ret < 0) {
+		err = -errno;
+		log_err("%s: failed to query protection information capabilities; error %d\n", f->file_name, errno);
+		goto out;
+	}
+
+	if (!(md_cap.lbmd_flags & LBMD_PI_CAP_INTEGRITY)) {
+		log_err("%s: Protection information not supported\n", f->file_name);
+		err = -ENOTSUP;
+		goto out;
+	}
+
+	/* Currently we don't support storage tags */
+	if (md_cap.lbmd_storage_tag_size) {
+		log_err("%s: Storage tag not supported\n", f->file_name);
+		err = -ENOTSUP;
+		goto out;
+	}
+
+	data->lba_size = md_cap.lbmd_interval;
+	data->lba_shift = ilog2(data->lba_size);
+	data->ms = md_cap.lbmd_size;
+	data->pi_size = md_cap.lbmd_pi_size;
+	data->pi_loc = !(md_cap.lbmd_pi_offset);
+
+	/* Assume Type 1 PI if reference tags supported */
+	if (md_cap.lbmd_flags & LBMD_PI_CAP_REFTAG)
+		data->pi_type = NVME_NS_DPS_PI_TYPE1;
+	else
+		data->pi_type = NVME_NS_DPS_PI_TYPE3;
+
+	switch (md_cap.lbmd_guard_tag_type) {
+	case LBMD_PI_CSUM_CRC16_T10DIF:
+		data->guard_type = NVME_NVM_NS_16B_GUARD;
+		break;
+	case LBMD_PI_CSUM_CRC64_NVME:
+		data->guard_type = NVME_NVM_NS_64B_GUARD;
+		break;
+	default:
+		log_err("%s: unsupported checksum type %d\n", f->file_name, md_cap.lbmd_guard_tag_type);
+		err = -ENOTSUP;
+		goto out;
+	}
+
+out:
+	close(fd);
+	return err;
+}
+
 static int fio_ioring_open_file(struct thread_data *td, struct fio_file *f)
 {
 	struct ioring_data *ld = td->io_ops_data;
 	struct ioring_options *o = td->eo;
 
+	if (o->md_per_io_size) {
+		struct nvme_data *data = NULL;
+		__u64 nlba = 0;
+		int ret;
+
+		data = FILE_ENG_DATA(f);
+		if (data == NULL) {
+			data = calloc(1, sizeof(struct nvme_data));
+			ret = fio_get_pi_info(f, &nlba, o->pi_act, data);
+			if (ret) {
+				free(data);
+				return ret;
+			}
+
+			FILE_SET_ENG_DATA(f, data);
+		}
+	}
+
 	if (!ld || !o->registerfiles)
 		return generic_open_file(td, f);
 
diff --git a/engines/nvme.c b/engines/nvme.c
index 37a31e2f..4b3d3860 100644
--- a/engines/nvme.c
+++ b/engines/nvme.c
@@ -8,22 +8,6 @@
 #include "../crc/crc-t10dif.h"
 #include "../crc/crc64.h"
 
-static inline __u64 get_slba(struct nvme_data *data, __u64 offset)
-{
-	if (data->lba_ext)
-		return offset / data->lba_ext;
-
-	return offset >> data->lba_shift;
-}
-
-static inline __u32 get_nlb(struct nvme_data *data, __u64 len)
-{
-	if (data->lba_ext)
-		return len / data->lba_ext - 1;
-
-	return (len >> data->lba_shift) - 1;
-}
-
 static void fio_nvme_generate_pi_16b_guard(struct nvme_data *data,
 					   struct io_u *io_u,
 					   struct nvme_cmd_ext_io_opts *opts)
@@ -421,14 +405,9 @@ int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 	return 0;
 }
 
-void fio_nvme_pi_fill(struct nvme_uring_cmd *cmd, struct io_u *io_u,
-		      struct nvme_cmd_ext_io_opts *opts)
+void fio_nvme_generate_guard(struct io_u *io_u, struct nvme_cmd_ext_io_opts *opts)
 {
 	struct nvme_data *data = FILE_ENG_DATA(io_u->file);
-	__u64 slba;
-
-	slba = get_slba(data, io_u->offset);
-	cmd->cdw12 |= opts->io_flags;
 
 	if (data->pi_type && !(opts->io_flags & NVME_IO_PRINFO_PRACT)) {
 		if (data->guard_type == NVME_NVM_NS_16B_GUARD)
@@ -436,6 +415,18 @@ void fio_nvme_pi_fill(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 		else if (data->guard_type == NVME_NVM_NS_64B_GUARD)
 			fio_nvme_generate_pi_64b_guard(data, io_u, opts);
 	}
+}
+
+void fio_nvme_pi_fill(struct nvme_uring_cmd *cmd, struct io_u *io_u,
+		      struct nvme_cmd_ext_io_opts *opts)
+{
+	struct nvme_data *data = FILE_ENG_DATA(io_u->file);
+	__u64 slba;
+
+	slba = get_slba(data, io_u->offset);
+	cmd->cdw12 |= opts->io_flags;
+
+	fio_nvme_generate_guard(io_u, opts);
 
 	switch (data->pi_type) {
 	case NVME_NS_DPS_PI_TYPE1:
diff --git a/engines/nvme.h b/engines/nvme.h
index 60b38d7f..4371eb5b 100644
--- a/engines/nvme.h
+++ b/engines/nvme.h
@@ -438,6 +438,8 @@ int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 void fio_nvme_pi_fill(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 		      struct nvme_cmd_ext_io_opts *opts);
 
+void fio_nvme_generate_guard(struct io_u *io_u, struct nvme_cmd_ext_io_opts *opts);
+
 int fio_nvme_pi_verify(struct nvme_data *data, struct io_u *io_u);
 
 int fio_nvme_get_zoned_model(struct thread_data *td, struct fio_file *f,
@@ -476,4 +478,20 @@ static inline bool fio_nvme_pi_ref_escape(__u8 *reftag)
 	return memcmp(reftag, ref_esc, sizeof(ref_esc)) == 0;
 }
 
+static inline __u64 get_slba(struct nvme_data *data, __u64 offset)
+{
+	if (data->lba_ext)
+		return offset / data->lba_ext;
+
+	return offset >> data->lba_shift;
+}
+
+static inline __u32 get_nlb(struct nvme_data *data, __u64 len)
+{
+	if (data->lba_ext)
+		return len / data->lba_ext - 1;
+
+	return (len >> data->lba_shift) - 1;
+}
+
 #endif
diff --git a/io_u.h b/io_u.h
index 178c1229..2d20a2b2 100644
--- a/io_u.h
+++ b/io_u.h
@@ -145,6 +145,7 @@ struct io_u {
 #endif
 		void *mmap_data;
 	};
+	void *pi_attr;
 };
 
 /*
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index b3876381..7b099902 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -70,6 +70,10 @@ struct io_uring_sqe {
 			__u64	addr3;
 			__u64	__pad2[1];
 		};
+		struct {
+			__u64	attr_ptr; /* pointer to attribute information */
+			__u64	attr_type_mask; /* bit mask of attributes */
+		};
 		/*
 		 * If the ring is initialized with IORING_SETUP_SQE128, then
 		 * this field is used for 80 bytes of arbitrary command data
@@ -78,6 +82,17 @@ struct io_uring_sqe {
 	};
 };
 
+/* sqe->attr_type_mask flags */
+#define IORING_RW_ATTR_FLAG_PI	(1U << 0)
+/* PI attribute information */
+struct io_uring_attr_pi {
+		__u16	flags;
+		__u16	app_tag;
+		__u32	len;
+		__u64	addr;
+		__u64	seed;
+		__u64	rsvd;
+};
 enum {
 	IOSQE_FIXED_FILE_BIT,
 	IOSQE_IO_DRAIN_BIT,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] t/io_uring_pi: test script for io_uring PI
  2025-07-23 17:04 [PATCH 0/2] io_uring r/w with metadata Vincent Fu
  2025-07-23 17:04 ` [PATCH 1/2] engines/io_uring: support " Vincent Fu
@ 2025-07-23 17:04 ` Vincent Fu
  1 sibling, 0 replies; 7+ messages in thread
From: Vincent Fu @ 2025-07-23 17:04 UTC (permalink / raw)
  To: anuj1072538, axboe, hch, joshi.k, anuj20.g, fio; +Cc: Vincent Fu

Test io_uring metadata support on NVMe devices.
Limited to direct=1 and metadata in a separate buffer.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
---
 t/io_uring_pi.py | 408 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 408 insertions(+)
 create mode 100644 t/io_uring_pi.py

diff --git a/t/io_uring_pi.py b/t/io_uring_pi.py
new file mode 100644
index 00000000..bd92edfd
--- /dev/null
+++ b/t/io_uring_pi.py
@@ -0,0 +1,408 @@
+#!/usr/bin/env python3
+
+"""
+# io_uring_pi.py
+#
+# Test metadata support using the io_uring ioengine.
+#
+# USAGE
+# See python3 io_uring_pi.py --help
+#
+# EXAMPLES (THIS IS A DESTRUCTIVE TEST!!)
+# python3 t/io_uring_pi.py --dut /dev/nvme1n1 -f ./fio
+#
+# REQUIREMENTS
+# Python 3.6
+#
+"""
+
+import os
+import sys
+import json
+import time
+import locale
+import logging
+import argparse
+import itertools
+import subprocess
+from pathlib import Path
+from fiotestlib import FioJobCmdTest, run_fio_tests
+from fiotestcommon import SUCCESS_NONZERO
+
+
+NUMBER_IOS = 8192
+BS_LOW = 1
+BS_HIGH = 16
+
+class DifDixTest(FioJobCmdTest):
+    """
+    NVMe DIF/DIX test class.
+    """
+
+    def setup(self, parameters):
+        """Setup a test."""
+
+        fio_args = [
+            "--name=io_uring_pi",
+            "--ioengine=io_uring",
+            "--direct=1",
+            f"--filename={self.fio_opts['filename']}",
+            f"--rw={self.fio_opts['rw']}",
+            f"--bsrange={self.fio_opts['bsrange']}",
+            f"--output={os.path.basename(self.filenames['output'])}",
+            f"--md_per_io_size={self.fio_opts['md_per_io_size']}",
+            "--pi_act=0",
+            f"--pi_chk={self.fio_opts['pi_chk']}",
+            f"--apptag={self.fio_opts['apptag']}",
+            f"--apptag_mask={self.fio_opts['apptag_mask']}",
+        ]
+        for opt in ['fixedbufs', 'nonvectored', 'force_async', 'registerfiles',
+                    'sqthread_poll', 'sqthread_poll_cpu', 'hipri', 'nowait',
+                    'time_based', 'runtime', 'verify', 'io_size', 'offset', 'number_ios',
+                    'output-format']:
+            if opt in self.fio_opts:
+                option = f"--{opt}={self.fio_opts[opt]}"
+                fio_args.append(option)
+
+        super().setup(fio_args)
+
+
+TEST_LIST = [
+#
+# Write data with pi_act=0 and then read the data back
+#
+    {
+        # Write workload with variable IO sizes
+        # pi_act=0
+        "test_id": 101,
+        "fio_opts": {
+            "rw": 'write',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            "pi_act": 0,
+            },
+        "pi_chk": "GUARD,REFTAG,APPTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with fixed small IO size
+        # pi_act=0
+        "test_id": 102,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            "pi_act": 0,
+            },
+        "pi_chk": "GUARD,REFTAG,APPTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_LOW,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO size
+        # pi_act=0
+        "test_id": 103,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            "pi_act": 0,
+            },
+        "pi_chk": "GUARD,REFTAG,APPTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO size
+        # trigger apptag mismatch error
+        # pi_act=0
+        "test_id": 104,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "apptag": "0xA888",
+            "apptag_mask": "0xFFFF",
+            "pi_act": 0,
+            },
+        "pi_chk": "GUARD,REFTAG,APPTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "success": SUCCESS_NONZERO,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO size
+        # fails because apptag mask must be 0xFFFF
+        # pi_act=0
+        "test_id": 105,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "apptag": "0xF888",
+            "apptag_mask": "0x0FFF",
+            "pi_act": 0,
+            },
+        "pi_chk": "GUARD,REFTAG,APPTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "success": SUCCESS_NONZERO,
+        "test_class": DifDixTest,
+    },
+]
+
+
+def get_lbafs(args):
+    """
+    Determine which LBA formats to use. Use either the ones specified on the
+    command line or if none are specified query the device and use all lba
+    formats with metadata.
+    """
+    lbaf_list = []
+    id_ns_cmd = f"sudo nvme id-ns --output-format=json {args.dut}".split(' ')
+    id_ns_output = subprocess.check_output(id_ns_cmd)
+    lbafs = json.loads(id_ns_output)['lbafs']
+    if args.lbaf:
+        for lbaf in args.lbaf:
+            lbaf_list.append({'lbaf': lbaf, 'ds': 2 ** lbafs[lbaf]['ds'],
+                              'ms': lbafs[lbaf]['ms'], })
+            if lbafs[lbaf]['ms'] == 0:
+                print(f'Error: lbaf {lbaf} has metadata size zero')
+                sys.exit(1)
+    else:
+        for lbaf_num, lbaf in enumerate(lbafs):
+            if lbaf['ms'] != 0:
+                lbaf_list.append({'lbaf': lbaf_num, 'ds': 2 ** lbaf['ds'],
+                                  'ms': lbaf['ms'], })
+
+    return lbaf_list
+
+
+def get_guard_pi(lbaf_list, args):
+    """
+    Find out how many bits of guard protection information are associated with
+    each lbaf to be used. If this is not available assume 16-bit guard pi.
+    Also record the bytes of protection information associated with the number
+    of guard PI bits.
+    """
+    nvm_id_ns_cmd = f"sudo nvme nvm-id-ns --output-format=json {args.dut}".split(' ')
+    try:
+        nvm_id_ns_output = subprocess.check_output(nvm_id_ns_cmd)
+    except subprocess.CalledProcessError:
+        print(f"Non-zero return code from {' '.join(nvm_id_ns_cmd)}; " \
+                "assuming all lbafs use 16b Guard Protection Information")
+        for lbaf in lbaf_list:
+            lbaf['guard_pi_bits'] = 16
+    else:
+        elbafs = json.loads(nvm_id_ns_output)['elbafs']
+        for elbaf_num, elbaf in enumerate(elbafs):
+            for lbaf in lbaf_list:
+                if lbaf['lbaf'] == elbaf_num:
+                    lbaf['guard_pi_bits'] = 16 << elbaf['pif']
+
+    # For 16b Guard Protection Information, the PI requires 8 bytes
+    # For 32b and 64b Guard PI, the PI requires 16 bytes
+    for lbaf in lbaf_list:
+        if lbaf['guard_pi_bits'] == 16:
+            lbaf['pi_bytes'] = 8
+        else:
+            lbaf['pi_bytes'] = 16
+
+
+def get_capabilities(args):
+    """
+    Determine what end-to-end data protection features the device supports.
+    """
+    caps = { 'pil': [], 'pitype': [], 'elba': [] }
+    id_ns_cmd = f"sudo nvme id-ns --output-format=json {args.dut}".split(' ')
+    id_ns_output = subprocess.check_output(id_ns_cmd)
+    id_ns_json = json.loads(id_ns_output)
+
+    mc = id_ns_json['mc']
+    if mc & 1:
+        caps['elba'].append(1)
+    if mc & 2:
+        caps['elba'].append(0)
+
+    dpc = id_ns_json['dpc']
+    if dpc & 1:
+        caps['pitype'].append(1)
+    if dpc & 2:
+        caps['pitype'].append(2)
+    if dpc & 4:
+        caps['pitype'].append(3)
+    if dpc & 8:
+        caps['pil'].append(1)
+    if dpc & 16:
+        caps['pil'].append(0)
+
+    for _, value in caps.items():
+        if len(value) == 0:
+            logging.error("One or more end-to-end data protection features unsupported: %s", caps)
+            sys.exit(-1)
+
+    return caps
+
+
+def format_device(args, lbaf, pitype, pil, elba):
+    """
+    Format device using specified lba format with specified pitype, pil, and
+    elba values.
+    """
+
+    format_cmd = f"sudo nvme format {args.dut} --lbaf={lbaf['lbaf']} " \
+                 f"--pi={pitype} --pil={pil} --ms={elba} --force"
+    logging.debug("Format command: %s", format_cmd)
+    format_cmd = format_cmd.split(' ')
+    format_cmd_result = subprocess.run(format_cmd, capture_output=True, check=False,
+                                       encoding=locale.getpreferredencoding())
+
+    # Sometimes nvme-cli may format the device successfully but fail to
+    # rescan the namespaces after the format. Continue if this happens but
+    # abort if some other error occurs.
+    if format_cmd_result.returncode != 0:
+        if 'failed to rescan namespaces' not in format_cmd_result.stderr \
+                or 'Success formatting namespace' not in format_cmd_result.stdout:
+            logging.error(format_cmd_result.stdout)
+            logging.error(format_cmd_result.stderr)
+            print("Unable to format device; skipping this configuration")
+            return False
+
+    logging.debug(format_cmd_result.stdout)
+    return True
+
+
+def parse_args():
+    """Parse command-line arguments."""
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-d', '--debug', help='Enable debug messages', action='store_true')
+    parser.add_argument('-f', '--fio', help='path to file executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root', help='artifact root directory')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
+    parser.add_argument('--dut', help='target device to test '
+                        '(e.g., /dev/nvme1n1). WARNING: THIS IS A DESTRUCTIVE TEST', required=True)
+    parser.add_argument('-l', '--lbaf', nargs='+', type=int,
+                        help='list of lba formats to test')
+    args = parser.parse_args()
+
+    return args
+
+
+def difdix_test(test_env, args, lbaf, pitype):
+    """
+    Adjust test arguments based on values of lbaf, and pitype.  Then run
+    the tests.
+    """
+    for test in TEST_LIST:
+        test['force_skip'] = False
+
+        blocksize = lbaf['ds']
+        # Set fio blocksize parameter at runtime
+        test['fio_opts']['md_per_io_size'] = lbaf['ms'] * test['bs_high']
+
+        test['fio_opts']['bsrange'] = f"{blocksize * test['bs_low']}-{blocksize * test['bs_high']}"
+
+        # Set fio pi_chk parameter at runtime. If the device is formatted
+        # with Type 3 protection information, this means that the reference
+        # tag is not checked and I/O commands may throw an error if they
+        # are submitted with the REFTAG bit set in pi_chk. Make sure fio
+        # does not set pi_chk's REFTAG bit if the device is formatted with
+        # Type 3 PI.
+        if 'pi_chk' in test:
+            if pitype == 3 and 'REFTAG' in test['pi_chk']:
+                test['fio_opts']['pi_chk'] = test['pi_chk'].replace('REFTAG','')
+                logging.debug("Type 3 PI: dropping REFTAG bit")
+            else:
+                test['fio_opts']['pi_chk'] = test['pi_chk']
+
+        logging.debug("Test %d: pi_act=%d, bsrange=%s, md_per_io_size=%d", test['test_id'],
+                      test['fio_opts']['pi_act'], test['fio_opts']['bsrange'],
+                      test['fio_opts']['md_per_io_size'])
+
+    return run_fio_tests(TEST_LIST, test_env, args)
+
+
+def main():
+    args = parse_args()
+
+    if args.debug:
+        logging.basicConfig(level=logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+
+    artifact_root = args.artifact_root if args.artifact_root else \
+        f"io_uring_pi-test-{time.strftime('%Y%m%d-%H%M%S')}"
+    os.mkdir(artifact_root)
+    print(f"Artifact directory is {artifact_root}")
+
+    if args.fio:
+        fio_path = str(Path(args.fio).absolute())
+    else:
+        fio_path = os.path.join(os.path.dirname(__file__), '../fio')
+    print(f"fio path is {fio_path}")
+
+    lbaf_list = get_lbafs(args)
+    get_guard_pi(lbaf_list, args)
+    caps = get_capabilities(args)
+    print("Device capabilities:", caps)
+
+    for test in TEST_LIST:
+        test['fio_opts']['filename'] = args.dut
+    test_env = {
+              'fio_path': fio_path,
+              'fio_root': str(Path(__file__).absolute().parent.parent),
+              'artifact_root': artifact_root,
+              'basename': 'io_uring_pi',
+              }
+
+    total = { 'passed':  0, 'failed': 0, 'skipped': 0 }
+
+    try:
+        for lbaf, pil, pitype in itertools.product(lbaf_list, caps['pil'], caps['pitype']):
+            if lbaf['ms'] == 0:
+                continue
+
+            print("\n")
+            print("-" * 120)
+            print(f"lbaf: {lbaf}, pil: {pil}, pitype: {pitype}")
+            print("-" * 120)
+
+            if not format_device(args, lbaf, pitype, pil, 0):
+                print("Formatting failed")
+                continue
+
+            test_env['artifact_root'] = \
+                os.path.join(artifact_root, f"lbaf{lbaf['lbaf']}pil{pil}pitype{pitype}")
+            os.mkdir(test_env['artifact_root'])
+
+            passed, failed, skipped = difdix_test(test_env, args, lbaf, pitype)
+
+            total['passed'] += passed
+            total['failed'] += failed
+            total['skipped'] += skipped
+
+    except KeyboardInterrupt:
+        pass
+
+    print(f"\n\n{total['passed']} test(s) passed, {total['failed']} failed, " \
+            f"{total['skipped']} skipped")
+    sys.exit(total['failed'])
+
+
+if __name__ == '__main__':
+    main()
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] engines/io_uring: support r/w with metadata
  2025-07-23 17:04 ` [PATCH 1/2] engines/io_uring: support " Vincent Fu
@ 2025-07-23 17:23   ` Jens Axboe
  2025-07-23 17:37     ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2025-07-23 17:23 UTC (permalink / raw)
  To: Vincent Fu, anuj1072538, hch, joshi.k, anuj20.g, fio; +Cc: Vincent Fu

On 7/23/25 11:04 AM, Vincent Fu wrote:
> @@ -436,6 +502,27 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
>  				sqe->len = 1;
>  			}
>  		}
> +		if (o->md_per_io_size) {
> +			struct io_uring_attr_pi *pi_attr = io_u->pi_attr;
> +			struct nvme_data *data = FILE_ENG_DATA(io_u->file);
> +
> +			sqe->attr_type_mask = IORING_RW_ATTR_FLAG_PI;
> +			sqe->attr_ptr = (__u64)(uintptr_t)pi_attr;
> +			pi_attr->addr = (__u64)(uintptr_t)io_u->mmap_data;
> +			pi_attr->len = o->md_per_io_size;
> +			pi_attr->app_tag = o->apptag;
> +			pi_attr->flags = 0;
> +			if (strstr(o->pi_chk, "GUARD") != NULL)
> +				pi_attr->flags |= IO_INTEGRITY_CHK_GUARD;
> +			if (strstr(o->pi_chk, "REFTAG") != NULL) {
> +				__u64 slba = get_slba(data, io_u->offset);
> +
> +				pi_attr->flags |= IO_INTEGRITY_CHK_REFTAG;
> +				pi_attr->seed = (__u32)slba;
> +			}
> +			if (strstr(o->pi_chk, "APPTAG") != NULL)
> +				pi_attr->flags |= IO_INTEGRITY_CHK_APPTAG;
> +		}

I'd move these out-of-line whenever possible. It both improves
readability, but also conveys that this isn't necessarily the normal/hot
path.

> @@ -590,6 +678,19 @@ static struct io_u *fio_ioring_event(struct thread_data *td, int event)
>  			io_u->error = -cqe->res;
>  		else
>  			io_u->resid = io_u->xfer_buflen - cqe->res;
> +		return io_u;
> +	}
> +
> +	if (o->md_per_io_size) {
> +		struct nvme_data *data;
> +		int ret;
> +
> +		data = FILE_ENG_DATA(io_u->file);
> +		if (data->pi_type && (io_u->ddir == DDIR_READ) && !o->pi_act) {
> +			ret = fio_nvme_pi_verify(data, io_u);
> +			if (ret)
> +				io_u->error = ret;
> +		}
>  	}

Ditto

> +	if (!strcmp(td->io_ops->name, "io_uring") && o->md_per_io_size) {
> +		struct nvme_data *data = FILE_ENG_DATA(io_u->file);
> +		struct nvme_cmd_ext_io_opts ext_opts = {0};
> +
> +		if (data->pi_type) {
> +			if (o->pi_act)
> +				ext_opts.io_flags |= NVME_IO_PRINFO_PRACT;
> +
> +			ext_opts.io_flags |= o->prchk;
> +			ext_opts.apptag = o->apptag;
> +			ext_opts.apptag_mask = o->apptag_mask;
> +		}
> +		fio_nvme_generate_guard(io_u, &ext_opts);
> +	}

Ehh a strcmp() in the hot path?! First of all, that's a big no-no.
Secondly, if this really was required, you'd add something to put that
strcmp() in the slow path and flag it. Lastly, thankfully this should be
much better as:

	if (td->io_ops == &ioengine_uring ...)

instead.

> @@ -1347,12 +1464,20 @@ static int fio_ioring_init(struct thread_data *td)
>  	/* io_u index */
>  	ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
>  
> +	if (!strcmp(td->io_ops->name, "io_uring") && o->md_per_io_size) {
> +		if (o->apptag_mask != 0xffff) {
> +			log_err("fio: io_uring with metadata requires an apptag_mask of 0xffff\n");
> +			return 1;
> +		}
> +	}

Same, though this is at least slow path. But people copy-paste wonky
code like that all the time...

>  	 * metadata buffer for nvme command.
>  	 * We are only supporting iomem=malloc / mem=malloc as of now.
>  	 */
> -	if (!strcmp(td->io_ops->name, "io_uring_cmd") &&
> -	    (o->cmd_type == FIO_URING_CMD_NVME) && o->md_per_io_size) {
> +	if ((!strcmp(td->io_ops->name, "io_uring_cmd") &&
> +	    (o->cmd_type == FIO_URING_CMD_NVME) && o->md_per_io_size) ||
> +	    (!strcmp(td->io_ops->name, "io_uring") && o->md_per_io_size)) {
>  		md_size = (unsigned long long) o->md_per_io_size
>  				* (unsigned long long) td->o.iodepth;
>  		md_size += page_mask + td->o.mem_align;

And I'm starting to see where this is coming from. If you are so
inclined, please do a cleanup patch first getting rid of these strcmp()
calls.

>  static int fio_ioring_open_file(struct thread_data *td, struct fio_file *f)
>  {
>  	struct ioring_data *ld = td->io_ops_data;
>  	struct ioring_options *o = td->eo;
>  
> +	if (o->md_per_io_size) {
> +		struct nvme_data *data = NULL;
> +		__u64 nlba = 0;
> +		int ret;
> +
> +		data = FILE_ENG_DATA(f);
> +		if (data == NULL) {
> +			data = calloc(1, sizeof(struct nvme_data));
> +			ret = fio_get_pi_info(f, &nlba, o->pi_act, data);
> +			if (ret) {
> +				free(data);
> +				return ret;
> +			}
> +
> +			FILE_SET_ENG_DATA(f, data);
> +		}
> +	}

Out of line please...

> diff --git a/engines/nvme.c b/engines/nvme.c
> index 37a31e2f..4b3d3860 100644
> --- a/engines/nvme.c
> +++ b/engines/nvme.c
> @@ -8,22 +8,6 @@
>  #include "../crc/crc-t10dif.h"
>  #include "../crc/crc64.h"
>  
> -static inline __u64 get_slba(struct nvme_data *data, __u64 offset)
> -{
> -	if (data->lba_ext)
> -		return offset / data->lba_ext;
> -
> -	return offset >> data->lba_shift;
> -}
> -
> -static inline __u32 get_nlb(struct nvme_data *data, __u64 len)
> -{
> -	if (data->lba_ext)
> -		return len / data->lba_ext - 1;
> -
> -	return (len >> data->lba_shift) - 1;
> -}
> -

Various bits like this, please do a prep patch first moving things
around.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] engines/io_uring: support r/w with metadata
  2025-07-23 17:23   ` Jens Axboe
@ 2025-07-23 17:37     ` Jens Axboe
  2025-07-23 18:28       ` Vincent Fu
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2025-07-23 17:37 UTC (permalink / raw)
  To: Vincent Fu, anuj1072538, hch, joshi.k, anuj20.g, fio; +Cc: Vincent Fu

>> +	if (!strcmp(td->io_ops->name, "io_uring") && o->md_per_io_size) {
>> +		struct nvme_data *data = FILE_ENG_DATA(io_u->file);
>> +		struct nvme_cmd_ext_io_opts ext_opts = {0};
>> +
>> +		if (data->pi_type) {
>> +			if (o->pi_act)
>> +				ext_opts.io_flags |= NVME_IO_PRINFO_PRACT;
>> +
>> +			ext_opts.io_flags |= o->prchk;
>> +			ext_opts.apptag = o->apptag;
>> +			ext_opts.apptag_mask = o->apptag_mask;
>> +		}
>> +		fio_nvme_generate_guard(io_u, &ext_opts);
>> +	}
> 
> Ehh a strcmp() in the hot path?! First of all, that's a big no-no.
> Secondly, if this really was required, you'd add something to put that
> strcmp() in the slow path and flag it. Lastly, thankfully this should be
> much better as:
> 
> 	if (td->io_ops == &ioengine_uring ...)
> 
> instead.

Eh I guess dynamically loaded engines would need special treatment. I'll
take a look. In any case, strcmp() is just too ugly to live, actually
quite a few in there and the io_uring engine is the only one that thinks
this is necessary.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] engines/io_uring: support r/w with metadata
  2025-07-23 17:37     ` Jens Axboe
@ 2025-07-23 18:28       ` Vincent Fu
  2025-07-23 19:23         ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Vincent Fu @ 2025-07-23 18:28 UTC (permalink / raw)
  To: Jens Axboe, anuj1072538, hch, joshi.k, anuj20.g, fio; +Cc: Vincent Fu

On 7/23/25 1:37 PM, Jens Axboe wrote:
>>> +	if (!strcmp(td->io_ops->name, "io_uring") && o->md_per_io_size) {
>>> +		struct nvme_data *data = FILE_ENG_DATA(io_u->file);
>>> +		struct nvme_cmd_ext_io_opts ext_opts = {0};
>>> +
>>> +		if (data->pi_type) {
>>> +			if (o->pi_act)
>>> +				ext_opts.io_flags |= NVME_IO_PRINFO_PRACT;
>>> +
>>> +			ext_opts.io_flags |= o->prchk;
>>> +			ext_opts.apptag = o->apptag;
>>> +			ext_opts.apptag_mask = o->apptag_mask;
>>> +		}
>>> +		fio_nvme_generate_guard(io_u, &ext_opts);
>>> +	}
>>
>> Ehh a strcmp() in the hot path?! First of all, that's a big no-no.
>> Secondly, if this really was required, you'd add something to put that
>> strcmp() in the slow path and flag it. Lastly, thankfully this should be
>> much better as:
>>
>> 	if (td->io_ops == &ioengine_uring ...)
>>
>> instead.
> 
> Eh I guess dynamically loaded engines would need special treatment. I'll
> take a look. In any case, strcmp() is just too ugly to live, actually
> quite a few in there and the io_uring engine is the only one that thinks
> this is necessary.
> 

Thanks for the feedback. How about doing a strcmp at ioengine init time 
and setting a flag that we can check in the hot path?

Vincent

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] engines/io_uring: support r/w with metadata
  2025-07-23 18:28       ` Vincent Fu
@ 2025-07-23 19:23         ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2025-07-23 19:23 UTC (permalink / raw)
  To: Vincent Fu, anuj1072538, hch, joshi.k, anuj20.g, fio; +Cc: Vincent Fu

On 7/23/25 12:28 PM, Vincent Fu wrote:
> On 7/23/25 1:37 PM, Jens Axboe wrote:
>>>> +    if (!strcmp(td->io_ops->name, "io_uring") && o->md_per_io_size) {
>>>> +        struct nvme_data *data = FILE_ENG_DATA(io_u->file);
>>>> +        struct nvme_cmd_ext_io_opts ext_opts = {0};
>>>> +
>>>> +        if (data->pi_type) {
>>>> +            if (o->pi_act)
>>>> +                ext_opts.io_flags |= NVME_IO_PRINFO_PRACT;
>>>> +
>>>> +            ext_opts.io_flags |= o->prchk;
>>>> +            ext_opts.apptag = o->apptag;
>>>> +            ext_opts.apptag_mask = o->apptag_mask;
>>>> +        }
>>>> +        fio_nvme_generate_guard(io_u, &ext_opts);
>>>> +    }
>>>
>>> Ehh a strcmp() in the hot path?! First of all, that's a big no-no.
>>> Secondly, if this really was required, you'd add something to put that
>>> strcmp() in the slow path and flag it. Lastly, thankfully this should be
>>> much better as:
>>>
>>>     if (td->io_ops == &ioengine_uring ...)
>>>
>>> instead.
>>
>> Eh I guess dynamically loaded engines would need special treatment. I'll
>> take a look. In any case, strcmp() is just too ugly to live, actually
>> quite a few in there and the io_uring engine is the only one that thinks
>> this is necessary.
>>
> 
> Thanks for the feedback. How about doing a strcmp at ioengine init
> time and setting a flag that we can check in the hot path?

You can do that if you want, that'd be fine too. I did io_ops->prep
compare, which is a bit annoying in that it's a double deref to get
there. But at least it's not a strcmp! If you check the repo now, some
cleanups done too, last couple of feature additions to the io_uring
engine have not really been up to snuff imho.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-07-23 19:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23 17:04 [PATCH 0/2] io_uring r/w with metadata Vincent Fu
2025-07-23 17:04 ` [PATCH 1/2] engines/io_uring: support " Vincent Fu
2025-07-23 17:23   ` Jens Axboe
2025-07-23 17:37     ` Jens Axboe
2025-07-23 18:28       ` Vincent Fu
2025-07-23 19:23         ` Jens Axboe
2025-07-23 17:04 ` [PATCH 2/2] t/io_uring_pi: test script for io_uring PI Vincent Fu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).