* [PATCH] NVMe: Metadata and PI format support
@ 2015-01-31 2:11 Keith Busch
2015-02-16 18:53 ` Sam Bradshaw (sbradshaw)
0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2015-01-31 2:11 UTC (permalink / raw)
Adds support for NVMe metadata formats and exposes block devices for all
namespaces regardless of their format. Namespace formats that are unusable
will have disk capacity set to 0, but a handle to the block device is
created to simplify device management. A namespace is not usable when
format requires host interleave block and metadata in single buffer,
has no provisioned storage, or failed to register with blk integrity.
The namespace has to be scanned in two phases to support separate
metadata formats. The first establishes the sector size and capacity
prior to invoking add_disk. If metadata is required, the capacity will
be temporarilly set to 0 until it can be revalidated and registered with
the integrity extenstions after add_disk completes.
The driver relies on the integrity extensions to provide the metadata
buffer. NVMe requires this be a single physically contiguous region,
so only one segment is allowed per command.
If the metadata is used for T10 PI, the driver provides mappings to
save and restore the reftag physical block translation. If metadata is
not used for PI, the driver provides no-op functions for generate and
verify. This way the setup is always provided by the block layer.
If a received command does not supply a required metadata buffer, the
command is failed with bad address. This could only happen if a user
manually disables verify/generate on such a disk. The only exception to
where this is okay is if the controller is capable of stripping/generating
the metadata, which is possible on some types of formats.
The metadata scatter gather list now occupies the spot in the nvme_iod
that used to be used to link retryable IOD's, but we don't do that
anymore, so the field was unused.
Signed-off-by: Keith Busch <keith.busch at intel.com>
---
This makes this patch obsolete:
http://lists.infradead.org/pipermail/linux-nvme/2015-January/001483.html
And borrows a lot of logic and comments from scsi/sd.c and scsi/sd_dif.c.
I lack hardware capable of all the formats this patch intends to
support. I tested by hacking firmware to report certain capabilities,
but the controller never actually made use of the metadata at all,
so PI checks always failed.
I did verify the mapped integrity scatter gather list was correct in
length, and automatically updated as I changed the physical block sizes
from 512 to 4k, and metadata from 8, 16, and 64 bytes.
This removes all the driver logic used to decide if a disk should be
created or not, and almost removes my need to allow arbitrary NSIDs on
the passthrough commands (almost; I still have a need).
drivers/block/nvme-core.c | 295 +++++++++++++++++++++++++++++++++++----------
include/linux/nvme.h | 3 +-
include/uapi/linux/nvme.h | 16 +++
3 files changed, 247 insertions(+), 67 deletions(-)
diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index d826bf3..36912d1 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -37,6 +37,7 @@
#include <linux/ptrace.h>
#include <linux/sched.h>
#include <linux/slab.h>
+#include <linux/t10-pi.h>
#include <linux/types.h>
#include <scsi/sg.h>
#include <asm-generic/io-64-nonatomic-lo-hi.h>
@@ -420,6 +421,61 @@ static int nvme_error_status(u16 status)
}
}
+static void nvme_dif_prep(u32 p, u32 v, struct t10_pi_tuple *pi)
+{
+ if (be32_to_cpu(pi->ref_tag) == v)
+ pi->ref_tag = cpu_to_be32(p);
+}
+
+static void nvme_dif_complete(u32 p, u32 v, struct t10_pi_tuple *pi)
+{
+ if (be32_to_cpu(pi->ref_tag) == p)
+ pi->ref_tag = cpu_to_be32(v);
+}
+
+/**
+ * nvme_dif_remap - remaps ref tags to bip seed and physical lba
+ *
+ * The virtual start sector is the one that was originally submitted by the
+ * block layer. Due to partitioning, MD/DM cloning, etc. the actual physical
+ * start sector may be different. Remap protection information to match the
+ * physical LBA on writes, and back to the original seed on reads.
+ *
+ * Type 0 and 3 do not have a ref tag, so no remapping required.
+ */
+static void nvme_dif_remap(struct request *req,
+ void (*dif_swap)(u32 p, u32 v, struct t10_pi_tuple *pi))
+{
+ struct nvme_ns *ns = req->rq_disk->private_data;
+ struct bio_integrity_payload *bip;
+ struct t10_pi_tuple *pi;
+ void *p;
+ u32 i, nlb, ts, phys, virt;
+
+ if (!ns->pi_type || ns->pi_type == NVME_NS_DPS_PI_TYPE3)
+ return;
+
+ bip = bio_integrity(req->bio);
+ if (!bip)
+ return;
+
+ p = kmap_atomic(bip->bip_vec->bv_page);
+ if (!p)
+ return;
+
+ virt = bip_get_seed(bip);
+ phys = nvme_block_nr(ns, blk_rq_pos(req));
+ nlb = (blk_rq_bytes(req) >> ns->lba_shift);
+ ts = ns->disk->integrity->tuple_size;
+
+ for (i = 0; i < nlb; i++, virt++, phys++) {
+ pi = (struct t10_pi_tuple *)p;
+ dif_swap(phys, virt, pi);
+ p += ts;
+ }
+ kunmap_atomic(pi);
+}
+
static void req_completion(struct nvme_queue *nvmeq, void *ctx,
struct nvme_completion *cqe)
{
@@ -450,9 +506,16 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
"completing aborted command with status:%04x\n",
status);
- if (iod->nents)
+ if (iod->nents) {
dma_unmap_sg(&nvmeq->dev->pci_dev->dev, iod->sg, iod->nents,
rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
+ if (blk_integrity_rq(req)) {
+ if (!rq_data_dir(req))
+ nvme_dif_remap(req, nvme_dif_complete);
+ dma_unmap_sg(&nvmeq->dev->pci_dev->dev, iod->meta_sg, 1,
+ rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
+ }
+ }
nvme_free_iod(nvmeq->dev, iod);
blk_mq_complete_request(req);
@@ -608,6 +671,24 @@ static int nvme_submit_iod(struct nvme_queue *nvmeq, struct nvme_iod *iod,
cmnd->rw.prp2 = cpu_to_le64(iod->first_dma);
cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
+
+ if (blk_integrity_rq(req)) {
+ cmnd->rw.metadata = cpu_to_le64(sg_dma_address(iod->meta_sg));
+ switch (ns->pi_type) {
+ case NVME_NS_DPS_PI_TYPE3:
+ control |= NVME_RW_PRINFO_PRCHK_GUARD;
+ break;
+ case NVME_NS_DPS_PI_TYPE1:
+ case NVME_NS_DPS_PI_TYPE2:
+ control |= NVME_RW_PRINFO_PRCHK_GUARD |
+ NVME_RW_PRINFO_PRCHK_REF;
+ cmnd->rw.reftag = cpu_to_le32(
+ nvme_block_nr(ns, blk_rq_pos(req)));
+ break;
+ }
+ } else if (ns->ms)
+ control |= NVME_RW_PRINFO_PRACT;
+
cmnd->rw.control = cpu_to_le16(control);
cmnd->rw.dsmgmt = cpu_to_le32(dsmgmt);
@@ -631,6 +712,19 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
unsigned size = !(req->cmd_flags & REQ_DISCARD) ? blk_rq_bytes(req) :
sizeof(struct nvme_dsm_range);
+ /*
+ * If formated with metadata, require the block layer provide a buffer
+ * unless this namespace is formated such that the metadata can be
+ * stripped/generated by the controller with PRACT=1.
+ */
+ if (ns->ms && !blk_integrity_rq(req)) {
+ if (!(ns->pi_type && ns->ms == 8)) {
+ req->errors = -EFAULT;
+ blk_mq_complete_request(req);
+ return BLK_MQ_RQ_QUEUE_OK;
+ }
+ }
+
iod = nvme_alloc_iod(psegs, size, ns->dev, GFP_ATOMIC);
if (!iod)
return BLK_MQ_RQ_QUEUE_BUSY;
@@ -668,6 +762,21 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
iod->nents, dma_dir);
goto retry_cmd;
}
+ if (blk_integrity_rq(req)) {
+ if (blk_rq_count_integrity_sg(req->q, req->bio) != 1)
+ goto error_cmd;
+
+ sg_init_table(iod->meta_sg, 1);
+ if (blk_rq_map_integrity_sg(
+ req->q, req->bio, iod->meta_sg) != 1)
+ goto error_cmd;
+
+ if (rq_data_dir(req))
+ nvme_dif_remap(req, nvme_dif_prep);
+
+ if (!dma_map_sg(nvmeq->q_dmadev, iod->meta_sg, 1, dma_dir))
+ goto error_cmd;
+ }
}
nvme_set_info(cmd, iod, req_completion);
@@ -1825,13 +1934,61 @@ static int nvme_getgeo(struct block_device *bd, struct hd_geometry *geo)
return 0;
}
+static void nvme_config_discard(struct nvme_ns *ns)
+{
+ u32 logical_block_size = queue_logical_block_size(ns->queue);
+ ns->queue->limits.discard_zeroes_data = 0;
+ ns->queue->limits.discard_alignment = logical_block_size;
+ ns->queue->limits.discard_granularity = logical_block_size;
+ ns->queue->limits.max_discard_sectors = 0xffffffff;
+ queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
+}
+
+static int nvme_noop_verify(struct blk_integrity_iter *iter)
+{
+ return 0;
+}
+
+static int nvme_noop_generate(struct blk_integrity_iter *iter)
+{
+ return 0;
+}
+
+struct blk_integrity nvme_meta_noop = {
+ .name = "NVME_META_NOOP",
+ .generate_fn = nvme_noop_generate,
+ .verify_fn = nvme_noop_verify,
+};
+
+static void nvme_init_integrity(struct nvme_ns *ns)
+{
+ struct blk_integrity integrity;
+
+ switch (ns->pi_type) {
+ case NVME_NS_DPS_PI_TYPE3:
+ integrity = t10_pi_type3_crc;
+ break;
+ case NVME_NS_DPS_PI_TYPE1:
+ case NVME_NS_DPS_PI_TYPE2:
+ integrity = t10_pi_type1_crc;
+ break;
+ default:
+ integrity = nvme_meta_noop;
+ break;
+ }
+ integrity.tuple_size = ns->ms;
+ blk_integrity_register(ns->disk, &integrity);
+ blk_queue_max_integrity_segments(ns->queue, 1);
+}
+
static int nvme_revalidate_disk(struct gendisk *disk)
{
struct nvme_ns *ns = disk->private_data;
struct nvme_dev *dev = ns->dev;
struct nvme_id_ns *id;
dma_addr_t dma_addr;
- int lbaf;
+ int lbaf, pi_type, old_ms;
+ unsigned short bs;
id = dma_alloc_coherent(&dev->pci_dev->dev, 4096, &dma_addr,
GFP_KERNEL);
@@ -1840,16 +1997,50 @@ static int nvme_revalidate_disk(struct gendisk *disk)
__func__);
return 0;
}
+ if (nvme_identify(dev, ns->ns_id, 0, dma_addr)) {
+ dev_warn(&dev->pci_dev->dev,
+ "identify failed ns:%d, setting capacity to 0\n",
+ ns->ns_id);
+ memset(id, 0, sizeof(*id));
+ }
- if (nvme_identify(dev, ns->ns_id, 0, dma_addr))
- goto free;
-
- lbaf = id->flbas & 0xf;
+ old_ms = ns->ms;
+ lbaf = id->flbas & NVME_NS_FLBAS_LBA_MASK;
ns->lba_shift = id->lbaf[lbaf].ds;
+ ns->ms = le16_to_cpu(id->lbaf[lbaf].ms);
+
+ /*
+ * If identify namespace failed, use default 512 byte block size so
+ * block layer can use before failing read/write for 0 capacity.
+ */
+ if (ns->lba_shift == 0)
+ ns->lba_shift = 9;
+ bs = 1 << ns->lba_shift;
+
+ /* XXX: PI implementation requires metadata equal t10 pi tuple size */
+ pi_type = ns->ms == sizeof(struct t10_pi_tuple) ?
+ id->dps & NVME_NS_DPS_PI_MASK : 0;
+
+ if (disk->integrity && (ns->pi_type != pi_type || ns->ms != old_ms ||
+ bs != queue_logical_block_size(disk->queue) ||
+ (ns->ms && id->flbas & NVME_NS_FLBAS_META_EXT)))
+ blk_integrity_unregister(disk);
+
+ ns->pi_type = pi_type;
+ blk_queue_logical_block_size(ns->queue, bs);
+
+ if (ns->ms && !disk->integrity && (disk->flags & GENHD_FL_UP) &&
+ !(id->flbas & NVME_NS_FLBAS_META_EXT))
+ nvme_init_integrity(ns);
+
+ if (id->ncap == 0 || (ns->ms && !disk->integrity))
+ set_capacity(disk, 0);
+ else
+ set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
+
+ if (dev->oncs & NVME_CTRL_ONCS_DSM)
+ nvme_config_discard(ns);
- blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
- set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
- free:
dma_free_coherent(&dev->pci_dev->dev, 4096, id, dma_addr);
return 0;
}
@@ -1906,30 +2097,16 @@ static int nvme_kthread(void *data)
return 0;
}
-static void nvme_config_discard(struct nvme_ns *ns)
-{
- u32 logical_block_size = queue_logical_block_size(ns->queue);
- ns->queue->limits.discard_zeroes_data = 0;
- ns->queue->limits.discard_alignment = logical_block_size;
- ns->queue->limits.discard_granularity = logical_block_size;
- ns->queue->limits.max_discard_sectors = 0xffffffff;
- queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
-}
-
-static struct nvme_ns *nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid,
- struct nvme_id_ns *id, struct nvme_lba_range_type *rt)
+static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
{
struct nvme_ns *ns;
struct gendisk *disk;
int node = dev_to_node(&dev->pci_dev->dev);
- int lbaf;
-
- if (rt->attributes & NVME_LBART_ATTRIB_HIDE)
- return NULL;
ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
if (!ns)
- return NULL;
+ return;
+
ns->queue = blk_mq_init_queue(&dev->tagset);
if (IS_ERR(ns->queue))
goto out_free_ns;
@@ -1945,9 +2122,9 @@ static struct nvme_ns *nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid,
ns->ns_id = nsid;
ns->disk = disk;
- lbaf = id->flbas & 0xf;
- ns->lba_shift = id->lbaf[lbaf].ds;
- ns->ms = le16_to_cpu(id->lbaf[lbaf].ms);
+ ns->lba_shift = 9; /* set to a default value for 512 until disk is validated */
+ list_add_tail(&ns->list, &dev->namespaces);
+
blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
if (dev->max_hw_sectors)
blk_queue_max_hw_sectors(ns->queue, dev->max_hw_sectors);
@@ -1964,18 +2141,23 @@ static struct nvme_ns *nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid,
disk->driverfs_dev = &dev->pci_dev->dev;
disk->flags = GENHD_FL_EXT_DEVT;
sprintf(disk->disk_name, "nvme%dn%d", dev->instance, nsid);
- set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
- if (dev->oncs & NVME_CTRL_ONCS_DSM)
- nvme_config_discard(ns);
-
- return ns;
+ /*
+ * Initialize capacity to 0 until we establish the namespace format and
+ * setup integrity extentions if necessary. The revalidate_disk after
+ * add_disk allows the driver to register with integrity if the format
+ * requires it.
+ */
+ set_capacity(disk, 0);
+ nvme_revalidate_disk(ns->disk);
+ add_disk(ns->disk);
+ revalidate_disk(ns->disk);
+ return;
out_free_queue:
blk_cleanup_queue(ns->queue);
out_free_ns:
kfree(ns);
- return NULL;
}
static void nvme_create_io_queues(struct nvme_dev *dev)
@@ -2100,22 +2282,20 @@ static int nvme_dev_add(struct nvme_dev *dev)
struct pci_dev *pdev = dev->pci_dev;
int res;
unsigned nn, i;
- struct nvme_ns *ns;
struct nvme_id_ctrl *ctrl;
- struct nvme_id_ns *id_ns;
void *mem;
dma_addr_t dma_addr;
int shift = NVME_CAP_MPSMIN(readq(&dev->bar->cap)) + 12;
- mem = dma_alloc_coherent(&pdev->dev, 8192, &dma_addr, GFP_KERNEL);
+ mem = dma_alloc_coherent(&pdev->dev, 4096, &dma_addr, GFP_KERNEL);
if (!mem)
return -ENOMEM;
res = nvme_identify(dev, 0, 1, dma_addr);
if (res) {
dev_err(&pdev->dev, "Identify Controller failed (%d)\n", res);
- res = -EIO;
- goto out;
+ dma_free_coherent(&dev->pci_dev->dev, 4096, mem, dma_addr);
+ return -EIO;
}
ctrl = mem;
@@ -2141,6 +2321,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
} else
dev->max_hw_sectors = max_hw_sectors;
}
+ dma_free_coherent(&dev->pci_dev->dev, 4096, mem, dma_addr);
dev->tagset.ops = &nvme_mq_ops;
dev->tagset.nr_hw_queues = dev->online_queues - 1;
@@ -2153,33 +2334,12 @@ static int nvme_dev_add(struct nvme_dev *dev)
dev->tagset.driver_data = dev;
if (blk_mq_alloc_tag_set(&dev->tagset))
- goto out;
-
- id_ns = mem;
- for (i = 1; i <= nn; i++) {
- res = nvme_identify(dev, i, 0, dma_addr);
- if (res)
- continue;
-
- if (id_ns->ncap == 0)
- continue;
-
- res = nvme_get_features(dev, NVME_FEAT_LBA_RANGE, i,
- dma_addr + 4096, NULL);
- if (res)
- memset(mem + 4096, 0, 4096);
+ return 0;
- ns = nvme_alloc_ns(dev, i, mem, mem + 4096);
- if (ns)
- list_add_tail(&ns->list, &dev->namespaces);
- }
- list_for_each_entry(ns, &dev->namespaces, list)
- add_disk(ns->disk);
- res = 0;
+ for (i = 1; i <= nn; i++)
+ nvme_alloc_ns(dev, i);
- out:
- dma_free_coherent(&dev->pci_dev->dev, 8192, mem, dma_addr);
- return res;
+ return 0;
}
static int nvme_dev_map(struct nvme_dev *dev)
@@ -2478,8 +2638,11 @@ static void nvme_dev_remove(struct nvme_dev *dev)
struct nvme_ns *ns;
list_for_each_entry(ns, &dev->namespaces, list) {
- if (ns->disk->flags & GENHD_FL_UP)
+ if (ns->disk->flags & GENHD_FL_UP) {
+ if (ns->disk->integrity)
+ blk_integrity_unregister(ns->disk);
del_gendisk(ns->disk);
+ }
if (!blk_queue_dying(ns->queue)) {
blk_mq_abort_requeue_list(ns->queue);
blk_cleanup_queue(ns->queue);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 258945f..e1fcea0 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -121,6 +121,7 @@ struct nvme_ns {
unsigned ns_id;
int lba_shift;
int ms;
+ int pi_type;
u64 mode_select_num_blocks;
u32 mode_select_block_len;
};
@@ -138,7 +139,7 @@ struct nvme_iod {
int nents; /* Used in scatterlist */
int length; /* Of data, in bytes */
dma_addr_t first_dma;
- struct list_head node;
+ struct scatterlist meta_sg[1]; /* metadata requires single contiguous buffer */
struct scatterlist sg[0];
};
diff --git a/include/uapi/linux/nvme.h b/include/uapi/linux/nvme.h
index 26386cf..406bfc9 100644
--- a/include/uapi/linux/nvme.h
+++ b/include/uapi/linux/nvme.h
@@ -124,10 +124,22 @@ struct nvme_id_ns {
enum {
NVME_NS_FEAT_THIN = 1 << 0,
+ NVME_NS_FLBAS_LBA_MASK = 0xf,
+ NVME_NS_FLBAS_META_EXT = 0x10,
NVME_LBAF_RP_BEST = 0,
NVME_LBAF_RP_BETTER = 1,
NVME_LBAF_RP_GOOD = 2,
NVME_LBAF_RP_DEGRADED = 3,
+ NVME_NS_DPC_PI_LAST = 1 << 4,
+ NVME_NS_DPC_PI_FIRST = 1 << 3,
+ NVME_NS_DPC_PI_TYPE3 = 1 << 2,
+ NVME_NS_DPC_PI_TYPE2 = 1 << 1,
+ NVME_NS_DPC_PI_TYPE1 = 1 << 0,
+ NVME_NS_DPS_PI_FIRST = 1 << 3,
+ NVME_NS_DPS_PI_MASK = 0x7,
+ NVME_NS_DPS_PI_TYPE1 = 1,
+ NVME_NS_DPS_PI_TYPE2 = 2,
+ NVME_NS_DPS_PI_TYPE3 = 3,
};
struct nvme_smart_log {
@@ -261,6 +273,10 @@ enum {
NVME_RW_DSM_LATENCY_LOW = 3 << 4,
NVME_RW_DSM_SEQ_REQ = 1 << 6,
NVME_RW_DSM_COMPRESSED = 1 << 7,
+ NVME_RW_PRINFO_PRCHK_REF = 1 << 10,
+ NVME_RW_PRINFO_PRCHK_APP = 1 << 11,
+ NVME_RW_PRINFO_PRCHK_GUARD = 1 << 12,
+ NVME_RW_PRINFO_PRACT = 1 << 13,
};
struct nvme_dsm_cmd {
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] NVMe: Metadata and PI format support
2015-01-31 2:11 [PATCH] NVMe: Metadata and PI format support Keith Busch
@ 2015-02-16 18:53 ` Sam Bradshaw (sbradshaw)
2015-02-17 16:05 ` Keith Busch
0 siblings, 1 reply; 6+ messages in thread
From: Sam Bradshaw (sbradshaw) @ 2015-02-16 18:53 UTC (permalink / raw)
> +/**
> + * nvme_dif_remap - remaps ref tags to bip seed and physical lba
> + *
> + * The virtual start sector is the one that was originally submitted
> by
> +the
> + * block layer. Due to partitioning, MD/DM cloning, etc. the actual
> +physical
> + * start sector may be different. Remap protection information to
> match
> +the
> + * physical LBA on writes, and back to the original seed on reads.
> + *
> + * Type 0 and 3 do not have a ref tag, so no remapping required.
> + */
> +static void nvme_dif_remap(struct request *req,
> + void (*dif_swap)(u32 p, u32 v, struct t10_pi_tuple
> *pi)) {
> + struct nvme_ns *ns = req->rq_disk->private_data;
> + struct bio_integrity_payload *bip;
> + struct t10_pi_tuple *pi;
> + void *p;
> + u32 i, nlb, ts, phys, virt;
> +
> + if (!ns->pi_type || ns->pi_type == NVME_NS_DPS_PI_TYPE3)
> + return;
> +
> + bip = bio_integrity(req->bio);
> + if (!bip)
> + return;
> +
> + p = kmap_atomic(bip->bip_vec->bv_page);
I think you want:
p = kmap_atomic(bip->bip_vec->bv_page) + bip->bip_vec->bv_offset;
> @@ -1840,16 +1997,50 @@ static int nvme_revalidate_disk(struct gendisk
> *disk)
> __func__);
> return 0;
> }
> + if (nvme_identify(dev, ns->ns_id, 0, dma_addr)) {
> + dev_warn(&dev->pci_dev->dev,
> + "identify failed ns:%d, setting capacity to 0\n",
> + ns->ns_id);
> + memset(id, 0, sizeof(*id));
> + }
>
> - if (nvme_identify(dev, ns->ns_id, 0, dma_addr))
> - goto free;
> -
> - lbaf = id->flbas & 0xf;
> + old_ms = ns->ms;
> + lbaf = id->flbas & NVME_NS_FLBAS_LBA_MASK;
> ns->lba_shift = id->lbaf[lbaf].ds;
> + ns->ms = le16_to_cpu(id->lbaf[lbaf].ms);
> +
> + /*
> + * If identify namespace failed, use default 512 byte block size
> so
> + * block layer can use before failing read/write for 0 capacity.
> + */
> + if (ns->lba_shift == 0)
> + ns->lba_shift = 9;
> + bs = 1 << ns->lba_shift;
> +
> + /* XXX: PI implementation requires metadata equal t10 pi tuple
> size */
> + pi_type = ns->ms == sizeof(struct t10_pi_tuple) ?
> + id->dps & NVME_NS_DPS_PI_MASK : 0;
If there is interest in incorporating this support, we can provide
a patch on top of this that enables 16b/32b/64b metadata with PI and
supports PIL={0,1}
-Sam
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] NVMe: Metadata and PI format support
2015-02-16 18:53 ` Sam Bradshaw (sbradshaw)
@ 2015-02-17 16:05 ` Keith Busch
2015-02-17 17:03 ` Sam Bradshaw
0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2015-02-17 16:05 UTC (permalink / raw)
On Mon, 16 Feb 2015, Sam Bradshaw (sbradshaw) wrote:
>> + p = kmap_atomic(bip->bip_vec->bv_page);
>
> I think you want:
> p = kmap_atomic(bip->bip_vec->bv_page) + bip->bip_vec->bv_offset;
Yep, nice catch.
>> + /* XXX: PI implementation requires metadata equal t10 pi tuple size */
>> + pi_type = ns->ms == sizeof(struct t10_pi_tuple) ?
>> + id->dps & NVME_NS_DPS_PI_MASK : 0;
>
> If there is interest in incorporating this support, we can provide
> a patch on top of this that enables 16b/32b/64b metadata with PI and
> supports PIL={0,1}
I would also like that to work. I was hoping to reuse the
t10_pi_generate/verify functions for this. Those can work if the
iter->prot_buf is incremented by the blk_intergity's tuple_size rather
than just the 8-byte PI field.
That'd work with PIL=0, not sure what to do with PIL=1. Maybe we should
just define our own functions for nvme even though they'd mostly be the
same as the ones scsi uses.
> -Sam
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] NVMe: Metadata and PI format support
2015-02-17 16:05 ` Keith Busch
@ 2015-02-17 17:03 ` Sam Bradshaw
2015-02-19 17:05 ` Keith Busch
0 siblings, 1 reply; 6+ messages in thread
From: Sam Bradshaw @ 2015-02-17 17:03 UTC (permalink / raw)
>> If there is interest in incorporating this support, we can provide
>> a patch on top of this that enables 16b/32b/64b metadata with PI and
>> supports PIL={0,1}
>
> I would also like that to work. I was hoping to reuse the
> t10_pi_generate/verify functions for this. Those can work if the
> iter->prot_buf is incremented by the blk_intergity's tuple_size rather
> than just the 8-byte PI field.
>
> That'd work with PIL=0, not sure what to do with PIL=1. Maybe we should
> just define our own functions for nvme even though they'd mostly be the
> same as the ones scsi uses.
We were unable to come up with an elegant solution that reused the
t10_pi_generate/verify functions while also handling the PIL=1 case.
So we defined our own functions.
---
drivers/block/nvme-core.c | 319 +++++++++++++++++++++++++++++++++++++++++++--
include/linux/nvme.h | 22 +++
2 files changed, 328 insertions(+), 13 deletions(-)
diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 23df65d..692eb31 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -38,6 +38,7 @@
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/t10-pi.h>
+#include <linux/crc-t10dif.h>
#include <linux/types.h>
#include <scsi/sg.h>
#include <asm-generic/io-64-nonatomic-lo-hi.h>
@@ -449,6 +450,7 @@ static void nvme_dif_remap(struct request *req,
struct nvme_ns *ns = req->rq_disk->private_data;
struct bio_integrity_payload *bip;
struct t10_pi_tuple *pi;
+ int pi_sz = sizeof(struct t10_pi_tuple);
void *p;
u32 i, nlb, ts, phys, virt;
@@ -468,6 +470,11 @@ static void nvme_dif_remap(struct request *req,
nlb = (blk_rq_bytes(req) >> ns->lba_shift);
ts = ns->disk->integrity->tuple_size;
+ /* Adjust if metadata > 8b and PI is in last 8b of metadata */
+ if ((ns->ms > pi_sz) && !ns->pi_first)
+ p = (struct t10_pi_tuple *) ((char *) p + ns->ms
+ - pi_sz);
+
for (i = 0; i < nlb; i++, virt++, phys++) {
pi = (struct t10_pi_tuple *)p;
dif_swap(phys, virt, pi);
@@ -1944,6 +1951,183 @@ static void nvme_config_discard(struct nvme_ns *ns)
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
}
+/*
+ * Calculates CRC value for the data buffer of given len, and returns
+ * it in big-endian format
+ */
+static __be16 nvme_crc_fn(void *data, unsigned int len)
+{
+ return cpu_to_be16(crc_t10dif(data, len));
+}
+
+/*
+ * Calculates unified CRC value for the inputted crc and data buffer of given
+ * len, and returns it in big-endian format
+ */
+static __be16 nvme_crc_generic_fn(__u16 crc, void *data, unsigned int len)
+{
+ return cpu_to_be16(crc_t10dif_generic(crc, data, len));
+}
+
+static int nvme_generate(struct blk_integrity_iter *bix, csum_fn *fn, u8 ms,
+ u8 type, u8 first)
+{
+ struct t10_pi_tuple *pi = bix->prot_buf;
+ int i, pi_sz = sizeof(struct t10_pi_tuple);
+
+ /* Adjust if metadata > 8b and PI is in last 8b of metadata */
+ if ((ms > pi_sz) && (!first))
+ pi = (struct t10_pi_tuple *) ((char *) pi + ms - pi_sz);
+
+ for (i = 0; i < bix->data_size; i += bix->interval) {
+
+ pi->guard_tag = fn(bix->data_buf, bix->interval);
+
+ /*
+ * Include metadata for CRC calc if PI is in last 8b
+ * of metadata
+ */
+ if ((ms > pi_sz) && (!first)) {
+ u8 *p = ((u8 *) pi - (ms - pi_sz));
+
+ memset(p, 0, ms - pi_sz);
+ pi->guard_tag = nvme_crc_generic_fn(
+ be16_to_cpu(pi->guard_tag),
+ p,
+ ms - pi_sz);
+ }
+
+ /*
+ * 'seed' is a virtual start sector that we received, and
+ * this func populates ref_tag based on it. The ref_tag will
+ * be remapped with actual physical sector in nvme_dif_remap()
+ */
+ if (type != NVME_NS_DPS_PI_TYPE3)
+ pi->ref_tag = cpu_to_be32(lower_32_bits(bix->seed));
+ else
+ pi->ref_tag = 0;
+
+ pi->app_tag = 0;
+ bix->data_buf += bix->interval;
+ bix->prot_buf += ms;
+ bix->seed++;
+ pi = (struct t10_pi_tuple *) ((char *) pi + ms);
+ }
+ return 0;
+}
+
+/* DEFINE_GENERATOR(ms, pi_type, pi_first) */
+DEFINE_GENERATOR(8, 1, 0);
+DEFINE_GENERATOR(8, 2, 0);
+DEFINE_GENERATOR(8, 3, 0);
+DEFINE_GENERATOR(16, 1, 0);
+DEFINE_GENERATOR(16, 1, 1);
+DEFINE_GENERATOR(16, 2, 0);
+DEFINE_GENERATOR(16, 2, 1);
+DEFINE_GENERATOR(16, 3, 0);
+DEFINE_GENERATOR(16, 3, 1);
+DEFINE_GENERATOR(32, 1, 0);
+DEFINE_GENERATOR(32, 1, 1);
+DEFINE_GENERATOR(32, 2, 0);
+DEFINE_GENERATOR(32, 2, 1);
+DEFINE_GENERATOR(32, 3, 0);
+DEFINE_GENERATOR(32, 3, 1);
+DEFINE_GENERATOR(64, 1, 0);
+DEFINE_GENERATOR(64, 1, 1);
+DEFINE_GENERATOR(64, 2, 0);
+DEFINE_GENERATOR(64, 2, 1);
+DEFINE_GENERATOR(64, 3, 0);
+DEFINE_GENERATOR(64, 3, 1);
+
+static int nvme_verify(struct blk_integrity_iter *bix, csum_fn *fn, u8 ms,
+ u8 type, u8 first)
+{
+ struct t10_pi_tuple *pi = bix->prot_buf;
+ int i, pi_sz = sizeof(struct t10_pi_tuple);
+ u16 csum;
+
+ /* If metadata > 8 bytes and PI is in last 8 bytes of metadata */
+ if ((ms > pi_sz) && (!first))
+ pi = (struct t10_pi_tuple *) ((char *) pi + ms - pi_sz);
+
+ for (i = 0; i < bix->data_size; i += bix->interval) {
+ /*
+ * For type1 & type2, apptag == 0xffff means skip verification
+ * For type3, apptag == 0xffff and reftag == 0xffffffff means
+ * skip verification
+ */
+ if (pi->app_tag == 0xffff) {
+ switch (type) {
+ case NVME_NS_DPS_PI_TYPE3:
+ if (pi->ref_tag != 0xffffffff)
+ break;
+ case NVME_NS_DPS_PI_TYPE1:
+ case NVME_NS_DPS_PI_TYPE2:
+ goto next;
+ }
+ }
+
+ csum = fn(bix->data_buf, bix->interval);
+
+ /* Update CRC if PI is in last 8 bytes of metadata */
+ if ((ms > pi_sz) && (!first)) {
+ csum = nvme_crc_generic_fn(
+ be16_to_cpu(csum),
+ ((u8 *) pi - (ms - pi_sz)),
+ ms - pi_sz);
+ }
+
+ if (pi->guard_tag != csum) {
+ pr_err("%s: guard tag error at sector %llu "
+ "(rcvd %04x, want %04x)\n", bix->disk_name,
+ (unsigned long long)bix->seed,
+ be16_to_cpu(pi->guard_tag), be16_to_cpu(csum));
+ return -EILSEQ;
+ }
+ /*
+ * ref_tag would have been remapped with virtual sector values
+ * by nvme_dif_remap(). So match it directly with 'seed'.
+ */
+ if (be32_to_cpu(pi->ref_tag) !=
+ lower_32_bits(bix->seed)) {
+ pr_err("%s: ref tag error at location %llu "
+ "(rcvd %u)\n", bix->disk_name,
+ (unsigned long long)
+ bix->seed, be32_to_cpu(pi->ref_tag));
+ return -EILSEQ;
+ }
+ next:
+ bix->data_buf += bix->interval;
+ bix->prot_buf += ms;
+ bix->seed++;
+ pi = (struct t10_pi_tuple *) ((char *) pi + ms);
+ }
+ return 0;
+}
+
+/* DEFINE_VERIFIER(ms, pi_type, pi_first) */
+DEFINE_VERIFIER(8, 1, 0);
+DEFINE_VERIFIER(8, 2, 0);
+DEFINE_VERIFIER(8, 3, 0);
+DEFINE_VERIFIER(16, 1, 0);
+DEFINE_VERIFIER(16, 1, 1);
+DEFINE_VERIFIER(16, 2, 0);
+DEFINE_VERIFIER(16, 2, 1);
+DEFINE_VERIFIER(16, 3, 0);
+DEFINE_VERIFIER(16, 3, 1);
+DEFINE_VERIFIER(32, 1, 0);
+DEFINE_VERIFIER(32, 1, 1);
+DEFINE_VERIFIER(32, 2, 0);
+DEFINE_VERIFIER(32, 2, 1);
+DEFINE_VERIFIER(32, 3, 0);
+DEFINE_VERIFIER(32, 3, 1);
+DEFINE_VERIFIER(64, 1, 0);
+DEFINE_VERIFIER(64, 1, 1);
+DEFINE_VERIFIER(64, 2, 0);
+DEFINE_VERIFIER(64, 2, 1);
+DEFINE_VERIFIER(64, 3, 0);
+DEFINE_VERIFIER(64, 3, 1);
+
static int nvme_noop_verify(struct blk_integrity_iter *iter)
{
return 0;
@@ -1960,22 +2144,130 @@ struct blk_integrity nvme_meta_noop = {
.verify_fn = nvme_noop_verify,
};
+/*
+ * Supports PI within metadata of size 8/16/32/64b formats.
+ * For rest of metadata formats, 'noop' functions are registered.
+ */
static void nvme_init_integrity(struct nvme_ns *ns)
{
- struct blk_integrity integrity;
+ struct blk_integrity integrity = nvme_meta_noop;
- switch (ns->pi_type) {
- case NVME_NS_DPS_PI_TYPE3:
- integrity = t10_pi_type3_crc;
+ switch (ns->ms) {
+ case NVME_8B_META:
+ integrity.name = "NVME_META_8B";
+ switch (ns->pi_type) {
+ case NVME_NS_DPS_PI_TYPE1:
+ integrity.generate_fn = nvme_gen_8_1_0;
+ integrity.verify_fn = nvme_ver_8_1_0;
+ break;
+ case NVME_NS_DPS_PI_TYPE2:
+ integrity.generate_fn = nvme_gen_8_2_0;
+ integrity.verify_fn = nvme_ver_8_2_0;
+ break;
+ case NVME_NS_DPS_PI_TYPE3:
+ integrity.generate_fn = nvme_gen_8_3_0;
+ integrity.verify_fn = nvme_ver_8_3_0;
+ break;
+ }
break;
- case NVME_NS_DPS_PI_TYPE1:
- case NVME_NS_DPS_PI_TYPE2:
- integrity = t10_pi_type1_crc;
+ case NVME_16B_META:
+ integrity.name = "NVME_META_16B";
+ switch (ns->pi_type) {
+ case NVME_NS_DPS_PI_TYPE1:
+ if (ns->pi_first) {
+ integrity.generate_fn = nvme_gen_16_1_1;
+ integrity.verify_fn = nvme_ver_16_1_1;
+ } else {
+ integrity.generate_fn = nvme_gen_16_1_0;
+ integrity.verify_fn = nvme_ver_16_1_0;
+ }
+ break;
+ case NVME_NS_DPS_PI_TYPE2:
+ if (ns->pi_first) {
+ integrity.generate_fn = nvme_gen_16_2_1;
+ integrity.verify_fn = nvme_ver_16_2_1;
+ } else {
+ integrity.generate_fn = nvme_gen_16_2_0;
+ integrity.verify_fn = nvme_ver_16_2_0;
+ }
+ break;
+ case NVME_NS_DPS_PI_TYPE3:
+ if (ns->pi_first) {
+ integrity.generate_fn = nvme_gen_16_3_1;
+ integrity.verify_fn = nvme_ver_16_3_1;
+ } else {
+ integrity.generate_fn = nvme_gen_16_3_0;
+ integrity.verify_fn = nvme_ver_16_3_0;
+ }
+ break;
+ }
break;
- default:
- integrity = nvme_meta_noop;
+ case NVME_32B_META:
+ integrity.name = "NVME_META_32B";
+ switch (ns->pi_type) {
+ case NVME_NS_DPS_PI_TYPE1:
+ if (ns->pi_first) {
+ integrity.generate_fn = nvme_gen_32_1_1;
+ integrity.verify_fn = nvme_ver_32_1_1;
+ } else {
+ integrity.generate_fn = nvme_gen_32_1_0;
+ integrity.verify_fn = nvme_ver_32_1_0;
+ }
+ break;
+ case NVME_NS_DPS_PI_TYPE2:
+ if (ns->pi_first) {
+ integrity.generate_fn = nvme_gen_32_2_1;
+ integrity.verify_fn = nvme_ver_32_2_1;
+ } else {
+ integrity.generate_fn = nvme_gen_32_2_0;
+ integrity.verify_fn = nvme_ver_32_2_0;
+ }
+ break;
+ case NVME_NS_DPS_PI_TYPE3:
+ if (ns->pi_first) {
+ integrity.generate_fn = nvme_gen_32_3_1;
+ integrity.verify_fn = nvme_ver_32_3_1;
+ } else {
+ integrity.generate_fn = nvme_gen_32_3_0;
+ integrity.verify_fn = nvme_ver_32_3_0;
+ }
+ break;
+ }
+ break;
+ case NVME_64B_META:
+ integrity.name = "NVME_META_64B";
+ switch (ns->pi_type) {
+ case NVME_NS_DPS_PI_TYPE1:
+ if (ns->pi_first) {
+ integrity.generate_fn = nvme_gen_64_1_1;
+ integrity.verify_fn = nvme_ver_64_1_1;
+ } else {
+ integrity.generate_fn = nvme_gen_64_1_0;
+ integrity.verify_fn = nvme_ver_64_1_0;
+ }
+ break;
+ case NVME_NS_DPS_PI_TYPE2:
+ if (ns->pi_first) {
+ integrity.generate_fn = nvme_gen_64_2_1;
+ integrity.verify_fn = nvme_ver_64_2_1;
+ } else {
+ integrity.generate_fn = nvme_gen_64_2_0;
+ integrity.verify_fn = nvme_ver_64_2_0;
+ }
+ break;
+ case NVME_NS_DPS_PI_TYPE3:
+ if (ns->pi_first) {
+ integrity.generate_fn = nvme_gen_64_3_1;
+ integrity.verify_fn = nvme_ver_64_3_1;
+ } else {
+ integrity.generate_fn = nvme_gen_64_3_0;
+ integrity.verify_fn = nvme_ver_64_3_0;
+ }
+ break;
+ }
break;
}
+
integrity.tuple_size = ns->ms;
blk_integrity_register(ns->disk, &integrity);
blk_queue_max_integrity_segments(ns->queue, 1);
@@ -1987,7 +2279,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
struct nvme_dev *dev = ns->dev;
struct nvme_id_ns *id;
dma_addr_t dma_addr;
- int lbaf, pi_type, old_ms;
+ int lbaf, pi_type, old_ms, pi_first;
unsigned short bs;
id = dma_alloc_coherent(&dev->pci_dev->dev, 4096, &dma_addr,
@@ -2017,16 +2309,17 @@ static int nvme_revalidate_disk(struct gendisk *disk)
ns->lba_shift = 9;
bs = 1 << ns->lba_shift;
- /* XXX: PI implementation requires metadata equal t10 pi tuple size */
- pi_type = ns->ms == sizeof(struct t10_pi_tuple) ?
- id->dps & NVME_NS_DPS_PI_MASK : 0;
+ pi_type = id->dps & NVME_NS_DPS_PI_MASK;
+ pi_first = (id->dps & NVME_NS_DPS_PI_FIRST) >> 3;
if (disk->integrity && (ns->pi_type != pi_type || ns->ms != old_ms ||
bs != queue_logical_block_size(disk->queue) ||
+ ns->pi_first != pi_first ||
(ns->ms && id->flbas & NVME_NS_FLBAS_META_EXT)))
blk_integrity_unregister(disk);
ns->pi_type = pi_type;
+ ns->pi_first = pi_first;
blk_queue_logical_block_size(ns->queue, bs);
if (ns->ms && !disk->integrity && (disk->flags & GENHD_FL_UP) &&
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index e1fcea0..05cd302 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -122,6 +122,7 @@ struct nvme_ns {
int lba_shift;
int ms;
int pi_type;
+ int pi_first; /* PI location in metadata. Whether first or last 8b*/
u64 mode_select_num_blocks;
u32 mode_select_block_len;
};
@@ -148,6 +149,27 @@ static inline u64 nvme_block_nr(struct nvme_ns *ns, sector_t sector)
return (sector >> (ns->lba_shift - 9));
}
+#define NVME_8B_META 8
+#define NVME_16B_META 16
+#define NVME_32B_META 32
+#define NVME_64B_META 64
+
+#define DEFINE_GENERATOR(m, t, f) GENERATOR(m, t, f, _)
+#define DEFINE_VERIFIER(m, t, f) VERIFIER(m, t, f, _)
+
+#define GENERATOR(m, t, f, u) \
+ static int nvme_gen_##m##u##t##u##f(struct blk_integrity_iter *bix) \
+ { \
+ return nvme_generate(bix, nvme_crc_fn, m, t, f); \
+ }
+#define VERIFIER(m, t, f, u) \
+ static int nvme_ver_##m##u##t##u##f(struct blk_integrity_iter *bix) \
+ { \
+ return nvme_verify(bix, nvme_crc_fn, m, t, f); \
+ }
+
+typedef __u16 (csum_fn) (void*, unsigned int);
+
/**
* nvme_free_iod - frees an nvme_iod
* @dev: The device that the I/O was submitted to
--
1.7.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] NVMe: Metadata and PI format support
2015-02-17 17:03 ` Sam Bradshaw
@ 2015-02-19 17:05 ` Keith Busch
2015-02-19 17:45 ` Sam Bradshaw (sbradshaw)
0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2015-02-19 17:05 UTC (permalink / raw)
On Tue, 17 Feb 2015, Sam Bradshaw wrote:
>>> If there is interest in incorporating this support, we can provide
>>> a patch on top of this that enables 16b/32b/64b metadata with PI and
>>> supports PIL={0,1}
>>
>> I would also like that to work. I was hoping to reuse the
>> t10_pi_generate/verify functions for this. Those can work if the
>> iter->prot_buf is incremented by the blk_intergity's tuple_size rather
>> than just the 8-byte PI field.
>>
>> That'd work with PIL=0, not sure what to do with PIL=1. Maybe we should
>> just define our own functions for nvme even though they'd mostly be the
>> same as the ones scsi uses.
>
> We were unable to come up with an elegant solution that reused the
> t10_pi_generate/verify functions while also handling the PIL=1 case.
> So we defined our own functions.
I see you support 16/32/64 exclusively. As unlikely it may be someone
would implement something other than those, the specification allows 64k
different metadata formats with as many more combinations for PIL. Could
the solution be generalized to accomodate arbitrary formats in a single
function rather than require a definition for each possible combination?
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] NVMe: Metadata and PI format support
2015-02-19 17:05 ` Keith Busch
@ 2015-02-19 17:45 ` Sam Bradshaw (sbradshaw)
0 siblings, 0 replies; 6+ messages in thread
From: Sam Bradshaw (sbradshaw) @ 2015-02-19 17:45 UTC (permalink / raw)
> -----Original Message-----
> From: Keith Busch [mailto:keith.busch at intel.com]
> Sent: Thursday, February 19, 2015 9:06 AM
> To: Sam Bradshaw (sbradshaw)
> Cc: Keith Busch; linux-nvme at lists.infradead.org
> Subject: Re: [PATCH] NVMe: Metadata and PI format support
>
> On Tue, 17 Feb 2015, Sam Bradshaw wrote:
> >>> If there is interest in incorporating this support, we can provide
> a
> >>> patch on top of this that enables 16b/32b/64b metadata with PI and
> >>> supports PIL={0,1}
> >>
> >> I would also like that to work. I was hoping to reuse the
> >> t10_pi_generate/verify functions for this. Those can work if the
> >> iter->prot_buf is incremented by the blk_intergity's tuple_size
> >> iter->rather
> >> than just the 8-byte PI field.
> >>
> >> That'd work with PIL=0, not sure what to do with PIL=1. Maybe we
> >> should just define our own functions for nvme even though they'd
> >> mostly be the same as the ones scsi uses.
> >
> > We were unable to come up with an elegant solution that reused the
> > t10_pi_generate/verify functions while also handling the PIL=1 case.
> > So we defined our own functions.
>
> I see you support 16/32/64 exclusively. As unlikely it may be someone
> would implement something other than those, the specification allows
> 64k different metadata formats with as many more combinations for PIL.
> Could the solution be generalized to accomodate arbitrary formats in a
> single function rather than require a definition for each possible
> combination?
Probably. I see a few different solutions but perhaps the simplest would be to
attach an opaque pointer to struct blk_integrity to encapsulate the namespace
specific metadata details and have the iterator pass it back to the generator/verifier
function.
It might make sense to make these changes in combination with enhancements to the
iterator to handle CRC calculation/verification that cross multiple bvecs.
-Sam
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-02-19 17:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-31 2:11 [PATCH] NVMe: Metadata and PI format support Keith Busch
2015-02-16 18:53 ` Sam Bradshaw (sbradshaw)
2015-02-17 16:05 ` Keith Busch
2015-02-17 17:03 ` Sam Bradshaw
2015-02-19 17:05 ` Keith Busch
2015-02-19 17:45 ` Sam Bradshaw (sbradshaw)
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.