* [PATCH 1/6] block: Replace bi_integrity with bi_special
2014-05-29 3:52 Copy offload Martin K. Petersen
@ 2014-05-29 3:52 ` Martin K. Petersen
2014-06-02 20:35 ` Nicholas A. Bellinger
2014-05-29 3:52 ` [PATCH 2/6] block: Implement support for copy offload operations Martin K. Petersen
` (4 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2014-05-29 3:52 UTC (permalink / raw)
To: axboe, nab, linux-scsi; +Cc: Martin K. Petersen
For commands like REQ_COPY we need a way to pass extra information along
with each bio. Like integrity metadata this information must be
available at the bottom of the stack so bi_private does not suffice.
Rename the existing bi_integrity field to bi_special and make it a union
so we can have different bio extensions for each class of command.
We previously used bi_integrity != NULL as a way to identify whether a
bio had integrity metadata or not. Introduce a REQ_INTEGRITY to be the
indicator now that bi_special can contain different things.
In addition, bio_integrity(bio) will now return a pointer to the
integrity payload (when applicable).
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
Documentation/block/data-integrity.txt | 10 +++++-----
block/bio-integrity.c | 23 ++++++++++++-----------
drivers/scsi/sd_dif.c | 8 ++++----
include/linux/bio.h | 10 +++++++---
include/linux/blk_types.h | 8 ++++++--
include/linux/blkdev.h | 7 ++-----
6 files changed, 36 insertions(+), 30 deletions(-)
diff --git a/Documentation/block/data-integrity.txt b/Documentation/block/data-integrity.txt
index 2d735b0ae383..5a0efc9ee5d5 100644
--- a/Documentation/block/data-integrity.txt
+++ b/Documentation/block/data-integrity.txt
@@ -129,11 +129,11 @@ interface for this is being worked on.
4.1 BIO
The data integrity patches add a new field to struct bio when
-CONFIG_BLK_DEV_INTEGRITY is enabled. bio->bi_integrity is a pointer
-to a struct bip which contains the bio integrity payload. Essentially
-a bip is a trimmed down struct bio which holds a bio_vec containing
-the integrity metadata and the required housekeeping information (bvec
-pool, vector count, etc.)
+CONFIG_BLK_DEV_INTEGRITY is enabled. bio_integrity(bio) returns a
+pointer to a struct bip which contains the bio integrity payload.
+Essentially a bip is a trimmed down struct bio which holds a bio_vec
+containing the integrity metadata and the required housekeeping
+information (bvec pool, vector count, etc.)
A kernel subsystem can enable data integrity protection on a bio by
calling bio_integrity_alloc(bio). This will allocate and attach the
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 9e241063a616..40c6a0e50301 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -76,7 +76,8 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
bip->bip_slab = idx;
bip->bip_bio = bio;
- bio->bi_integrity = bip;
+ bio->bi_special.integrity = bip;
+ bio->bi_rw |= REQ_INTEGRITY;
return bip;
err:
@@ -94,7 +95,7 @@ EXPORT_SYMBOL(bio_integrity_alloc);
*/
void bio_integrity_free(struct bio *bio)
{
- struct bio_integrity_payload *bip = bio->bi_integrity;
+ struct bio_integrity_payload *bip = bio_integrity(bio);
struct bio_set *bs = bio->bi_pool;
if (bip->bip_owns_buf)
@@ -110,7 +111,7 @@ void bio_integrity_free(struct bio *bio)
kfree(bip);
}
- bio->bi_integrity = NULL;
+ bio->bi_special.integrity = NULL;
}
EXPORT_SYMBOL(bio_integrity_free);
@@ -134,7 +135,7 @@ static inline unsigned int bip_integrity_vecs(struct bio_integrity_payload *bip)
int bio_integrity_add_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int offset)
{
- struct bio_integrity_payload *bip = bio->bi_integrity;
+ struct bio_integrity_payload *bip = bio_integrity(bio);
struct bio_vec *iv;
if (bip->bip_vcnt >= bip_integrity_vecs(bip)) {
@@ -240,7 +241,7 @@ EXPORT_SYMBOL(bio_integrity_tag_size);
static int bio_integrity_tag(struct bio *bio, void *tag_buf, unsigned int len,
int set)
{
- struct bio_integrity_payload *bip = bio->bi_integrity;
+ struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
unsigned int nr_sectors;
@@ -315,12 +316,12 @@ static int bio_integrity_generate_verify(struct bio *bio, int operate)
struct bio_vec *bv;
sector_t sector;
unsigned int sectors, ret = 0, i;
- void *prot_buf = bio->bi_integrity->bip_buf;
+ void *prot_buf = bio_integrity(bio)->bip_buf;
if (operate)
sector = bio->bi_iter.bi_sector;
else
- sector = bio->bi_integrity->bip_iter.bi_sector;
+ sector = bio_integrity(bio)->bip_iter.bi_sector;
bix.disk_name = bio->bi_bdev->bd_disk->disk_name;
bix.sector_size = bi->sector_size;
@@ -516,7 +517,7 @@ static void bio_integrity_verify_fn(struct work_struct *work)
*/
void bio_integrity_endio(struct bio *bio, int error)
{
- struct bio_integrity_payload *bip = bio->bi_integrity;
+ struct bio_integrity_payload *bip = bio_integrity(bio);
BUG_ON(bip->bip_bio != bio);
@@ -547,7 +548,7 @@ EXPORT_SYMBOL(bio_integrity_endio);
*/
void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
{
- struct bio_integrity_payload *bip = bio->bi_integrity;
+ struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
@@ -569,7 +570,7 @@ EXPORT_SYMBOL(bio_integrity_advance);
void bio_integrity_trim(struct bio *bio, unsigned int offset,
unsigned int sectors)
{
- struct bio_integrity_payload *bip = bio->bi_integrity;
+ struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
bio_integrity_advance(bio, offset << 9);
@@ -588,7 +589,7 @@ EXPORT_SYMBOL(bio_integrity_trim);
int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
gfp_t gfp_mask)
{
- struct bio_integrity_payload *bip_src = bio_src->bi_integrity;
+ struct bio_integrity_payload *bip_src = bio_integrity(bio_src);
struct bio_integrity_payload *bip;
BUG_ON(bip_src == NULL);
diff --git a/drivers/scsi/sd_dif.c b/drivers/scsi/sd_dif.c
index a7a691d0af7d..29f0477a8708 100644
--- a/drivers/scsi/sd_dif.c
+++ b/drivers/scsi/sd_dif.c
@@ -383,9 +383,9 @@ void sd_dif_prepare(struct request *rq, sector_t hw_sector,
if (bio_flagged(bio, BIO_MAPPED_INTEGRITY))
break;
- virt = bio->bi_integrity->bip_iter.bi_sector & 0xffffffff;
+ virt = bio_integrity(bio)->bip_iter.bi_sector & 0xffffffff;
- bip_for_each_vec(iv, bio->bi_integrity, iter) {
+ bip_for_each_vec(iv, bio_integrity(bio), iter) {
sdt = kmap_atomic(iv.bv_page)
+ iv.bv_offset;
@@ -434,9 +434,9 @@ void sd_dif_complete(struct scsi_cmnd *scmd, unsigned int good_bytes)
struct bio_vec iv;
struct bvec_iter iter;
- virt = bio->bi_integrity->bip_iter.bi_sector & 0xffffffff;
+ virt = bio_integrity(bio)->bip_iter.bi_sector & 0xffffffff;
- bip_for_each_vec(iv, bio->bi_integrity, iter) {
+ bip_for_each_vec(iv, bio_integrity(bio), iter) {
sdt = kmap_atomic(iv.bv_page)
+ iv.bv_offset;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5a645769f020..9fb4b0d75b11 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -644,7 +644,13 @@ struct biovec_slab {
#if defined(CONFIG_BLK_DEV_INTEGRITY)
+static inline struct bio_integrity_payload * bio_integrity(struct bio *bio)
+{
+ if (bio->bi_rw & REQ_INTEGRITY)
+ return bio->bi_special.integrity;
+ return NULL;
+}
#define bip_vec_idx(bip, idx) (&(bip->bip_vec[(idx)]))
@@ -653,9 +659,7 @@ struct biovec_slab {
#define bio_for_each_integrity_vec(_bvl, _bio, _iter) \
for_each_bio(_bio) \
- bip_for_each_vec(_bvl, _bio->bi_integrity, _iter)
-
-#define bio_integrity(bio) (bio->bi_integrity != NULL)
+ bip_for_each_vec(_bvl, _bio->bi_special.integrity, _iter)
extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
extern void bio_integrity_free(struct bio *);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d8e4cea23a25..9cce1fcd6793 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -78,9 +78,11 @@ struct bio {
struct io_context *bi_ioc;
struct cgroup_subsys_state *bi_css;
#endif
+ union {
#if defined(CONFIG_BLK_DEV_INTEGRITY)
- struct bio_integrity_payload *bi_integrity; /* data integrity */
+ struct bio_integrity_payload *integrity; /* data integrity */
#endif
+ } bi_special;
unsigned short bi_vcnt; /* how many bio_vec's */
@@ -162,6 +164,7 @@ enum rq_flag_bits {
__REQ_WRITE_SAME, /* write same block many times */
__REQ_NOIDLE, /* don't anticipate more IO after this one */
+ __REQ_INTEGRITY, /* I/O includes block integrity payload */
__REQ_FUA, /* forced unit access */
__REQ_FLUSH, /* request for cache flush */
@@ -204,13 +207,14 @@ enum rq_flag_bits {
#define REQ_DISCARD (1ULL << __REQ_DISCARD)
#define REQ_WRITE_SAME (1ULL << __REQ_WRITE_SAME)
#define REQ_NOIDLE (1ULL << __REQ_NOIDLE)
+#define REQ_INTEGRITY (1ULL << __REQ_INTEGRITY)
#define REQ_FAILFAST_MASK \
(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
#define REQ_COMMON_MASK \
(REQ_WRITE | REQ_FAILFAST_MASK | REQ_SYNC | REQ_META | REQ_PRIO | \
REQ_DISCARD | REQ_WRITE_SAME | REQ_NOIDLE | REQ_FLUSH | REQ_FUA | \
- REQ_SECURE)
+ REQ_SECURE | REQ_INTEGRITY)
#define REQ_CLONE_MASK REQ_COMMON_MASK
#define BIO_NO_ADVANCE_ITER_MASK (REQ_DISCARD|REQ_WRITE_SAME)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6bc011a09e82..5d0067766ff2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1484,12 +1484,9 @@ static inline struct blk_integrity *blk_get_integrity(struct gendisk *disk)
return disk->integrity;
}
-static inline int blk_integrity_rq(struct request *rq)
+static inline bool blk_integrity_rq(struct request *rq)
{
- if (rq->bio == NULL)
- return 0;
-
- return bio_integrity(rq->bio);
+ return rq->cmd_flags & REQ_INTEGRITY;
}
static inline void blk_queue_max_integrity_segments(struct request_queue *q,
--
1.9.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/6] block: Replace bi_integrity with bi_special
2014-05-29 3:52 ` [PATCH 1/6] block: Replace bi_integrity with bi_special Martin K. Petersen
@ 2014-06-02 20:35 ` Nicholas A. Bellinger
0 siblings, 0 replies; 20+ messages in thread
From: Nicholas A. Bellinger @ 2014-06-02 20:35 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: axboe, nab, linux-scsi
On Wed, 2014-05-28 at 23:52 -0400, Martin K. Petersen wrote:
> For commands like REQ_COPY we need a way to pass extra information along
> with each bio. Like integrity metadata this information must be
> available at the bottom of the stack so bi_private does not suffice.
>
> Rename the existing bi_integrity field to bi_special and make it a union
> so we can have different bio extensions for each class of command.
>
> We previously used bi_integrity != NULL as a way to identify whether a
> bio had integrity metadata or not. Introduce a REQ_INTEGRITY to be the
> indicator now that bi_special can contain different things.
>
> In addition, bio_integrity(bio) will now return a pointer to the
> integrity payload (when applicable).
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> Documentation/block/data-integrity.txt | 10 +++++-----
> block/bio-integrity.c | 23 ++++++++++++-----------
> drivers/scsi/sd_dif.c | 8 ++++----
> include/linux/bio.h | 10 +++++++---
> include/linux/blk_types.h | 8 ++++++--
> include/linux/blkdev.h | 7 ++-----
> 6 files changed, 36 insertions(+), 30 deletions(-)
>
Looks fine.
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/6] block: Implement support for copy offload operations
2014-05-29 3:52 Copy offload Martin K. Petersen
2014-05-29 3:52 ` [PATCH 1/6] block: Replace bi_integrity with bi_special Martin K. Petersen
@ 2014-05-29 3:52 ` Martin K. Petersen
2014-06-02 20:38 ` Nicholas A. Bellinger
2014-05-29 3:52 ` [PATCH 3/6] block: Introduce copy offload library function Martin K. Petersen
` (3 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2014-05-29 3:52 UTC (permalink / raw)
To: axboe, nab, linux-scsi; +Cc: Martin K. Petersen
Many modern SCSI devices support copy offloading operations in which one
can copy a block range from one LUN to another without the need for data
to be copied sent to the host and back. This is particularly useful for
things like cloning LUNs or virtual machine images.
Implement support for REQ_COPY commands in the block layer:
- Add max_copy_sectors queue limits and handle stacking
- Expose this parameter in sysfs in bytes (copy_max_bytes)
- Add special casing for REQ_COPY in merging and mapping functions
- Introduce a bio_copy descriptor hanging off of bio->bi_special. This
descriptor contains the source bdev and source sector for the copy
operation. Target bdev/sector/size are described by the bio proper.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
Documentation/ABI/testing/sysfs-block | 9 +++++++++
block/blk-core.c | 5 +++++
block/blk-merge.c | 7 ++-----
block/blk-settings.c | 15 +++++++++++++++
block/blk-sysfs.c | 10 ++++++++++
include/linux/bio.h | 15 +++++++++++++--
include/linux/blk_types.h | 15 ++++++++++++---
include/linux/blkdev.h | 13 +++++++++++++
8 files changed, 79 insertions(+), 10 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 279da08f7541..d1304cc305f7 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -220,3 +220,12 @@ Description:
write_same_max_bytes is 0, write same is not supported
by the device.
+
+What: /sys/block/<disk>/queue/copy_max_bytes
+Date: January 2014
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Devices that support copy offloading will set this value
+ to indicate the maximum buffer size in bytes that can be
+ copied in one operation. If the copy_max_bytes is 0 the
+ device does not support copy offload.
diff --git a/block/blk-core.c b/block/blk-core.c
index 5b6f768a7c01..3a91044ee19b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1810,6 +1810,11 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
+ if (bio->bi_rw & REQ_COPY && !bdev_copy_offload(bio->bi_bdev)) {
+ err = -EOPNOTSUPP;
+ goto end_io;
+ }
+
/*
* Various block parts want %current->io_context and lazy ioc
* allocation ends up trading a lot of pain for a small amount of
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 6c583f9c5b65..0e1027e2e32b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -25,10 +25,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
* This should probably be returning 0, but blk_add_request_payload()
* (Christoph!!!!)
*/
- if (bio->bi_rw & REQ_DISCARD)
- return 1;
-
- if (bio->bi_rw & REQ_WRITE_SAME)
+ if (bio->bi_rw & (REQ_DISCARD | REQ_WRITE_SAME | REQ_COPY))
return 1;
fbio = bio;
@@ -182,7 +179,7 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
nsegs = 0;
cluster = blk_queue_cluster(q);
- if (bio->bi_rw & REQ_DISCARD) {
+ if (bio->bi_rw & (REQ_DISCARD | REQ_COPY)) {
/*
* This is a hack - drivers should be neither modifying the
* biovec, nor relying on bi_vcnt - but because of
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 5d21239bc859..98801bcc02b0 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -114,6 +114,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
lim->max_sectors = lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
lim->max_write_same_sectors = 0;
+ lim->max_copy_sectors = 0;
lim->max_discard_sectors = 0;
lim->discard_granularity = 0;
lim->discard_alignment = 0;
@@ -147,6 +148,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_segment_size = UINT_MAX;
lim->max_sectors = UINT_MAX;
lim->max_write_same_sectors = UINT_MAX;
+ lim->max_copy_sectors = UINT_MAX;
}
EXPORT_SYMBOL(blk_set_stacking_limits);
@@ -301,6 +303,18 @@ void blk_queue_max_write_same_sectors(struct request_queue *q,
EXPORT_SYMBOL(blk_queue_max_write_same_sectors);
/**
+ * blk_queue_max_copy_sectors - set max sectors for a single copy operation
+ * @q: the request queue for the device
+ * @max_copy_sectors: maximum number of sectors per copy operation
+ **/
+void blk_queue_max_copy_sectors(struct request_queue *q,
+ unsigned int max_copy_sectors)
+{
+ q->limits.max_copy_sectors = max_copy_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_copy_sectors);
+
+/**
* blk_queue_max_segments - set max hw segments for a request for this queue
* @q: the request queue for the device
* @max_segments: max number of segments
@@ -527,6 +541,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
t->max_write_same_sectors = min(t->max_write_same_sectors,
b->max_write_same_sectors);
+ t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors);
t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 4d6811ac13fd..8d9077dc5bae 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -161,6 +161,11 @@ static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
(unsigned long long)q->limits.max_write_same_sectors << 9);
}
+static ssize_t queue_copy_max_show(struct request_queue *q, char *page)
+{
+ return sprintf(page, "%llu\n",
+ (unsigned long long)q->limits.max_copy_sectors << 9);
+}
static ssize_t
queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
@@ -374,6 +379,10 @@ static struct queue_sysfs_entry queue_write_same_max_entry = {
.show = queue_write_same_max_show,
};
+static struct queue_sysfs_entry queue_copy_max_entry = {
+ .attr = {.name = "copy_max_bytes", .mode = S_IRUGO },
+ .show = queue_copy_max_show,
+};
static struct queue_sysfs_entry queue_nonrot_entry = {
.attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
.show = queue_show_nonrot,
@@ -422,6 +431,7 @@ static struct attribute *default_attrs[] = {
&queue_discard_max_entry.attr,
&queue_discard_zeroes_data_entry.attr,
&queue_write_same_max_entry.attr,
+ &queue_copy_max_entry.attr,
&queue_nonrot_entry.attr,
&queue_nomerges_entry.attr,
&queue_rq_affinity_entry.attr,
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 9fb4b0d75b11..b85fa9ac5779 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -251,8 +251,8 @@ static inline unsigned bio_segments(struct bio *bio)
struct bvec_iter iter;
/*
- * We special case discard/write same, because they interpret bi_size
- * differently:
+ * We special case discard/write same/copy, because they
+ * interpret bi_size differently:
*/
if (bio->bi_rw & REQ_DISCARD)
@@ -261,12 +261,23 @@ static inline unsigned bio_segments(struct bio *bio)
if (bio->bi_rw & REQ_WRITE_SAME)
return 1;
+ if (bio->bi_rw & REQ_COPY)
+ return 1;
+
bio_for_each_segment(bv, bio, iter)
segs++;
return segs;
}
+static inline struct bio_copy *bio_copy(struct bio *bio)
+{
+ if (bio->bi_rw & REQ_COPY)
+ return bio->bi_special.copy;
+
+ return NULL;
+}
+
/*
* get a reference to a bio, so it won't disappear. the intended use is
* something like:
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 9cce1fcd6793..7ba2798dd579 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -39,6 +39,11 @@ struct bvec_iter {
current bvec */
};
+struct bio_copy {
+ struct block_device *bic_bdev;
+ sector_t bic_sector;
+};
+
/*
* main unit of I/O for the block layer and lower layers (ie drivers and
* stacking drivers)
@@ -81,6 +86,7 @@ struct bio {
union {
#if defined(CONFIG_BLK_DEV_INTEGRITY)
struct bio_integrity_payload *integrity; /* data integrity */
+ struct bio_copy *copy; /* copy offload */
#endif
} bi_special;
@@ -162,6 +168,7 @@ enum rq_flag_bits {
__REQ_DISCARD, /* request to discard sectors */
__REQ_SECURE, /* secure discard (used with __REQ_DISCARD) */
__REQ_WRITE_SAME, /* write same block many times */
+ __REQ_COPY, /* copy block range */
__REQ_NOIDLE, /* don't anticipate more IO after this one */
__REQ_INTEGRITY, /* I/O includes block integrity payload */
@@ -206,6 +213,7 @@ enum rq_flag_bits {
#define REQ_PRIO (1ULL << __REQ_PRIO)
#define REQ_DISCARD (1ULL << __REQ_DISCARD)
#define REQ_WRITE_SAME (1ULL << __REQ_WRITE_SAME)
+#define REQ_COPY (1ULL << __REQ_COPY)
#define REQ_NOIDLE (1ULL << __REQ_NOIDLE)
#define REQ_INTEGRITY (1ULL << __REQ_INTEGRITY)
@@ -214,14 +222,15 @@ enum rq_flag_bits {
#define REQ_COMMON_MASK \
(REQ_WRITE | REQ_FAILFAST_MASK | REQ_SYNC | REQ_META | REQ_PRIO | \
REQ_DISCARD | REQ_WRITE_SAME | REQ_NOIDLE | REQ_FLUSH | REQ_FUA | \
- REQ_SECURE | REQ_INTEGRITY)
+ REQ_SECURE | REQ_INTEGRITY | REQ_COPY)
#define REQ_CLONE_MASK REQ_COMMON_MASK
-#define BIO_NO_ADVANCE_ITER_MASK (REQ_DISCARD|REQ_WRITE_SAME)
+#define BIO_NO_ADVANCE_ITER_MASK (REQ_DISCARD|REQ_WRITE_SAME|REQ_COPY)
/* This mask is used for both bio and request merge checking */
#define REQ_NOMERGE_FLAGS \
- (REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA)
+ (REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA | \
+ REQ_COPY)
#define REQ_RAHEAD (1ULL << __REQ_RAHEAD)
#define REQ_THROTTLED (1ULL << __REQ_THROTTLED)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5d0067766ff2..0d80e09251e6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -289,6 +289,7 @@ struct queue_limits {
unsigned int io_opt;
unsigned int max_discard_sectors;
unsigned int max_write_same_sectors;
+ unsigned int max_copy_sectors;
unsigned int discard_granularity;
unsigned int discard_alignment;
@@ -976,6 +977,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
unsigned int max_discard_sectors);
extern void blk_queue_max_write_same_sectors(struct request_queue *q,
unsigned int max_write_same_sectors);
+extern void blk_queue_max_copy_sectors(struct request_queue *q,
+ unsigned int max_copy_sectors);
extern void blk_queue_logical_block_size(struct request_queue *, unsigned short);
extern void blk_queue_physical_block_size(struct request_queue *, unsigned int);
extern void blk_queue_alignment_offset(struct request_queue *q,
@@ -1332,6 +1335,16 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
return 0;
}
+static inline unsigned int bdev_copy_offload(struct block_device *bdev)
+{
+ struct request_queue *q = bdev_get_queue(bdev);
+
+ if (q)
+ return q->limits.max_copy_sectors;
+
+ return 0;
+}
+
static inline int queue_dma_alignment(struct request_queue *q)
{
return q ? q->dma_alignment : 511;
--
1.9.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/6] block: Implement support for copy offload operations
2014-05-29 3:52 ` [PATCH 2/6] block: Implement support for copy offload operations Martin K. Petersen
@ 2014-06-02 20:38 ` Nicholas A. Bellinger
0 siblings, 0 replies; 20+ messages in thread
From: Nicholas A. Bellinger @ 2014-06-02 20:38 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: axboe, nab, linux-scsi
On Wed, 2014-05-28 at 23:52 -0400, Martin K. Petersen wrote:
> Many modern SCSI devices support copy offloading operations in which one
> can copy a block range from one LUN to another without the need for data
> to be copied sent to the host and back. This is particularly useful for
> things like cloning LUNs or virtual machine images.
>
> Implement support for REQ_COPY commands in the block layer:
>
> - Add max_copy_sectors queue limits and handle stacking
>
> - Expose this parameter in sysfs in bytes (copy_max_bytes)
>
> - Add special casing for REQ_COPY in merging and mapping functions
>
> - Introduce a bio_copy descriptor hanging off of bio->bi_special. This
> descriptor contains the source bdev and source sector for the copy
> operation. Target bdev/sector/size are described by the bio proper.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> Documentation/ABI/testing/sysfs-block | 9 +++++++++
> block/blk-core.c | 5 +++++
> block/blk-merge.c | 7 ++-----
> block/blk-settings.c | 15 +++++++++++++++
> block/blk-sysfs.c | 10 ++++++++++
> include/linux/bio.h | 15 +++++++++++++--
> include/linux/blk_types.h | 15 ++++++++++++---
> include/linux/blkdev.h | 13 +++++++++++++
> 8 files changed, 79 insertions(+), 10 deletions(-)
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/6] block: Introduce copy offload library function
2014-05-29 3:52 Copy offload Martin K. Petersen
2014-05-29 3:52 ` [PATCH 1/6] block: Replace bi_integrity with bi_special Martin K. Petersen
2014-05-29 3:52 ` [PATCH 2/6] block: Implement support for copy offload operations Martin K. Petersen
@ 2014-05-29 3:52 ` Martin K. Petersen
2014-06-02 20:40 ` Nicholas A. Bellinger
2014-05-29 3:52 ` [PATCH 4/6] block: Copy offload ioctl Martin K. Petersen
` (2 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2014-05-29 3:52 UTC (permalink / raw)
To: axboe, nab, linux-scsi; +Cc: Martin K. Petersen
blkdev_issue_copy() is a library function that filesystems can use to
clone block ranges between devices that support copy offloading. Both
source and target device must have max_copy_sectors > 0 in the queue
limits.
blkdev_issue_copy() will iterate over the blocks in the source range and
issue copy offload requests using the granularity preferred by source
and target.
There is no guarantee that a copy offload operation will be successful
even if both devices are offload-capable. Filesystems must be prepared
to manually copy or punt to userland if the operation fails.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/blk-lib.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/blkdev.h | 2 ++
2 files changed, 87 insertions(+)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 97a733cf3d5f..5a0afc6e933e 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -305,3 +305,88 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
}
EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_copy - queue a copy same operation
+ * @src_bdev: source blockdev
+ * @src_sector: source sector
+ * @dst_bdev: destination blockdev
+ * @dst_sector: destination sector
+ * @nr_sects: number of sectors to copy
+ * @gfp_mask: memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ * Copy a block range from source device to target device.
+ */
+int blkdev_issue_copy(struct block_device *src_bdev, sector_t src_sector,
+ struct block_device *dst_bdev, sector_t dst_sector,
+ unsigned int nr_sects, gfp_t gfp_mask)
+{
+ DECLARE_COMPLETION_ONSTACK(wait);
+ struct request_queue *sq = bdev_get_queue(src_bdev);
+ struct request_queue *dq = bdev_get_queue(dst_bdev);
+ unsigned int max_copy_sectors;
+ struct bio_batch bb;
+ int ret = 0;
+
+ if (!sq || !dq)
+ return -ENXIO;
+
+ max_copy_sectors = min(sq->limits.max_copy_sectors,
+ dq->limits.max_copy_sectors);
+
+ if (max_copy_sectors == 0)
+ return -EOPNOTSUPP;
+
+ atomic_set(&bb.done, 1);
+ bb.flags = 1 << BIO_UPTODATE;
+ bb.wait = &wait;
+
+ while (nr_sects) {
+ struct bio *bio;
+ struct bio_copy *bc;
+ unsigned int chunk;
+
+ bc = kmalloc(sizeof(struct bio_copy), gfp_mask);
+ if (!bc) {
+ ret = -ENOMEM;
+ break;
+ }
+
+ bio = bio_alloc(gfp_mask, 1);
+ if (!bio) {
+ kfree(bc);
+ ret = -ENOMEM;
+ break;
+ }
+
+ chunk = min(nr_sects, max_copy_sectors);
+
+ bio->bi_iter.bi_sector = dst_sector;
+ bio->bi_iter.bi_size = chunk << 9;
+ bio->bi_end_io = bio_batch_end_io;
+ bio->bi_bdev = dst_bdev;
+ bio->bi_private = &bb;
+ bio->bi_special.copy = bc;
+
+ bc->bic_bdev = src_bdev;
+ bc->bic_sector = src_sector;
+
+ atomic_inc(&bb.done);
+ submit_bio(REQ_WRITE | REQ_COPY, bio);
+
+ src_sector += chunk;
+ dst_sector += chunk;
+ nr_sects -= chunk;
+ }
+
+ /* Wait for bios in-flight */
+ if (!atomic_dec_and_test(&bb.done))
+ wait_for_completion_io(&wait);
+
+ if (!test_bit(BIO_UPTODATE, &bb.flags))
+ ret = -ENOTSUPP;
+
+ return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_copy);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0d80e09251e6..d2fe99e6b3b8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1136,6 +1136,8 @@ extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags);
extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct page *page);
+extern int blkdev_issue_copy(struct block_device *, sector_t,
+ struct block_device *, sector_t, unsigned int, gfp_t);
extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask);
static inline int sb_issue_discard(struct super_block *sb, sector_t block,
--
1.9.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 3/6] block: Introduce copy offload library function
2014-05-29 3:52 ` [PATCH 3/6] block: Introduce copy offload library function Martin K. Petersen
@ 2014-06-02 20:40 ` Nicholas A. Bellinger
0 siblings, 0 replies; 20+ messages in thread
From: Nicholas A. Bellinger @ 2014-06-02 20:40 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: axboe, nab, linux-scsi
On Wed, 2014-05-28 at 23:52 -0400, Martin K. Petersen wrote:
> blkdev_issue_copy() is a library function that filesystems can use to
> clone block ranges between devices that support copy offloading. Both
> source and target device must have max_copy_sectors > 0 in the queue
> limits.
>
> blkdev_issue_copy() will iterate over the blocks in the source range and
> issue copy offload requests using the granularity preferred by source
> and target.
>
> There is no guarantee that a copy offload operation will be successful
> even if both devices are offload-capable. Filesystems must be prepared
> to manually copy or punt to userland if the operation fails.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> block/blk-lib.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/blkdev.h | 2 ++
> 2 files changed, 87 insertions(+)
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 97a733cf3d5f..5a0afc6e933e 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -305,3 +305,88 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
> }
> EXPORT_SYMBOL(blkdev_issue_zeroout);
> +
> +/**
> + * blkdev_issue_copy - queue a copy same operation
> + * @src_bdev: source blockdev
> + * @src_sector: source sector
> + * @dst_bdev: destination blockdev
> + * @dst_sector: destination sector
> + * @nr_sects: number of sectors to copy
> + * @gfp_mask: memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + * Copy a block range from source device to target device.
> + */
> +int blkdev_issue_copy(struct block_device *src_bdev, sector_t src_sector,
> + struct block_device *dst_bdev, sector_t dst_sector,
> + unsigned int nr_sects, gfp_t gfp_mask)
> +{
> + DECLARE_COMPLETION_ONSTACK(wait);
> + struct request_queue *sq = bdev_get_queue(src_bdev);
> + struct request_queue *dq = bdev_get_queue(dst_bdev);
> + unsigned int max_copy_sectors;
> + struct bio_batch bb;
> + int ret = 0;
> +
> + if (!sq || !dq)
> + return -ENXIO;
> +
> + max_copy_sectors = min(sq->limits.max_copy_sectors,
> + dq->limits.max_copy_sectors);
> +
> + if (max_copy_sectors == 0)
> + return -EOPNOTSUPP;
> +
> + atomic_set(&bb.done, 1);
> + bb.flags = 1 << BIO_UPTODATE;
> + bb.wait = &wait;
> +
> + while (nr_sects) {
> + struct bio *bio;
> + struct bio_copy *bc;
> + unsigned int chunk;
> +
> + bc = kmalloc(sizeof(struct bio_copy), gfp_mask);
> + if (!bc) {
> + ret = -ENOMEM;
> + break;
> + }
> +
> + bio = bio_alloc(gfp_mask, 1);
> + if (!bio) {
> + kfree(bc);
> + ret = -ENOMEM;
> + break;
> + }
> +
> + chunk = min(nr_sects, max_copy_sectors);
> +
> + bio->bi_iter.bi_sector = dst_sector;
> + bio->bi_iter.bi_size = chunk << 9;
> + bio->bi_end_io = bio_batch_end_io;
> + bio->bi_bdev = dst_bdev;
> + bio->bi_private = &bb;
> + bio->bi_special.copy = bc;
> +
> + bc->bic_bdev = src_bdev;
> + bc->bic_sector = src_sector;
> +
> + atomic_inc(&bb.done);
> + submit_bio(REQ_WRITE | REQ_COPY, bio);
> +
> + src_sector += chunk;
> + dst_sector += chunk;
> + nr_sects -= chunk;
> + }
> +
> + /* Wait for bios in-flight */
> + if (!atomic_dec_and_test(&bb.done))
> + wait_for_completion_io(&wait);
> +
> + if (!test_bit(BIO_UPTODATE, &bb.flags))
> + ret = -ENOTSUPP;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(blkdev_issue_copy);
Mmmm, where does *bc memory get released in the normal bio completion
path..?
--nab
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/6] block: Copy offload ioctl
2014-05-29 3:52 Copy offload Martin K. Petersen
` (2 preceding siblings ...)
2014-05-29 3:52 ` [PATCH 3/6] block: Introduce copy offload library function Martin K. Petersen
@ 2014-05-29 3:52 ` Martin K. Petersen
2014-06-02 20:42 ` Nicholas A. Bellinger
2014-05-29 3:52 ` [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present Martin K. Petersen
2014-05-29 3:52 ` [PATCH 6/6] [SCSI] sd: Implement copy offload support Martin K. Petersen
5 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2014-05-29 3:52 UTC (permalink / raw)
To: axboe, nab, linux-scsi; +Cc: Martin K. Petersen
Add an ioctl which can be used to clone a block range within a single
block device. This is useful for testing the copy offload code.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/ioctl.c | 35 +++++++++++++++++++++++++++++++++++
include/uapi/linux/fs.h | 1 +
2 files changed, 36 insertions(+)
diff --git a/block/ioctl.c b/block/ioctl.c
index 7d5c3b20af45..5efb6e666f18 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -201,6 +201,29 @@ static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start,
return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL);
}
+static int blk_ioctl_copy(struct block_device *bdev, uint64_t src_offset,
+ uint64_t dst_offset, uint64_t len)
+{
+ if (src_offset & 511)
+ return -EINVAL;
+ if (dst_offset & 511)
+ return -EINVAL;
+ if (len & 511)
+ return -EINVAL;
+ src_offset >>= 9;
+ dst_offset >>= 9;
+ len >>= 9;
+
+ if (src_offset + len > (i_size_read(bdev->bd_inode) >> 9))
+ return -EINVAL;
+
+ if (dst_offset + len > (i_size_read(bdev->bd_inode) >> 9))
+ return -EINVAL;
+
+ return blkdev_issue_copy(bdev, src_offset, bdev, dst_offset, len,
+ GFP_KERNEL);
+}
+
static int put_ushort(unsigned long arg, unsigned short val)
{
return put_user(val, (unsigned short __user *)arg);
@@ -328,6 +351,18 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
return blk_ioctl_zeroout(bdev, range[0], range[1]);
}
+ case BLKCOPY: {
+ uint64_t range[3];
+
+ if (!(mode & FMODE_WRITE))
+ return -EBADF;
+
+ if (copy_from_user(range, (void __user *)arg, sizeof(range)))
+ return -EFAULT;
+
+ return blk_ioctl_copy(bdev, range[0], range[1], range[2]);
+ }
+
case HDIO_GETGEO: {
struct hd_geometry geo;
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index ca1a11bb4443..195c2c4cbacc 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -149,6 +149,7 @@ struct inodes_stat_t {
#define BLKSECDISCARD _IO(0x12,125)
#define BLKROTATIONAL _IO(0x12,126)
#define BLKZEROOUT _IO(0x12,127)
+#define BLKCOPY _IO(0x12,128)
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
--
1.9.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] block: Copy offload ioctl
2014-05-29 3:52 ` [PATCH 4/6] block: Copy offload ioctl Martin K. Petersen
@ 2014-06-02 20:42 ` Nicholas A. Bellinger
0 siblings, 0 replies; 20+ messages in thread
From: Nicholas A. Bellinger @ 2014-06-02 20:42 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: axboe, nab, linux-scsi
On Wed, 2014-05-28 at 23:52 -0400, Martin K. Petersen wrote:
> Add an ioctl which can be used to clone a block range within a single
> block device. This is useful for testing the copy offload code.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> block/ioctl.c | 35 +++++++++++++++++++++++++++++++++++
> include/uapi/linux/fs.h | 1 +
> 2 files changed, 36 insertions(+)
>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present
2014-05-29 3:52 Copy offload Martin K. Petersen
` (3 preceding siblings ...)
2014-05-29 3:52 ` [PATCH 4/6] block: Copy offload ioctl Martin K. Petersen
@ 2014-05-29 3:52 ` Martin K. Petersen
2014-06-02 20:43 ` Nicholas A. Bellinger
` (2 more replies)
2014-05-29 3:52 ` [PATCH 6/6] [SCSI] sd: Implement copy offload support Martin K. Petersen
5 siblings, 3 replies; 20+ messages in thread
From: Martin K. Petersen @ 2014-05-29 3:52 UTC (permalink / raw)
To: axboe, nab, linux-scsi; +Cc: Martin K. Petersen
Copy offloading requires us to know the NAA descriptor for both source
target device. This descriptor is mandatory in the Device Identification
VPD page. Locate this descriptor in the returned VPD data so we don't
have to do lookups for every copy command.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/scsi/scsi.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++
include/scsi/scsi_device.h | 2 ++
2 files changed, 59 insertions(+)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe6bf98..7faea9987abf 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1024,6 +1024,62 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, unsigned char *buf,
EXPORT_SYMBOL_GPL(scsi_get_vpd_page);
/**
+ * scsi_lookup_naa - Lookup NAA descriptor in VPD page 0x83
+ * @sdev: The device to ask
+ *
+ * Copy offloading requires us to know the NAA descriptor for both
+ * source and target device. This descriptor is mandatory in the Device
+ * Identification VPD page. Locate this descriptor in the returned VPD
+ * data so we don't have to do lookups for every copy command.
+ */
+static void scsi_lookup_naa(struct scsi_device *sdev)
+{
+ unsigned char *buf = sdev->vpd_pg83;
+ unsigned int len = sdev->vpd_pg83_len;
+
+ if (buf[1] != 0x83 || get_unaligned_be16(&buf[2]) == 0) {
+ sdev_printk(KERN_ERR, sdev,
+ "%s: VPD page 0x83 contains no descriptors\n",
+ __func__);
+ return;
+ }
+
+ buf += 4;
+ len -= 4;
+
+ do {
+ unsigned int desig_len = buf[3] + 4;
+
+ /* Binary code set */
+ if ((buf[0] & 0xf) != 1)
+ goto skip;
+
+ /* Target association */
+ if ((buf[1] >> 4) & 0x3)
+ goto skip;
+
+ /* NAA designator */
+ if ((buf[1] & 0xf) != 0x3)
+ goto skip;
+
+ sdev->naa = buf;
+ sdev->naa_len = desig_len;
+
+ return;
+
+skip:
+ buf += desig_len;
+ len -= desig_len;
+
+ } while (len > 0);
+
+ sdev_printk(KERN_ERR, sdev,
+ "%s: VPD page 0x83 NAA descriptor not found\n", __func__);
+
+ return;
+}
+
+/**
* scsi_attach_vpd - Attach Vital Product Data to a SCSI device structure
* @sdev: The device to ask
*
@@ -1107,6 +1163,7 @@ retry_pg83:
}
sdev->vpd_pg83_len = result;
sdev->vpd_pg83 = vpd_buf;
+ scsi_lookup_naa(sdev);
}
}
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 5853c913d2b0..67bb70012802 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -119,6 +119,8 @@ struct scsi_device {
unsigned char *vpd_pg83;
int vpd_pg80_len;
unsigned char *vpd_pg80;
+ unsigned char naa_len;
+ unsigned char *naa;
unsigned char current_tag; /* current tag */
struct scsi_target *sdev_target; /* used only for single_lun */
--
1.9.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present
2014-05-29 3:52 ` [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present Martin K. Petersen
@ 2014-06-02 20:43 ` Nicholas A. Bellinger
2014-06-02 20:59 ` Paolo Bonzini
2014-07-17 11:48 ` Bart Van Assche
2 siblings, 0 replies; 20+ messages in thread
From: Nicholas A. Bellinger @ 2014-06-02 20:43 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: axboe, nab, linux-scsi
On Wed, 2014-05-28 at 23:52 -0400, Martin K. Petersen wrote:
> Copy offloading requires us to know the NAA descriptor for both source
> target device. This descriptor is mandatory in the Device Identification
> VPD page. Locate this descriptor in the returned VPD data so we don't
> have to do lookups for every copy command.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> drivers/scsi/scsi.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++
> include/scsi/scsi_device.h | 2 ++
> 2 files changed, 59 insertions(+)
>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present
2014-05-29 3:52 ` [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present Martin K. Petersen
2014-06-02 20:43 ` Nicholas A. Bellinger
@ 2014-06-02 20:59 ` Paolo Bonzini
2014-06-03 1:00 ` Martin K. Petersen
2014-07-17 11:48 ` Bart Van Assche
2 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-06-02 20:59 UTC (permalink / raw)
To: Martin K. Petersen, axboe, nab, linux-scsi
Il 29/05/2014 05:52, Martin K. Petersen ha scritto:
> + sdev_printk(KERN_ERR, sdev,
> + "%s: VPD page 0x83 NAA descriptor not found\n", __func__);
> +
> + return;
I suspect this error will be relatively common.
libata for example has
if (ata_id_has_wwn(args->id)) {
/* SAT defined lu world wide name */
/* piv=0, assoc=lu, code_set=binary, designator=NAA */
rbuf[num + 0] = 1;
rbuf[num + 1] = 3;
rbuf[num + 3] = ATA_ID_WWN_LEN;
num += 4;
ata_id_string(args->id, (unsigned char *) rbuf + num,
ATA_ID_WWN, ATA_ID_WWN_LEN);
num += ATA_ID_WWN_LEN;
}
rbuf[3] = num - 4; /* page len (assume less than 256 bytes) */
and most of the time IDE disks in a virtual machine are configured without
a WWN.
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present
2014-06-02 20:59 ` Paolo Bonzini
@ 2014-06-03 1:00 ` Martin K. Petersen
2014-06-03 9:13 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2014-06-03 1:00 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Martin K. Petersen, axboe, nab, linux-scsi
>>>>> "Paolo" == Paolo Bonzini <pbonzini@redhat.com> writes:
>> + sdev_printk(KERN_ERR, sdev,
>> + "%s: VPD page 0x83 NAA descriptor not found\n", __func__);
>> +
>> + return;
Paolo> I suspect this error will be relatively common.
You're right. But we would like to see it for devices that actually
implement copy offload. So I suggest the following tweak...
[SCSI] Look up and store NAA if VPD page 0x83 is present
Copy offloading requires us to know the NAA descriptor for both source
target device. This descriptor is mandatory in the Device Identification
VPD page. Locate this descriptor in the returned VPD data so we don't
have to do lookups for every copy command.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe6bf98..190dca4a8494 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1024,6 +1024,62 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, unsigned char *buf,
EXPORT_SYMBOL_GPL(scsi_get_vpd_page);
/**
+ * scsi_lookup_naa - Lookup NAA descriptor in VPD page 0x83
+ * @sdev: The device to ask
+ *
+ * Copy offloading requires us to know the NAA descriptor for both
+ * source and target device. This descriptor is mandatory in the Device
+ * Identification VPD page. Locate this descriptor in the returned VPD
+ * data so we don't have to do lookups for every copy command.
+ */
+static void scsi_lookup_naa(struct scsi_device *sdev)
+{
+ unsigned char *buf = sdev->vpd_pg83;
+ unsigned int len = sdev->vpd_pg83_len;
+
+ if (buf[1] != 0x83 || get_unaligned_be16(&buf[2]) == 0) {
+ sdev_printk(KERN_ERR, sdev,
+ "%s: VPD page 0x83 contains no descriptors\n",
+ __func__);
+ return;
+ }
+
+ buf += 4;
+ len -= 4;
+
+ do {
+ unsigned int desig_len = buf[3] + 4;
+
+ /* Binary code set */
+ if ((buf[0] & 0xf) != 1)
+ goto skip;
+
+ /* Target association */
+ if ((buf[1] >> 4) & 0x3)
+ goto skip;
+
+ /* NAA designator */
+ if ((buf[1] & 0xf) != 0x3)
+ goto skip;
+
+ sdev->naa = buf;
+ sdev->naa_len = desig_len;
+
+ return;
+
+skip:
+ buf += desig_len;
+ len -= desig_len;
+
+ } while (len > 0);
+
+ sdev_printk(KERN_ERR, sdev,
+ "%s: VPD page 0x83 NAA descriptor not found\n", __func__);
+
+ return;
+}
+
+/**
* scsi_attach_vpd - Attach Vital Product Data to a SCSI device structure
* @sdev: The device to ask
*
@@ -1107,6 +1163,10 @@ retry_pg83:
}
sdev->vpd_pg83_len = result;
sdev->vpd_pg83 = vpd_buf;
+
+ /* Lookup NAA if 3PC set in INQUIRY response */
+ if (sdev->inquiry_len >= 6 && sdev->inquiry[5] & (1 << 3))
+ scsi_lookup_naa(sdev);
}
}
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 5853c913d2b0..67bb70012802 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -119,6 +119,8 @@ struct scsi_device {
unsigned char *vpd_pg83;
int vpd_pg80_len;
unsigned char *vpd_pg80;
+ unsigned char naa_len;
+ unsigned char *naa;
unsigned char current_tag; /* current tag */
struct scsi_target *sdev_target; /* used only for single_lun */
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present
2014-06-03 1:00 ` Martin K. Petersen
@ 2014-06-03 9:13 ` Paolo Bonzini
0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2014-06-03 9:13 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: axboe, nab, linux-scsi
Il 03/06/2014 03:00, Martin K. Petersen ha scritto:
>>>>>> "Paolo" == Paolo Bonzini <pbonzini@redhat.com> writes:
>
>>> + sdev_printk(KERN_ERR, sdev,
>>> + "%s: VPD page 0x83 NAA descriptor not found\n", __func__);
>>> +
>>> + return;
>
> Paolo> I suspect this error will be relatively common.
>
> You're right. But we would like to see it for devices that actually
> implement copy offload. So I suggest the following tweak...
>
>
> [SCSI] Look up and store NAA if VPD page 0x83 is present
>
> Copy offloading requires us to know the NAA descriptor for both source
> target device. This descriptor is mandatory in the Device Identification
> VPD page. Locate this descriptor in the returned VPD data so we don't
> have to do lookups for every copy command.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 88d46fe6bf98..190dca4a8494 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -1024,6 +1024,62 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, unsigned char *buf,
> EXPORT_SYMBOL_GPL(scsi_get_vpd_page);
>
> /**
> + * scsi_lookup_naa - Lookup NAA descriptor in VPD page 0x83
> + * @sdev: The device to ask
> + *
> + * Copy offloading requires us to know the NAA descriptor for both
> + * source and target device. This descriptor is mandatory in the Device
> + * Identification VPD page. Locate this descriptor in the returned VPD
> + * data so we don't have to do lookups for every copy command.
> + */
> +static void scsi_lookup_naa(struct scsi_device *sdev)
> +{
> + unsigned char *buf = sdev->vpd_pg83;
> + unsigned int len = sdev->vpd_pg83_len;
> +
> + if (buf[1] != 0x83 || get_unaligned_be16(&buf[2]) == 0) {
> + sdev_printk(KERN_ERR, sdev,
> + "%s: VPD page 0x83 contains no descriptors\n",
> + __func__);
> + return;
> + }
> +
> + buf += 4;
> + len -= 4;
> +
> + do {
> + unsigned int desig_len = buf[3] + 4;
> +
> + /* Binary code set */
> + if ((buf[0] & 0xf) != 1)
> + goto skip;
> +
> + /* Target association */
> + if ((buf[1] >> 4) & 0x3)
> + goto skip;
> +
> + /* NAA designator */
> + if ((buf[1] & 0xf) != 0x3)
> + goto skip;
> +
> + sdev->naa = buf;
> + sdev->naa_len = desig_len;
> +
> + return;
> +
> +skip:
> + buf += desig_len;
> + len -= desig_len;
> +
> + } while (len > 0);
> +
> + sdev_printk(KERN_ERR, sdev,
> + "%s: VPD page 0x83 NAA descriptor not found\n", __func__);
> +
> + return;
> +}
> +
> +/**
> * scsi_attach_vpd - Attach Vital Product Data to a SCSI device structure
> * @sdev: The device to ask
> *
> @@ -1107,6 +1163,10 @@ retry_pg83:
> }
> sdev->vpd_pg83_len = result;
> sdev->vpd_pg83 = vpd_buf;
> +
> + /* Lookup NAA if 3PC set in INQUIRY response */
> + if (sdev->inquiry_len >= 6 && sdev->inquiry[5] & (1 << 3))
> + scsi_lookup_naa(sdev);
> }
> }
>
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 5853c913d2b0..67bb70012802 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -119,6 +119,8 @@ struct scsi_device {
> unsigned char *vpd_pg83;
> int vpd_pg80_len;
> unsigned char *vpd_pg80;
> + unsigned char naa_len;
> + unsigned char *naa;
> unsigned char current_tag; /* current tag */
> struct scsi_target *sdev_target; /* used only for single_lun */
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present
2014-05-29 3:52 ` [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present Martin K. Petersen
2014-06-02 20:43 ` Nicholas A. Bellinger
2014-06-02 20:59 ` Paolo Bonzini
@ 2014-07-17 11:48 ` Bart Van Assche
2014-07-17 15:43 ` Martin K. Petersen
2 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2014-07-17 11:48 UTC (permalink / raw)
To: Martin K. Petersen, axboe, nab, linux-scsi
On 05/29/14 05:52, Martin K. Petersen wrote:
> Copy offloading requires us to know the NAA descriptor for both source
> target device. This descriptor is mandatory in the Device Identification
> VPD page. Locate this descriptor in the returned VPD data so we don't
> have to do lookups for every copy command.
Hello Martin,
Sorry for the late reply but it's only now that I noticed this patch.
Are you sure that presence of a NAA descriptor is mandatory ? This is
what I found in SPC-4 r37 paragraph 7.8.6.2.1:
<quote>
At least one designation descriptor should have the DESIGNATOR TYPE
field set to:
a) 2h (i.e., EUI-64-based);
b) 3h (i.e., NAA); or
c) 8h (i.e., SCSI name string).
</quote>
I think this means that presence of one of these three types of
descriptors is sufficient in order to be compliant with SPC-4.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present
2014-07-17 11:48 ` Bart Van Assche
@ 2014-07-17 15:43 ` Martin K. Petersen
0 siblings, 0 replies; 20+ messages in thread
From: Martin K. Petersen @ 2014-07-17 15:43 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Martin K. Petersen, axboe, nab, linux-scsi
>>>>> "Bart" == Bart Van Assche <bvanassche@acm.org> writes:
Bart> Sorry for the late reply but it's only now that I noticed this
Bart> patch. Are you sure that presence of a NAA descriptor is
Bart> mandatory ? This is what I found in SPC-4 r37 paragraph 7.8.6.2.1:
Bart> <quote> At least one designation descriptor should have the
Bart> DESIGNATOR TYPE field set to:
Bart> a) 2h (i.e., EUI-64-based);
Bart> b) 3h (i.e., NAA); or
Bart> c) 8h (i.e., SCSI name string).
Bart> </quote>
Bart> I think this means that presence of one of these three types of
Bart> descriptors is sufficient in order to be compliant with SPC-4.
That's correct and bad wording on my part. It's not mandatory in the SPC
sense. But the subset of the copy offload spec that vendors have
generally agreed on makes NAA a requirement.
You have a good point, however, and I'll tweak the code to make sure we
support 2h and 8h as well.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 6/6] [SCSI] sd: Implement copy offload support
2014-05-29 3:52 Copy offload Martin K. Petersen
` (4 preceding siblings ...)
2014-05-29 3:52 ` [PATCH 5/6] [SCSI] Look up and store NAA if VPD page 0x83 is present Martin K. Petersen
@ 2014-05-29 3:52 ` Martin K. Petersen
2014-05-29 14:48 ` Douglas Gilbert
2014-06-02 20:46 ` Nicholas A. Bellinger
5 siblings, 2 replies; 20+ messages in thread
From: Martin K. Petersen @ 2014-05-29 3:52 UTC (permalink / raw)
To: axboe, nab, linux-scsi; +Cc: Martin K. Petersen
Implement support for hardware copy offload. This initial implementation
only supports EXTENDED COPY(LID1). If need be we can add support for
LID4 or token copy at a later date.
If a device has the 3PC flag set in the standard INQUIRY response we'll
issue a RECEIVE COPY OPERATION PARAMETERS command. We require the device
to support two copy source/copy destination descriptors and one block to
block (0x02) segment descriptor. The device must support the NAA
identification descriptor (0xE4). If the device is capable we'll set the
queue limits to indicate that the device supports copy offload.
The copy block range limit can be overridden in scsi_disk's
max_copy_block sysfs attribute.
sd_setup_copy_command() is used to prepare any REQ_COPY requests. The
relevant descriptors are placed in a payload page akin to REQ_DISCARD.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/scsi/sd.c | 254 ++++++++++++++++++++++++++++++++++++++++++++-
drivers/scsi/sd.h | 4 +
include/scsi/scsi_device.h | 1 +
3 files changed, 257 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 96af195224f2..071225f34d63 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -100,6 +100,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
static void sd_config_discard(struct scsi_disk *, unsigned int);
static void sd_config_write_same(struct scsi_disk *);
+static void sd_config_copy(struct scsi_disk *);
static int sd_revalidate_disk(struct gendisk *);
static void sd_unlock_native_capacity(struct gendisk *disk);
static int sd_probe(struct device *);
@@ -461,6 +462,48 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RW(max_write_same_blocks);
+static ssize_t
+max_copy_blocks_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct scsi_disk *sdkp = to_scsi_disk(dev);
+
+ return snprintf(buf, 20, "%u\n", sdkp->max_copy_blocks);
+}
+
+static ssize_t
+max_copy_blocks_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct scsi_disk *sdkp = to_scsi_disk(dev);
+ struct scsi_device *sdp = sdkp->device;
+ unsigned long max;
+ int err;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+
+ if (sdp->type != TYPE_DISK)
+ return -EINVAL;
+
+ err = kstrtoul(buf, 10, &max);
+
+ if (err)
+ return err;
+
+ if (max == 0)
+ sdp->no_copy = 1;
+ else if (max <= SD_MAX_COPY_BLOCKS) {
+ sdp->no_copy = 0;
+ sdkp->max_copy_blocks = max;
+ }
+
+ sd_config_copy(sdkp);
+
+ return count;
+}
+static DEVICE_ATTR_RW(max_copy_blocks);
+
static struct attribute *sd_disk_attrs[] = {
&dev_attr_cache_type.attr,
&dev_attr_FUA.attr,
@@ -472,6 +515,7 @@ static struct attribute *sd_disk_attrs[] = {
&dev_attr_thin_provisioning.attr,
&dev_attr_provisioning_mode.attr,
&dev_attr_max_write_same_blocks.attr,
+ &dev_attr_max_copy_blocks.attr,
&dev_attr_max_medium_access_timeouts.attr,
NULL,
};
@@ -826,6 +870,100 @@ static int sd_setup_write_same_cmnd(struct scsi_device *sdp, struct request *rq)
return ret;
}
+static void sd_config_copy(struct scsi_disk *sdkp)
+{
+ struct request_queue *q = sdkp->disk->queue;
+ unsigned int logical_block_size = sdkp->device->sector_size;
+
+ if (sdkp->device->no_copy)
+ sdkp->max_copy_blocks = 0;
+
+ /* Segment descriptor 0x02 has a 64k block limit */
+ sdkp->max_copy_blocks = min(sdkp->max_copy_blocks,
+ (u32)SD_MAX_CSD2_BLOCKS);
+
+ blk_queue_max_copy_sectors(q, sdkp->max_copy_blocks *
+ (logical_block_size >> 9));
+}
+
+static int sd_setup_copy_cmnd(struct scsi_device *sdp, struct request *rq)
+{
+ struct scsi_device *src_sdp, *dst_sdp;
+ sector_t src_lba, dst_lba;
+ unsigned int nr_blocks, buf_len, nr_bytes = blk_rq_bytes(rq);
+ int ret;
+ struct bio *bio = rq->bio;
+ struct bio_copy *bic = bio_copy(bio);
+ struct page *page;
+ unsigned char *buf;
+
+ if (!bic)
+ return BLKPREP_KILL;
+
+ dst_sdp = scsi_disk(rq->rq_disk)->device;
+ src_sdp = scsi_disk(bic->bic_bdev->bd_disk)->device;
+
+ if (src_sdp->no_copy || dst_sdp->no_copy)
+ return BLKPREP_KILL;
+
+ if (src_sdp->sector_size != dst_sdp->sector_size)
+ return BLKPREP_KILL;
+
+ dst_lba = blk_rq_pos(rq) >> (ilog2(dst_sdp->sector_size) - 9);
+ src_lba = bic->bic_sector >> (ilog2(src_sdp->sector_size) - 9);
+ nr_blocks = blk_rq_sectors(rq) >> (ilog2(dst_sdp->sector_size) - 9);
+
+ page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+ if (!page)
+ return BLKPREP_DEFER;
+
+ buf = page_address(page);
+
+ /* Extended Copy (LID1) Parameter List (16 bytes) */
+ buf[0] = 0; /* LID */
+ buf[1] = 3 << 3; /* LID usage 11b */
+ put_unaligned_be16(32 + 32, &buf[2]); /* 32 bytes per E4 desc. */
+ put_unaligned_be32(28, &buf[8]); /* 28 bytes per B2B desc. */
+ buf += 16;
+
+ /* Source CSCD (32 bytes) */
+ buf[0] = 0xe4; /* Identification desc. */
+ memcpy(&buf[4], src_sdp->naa, src_sdp->naa_len);
+ buf += 32;
+
+ /* Destination CSCD (32 bytes) */
+ buf[0] = 0xe4; /* Identification desc. */
+ memcpy(&buf[4], dst_sdp->naa, dst_sdp->naa_len);
+ buf += 32;
+
+ /* Segment descriptor (28 bytes) */
+ buf[0] = 0x02; /* Block to block desc. */
+ put_unaligned_be16(0x18, &buf[2]); /* Descriptor length */
+ put_unaligned_be16(0, &buf[4]); /* Source is desc. 0 */
+ put_unaligned_be16(1, &buf[6]); /* Dest. is desc. 1 */
+ put_unaligned_be16(nr_blocks, &buf[10]);
+ put_unaligned_be64(src_lba, &buf[12]);
+ put_unaligned_be64(dst_lba, &buf[20]);
+
+ /* CDB */
+ memset(rq->cmd, 0, rq->cmd_len);
+ rq->cmd[0] = EXTENDED_COPY;
+ rq->cmd[1] = 0; /* LID1 */
+ buf_len = 16 + 32 + 32 + 28;
+ put_unaligned_be32(buf_len, &rq->cmd[10]);
+ rq->timeout = SD_COPY_TIMEOUT;
+
+ rq->completion_data = page;
+ blk_add_request_payload(rq, page, buf_len);
+ ret = scsi_setup_blk_pc_cmnd(sdp, rq);
+ rq->__data_len = nr_bytes;
+
+ if (ret != BLKPREP_OK)
+ __free_page(page);
+
+ return ret;
+}
+
static int scsi_setup_flush_cmnd(struct scsi_device *sdp, struct request *rq)
{
rq->timeout *= SD_FLUSH_TIMEOUT_MULTIPLIER;
@@ -840,7 +978,7 @@ static void sd_unprep_fn(struct request_queue *q, struct request *rq)
{
struct scsi_cmnd *SCpnt = rq->special;
- if (rq->cmd_flags & REQ_DISCARD)
+ if (rq->cmd_flags & (REQ_DISCARD | REQ_COPY))
__free_page(rq->completion_data);
if (SCpnt->cmnd != rq->cmd) {
@@ -880,6 +1018,9 @@ static int sd_prep_fn(struct request_queue *q, struct request *rq)
} else if (rq->cmd_flags & REQ_WRITE_SAME) {
ret = sd_setup_write_same_cmnd(sdp, rq);
goto out;
+ } else if (rq->cmd_flags & REQ_COPY) {
+ ret = sd_setup_copy_cmnd(sdp, rq);
+ goto out;
} else if (rq->cmd_flags & REQ_FLUSH) {
ret = scsi_setup_flush_cmnd(sdp, rq);
goto out;
@@ -1660,7 +1801,8 @@ static int sd_done(struct scsi_cmnd *SCpnt)
unsigned char op = SCpnt->cmnd[0];
unsigned char unmap = SCpnt->cmnd[1] & 8;
- if (req->cmd_flags & REQ_DISCARD || req->cmd_flags & REQ_WRITE_SAME) {
+ if (req->cmd_flags & REQ_DISCARD || req->cmd_flags & REQ_WRITE_SAME ||
+ req->cmd_flags & REQ_COPY) {
if (!result) {
good_bytes = blk_rq_bytes(req);
scsi_set_resid(SCpnt, 0);
@@ -1719,6 +1861,14 @@ static int sd_done(struct scsi_cmnd *SCpnt)
/* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */
if (sshdr.asc == 0x20 || sshdr.asc == 0x24) {
switch (op) {
+ case EXTENDED_COPY:
+ sdkp->device->no_copy = 1;
+ sd_config_copy(sdkp);
+
+ good_bytes = 0;
+ req->__data_len = blk_rq_bytes(req);
+ req->cmd_flags |= REQ_QUIET;
+ break;
case UNMAP:
sd_config_discard(sdkp, SD_LBP_DISABLE);
break;
@@ -2687,6 +2837,105 @@ static void sd_read_write_same(struct scsi_disk *sdkp, unsigned char *buffer)
sdkp->ws10 = 1;
}
+static void sd_read_copy_operations(struct scsi_disk *sdkp,
+ unsigned char *buffer)
+{
+ struct scsi_device *sdev = sdkp->device;
+ struct scsi_sense_hdr sshdr;
+ unsigned char cdb[16];
+ unsigned int result, len, i;
+ bool b2b_desc = false, id_desc = false;
+
+ if (sdev->naa_len == 0)
+ return;
+
+ /* Verify that the device has 3PC set in INQUIRY response */
+ if (sdev->inquiry_len < 6 || (sdev->inquiry[5] & (1 << 3)) == 0)
+ return;
+
+ /* Receive Copy Operation Parameters */
+ memset(cdb, 0, 16);
+ cdb[0] = RECEIVE_COPY_RESULTS;
+ cdb[1] = 0x3;
+ put_unaligned_be32(SD_BUF_SIZE, &cdb[10]);
+
+ memset(buffer, 0, SD_BUF_SIZE);
+ result = scsi_execute_req(sdev, cdb, DMA_FROM_DEVICE,
+ buffer, SD_BUF_SIZE, &sshdr,
+ SD_TIMEOUT, SD_MAX_RETRIES, NULL);
+
+ if (!scsi_status_is_good(result)) {
+ sd_printk(KERN_ERR, sdkp,
+ "%s: Receive Copy Operating Parameters failed\n",
+ __func__);
+ return;
+ }
+
+ /* The RCOP response is a minimum of 44 bytes long. First 4
+ * bytes contain the length of the remaining buffer, i.e. 40+
+ * bytes. Trailing the defined fields is a list of supported
+ * descriptors. We need at least 2 descriptors to drive the
+ * target, hence 42.
+ */
+ len = get_unaligned_be32(&buffer[0]);
+ if (len < 42) {
+ sd_printk(KERN_ERR, sdkp, "%s: result too short (%u)\n",
+ __func__, len);
+ return;
+ }
+
+ if ((buffer[4] & 1) == 0) {
+ sd_printk(KERN_ERR, sdkp, "%s: does not support SNLID\n",
+ __func__);
+ return;
+ }
+
+ if (get_unaligned_be16(&buffer[8]) < 2) {
+ sd_printk(KERN_ERR, sdkp,
+ "%s: Need 2 or more CSCD descriptors\n", __func__);
+ return;
+ }
+
+ if (get_unaligned_be16(&buffer[10]) < 1) {
+ sd_printk(KERN_ERR, sdkp,
+ "%s: Need 1 or more segment descriptor\n", __func__);
+ return;
+ }
+
+ if (len - 40 != buffer[43]) {
+ sd_printk(KERN_ERR, sdkp,
+ "%s: Buffer len and descriptor count mismatch " \
+ "(%u vs. %u)\n", __func__, len - 40, buffer[43]);
+ return;
+ }
+
+ for (i = 44 ; i < len + 4 ; i++) {
+ if (buffer[i] == 0x02)
+ b2b_desc = true;
+
+ if (buffer[i] == 0xe4)
+ id_desc = true;
+ }
+
+ if (!b2b_desc) {
+ sd_printk(KERN_ERR, sdkp,
+ "%s: No block 2 block descriptor (0x02)\n",
+ __func__);
+ return;
+ }
+
+ if (!id_desc) {
+ sd_printk(KERN_ERR, sdkp,
+ "%s: No identification descriptor (0xE4)\n",
+ __func__);
+ return;
+ }
+
+ sdkp->max_copy_blocks = get_unaligned_be32(&buffer[16])
+ >> ilog2(sdev->sector_size);
+ sd_config_copy(sdkp);
+}
+
static int sd_try_extended_inquiry(struct scsi_device *sdp)
{
/*
@@ -2747,6 +2996,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
sd_read_cache_type(sdkp, buffer);
sd_read_app_tag_own(sdkp, buffer);
sd_read_write_same(sdkp, buffer);
+ sd_read_copy_operations(sdkp, buffer);
}
sdkp->first_scan = 0;
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 620871efbf0a..e7345c552197 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -19,6 +19,7 @@
*/
#define SD_FLUSH_TIMEOUT_MULTIPLIER 2
#define SD_WRITE_SAME_TIMEOUT (120 * HZ)
+#define SD_COPY_TIMEOUT (120 * HZ)
/*
* Number of allowed retries
@@ -46,6 +47,8 @@ enum {
enum {
SD_MAX_WS10_BLOCKS = 0xffff,
SD_MAX_WS16_BLOCKS = 0x7fffff,
+ SD_MAX_CSD2_BLOCKS = 0xffff,
+ SD_MAX_COPY_BLOCKS = 0x7fffff,
};
enum {
@@ -66,6 +69,7 @@ struct scsi_disk {
sector_t capacity; /* size in 512-byte sectors */
u32 max_ws_blocks;
u32 max_unmap_blocks;
+ u32 max_copy_blocks;
u32 unmap_granularity;
u32 unmap_alignment;
u32 index;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 67bb70012802..f0a3a3e861e8 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -153,6 +153,7 @@ struct scsi_device {
unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
unsigned no_report_opcodes:1; /* no REPORT SUPPORTED OPERATION CODES */
unsigned no_write_same:1; /* no WRITE SAME command */
+ unsigned no_copy:1; /* no copy offload */
unsigned use_16_for_rw:1; /* Use read/write(16) over read/write(10) */
unsigned skip_ms_page_8:1; /* do not use MODE SENSE page 0x08 */
unsigned skip_ms_page_3f:1; /* do not use MODE SENSE page 0x3f */
--
1.9.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 6/6] [SCSI] sd: Implement copy offload support
2014-05-29 3:52 ` [PATCH 6/6] [SCSI] sd: Implement copy offload support Martin K. Petersen
@ 2014-05-29 14:48 ` Douglas Gilbert
2014-05-30 0:05 ` Martin K. Petersen
2014-06-02 20:46 ` Nicholas A. Bellinger
1 sibling, 1 reply; 20+ messages in thread
From: Douglas Gilbert @ 2014-05-29 14:48 UTC (permalink / raw)
To: Martin K. Petersen, axboe, nab, linux-scsi
On 14-05-28 11:52 PM, Martin K. Petersen wrote:
> Implement support for hardware copy offload. This initial implementation
> only supports EXTENDED COPY(LID1). If need be we can add support for
> LID4 or token copy at a later date.
>
> If a device has the 3PC flag set in the standard INQUIRY response we'll
> issue a RECEIVE COPY OPERATION PARAMETERS command. We require the device
> to support two copy source/copy destination descriptors and one block to
> block (0x02) segment descriptor. The device must support the NAA
> identification descriptor (0xE4). If the device is capable we'll set the
> queue limits to indicate that the device supports copy offload.
SPC-4 has downgraded the RECEIVE COPY OPERATION PARAMETERS command
in favour of the new Third Party Copy (TPC) VPD page [0x8f]. If the
latter is present, it should be used. Most real world implementations
of XCOPY(LID1) comply (loosely) with SPC-2 and SPC-3 since VMWare's
VAAI is the effective "standard".
Further I noticed some ODX ** implementations which do use that TPC
VPD page, do not implement its (SPC-4 defined) mandatory descriptors
(0x1 and 0x8001) needed to support XCOPY(LID1). Perhaps there may be
some convergence after SPC-4 becomes a standard.
In any case, this patch set looks good.
May I also point out the NAB's target subsystem has a block to
block XCOPY(LID1) implementation. So it can be used to test
this patch set. The standards require copies between LUs in
the same target to be supported (if xcopy is supported),
otherwise support is optional (i.e. between LUs in different
targets). Hence most implementations restrict the LUs to being
in the same target.
Doug Gilbert
** ODX is MS's name for the subset of XCOPY(LID4) defined in
SBC-3 using the POPULATE TOKEN and WRITE USING TOKEN
commands. So this is a "token copy" referred to above.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 6/6] [SCSI] sd: Implement copy offload support
2014-05-29 14:48 ` Douglas Gilbert
@ 2014-05-30 0:05 ` Martin K. Petersen
0 siblings, 0 replies; 20+ messages in thread
From: Martin K. Petersen @ 2014-05-30 0:05 UTC (permalink / raw)
To: Douglas Gilbert; +Cc: Martin K. Petersen, axboe, nab, linux-scsi
>>>>> "Doug" == Douglas Gilbert <dgilbert@interlog.com> writes:
Doug,
Doug> SPC-4 has downgraded the RECEIVE COPY OPERATION PARAMETERS command
Doug> in favour of the new Third Party Copy (TPC) VPD page [0x8f]. If
Doug> the latter is present, it should be used. Most real world
Doug> implementations of XCOPY(LID1) comply (loosely) with SPC-2 and
Doug> SPC-3 since VMWare's VAAI is the effective "standard".
Indeed. None of my test targets supported the TPC VPD page (or only
parts of it) so that's stuck on my todo list for now. But my intent is
to check the VPD first and if it's not present (and 3PC is set) fall
back to RCOP.
I'd also like to add support for multiple segment descriptors. Another
item that's sparsely supported by the devices out there.
Doug> May I also point out the NAB's target subsystem has a block to
Doug> block XCOPY(LID1) implementation. So it can be used to test this
Doug> patch set.
Yep. That's what I used for development.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 6/6] [SCSI] sd: Implement copy offload support
2014-05-29 3:52 ` [PATCH 6/6] [SCSI] sd: Implement copy offload support Martin K. Petersen
2014-05-29 14:48 ` Douglas Gilbert
@ 2014-06-02 20:46 ` Nicholas A. Bellinger
1 sibling, 0 replies; 20+ messages in thread
From: Nicholas A. Bellinger @ 2014-06-02 20:46 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: axboe, nab, linux-scsi
On Wed, 2014-05-28 at 23:52 -0400, Martin K. Petersen wrote:
> Implement support for hardware copy offload. This initial implementation
> only supports EXTENDED COPY(LID1). If need be we can add support for
> LID4 or token copy at a later date.
>
> If a device has the 3PC flag set in the standard INQUIRY response we'll
> issue a RECEIVE COPY OPERATION PARAMETERS command. We require the device
> to support two copy source/copy destination descriptors and one block to
> block (0x02) segment descriptor. The device must support the NAA
> identification descriptor (0xE4). If the device is capable we'll set the
> queue limits to indicate that the device supports copy offload.
>
> The copy block range limit can be overridden in scsi_disk's
> max_copy_block sysfs attribute.
>
> sd_setup_copy_command() is used to prepare any REQ_COPY requests. The
> relevant descriptors are placed in a payload page akin to REQ_DISCARD.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> drivers/scsi/sd.c | 254 ++++++++++++++++++++++++++++++++++++++++++++-
> drivers/scsi/sd.h | 4 +
> include/scsi/scsi_device.h | 1 +
> 3 files changed, 257 insertions(+), 2 deletions(-)
>
Looks good. Nice work. ;)
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
^ permalink raw reply [flat|nested] 20+ messages in thread