* [PATCH 1/8] block: Clean up special command handling logic
2012-09-18 16:19 Discard merging / Write same support v5 Martin K. Petersen
@ 2012-09-18 16:19 ` Martin K. Petersen
2012-09-18 16:19 ` [PATCH 2/8] block: Consolidate command flag and queue limit checks for merges Martin K. Petersen
` (7 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Martin K. Petersen @ 2012-09-18 16:19 UTC (permalink / raw)
To: linux-scsi; +Cc: axboe, shli, snitzer, Martin K. Petersen
From: "Martin K. Petersen" <martin.petersen@oracle.com>
Remove special-casing of non-rw fs style requests (discard). The nomerge
flags are consolidated in blk_types.h, and rq_mergeable() and
bio_mergeable() have been modified to use them.
bio_is_rw() is used in place of bio_has_data() a few places. This is
done to to distinguish true reads and writes from other fs type requests
that carry a payload (e.g. write same).
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-core.c | 13 ++++++-------
block/blk-merge.c | 22 +---------------------
block/blk.h | 5 ++---
block/elevator.c | 6 ++----
include/linux/bio.h | 23 +++++++++++++++++++++--
include/linux/blk_types.h | 4 ++++
include/linux/blkdev.h | 22 ++++++++++------------
7 files changed, 46 insertions(+), 49 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 2d739ca..5cc2929 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1657,8 +1657,8 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
- if (unlikely(!(bio->bi_rw & REQ_DISCARD) &&
- nr_sectors > queue_max_hw_sectors(q))) {
+ if (likely(bio_is_rw(bio) &&
+ nr_sectors > queue_max_hw_sectors(q))) {
printk(KERN_ERR "bio too big device %s (%u > %u)\n",
bdevname(bio->bi_bdev, b),
bio_sectors(bio),
@@ -1699,8 +1699,7 @@ generic_make_request_checks(struct bio *bio)
if ((bio->bi_rw & REQ_DISCARD) &&
(!blk_queue_discard(q) ||
- ((bio->bi_rw & REQ_SECURE) &&
- !blk_queue_secdiscard(q)))) {
+ ((bio->bi_rw & REQ_SECURE) && !blk_queue_secdiscard(q)))) {
err = -EOPNOTSUPP;
goto end_io;
}
@@ -1818,7 +1817,7 @@ void submit_bio(int rw, struct bio *bio)
* If it's a regular read/write or a barrier with data attached,
* go through the normal accounting stuff before submission.
*/
- if (bio_has_data(bio) && !(rw & REQ_DISCARD)) {
+ if (bio_has_data(bio)) {
if (rw & WRITE) {
count_vm_events(PGPGOUT, count);
} else {
@@ -1864,7 +1863,7 @@ EXPORT_SYMBOL(submit_bio);
*/
int blk_rq_check_limits(struct request_queue *q, struct request *rq)
{
- if (rq->cmd_flags & REQ_DISCARD)
+ if (!rq_mergeable(rq))
return 0;
if (blk_rq_sectors(rq) > queue_max_sectors(q) ||
@@ -2338,7 +2337,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
req->buffer = bio_data(req->bio);
/* update sector only for requests with clear definition of sector */
- if (req->cmd_type == REQ_TYPE_FS || (req->cmd_flags & REQ_DISCARD))
+ if (req->cmd_type == REQ_TYPE_FS)
req->__sector += total_bytes >> 9;
/* mixed attributes always follow the first bio */
diff --git a/block/blk-merge.c b/block/blk-merge.c
index e76279e..86710ca 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -418,18 +418,6 @@ static int attempt_merge(struct request_queue *q, struct request *req,
return 0;
/*
- * Don't merge file system requests and discard requests
- */
- if ((req->cmd_flags & REQ_DISCARD) != (next->cmd_flags & REQ_DISCARD))
- return 0;
-
- /*
- * Don't merge discard requests and secure discard requests
- */
- if ((req->cmd_flags & REQ_SECURE) != (next->cmd_flags & REQ_SECURE))
- return 0;
-
- /*
* not contiguous
*/
if (blk_rq_pos(req) + blk_rq_sectors(req) != blk_rq_pos(next))
@@ -521,15 +509,7 @@ int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
{
- if (!rq_mergeable(rq))
- return false;
-
- /* don't merge file system requests and discard requests */
- if ((bio->bi_rw & REQ_DISCARD) != (rq->bio->bi_rw & REQ_DISCARD))
- return false;
-
- /* don't merge discard requests and secure discard requests */
- if ((bio->bi_rw & REQ_SECURE) != (rq->bio->bi_rw & REQ_SECURE))
+ if (!rq_mergeable(rq) || !bio_mergeable(bio))
return false;
/* different data direction or already started, don't merge */
diff --git a/block/blk.h b/block/blk.h
index 2a0ea32..ca51543 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -171,14 +171,13 @@ static inline int queue_congestion_off_threshold(struct request_queue *q)
*
* a) it's attached to a gendisk, and
* b) the queue had IO stats enabled when this request was started, and
- * c) it's a file system request or a discard request
+ * c) it's a file system request
*/
static inline int blk_do_io_stat(struct request *rq)
{
return rq->rq_disk &&
(rq->cmd_flags & REQ_IO_STAT) &&
- (rq->cmd_type == REQ_TYPE_FS ||
- (rq->cmd_flags & REQ_DISCARD));
+ (rq->cmd_type == REQ_TYPE_FS);
}
/*
diff --git a/block/elevator.c b/block/elevator.c
index 6a55d41..9b1d42b 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -562,8 +562,7 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
if (rq->cmd_flags & REQ_SOFTBARRIER) {
/* barriers are scheduling boundary, update end_sector */
- if (rq->cmd_type == REQ_TYPE_FS ||
- (rq->cmd_flags & REQ_DISCARD)) {
+ if (rq->cmd_type == REQ_TYPE_FS) {
q->end_sector = rq_end_sector(rq);
q->boundary_rq = rq;
}
@@ -605,8 +604,7 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
if (elv_attempt_insert_merge(q, rq))
break;
case ELEVATOR_INSERT_SORT:
- BUG_ON(rq->cmd_type != REQ_TYPE_FS &&
- !(rq->cmd_flags & REQ_DISCARD));
+ BUG_ON(rq->cmd_type != REQ_TYPE_FS);
rq->cmd_flags |= REQ_SORTED;
q->nr_sorted++;
if (rq_mergeable(rq)) {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 52b9cbc..e54305c 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -386,9 +386,28 @@ static inline char *__bio_kmap_irq(struct bio *bio, unsigned short idx,
/*
* Check whether this bio carries any data or not. A NULL bio is allowed.
*/
-static inline int bio_has_data(struct bio *bio)
+static inline bool bio_has_data(struct bio *bio)
{
- return bio && bio->bi_io_vec != NULL;
+ if (bio && bio->bi_vcnt)
+ return true;
+
+ return false;
+}
+
+static inline bool bio_is_rw(struct bio *bio)
+{
+ if (!bio_has_data(bio))
+ return false;
+
+ return true;
+}
+
+static inline bool bio_mergeable(struct bio *bio)
+{
+ if (bio->bi_rw & REQ_NOMERGE_FLAGS)
+ return false;
+
+ return true;
}
/*
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 3eefbb2..1b22966 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -194,6 +194,10 @@ enum rq_flag_bits {
REQ_DISCARD | REQ_NOIDLE | REQ_FLUSH | REQ_FUA | REQ_SECURE)
#define REQ_CLONE_MASK REQ_COMMON_MASK
+/* This mask is used for both bio and request merge checking */
+#define REQ_NOMERGE_FLAGS \
+ (REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA)
+
#define REQ_RAHEAD (1 << __REQ_RAHEAD)
#define REQ_THROTTLED (1 << __REQ_THROTTLED)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4a2ab7c..3a6fea7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -540,8 +540,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_account_rq(rq) \
(((rq)->cmd_flags & REQ_STARTED) && \
- ((rq)->cmd_type == REQ_TYPE_FS || \
- ((rq)->cmd_flags & REQ_DISCARD)))
+ ((rq)->cmd_type == REQ_TYPE_FS))
#define blk_pm_request(rq) \
((rq)->cmd_type == REQ_TYPE_PM_SUSPEND || \
@@ -595,17 +594,16 @@ static inline void blk_clear_rl_full(struct request_list *rl, bool sync)
rl->flags &= ~flag;
}
+static inline bool rq_mergeable(struct request *rq)
+{
+ if (rq->cmd_type != REQ_TYPE_FS)
+ return false;
-/*
- * mergeable request must not have _NOMERGE or _BARRIER bit set, nor may
- * it already be started by driver.
- */
-#define RQ_NOMERGE_FLAGS \
- (REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA | REQ_DISCARD)
-#define rq_mergeable(rq) \
- (!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
- (((rq)->cmd_flags & REQ_DISCARD) || \
- (rq)->cmd_type == REQ_TYPE_FS))
+ if (rq->cmd_flags & REQ_NOMERGE_FLAGS)
+ return false;
+
+ return true;
+}
/*
* q->prep_rq_fn return values
--
1.7.7.6
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/8] block: Consolidate command flag and queue limit checks for merges
2012-09-18 16:19 Discard merging / Write same support v5 Martin K. Petersen
2012-09-18 16:19 ` [PATCH 1/8] block: Clean up special command handling logic Martin K. Petersen
@ 2012-09-18 16:19 ` Martin K. Petersen
2012-09-18 17:59 ` Mike Snitzer
2012-09-18 16:19 ` [PATCH 3/8] block: Implement support for WRITE SAME Martin K. Petersen
` (6 subsequent siblings)
8 siblings, 1 reply; 14+ messages in thread
From: Martin K. Petersen @ 2012-09-18 16:19 UTC (permalink / raw)
To: linux-scsi; +Cc: axboe, shli, snitzer, Martin K. Petersen
From: "Martin K. Petersen" <martin.petersen@oracle.com>
- blk_check_merge_flags() verifies that cmd_flags / bi_rw are
compatible. This function is called for both req-req and req-bio
merging.
- blk_rq_get_max_sectors() and blk_queue_get_max_sectors() can be used
to query the maximum sector count for a given request or queue. The
calls will return the right value from the queue limits given the
type of command (RW, discard, write same, etc.)
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/blk-core.c | 3 +--
block/blk-merge.c | 30 ++++++++++++------------------
include/linux/blkdev.h | 31 +++++++++++++++++++++++++++++++
3 files changed, 44 insertions(+), 20 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 5cc2929..33eded0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1866,8 +1866,7 @@ int blk_rq_check_limits(struct request_queue *q, struct request *rq)
if (!rq_mergeable(rq))
return 0;
- if (blk_rq_sectors(rq) > queue_max_sectors(q) ||
- blk_rq_bytes(rq) > queue_max_hw_sectors(q) << 9) {
+ if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
printk(KERN_ERR "%s: over max size limit.\n", __func__);
return -EIO;
}
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 86710ca..642b862 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -275,14 +275,8 @@ no_merge:
int ll_back_merge_fn(struct request_queue *q, struct request *req,
struct bio *bio)
{
- unsigned short max_sectors;
-
- if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
- max_sectors = queue_max_hw_sectors(q);
- else
- max_sectors = queue_max_sectors(q);
-
- if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
+ if (blk_rq_sectors(req) + bio_sectors(bio) >
+ blk_rq_get_max_sectors(req)) {
req->cmd_flags |= REQ_NOMERGE;
if (req == q->last_merge)
q->last_merge = NULL;
@@ -299,15 +293,8 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req,
int ll_front_merge_fn(struct request_queue *q, struct request *req,
struct bio *bio)
{
- unsigned short max_sectors;
-
- if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
- max_sectors = queue_max_hw_sectors(q);
- else
- max_sectors = queue_max_sectors(q);
-
-
- if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
+ if (blk_rq_sectors(req) + bio_sectors(bio) >
+ blk_rq_get_max_sectors(req)) {
req->cmd_flags |= REQ_NOMERGE;
if (req == q->last_merge)
q->last_merge = NULL;
@@ -338,7 +325,8 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
/*
* Will it become too large?
*/
- if ((blk_rq_sectors(req) + blk_rq_sectors(next)) > queue_max_sectors(q))
+ if ((blk_rq_sectors(req) + blk_rq_sectors(next)) >
+ blk_rq_get_max_sectors(req))
return 0;
total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
@@ -417,6 +405,9 @@ static int attempt_merge(struct request_queue *q, struct request *req,
if (!rq_mergeable(req) || !rq_mergeable(next))
return 0;
+ if (!blk_check_merge_flags(req->cmd_flags, next->cmd_flags))
+ return 0;
+
/*
* not contiguous
*/
@@ -512,6 +503,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
if (!rq_mergeable(rq) || !bio_mergeable(bio))
return false;
+ if (!blk_check_merge_flags(rq->cmd_flags, bio->bi_rw))
+ return false;
+
/* different data direction or already started, don't merge */
if (bio_data_dir(bio) != rq_data_dir(rq))
return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3a6fea7..90f7abe 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -605,6 +605,18 @@ static inline bool rq_mergeable(struct request *rq)
return true;
}
+static inline bool blk_check_merge_flags(unsigned int flags1,
+ unsigned int flags2)
+{
+ if ((flags1 & REQ_DISCARD) != (flags2 & REQ_DISCARD))
+ return false;
+
+ if ((flags1 & REQ_SECURE) != (flags2 & REQ_SECURE))
+ return false;
+
+ return true;
+}
+
/*
* q->prep_rq_fn return values
*/
@@ -800,6 +812,25 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq)
return blk_rq_cur_bytes(rq) >> 9;
}
+static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
+ unsigned int cmd_flags)
+{
+ if (unlikely(cmd_flags & REQ_DISCARD))
+ return q->limits.max_discard_sectors;
+
+ return q->limits.max_sectors;
+}
+
+static inline unsigned int blk_rq_get_max_sectors(struct request *rq)
+{
+ struct request_queue *q = rq->q;
+
+ if (unlikely(rq->cmd_type == REQ_TYPE_BLOCK_PC))
+ return q->limits.max_hw_sectors;
+
+ return blk_queue_get_max_sectors(q, rq->cmd_flags);
+}
+
/*
* Request issue related functions.
*/
--
1.7.7.6
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/8] block: Consolidate command flag and queue limit checks for merges
2012-09-18 16:19 ` [PATCH 2/8] block: Consolidate command flag and queue limit checks for merges Martin K. Petersen
@ 2012-09-18 17:59 ` Mike Snitzer
0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2012-09-18 17:59 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: linux-scsi, axboe, shli
On Tue, Sep 18 2012 at 12:19pm -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:
> From: "Martin K. Petersen" <martin.petersen@oracle.com>
>
> - blk_check_merge_flags() verifies that cmd_flags / bi_rw are
> compatible. This function is called for both req-req and req-bio
> merging.
>
> - blk_rq_get_max_sectors() and blk_queue_get_max_sectors() can be used
> to query the maximum sector count for a given request or queue. The
> calls will return the right value from the queue limits given the
> type of command (RW, discard, write same, etc.)
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> block/blk-core.c | 3 +--
> block/blk-merge.c | 30 ++++++++++++------------------
> include/linux/blkdev.h | 31 +++++++++++++++++++++++++++++++
> 3 files changed, 44 insertions(+), 20 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5cc2929..33eded0 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1866,8 +1866,7 @@ int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> if (!rq_mergeable(rq))
> return 0;
>
> - if (blk_rq_sectors(rq) > queue_max_sectors(q) ||
> - blk_rq_bytes(rq) > queue_max_hw_sectors(q) << 9) {
> + if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
> printk(KERN_ERR "%s: over max size limit.\n", __func__);
> return -EIO;
> }
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 86710ca..642b862 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -275,14 +275,8 @@ no_merge:
> int ll_back_merge_fn(struct request_queue *q, struct request *req,
> struct bio *bio)
> {
> - unsigned short max_sectors;
> -
> - if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
> - max_sectors = queue_max_hw_sectors(q);
> - else
> - max_sectors = queue_max_sectors(q);
> -
> - if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
> + if (blk_rq_sectors(req) + bio_sectors(bio) >
> + blk_rq_get_max_sectors(req)) {
> req->cmd_flags |= REQ_NOMERGE;
> if (req == q->last_merge)
> q->last_merge = NULL;
> @@ -299,15 +293,8 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req,
> int ll_front_merge_fn(struct request_queue *q, struct request *req,
> struct bio *bio)
> {
> - unsigned short max_sectors;
> -
> - if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
> - max_sectors = queue_max_hw_sectors(q);
> - else
> - max_sectors = queue_max_sectors(q);
> -
> -
> - if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
> + if (blk_rq_sectors(req) + bio_sectors(bio) >
> + blk_rq_get_max_sectors(req)) {
> req->cmd_flags |= REQ_NOMERGE;
> if (req == q->last_merge)
> q->last_merge = NULL;
> @@ -338,7 +325,8 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
> /*
> * Will it become too large?
> */
> - if ((blk_rq_sectors(req) + blk_rq_sectors(next)) > queue_max_sectors(q))
> + if ((blk_rq_sectors(req) + blk_rq_sectors(next)) >
> + blk_rq_get_max_sectors(req))
> return 0;
>
> total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
> @@ -417,6 +405,9 @@ static int attempt_merge(struct request_queue *q, struct request *req,
> if (!rq_mergeable(req) || !rq_mergeable(next))
> return 0;
>
> + if (!blk_check_merge_flags(req->cmd_flags, next->cmd_flags))
> + return 0;
> +
> /*
> * not contiguous
> */
> @@ -512,6 +503,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
> if (!rq_mergeable(rq) || !bio_mergeable(bio))
> return false;
>
> + if (!blk_check_merge_flags(rq->cmd_flags, bio->bi_rw))
> + return false;
> +
> /* different data direction or already started, don't merge */
> if (bio_data_dir(bio) != rq_data_dir(rq))
> return false;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 3a6fea7..90f7abe 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -605,6 +605,18 @@ static inline bool rq_mergeable(struct request *rq)
> return true;
> }
>
> +static inline bool blk_check_merge_flags(unsigned int flags1,
> + unsigned int flags2)
> +{
> + if ((flags1 & REQ_DISCARD) != (flags2 & REQ_DISCARD))
> + return false;
> +
> + if ((flags1 & REQ_SECURE) != (flags2 & REQ_SECURE))
> + return false;
> +
> + return true;
> +}
> +
> /*
> * q->prep_rq_fn return values
> */
> @@ -800,6 +812,25 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq)
> return blk_rq_cur_bytes(rq) >> 9;
> }
>
> +static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
> + unsigned int cmd_flags)
> +{
> + if (unlikely(cmd_flags & REQ_DISCARD))
> + return q->limits.max_discard_sectors;
> +
> + return q->limits.max_sectors;
Any reason not to use queue_max_sectors() here?
> +}
> +
> +static inline unsigned int blk_rq_get_max_sectors(struct request *rq)
> +{
> + struct request_queue *q = rq->q;
> +
> + if (unlikely(rq->cmd_type == REQ_TYPE_BLOCK_PC))
> + return q->limits.max_hw_sectors;
Or queue_max_hw_sectors() here?
> + return blk_queue_get_max_sectors(q, rq->cmd_flags);
> +}
> +
But I don't feel that strongly on these nits. So regardless:
Acked-by: Mike Snitzer <snitzer@redhat.com>
(thanks for getting this patchset together Martin)
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/8] block: Implement support for WRITE SAME
2012-09-18 16:19 Discard merging / Write same support v5 Martin K. Petersen
2012-09-18 16:19 ` [PATCH 1/8] block: Clean up special command handling logic Martin K. Petersen
2012-09-18 16:19 ` [PATCH 2/8] block: Consolidate command flag and queue limit checks for merges Martin K. Petersen
@ 2012-09-18 16:19 ` Martin K. Petersen
2012-09-18 16:19 ` [PATCH 4/8] block: Make blkdev_issue_zeroout use " Martin K. Petersen
` (5 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Martin K. Petersen @ 2012-09-18 16:19 UTC (permalink / raw)
To: linux-scsi; +Cc: axboe, shli, snitzer, Martin K. Petersen
From: "Martin K. Petersen" <martin.petersen@oracle.com>
The WRITE SAME command supported on some SCSI devices allows the same
block to be efficiently replicated throughout a block range. Only a
single logical block is transferred from the host and the storage device
writes the same data to all blocks described by the I/O.
This patch implements support for WRITE SAME in the block layer. The
blkdev_issue_write_same() function can be used by filesystems and block
drivers to replicate a buffer across a block range. This can be used to
efficiently initialize software RAID devices, etc.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
---
Documentation/ABI/testing/sysfs-block | 14 ++++++
block/blk-core.c | 14 +++++-
block/blk-lib.c | 74 +++++++++++++++++++++++++++++++++
block/blk-merge.c | 9 ++++
block/blk-settings.c | 16 +++++++
block/blk-sysfs.c | 13 ++++++
drivers/md/raid0.c | 1 +
fs/bio.c | 9 +++-
include/linux/bio.h | 3 +
include/linux/blk_types.h | 5 ++-
include/linux/blkdev.h | 29 +++++++++++++
11 files changed, 181 insertions(+), 6 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index c1eb41c..279da08 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -206,3 +206,17 @@ Description:
when a discarded area is read the discard_zeroes_data
parameter will be set to one. Otherwise it will be 0 and
the result of reading a discarded area is undefined.
+
+What: /sys/block/<disk>/queue/write_same_max_bytes
+Date: January 2012
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Some devices support a write same operation in which a
+ single data block can be written to a range of several
+ contiguous blocks on storage. This can be used to wipe
+ areas on disk or to initialize drives in a RAID
+ configuration. write_same_max_bytes indicates how many
+ bytes can be written in a single write same command. If
+ write_same_max_bytes is 0, write same is not supported
+ by the device.
+
diff --git a/block/blk-core.c b/block/blk-core.c
index 33eded0..3b08054 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1704,6 +1704,11 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
+ if (bio->bi_rw & REQ_WRITE_SAME && !bdev_write_same(bio->bi_bdev)) {
+ err = -EOPNOTSUPP;
+ goto end_io;
+ }
+
/*
* Various block parts want %current->io_context and lazy ioc
* allocation ends up trading a lot of pain for a small amount of
@@ -1809,8 +1814,6 @@ EXPORT_SYMBOL(generic_make_request);
*/
void submit_bio(int rw, struct bio *bio)
{
- int count = bio_sectors(bio);
-
bio->bi_rw |= rw;
/*
@@ -1818,6 +1821,13 @@ void submit_bio(int rw, struct bio *bio)
* go through the normal accounting stuff before submission.
*/
if (bio_has_data(bio)) {
+ unsigned int count;
+
+ if (unlikely(rw & REQ_WRITE_SAME))
+ count = bdev_logical_block_size(bio->bi_bdev) >> 9;
+ else
+ count = bio_sectors(bio);
+
if (rw & WRITE) {
count_vm_events(PGPGOUT, count);
} else {
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 19cc761..a062543 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -130,6 +130,80 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
EXPORT_SYMBOL(blkdev_issue_discard);
/**
+ * blkdev_issue_write_same - queue a write same operation
+ * @bdev: target blockdev
+ * @sector: start sector
+ * @nr_sects: number of sectors to write
+ * @gfp_mask: memory allocation flags (for bio_alloc)
+ * @page: page containing data to write
+ *
+ * Description:
+ * Issue a write same request for the sectors in question.
+ */
+int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, gfp_t gfp_mask,
+ struct page *page)
+{
+ DECLARE_COMPLETION_ONSTACK(wait);
+ struct request_queue *q = bdev_get_queue(bdev);
+ unsigned int max_write_same_sectors;
+ struct bio_batch bb;
+ struct bio *bio;
+ int ret = 0;
+
+ if (!q)
+ return -ENXIO;
+
+ max_write_same_sectors = q->limits.max_write_same_sectors;
+
+ if (max_write_same_sectors == 0)
+ return -EOPNOTSUPP;
+
+ atomic_set(&bb.done, 1);
+ bb.flags = 1 << BIO_UPTODATE;
+ bb.wait = &wait;
+
+ while (nr_sects) {
+ bio = bio_alloc(gfp_mask, 1);
+ if (!bio) {
+ ret = -ENOMEM;
+ break;
+ }
+
+ bio->bi_sector = sector;
+ bio->bi_end_io = bio_batch_end_io;
+ bio->bi_bdev = bdev;
+ bio->bi_private = &bb;
+ bio->bi_vcnt = 1;
+ bio->bi_io_vec->bv_page = page;
+ bio->bi_io_vec->bv_offset = 0;
+ bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev);
+
+ if (nr_sects > max_write_same_sectors) {
+ bio->bi_size = max_write_same_sectors << 9;
+ nr_sects -= max_write_same_sectors;
+ sector += max_write_same_sectors;
+ } else {
+ bio->bi_size = nr_sects << 9;
+ nr_sects = 0;
+ }
+
+ atomic_inc(&bb.done);
+ submit_bio(REQ_WRITE | REQ_WRITE_SAME, bio);
+ }
+
+ /* Wait for bios in-flight */
+ if (!atomic_dec_and_test(&bb.done))
+ wait_for_completion(&wait);
+
+ if (!test_bit(BIO_UPTODATE, &bb.flags))
+ ret = -ENOTSUPP;
+
+ return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_write_same);
+
+/**
* blkdev_issue_zeroout - generate number of zero filed write bios
* @bdev: blockdev to issue
* @sector: start sector
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 642b862..936a110 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -419,6 +419,10 @@ static int attempt_merge(struct request_queue *q, struct request *req,
|| next->special)
return 0;
+ if (req->cmd_flags & REQ_WRITE_SAME &&
+ !blk_write_same_mergeable(req->bio, next->bio))
+ return 0;
+
/*
* If we are allowed to merge, then append bio list
* from next to rq and release next. merge_requests_fn
@@ -518,6 +522,11 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
if (bio_integrity(bio) != blk_integrity_rq(rq))
return false;
+ /* must be using the same buffer */
+ if (rq->cmd_flags & REQ_WRITE_SAME &&
+ !blk_write_same_mergeable(rq->bio, bio))
+ return false;
+
return true;
}
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 565a678..779bb76 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -113,6 +113,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
lim->max_sectors = lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
+ lim->max_write_same_sectors = 0;
lim->max_discard_sectors = 0;
lim->discard_granularity = 0;
lim->discard_alignment = 0;
@@ -144,6 +145,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_segments = USHRT_MAX;
lim->max_hw_sectors = UINT_MAX;
lim->max_sectors = UINT_MAX;
+ lim->max_write_same_sectors = UINT_MAX;
}
EXPORT_SYMBOL(blk_set_stacking_limits);
@@ -286,6 +288,18 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
EXPORT_SYMBOL(blk_queue_max_discard_sectors);
/**
+ * blk_queue_max_write_same_sectors - set max sectors for a single write same
+ * @q: the request queue for the device
+ * @max_write_same_sectors: maximum number of sectors to write per command
+ **/
+void blk_queue_max_write_same_sectors(struct request_queue *q,
+ unsigned int max_write_same_sectors)
+{
+ q->limits.max_write_same_sectors = max_write_same_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_write_same_sectors);
+
+/**
* blk_queue_max_segments - set max hw segments for a request for this queue
* @q: the request queue for the device
* @max_segments: max number of segments
@@ -510,6 +524,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
+ t->max_write_same_sectors = min(t->max_write_same_sectors,
+ b->max_write_same_sectors);
t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ea51d82..247dbfd 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -180,6 +180,13 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
return queue_var_show(queue_discard_zeroes_data(q), page);
}
+static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
+{
+ return sprintf(page, "%llu\n",
+ (unsigned long long)q->limits.max_write_same_sectors << 9);
+}
+
+
static ssize_t
queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
{
@@ -385,6 +392,11 @@ static struct queue_sysfs_entry queue_discard_zeroes_data_entry = {
.show = queue_discard_zeroes_data_show,
};
+static struct queue_sysfs_entry queue_write_same_max_entry = {
+ .attr = {.name = "write_same_max_bytes", .mode = S_IRUGO },
+ .show = queue_write_same_max_show,
+};
+
static struct queue_sysfs_entry queue_nonrot_entry = {
.attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
.show = queue_show_nonrot,
@@ -432,6 +444,7 @@ static struct attribute *default_attrs[] = {
&queue_discard_granularity_entry.attr,
&queue_discard_max_entry.attr,
&queue_discard_zeroes_data_entry.attr,
+ &queue_write_same_max_entry.attr,
&queue_nonrot_entry.attr,
&queue_nomerges_entry.attr,
&queue_rq_affinity_entry.attr,
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index de63a1f..a9e4fa9 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -422,6 +422,7 @@ static int raid0_run(struct mddev *mddev)
if (md_check_no_bitmap(mddev))
return -EINVAL;
blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
+ blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors);
/* if private is not null, we are here after takeover */
if (mddev->private == NULL) {
diff --git a/fs/bio.c b/fs/bio.c
index 13e9567..f855e0e 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1487,9 +1487,12 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
bp->bv1 = bi->bi_io_vec[0];
bp->bv2 = bi->bi_io_vec[0];
- bp->bv2.bv_offset += first_sectors << 9;
- bp->bv2.bv_len -= first_sectors << 9;
- bp->bv1.bv_len = first_sectors << 9;
+
+ if (bio_is_rw(bi)) {
+ bp->bv2.bv_offset += first_sectors << 9;
+ bp->bv2.bv_len -= first_sectors << 9;
+ bp->bv1.bv_len = first_sectors << 9;
+ }
bp->bio1.bi_io_vec = &bp->bv1;
bp->bio2.bi_io_vec = &bp->bv2;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index e54305c..820e7aa 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -399,6 +399,9 @@ static inline bool bio_is_rw(struct bio *bio)
if (!bio_has_data(bio))
return false;
+ if (bio->bi_rw & REQ_WRITE_SAME)
+ return false;
+
return true;
}
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 1b22966..cdf1119 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -147,6 +147,7 @@ enum rq_flag_bits {
__REQ_PRIO, /* boost priority in cfq */
__REQ_DISCARD, /* request to discard sectors */
__REQ_SECURE, /* secure discard (used with __REQ_DISCARD) */
+ __REQ_WRITE_SAME, /* write same block many times */
__REQ_NOIDLE, /* don't anticipate more IO after this one */
__REQ_FUA, /* forced unit access */
@@ -185,13 +186,15 @@ enum rq_flag_bits {
#define REQ_META (1 << __REQ_META)
#define REQ_PRIO (1 << __REQ_PRIO)
#define REQ_DISCARD (1 << __REQ_DISCARD)
+#define REQ_WRITE_SAME (1 << __REQ_WRITE_SAME)
#define REQ_NOIDLE (1 << __REQ_NOIDLE)
#define REQ_FAILFAST_MASK \
(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
#define REQ_COMMON_MASK \
(REQ_WRITE | REQ_FAILFAST_MASK | REQ_SYNC | REQ_META | REQ_PRIO | \
- REQ_DISCARD | REQ_NOIDLE | REQ_FLUSH | REQ_FUA | REQ_SECURE)
+ REQ_DISCARD | REQ_WRITE_SAME | REQ_NOIDLE | REQ_FLUSH | REQ_FUA | \
+ REQ_SECURE)
#define REQ_CLONE_MASK REQ_COMMON_MASK
/* This mask is used for both bio and request merge checking */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 90f7abe..1756001 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -270,6 +270,7 @@ struct queue_limits {
unsigned int io_min;
unsigned int io_opt;
unsigned int max_discard_sectors;
+ unsigned int max_write_same_sectors;
unsigned int discard_granularity;
unsigned int discard_alignment;
@@ -614,9 +615,20 @@ static inline bool blk_check_merge_flags(unsigned int flags1,
if ((flags1 & REQ_SECURE) != (flags2 & REQ_SECURE))
return false;
+ if ((flags1 & REQ_WRITE_SAME) != (flags2 & REQ_WRITE_SAME))
+ return false;
+
return true;
}
+static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b)
+{
+ if (bio_data(a) == bio_data(b))
+ return true;
+
+ return false;
+}
+
/*
* q->prep_rq_fn return values
*/
@@ -818,6 +830,9 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
if (unlikely(cmd_flags & REQ_DISCARD))
return q->limits.max_discard_sectors;
+ if (unlikely(cmd_flags & REQ_WRITE_SAME))
+ return q->limits.max_write_same_sectors;
+
return q->limits.max_sectors;
}
@@ -886,6 +901,8 @@ extern void blk_queue_max_segments(struct request_queue *, unsigned short);
extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
extern void blk_queue_max_discard_sectors(struct request_queue *q,
unsigned int max_discard_sectors);
+extern void blk_queue_max_write_same_sectors(struct request_queue *q,
+ unsigned int max_write_same_sectors);
extern void blk_queue_logical_block_size(struct request_queue *, unsigned short);
extern void blk_queue_physical_block_size(struct request_queue *, unsigned int);
extern void blk_queue_alignment_offset(struct request_queue *q,
@@ -1016,6 +1033,8 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags);
+extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, gfp_t gfp_mask, struct page *page);
extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask);
static inline int sb_issue_discard(struct super_block *sb, sector_t block,
@@ -1193,6 +1212,16 @@ static inline unsigned int bdev_discard_zeroes_data(struct block_device *bdev)
return queue_discard_zeroes_data(bdev_get_queue(bdev));
}
+static inline unsigned int bdev_write_same(struct block_device *bdev)
+{
+ struct request_queue *q = bdev_get_queue(bdev);
+
+ if (q)
+ return q->limits.max_write_same_sectors;
+
+ return 0;
+}
+
static inline int queue_dma_alignment(struct request_queue *q)
{
return q ? q->dma_alignment : 511;
--
1.7.7.6
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/8] block: Make blkdev_issue_zeroout use WRITE SAME
2012-09-18 16:19 Discard merging / Write same support v5 Martin K. Petersen
` (2 preceding siblings ...)
2012-09-18 16:19 ` [PATCH 3/8] block: Implement support for WRITE SAME Martin K. Petersen
@ 2012-09-18 16:19 ` Martin K. Petersen
2012-09-18 16:19 ` [PATCH 5/8] block: ioctl to zero block ranges Martin K. Petersen
` (4 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Martin K. Petersen @ 2012-09-18 16:19 UTC (permalink / raw)
To: linux-scsi; +Cc: axboe, shli, snitzer, Martin K. Petersen
From: "Martin K. Petersen" <martin.petersen@oracle.com>
If the device supports WRITE SAME, use that to optimize zeroing of
blocks. If the device does not support WRITE SAME or if the operation
fails, fall back to writing zeroes the old-fashioned way.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-lib.c | 30 +++++++++++++++++++++++++++++-
1 files changed, 29 insertions(+), 1 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index a062543..9373b58 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -214,7 +214,7 @@ EXPORT_SYMBOL(blkdev_issue_write_same);
* Generate and issue number of bios with zerofiled pages.
*/
-int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
+int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask)
{
int ret;
@@ -264,4 +264,32 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
return ret;
}
+
+/**
+ * blkdev_issue_zeroout - zero-fill a block range
+ * @bdev: blockdev to write
+ * @sector: start sector
+ * @nr_sects: number of sectors to write
+ * @gfp_mask: memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ * Generate and issue number of bios with zerofiled pages.
+ */
+
+int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, gfp_t gfp_mask)
+{
+ if (bdev_write_same(bdev)) {
+ unsigned char bdn[BDEVNAME_SIZE];
+
+ if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
+ ZERO_PAGE(0)))
+ return 0;
+
+ bdevname(bdev, bdn);
+ pr_err("%s: WRITE SAME failed. Manually zeroing.\n", bdn);
+ }
+
+ return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
+}
EXPORT_SYMBOL(blkdev_issue_zeroout);
--
1.7.7.6
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/8] block: ioctl to zero block ranges
2012-09-18 16:19 Discard merging / Write same support v5 Martin K. Petersen
` (3 preceding siblings ...)
2012-09-18 16:19 ` [PATCH 4/8] block: Make blkdev_issue_zeroout use " Martin K. Petersen
@ 2012-09-18 16:19 ` Martin K. Petersen
2012-09-18 16:19 ` [PATCH 6/8] scsi: Add a report opcode helper Martin K. Petersen
` (3 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Martin K. Petersen @ 2012-09-18 16:19 UTC (permalink / raw)
To: linux-scsi; +Cc: axboe, shli, snitzer, Martin K. Petersen
From: "Martin K. Petersen" <martin.petersen@oracle.com>
Introduce a BLKZEROOUT ioctl which can be used to clear block ranges by
way of blkdev_issue_zeroout().
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
---
block/ioctl.c | 27 +++++++++++++++++++++++++++
include/linux/fs.h | 1 +
2 files changed, 28 insertions(+), 0 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index 4476e0e8..769d296 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -185,6 +185,22 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
return blkdev_issue_discard(bdev, start, len, GFP_KERNEL, flags);
}
+static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start,
+ uint64_t len)
+{
+ if (start & 511)
+ return -EINVAL;
+ if (len & 511)
+ return -EINVAL;
+ start >>= 9;
+ len >>= 9;
+
+ if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ return -EINVAL;
+
+ return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL);
+}
+
static int put_ushort(unsigned long arg, unsigned short val)
{
return put_user(val, (unsigned short __user *)arg);
@@ -300,6 +316,17 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
return blk_ioctl_discard(bdev, range[0], range[1],
cmd == BLKSECDISCARD);
}
+ case BLKZEROOUT: {
+ uint64_t range[2];
+
+ if (!(mode & FMODE_WRITE))
+ return -EBADF;
+
+ if (copy_from_user(range, (void __user *)arg, sizeof(range)))
+ return -EFAULT;
+
+ return blk_ioctl_zeroout(bdev, range[0], range[1]);
+ }
case HDIO_GETGEO: {
struct hd_geometry geo;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index aa11047..bd6f6e7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -335,6 +335,7 @@ struct inodes_stat_t {
#define BLKDISCARDZEROES _IO(0x12,124)
#define BLKSECDISCARD _IO(0x12,125)
#define BLKROTATIONAL _IO(0x12,126)
+#define BLKZEROOUT _IO(0x12,127)
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
--
1.7.7.6
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 6/8] scsi: Add a report opcode helper
2012-09-18 16:19 Discard merging / Write same support v5 Martin K. Petersen
` (4 preceding siblings ...)
2012-09-18 16:19 ` [PATCH 5/8] block: ioctl to zero block ranges Martin K. Petersen
@ 2012-09-18 16:19 ` Martin K. Petersen
2012-09-18 16:19 ` [PATCH 7/8] sd: Permit merged discard requests Martin K. Petersen
` (2 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Martin K. Petersen @ 2012-09-18 16:19 UTC (permalink / raw)
To: linux-scsi; +Cc: axboe, shli, snitzer, Martin K. Petersen
From: "Martin K. Petersen" <martin.petersen@oracle.com>
The REPORT SUPPORTED OPERATION CODES command can be used to query
whether a given opcode is supported by a device. Add a helper function
that allows us to look up commands.
We only issue RSOC if the device reports compliance with SPC-3 or
later. But to err on the side of caution we disable the command for ATA,
FireWire and USB.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/ata/libata-scsi.c | 1 +
drivers/firewire/sbp2.c | 1 +
drivers/scsi/scsi.c | 45 ++++++++++++++++++++++++++++++++++++++++
drivers/usb/storage/scsiglue.c | 3 ++
include/scsi/scsi_device.h | 3 ++
5 files changed, 53 insertions(+), 0 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 8ec81ca..86d43b2 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1052,6 +1052,7 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
{
sdev->use_10_for_rw = 1;
sdev->use_10_for_ms = 1;
+ sdev->no_report_opcodes = 1;
/* Schedule policy is determined by ->qc_defer() callback and
* it needs to see every deferred qc. Set dev_blocked to 1 to
diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index 1162d6b..f82e9d4 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -1546,6 +1546,7 @@ static int sbp2_scsi_slave_configure(struct scsi_device *sdev)
struct sbp2_logical_unit *lu = sdev->hostdata;
sdev->use_10_for_rw = 1;
+ sdev->no_report_opcodes = 1;
if (sbp2_param_exclusive_login)
sdev->manage_start_stop = 1;
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2936b44..2c0d0ec 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -55,6 +55,7 @@
#include <linux/cpu.h>
#include <linux/mutex.h>
#include <linux/async.h>
+#include <asm/unaligned.h>
#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
@@ -1062,6 +1063,50 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, unsigned char *buf,
EXPORT_SYMBOL_GPL(scsi_get_vpd_page);
/**
+ * scsi_report_opcode - Find out if a given command opcode is supported
+ * @sdev: scsi device to query
+ * @buffer: scratch buffer (must be at least 20 bytes long)
+ * @len: length of buffer
+ * @opcode: opcode for command to look up
+ *
+ * Uses the REPORT SUPPORTED OPERATION CODES to look up the given
+ * opcode. Returns 0 if RSOC fails or if the command opcode is
+ * unsupported. Returns 1 if the device claims to support the command.
+ */
+int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer,
+ unsigned int len, unsigned char opcode)
+{
+ unsigned char cmd[16];
+ struct scsi_sense_hdr sshdr;
+ int result;
+
+ if (sdev->no_report_opcodes || sdev->scsi_level < SCSI_SPC_3)
+ return 0;
+
+ memset(cmd, 0, 16);
+ cmd[0] = MAINTENANCE_IN;
+ cmd[1] = MI_REPORT_SUPPORTED_OPERATION_CODES;
+ cmd[2] = 1; /* One command format */
+ cmd[3] = opcode;
+ put_unaligned_be32(len, &cmd[6]);
+ memset(buffer, 0, len);
+
+ result = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buffer, len,
+ &sshdr, 30 * HZ, 3, NULL);
+
+ if (result && scsi_sense_valid(&sshdr) &&
+ sshdr.sense_key == ILLEGAL_REQUEST &&
+ (sshdr.asc == 0x20 || sshdr.asc == 0x24) && sshdr.ascq == 0x00)
+ return 0;
+
+ if ((buffer[1] & 3) == 3) /* Command supported */
+ return 1;
+
+ return 0;
+}
+EXPORT_SYMBOL(scsi_report_opcode);
+
+/**
* scsi_device_get - get an additional reference to a scsi_device
* @sdev: device to get a reference to
*
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index a3d5436..6ab376a 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -186,6 +186,9 @@ static int slave_configure(struct scsi_device *sdev)
/* Some devices don't handle VPD pages correctly */
sdev->skip_vpd_pages = 1;
+ /* Do not attempt to use REPORT SUPPORTED OPERATION CODES */
+ sdev->no_report_opcodes = 1;
+
/* Some disks return the total number of blocks in response
* to READ CAPACITY rather than the highest block number.
* If this device makes that mistake, tell the sd driver. */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 9895f69..7d0b4f6 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -135,6 +135,7 @@ struct scsi_device {
* because we did a bus reset. */
unsigned use_10_for_rw:1; /* first try 10-byte read / write */
unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
+ unsigned no_report_opcodes:1; /* no REPORT SUPPORTED OPERATION CODES */
unsigned skip_ms_page_8:1; /* do not use MODE SENSE page 0x08 */
unsigned skip_ms_page_3f:1; /* do not use MODE SENSE page 0x3f */
unsigned skip_vpd_pages:1; /* do not read VPD pages */
@@ -361,6 +362,8 @@ extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout,
int retries, struct scsi_sense_hdr *sshdr);
extern int scsi_get_vpd_page(struct scsi_device *, u8 page, unsigned char *buf,
int buf_len);
+extern int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer,
+ unsigned int len, unsigned char opcode);
extern int scsi_device_set_state(struct scsi_device *sdev,
enum scsi_device_state state);
extern struct scsi_event *sdev_evt_alloc(enum scsi_device_event evt_type,
--
1.7.7.6
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 7/8] sd: Permit merged discard requests
2012-09-18 16:19 Discard merging / Write same support v5 Martin K. Petersen
` (5 preceding siblings ...)
2012-09-18 16:19 ` [PATCH 6/8] scsi: Add a report opcode helper Martin K. Petersen
@ 2012-09-18 16:19 ` Martin K. Petersen
2012-09-20 0:35 ` Shaohua Li
2012-09-18 16:19 ` [PATCH 8/8] sd: Implement support for WRITE SAME Martin K. Petersen
2012-09-20 12:33 ` Discard merging / Write same support v5 Jens Axboe
8 siblings, 1 reply; 14+ messages in thread
From: Martin K. Petersen @ 2012-09-18 16:19 UTC (permalink / raw)
To: linux-scsi; +Cc: axboe, shli, snitzer, Martin K. Petersen
From: "Martin K. Petersen" <martin.petersen@oracle.com>
Support requests with more than one bio payload for discards. The total
number of bytes to be discarded is stored in req->__data_len and used in
sd_done() to complete the I/O.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/scsi/sd.c | 32 +++++++++++++++++++-------------
1 files changed, 19 insertions(+), 13 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4df73e5..e100b5c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -560,29 +560,26 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
}
/**
- * scsi_setup_discard_cmnd - unmap blocks on thinly provisioned device
+ * sd_setup_discard_cmnd - unmap blocks on thinly provisioned device
* @sdp: scsi device to operate one
* @rq: Request to prepare
*
* Will issue either UNMAP or WRITE SAME(16) depending on preference
* indicated by target device.
**/
-static int scsi_setup_discard_cmnd(struct scsi_device *sdp, struct request *rq)
+static int sd_setup_discard_cmnd(struct scsi_device *sdp, struct request *rq)
{
struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
- struct bio *bio = rq->bio;
- sector_t sector = bio->bi_sector;
- unsigned int nr_sectors = bio_sectors(bio);
+ sector_t sector = blk_rq_pos(rq);
+ unsigned int nr_sectors = blk_rq_sectors(rq);
+ unsigned int nr_bytes = blk_rq_bytes(rq);
unsigned int len;
int ret;
char *buf;
struct page *page;
- if (sdkp->device->sector_size == 4096) {
- sector >>= 3;
- nr_sectors >>= 3;
- }
-
+ sector >>= ilog2(sdp->sector_size) - 9;
+ nr_sectors >>= ilog2(sdp->sector_size) - 9;
rq->timeout = SD_TIMEOUT;
memset(rq->cmd, 0, rq->cmd_len);
@@ -637,6 +634,7 @@ static int scsi_setup_discard_cmnd(struct scsi_device *sdp, struct request *rq)
blk_add_request_payload(rq, page, len);
ret = scsi_setup_blk_pc_cmnd(sdp, rq);
rq->buffer = page_address(page);
+ rq->__data_len = nr_bytes;
out:
if (ret != BLKPREP_OK) {
@@ -689,7 +687,7 @@ static int sd_prep_fn(struct request_queue *q, struct request *rq)
* block PC requests to make life easier.
*/
if (rq->cmd_flags & REQ_DISCARD) {
- ret = scsi_setup_discard_cmnd(sdp, rq);
+ ret = sd_setup_discard_cmnd(sdp, rq);
goto out;
} else if (rq->cmd_flags & REQ_FLUSH) {
ret = scsi_setup_flush_cmnd(sdp, rq);
@@ -1460,12 +1458,20 @@ static int sd_done(struct scsi_cmnd *SCpnt)
unsigned int good_bytes = result ? 0 : scsi_bufflen(SCpnt);
struct scsi_sense_hdr sshdr;
struct scsi_disk *sdkp = scsi_disk(SCpnt->request->rq_disk);
+ struct request *req = SCpnt->request;
int sense_valid = 0;
int sense_deferred = 0;
unsigned char op = SCpnt->cmnd[0];
- if ((SCpnt->request->cmd_flags & REQ_DISCARD) && !result)
- scsi_set_resid(SCpnt, 0);
+ if (req->cmd_flags & REQ_DISCARD) {
+ if (!result) {
+ good_bytes = blk_rq_bytes(req);
+ scsi_set_resid(SCpnt, 0);
+ } else {
+ good_bytes = 0;
+ scsi_set_resid(SCpnt, blk_rq_bytes(req));
+ }
+ }
if (result) {
sense_valid = scsi_command_normalize_sense(SCpnt, &sshdr);
--
1.7.7.6
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 8/8] sd: Implement support for WRITE SAME
2012-09-18 16:19 Discard merging / Write same support v5 Martin K. Petersen
` (6 preceding siblings ...)
2012-09-18 16:19 ` [PATCH 7/8] sd: Permit merged discard requests Martin K. Petersen
@ 2012-09-18 16:19 ` Martin K. Petersen
2012-09-20 12:33 ` Discard merging / Write same support v5 Jens Axboe
8 siblings, 0 replies; 14+ messages in thread
From: Martin K. Petersen @ 2012-09-18 16:19 UTC (permalink / raw)
To: linux-scsi; +Cc: axboe, shli, snitzer, Martin K. Petersen
From: "Martin K. Petersen" <martin.petersen@oracle.com>
Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk
driver.
- We set the default maximum to 0xFFFF because there are several
devices out there that only support two-byte block counts even with
WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the
device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK
LIMITS VPD.
- max_write_same_blocks can be overriden per-device basis in sysfs.
- The UNMAP discovery heuristics remain unchanged but the discard
limits are tweaked to match the "real" WRITE SAME commands.
- In the error handling logic we now distinguish between WRITE SAME
with and without UNMAP set.
The discovery process heuristics are:
- If the device reports a SCSI level of SPC-3 or greater we'll issue
READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is
supported. If that's the case we will use it.
- If the device supports the block limits VPD and reports a MAXIMUM
WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16).
- Otherwise we will use WRITE SAME(10) unless the target LBA is beyond
0xFFFFFFFF or the block count exceeds 0xFFFF.
- no_write_same is set for ATA, FireWire and USB.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
---
drivers/ata/libata-scsi.c | 1 +
drivers/firewire/sbp2.c | 1 +
drivers/scsi/scsi_lib.c | 22 ++++-
drivers/scsi/sd.c | 172 +++++++++++++++++++++++++++++++++++++---
drivers/scsi/sd.h | 7 ++
drivers/usb/storage/scsiglue.c | 3 +
include/scsi/scsi_device.h | 1 +
7 files changed, 191 insertions(+), 16 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 86d43b2..002f1b7 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1053,6 +1053,7 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
sdev->use_10_for_rw = 1;
sdev->use_10_for_ms = 1;
sdev->no_report_opcodes = 1;
+ sdev->no_write_same = 1;
/* Schedule policy is determined by ->qc_defer() callback and
* it needs to see every deferred qc. Set dev_blocked to 1 to
diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index f82e9d4..bb1b392 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -1547,6 +1547,7 @@ static int sbp2_scsi_slave_configure(struct scsi_device *sdev)
sdev->use_10_for_rw = 1;
sdev->no_report_opcodes = 1;
+ sdev->no_write_same = 1;
if (sbp2_param_exclusive_login)
sdev->manage_start_stop = 1;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ffd7773..874d41b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -897,11 +897,23 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
action = ACTION_FAIL;
error = -EILSEQ;
/* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */
- } else if ((sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
- (cmd->cmnd[0] == UNMAP ||
- cmd->cmnd[0] == WRITE_SAME_16 ||
- cmd->cmnd[0] == WRITE_SAME)) {
- description = "Discard failure";
+ } else if (sshdr.asc == 0x20 || sshdr.asc == 0x24) {
+ switch (cmd->cmnd[0]) {
+ case UNMAP:
+ description = "Discard failure";
+ break;
+ case WRITE_SAME:
+ case WRITE_SAME_16:
+ if (cmd->cmnd[1] & 0x8)
+ description = "Discard failure";
+ else
+ description =
+ "Write same failure";
+ break;
+ default:
+ description = "Invalid command failure";
+ break;
+ }
action = ACTION_FAIL;
error = -EREMOTEIO;
} else
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e100b5c..1cf2d5d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -99,6 +99,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
#endif
static void sd_config_discard(struct scsi_disk *, unsigned int);
+static void sd_config_write_same(struct scsi_disk *);
static int sd_revalidate_disk(struct gendisk *);
static void sd_unlock_native_capacity(struct gendisk *disk);
static int sd_probe(struct device *);
@@ -373,6 +374,45 @@ sd_store_max_medium_access_timeouts(struct device *dev,
return err ? err : count;
}
+static ssize_t
+sd_show_write_same_blocks(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct scsi_disk *sdkp = to_scsi_disk(dev);
+
+ return snprintf(buf, 20, "%u\n", sdkp->max_ws_blocks);
+}
+
+static ssize_t
+sd_store_write_same_blocks(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct scsi_disk *sdkp = to_scsi_disk(dev);
+ struct scsi_device *sdp = sdkp->device;
+ unsigned long max;
+ int err;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+
+ if (sdp->type != TYPE_DISK)
+ return -EINVAL;
+
+ err = kstrtoul(buf, 10, &max);
+
+ if (err)
+ return err;
+
+ if (max == 0)
+ sdp->no_write_same = 1;
+ else if (max <= SD_MAX_WS16_BLOCKS)
+ sdkp->max_ws_blocks = max;
+
+ sd_config_write_same(sdkp);
+
+ return count;
+}
+
static struct device_attribute sd_disk_attrs[] = {
__ATTR(cache_type, S_IRUGO|S_IWUSR, sd_show_cache_type,
sd_store_cache_type),
@@ -387,6 +427,8 @@ static struct device_attribute sd_disk_attrs[] = {
__ATTR(thin_provisioning, S_IRUGO, sd_show_thin_provisioning, NULL),
__ATTR(provisioning_mode, S_IRUGO|S_IWUSR, sd_show_provisioning_mode,
sd_store_provisioning_mode),
+ __ATTR(max_write_same_blocks, S_IRUGO|S_IWUSR,
+ sd_show_write_same_blocks, sd_store_write_same_blocks),
__ATTR(max_medium_access_timeouts, S_IRUGO|S_IWUSR,
sd_show_max_medium_access_timeouts,
sd_store_max_medium_access_timeouts),
@@ -538,19 +580,23 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
return;
case SD_LBP_UNMAP:
- max_blocks = min_not_zero(sdkp->max_unmap_blocks, 0xffffffff);
+ max_blocks = min_not_zero(sdkp->max_unmap_blocks,
+ (u32)SD_MAX_WS16_BLOCKS);
break;
case SD_LBP_WS16:
- max_blocks = min_not_zero(sdkp->max_ws_blocks, 0xffffffff);
+ max_blocks = min_not_zero(sdkp->max_ws_blocks,
+ (u32)SD_MAX_WS16_BLOCKS);
break;
case SD_LBP_WS10:
- max_blocks = min_not_zero(sdkp->max_ws_blocks, (u32)0xffff);
+ max_blocks = min_not_zero(sdkp->max_ws_blocks,
+ (u32)SD_MAX_WS10_BLOCKS);
break;
case SD_LBP_ZERO:
- max_blocks = min_not_zero(sdkp->max_ws_blocks, (u32)0xffff);
+ max_blocks = min_not_zero(sdkp->max_ws_blocks,
+ (u32)SD_MAX_WS10_BLOCKS);
q->limits.discard_zeroes_data = 1;
break;
}
@@ -644,6 +690,83 @@ out:
return ret;
}
+static void sd_config_write_same(struct scsi_disk *sdkp)
+{
+ struct request_queue *q = sdkp->disk->queue;
+ unsigned int logical_block_size = sdkp->device->sector_size;
+ unsigned int blocks = 0;
+
+ if (sdkp->device->no_write_same) {
+ sdkp->max_ws_blocks = 0;
+ goto out;
+ }
+
+ /* Some devices can not handle block counts above 0xffff despite
+ * supporting WRITE SAME(16). Consequently we default to 64k
+ * blocks per I/O unless the device explicitly advertises a
+ * bigger limit.
+ */
+ if (sdkp->max_ws_blocks == 0)
+ sdkp->max_ws_blocks = SD_MAX_WS10_BLOCKS;
+
+ if (sdkp->ws16 || sdkp->max_ws_blocks > SD_MAX_WS10_BLOCKS)
+ blocks = min_not_zero(sdkp->max_ws_blocks,
+ (u32)SD_MAX_WS16_BLOCKS);
+ else
+ blocks = min_not_zero(sdkp->max_ws_blocks,
+ (u32)SD_MAX_WS10_BLOCKS);
+
+out:
+ blk_queue_max_write_same_sectors(q, blocks * (logical_block_size >> 9));
+}
+
+/**
+ * sd_setup_write_same_cmnd - write the same data to multiple blocks
+ * @sdp: scsi device to operate one
+ * @rq: Request to prepare
+ *
+ * Will issue either WRITE SAME(10) or WRITE SAME(16) depending on
+ * preference indicated by target device.
+ **/
+static int sd_setup_write_same_cmnd(struct scsi_device *sdp, struct request *rq)
+{
+ struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+ struct bio *bio = rq->bio;
+ sector_t sector = blk_rq_pos(rq);
+ unsigned int nr_sectors = blk_rq_sectors(rq);
+ unsigned int nr_bytes = blk_rq_bytes(rq);
+ int ret;
+
+ if (sdkp->device->no_write_same)
+ return BLKPREP_KILL;
+
+ BUG_ON(bio_offset(bio) || bio_iovec(bio)->bv_len != sdp->sector_size);
+
+ sector >>= ilog2(sdp->sector_size) - 9;
+ nr_sectors >>= ilog2(sdp->sector_size) - 9;
+
+ rq->__data_len = sdp->sector_size;
+ rq->timeout = SD_WRITE_SAME_TIMEOUT;
+ memset(rq->cmd, 0, rq->cmd_len);
+
+ if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff) {
+ rq->cmd_len = 16;
+ rq->cmd[0] = WRITE_SAME_16;
+ put_unaligned_be64(sector, &rq->cmd[2]);
+ put_unaligned_be32(nr_sectors, &rq->cmd[10]);
+ } else {
+ rq->cmd_len = 10;
+ rq->cmd[0] = WRITE_SAME;
+ put_unaligned_be32(sector, &rq->cmd[2]);
+ put_unaligned_be16(nr_sectors, &rq->cmd[7]);
+ }
+
+ ret = scsi_setup_blk_pc_cmnd(sdp, rq);
+ rq->__data_len = nr_bytes;
+
+ return ret;
+}
+
static int scsi_setup_flush_cmnd(struct scsi_device *sdp, struct request *rq)
{
rq->timeout = SD_FLUSH_TIMEOUT;
@@ -689,6 +812,9 @@ static int sd_prep_fn(struct request_queue *q, struct request *rq)
if (rq->cmd_flags & REQ_DISCARD) {
ret = sd_setup_discard_cmnd(sdp, rq);
goto out;
+ } else if (rq->cmd_flags & REQ_WRITE_SAME) {
+ ret = sd_setup_write_same_cmnd(sdp, rq);
+ goto out;
} else if (rq->cmd_flags & REQ_FLUSH) {
ret = scsi_setup_flush_cmnd(sdp, rq);
goto out;
@@ -1462,8 +1588,9 @@ static int sd_done(struct scsi_cmnd *SCpnt)
int sense_valid = 0;
int sense_deferred = 0;
unsigned char op = SCpnt->cmnd[0];
+ unsigned char unmap = SCpnt->cmnd[1] & 8;
- if (req->cmd_flags & REQ_DISCARD) {
+ if (req->cmd_flags & REQ_DISCARD || req->cmd_flags & REQ_WRITE_SAME) {
if (!result) {
good_bytes = blk_rq_bytes(req);
scsi_set_resid(SCpnt, 0);
@@ -1520,9 +1647,25 @@ static int sd_done(struct scsi_cmnd *SCpnt)
if (sshdr.asc == 0x10) /* DIX: Host detected corruption */
good_bytes = sd_completed_bytes(SCpnt);
/* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */
- if ((sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
- (op == UNMAP || op == WRITE_SAME_16 || op == WRITE_SAME))
- sd_config_discard(sdkp, SD_LBP_DISABLE);
+ if (sshdr.asc == 0x20 || sshdr.asc == 0x24) {
+ switch (op) {
+ case UNMAP:
+ sd_config_discard(sdkp, SD_LBP_DISABLE);
+ break;
+ case WRITE_SAME_16:
+ case WRITE_SAME:
+ if (unmap)
+ sd_config_discard(sdkp, SD_LBP_DISABLE);
+ else {
+ sdkp->device->no_write_same = 1;
+ sd_config_write_same(sdkp);
+
+ good_bytes = 0;
+ req->__data_len = blk_rq_bytes(req);
+ req->cmd_flags |= REQ_QUIET;
+ }
+ }
+ }
break;
default:
break;
@@ -2347,9 +2490,7 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
if (buffer[3] == 0x3c) {
unsigned int lba_count, desc_count;
- sdkp->max_ws_blocks =
- (u32) min_not_zero(get_unaligned_be64(&buffer[36]),
- (u64)0xffffffff);
+ sdkp->max_ws_blocks = (u32)get_unaligned_be64(&buffer[36]);
if (!sdkp->lbpme)
goto out;
@@ -2442,6 +2583,13 @@ static void sd_read_block_provisioning(struct scsi_disk *sdkp)
kfree(buffer);
}
+static void sd_read_write_same(struct scsi_disk *sdkp, unsigned char *buffer)
+{
+ if (scsi_report_opcode(sdkp->device, buffer, SD_BUF_SIZE,
+ WRITE_SAME_16))
+ sdkp->ws16 = 1;
+}
+
static int sd_try_extended_inquiry(struct scsi_device *sdp)
{
/*
@@ -2501,6 +2649,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
sd_read_write_protect_flag(sdkp, buffer);
sd_read_cache_type(sdkp, buffer);
sd_read_app_tag_own(sdkp, buffer);
+ sd_read_write_same(sdkp, buffer);
}
sdkp->first_scan = 0;
@@ -2518,6 +2667,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
blk_queue_flush(sdkp->disk->queue, flush);
set_capacity(disk, sdkp->capacity);
+ sd_config_write_same(sdkp);
kfree(buffer);
out:
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index f703f48..cfcf8fd 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -14,6 +14,7 @@
#define SD_TIMEOUT (30 * HZ)
#define SD_MOD_TIMEOUT (75 * HZ)
#define SD_FLUSH_TIMEOUT (60 * HZ)
+#define SD_WRITE_SAME_TIMEOUT (120 * HZ)
/*
* Number of allowed retries
@@ -39,6 +40,11 @@ enum {
};
enum {
+ SD_MAX_WS10_BLOCKS = 0xffff,
+ SD_MAX_WS16_BLOCKS = 0x7fffff,
+};
+
+enum {
SD_LBP_FULL = 0, /* Full logical block provisioning */
SD_LBP_UNMAP, /* Use UNMAP command */
SD_LBP_WS16, /* Use WRITE SAME(16) with UNMAP bit */
@@ -77,6 +83,7 @@ struct scsi_disk {
unsigned lbpws : 1;
unsigned lbpws10 : 1;
unsigned lbpvpd : 1;
+ unsigned ws16 : 1;
};
#define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 6ab376a..92f35ab 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -189,6 +189,9 @@ static int slave_configure(struct scsi_device *sdev)
/* Do not attempt to use REPORT SUPPORTED OPERATION CODES */
sdev->no_report_opcodes = 1;
+ /* Do not attempt to use WRITE SAME */
+ sdev->no_write_same = 1;
+
/* Some disks return the total number of blocks in response
* to READ CAPACITY rather than the highest block number.
* If this device makes that mistake, tell the sd driver. */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 7d0b4f6..3d6e315 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -136,6 +136,7 @@ struct scsi_device {
unsigned use_10_for_rw:1; /* first try 10-byte read / write */
unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
unsigned no_report_opcodes:1; /* no REPORT SUPPORTED OPERATION CODES */
+ unsigned no_write_same:1; /* no WRITE SAME command */
unsigned skip_ms_page_8:1; /* do not use MODE SENSE page 0x08 */
unsigned skip_ms_page_3f:1; /* do not use MODE SENSE page 0x3f */
unsigned skip_vpd_pages:1; /* do not read VPD pages */
--
1.7.7.6
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Discard merging / Write same support v5
2012-09-18 16:19 Discard merging / Write same support v5 Martin K. Petersen
` (7 preceding siblings ...)
2012-09-18 16:19 ` [PATCH 8/8] sd: Implement support for WRITE SAME Martin K. Petersen
@ 2012-09-20 12:33 ` Jens Axboe
2012-09-20 12:43 ` James Bottomley
8 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2012-09-20 12:33 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: linux-scsi, shli, snitzer
On 09/18/2012 06:19 PM, Martin K. Petersen wrote:
> I've been beating on this patch kit for the last few weeks with a bunch
> of different devices. I have also tested extensively with Shaohua Li's
> MD RAID0 discard support.
>
> The only functional change is support for discard and write same
> bio/request merging. Other than that there were a few tweaks for later
> kernels by both Mike and me.
>
> I put a writesame5 branch on:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git
>
> It's against block/for-3.7/core.
>
> [PATCH 1/8] block: Clean up special command handling logic
> [PATCH 2/8] block: Consolidate command flag and queue limit checks
> [PATCH 3/8] block: Implement support for WRITE SAME
> [PATCH 4/8] block: Make blkdev_issue_zeroout use WRITE SAME
> [PATCH 5/8] block: ioctl to zero block ranges
> [PATCH 6/8] scsi: Add a report opcode helper
> [PATCH 7/8] sd: Permit merged discard requests
> [PATCH 8/8] sd: Implement support for WRITE SAME
Thanks Martin, I queued up 1..5 in for-3.7/core. I'm assuming James will
pick up the SCSI bits?
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Discard merging / Write same support v5
2012-09-20 12:33 ` Discard merging / Write same support v5 Jens Axboe
@ 2012-09-20 12:43 ` James Bottomley
2012-09-20 12:47 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2012-09-20 12:43 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, linux-scsi, shli, snitzer
On Thu, 2012-09-20 at 14:33 +0200, Jens Axboe wrote:
> On 09/18/2012 06:19 PM, Martin K. Petersen wrote:
> > I've been beating on this patch kit for the last few weeks with a bunch
> > of different devices. I have also tested extensively with Shaohua Li's
> > MD RAID0 discard support.
> >
> > The only functional change is support for discard and write same
> > bio/request merging. Other than that there were a few tweaks for later
> > kernels by both Mike and me.
> >
> > I put a writesame5 branch on:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git
> >
> > It's against block/for-3.7/core.
> >
> > [PATCH 1/8] block: Clean up special command handling logic
> > [PATCH 2/8] block: Consolidate command flag and queue limit checks
> > [PATCH 3/8] block: Implement support for WRITE SAME
> > [PATCH 4/8] block: Make blkdev_issue_zeroout use WRITE SAME
> > [PATCH 5/8] block: ioctl to zero block ranges
> > [PATCH 6/8] scsi: Add a report opcode helper
> > [PATCH 7/8] sd: Permit merged discard requests
> > [PATCH 8/8] sd: Implement support for WRITE SAME
>
> Thanks Martin, I queued up 1..5 in for-3.7/core. I'm assuming James will
> pick up the SCSI bits?
Yes, but Linus gave me a rocket about post merge trees. Can you do this
on a separate branch and I'll build on top of that?
Thanks,
James
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Discard merging / Write same support v5
2012-09-20 12:43 ` James Bottomley
@ 2012-09-20 12:47 ` Jens Axboe
0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2012-09-20 12:47 UTC (permalink / raw)
To: James Bottomley; +Cc: Martin K. Petersen, linux-scsi, shli, snitzer
On 09/20/2012 02:43 PM, James Bottomley wrote:
> On Thu, 2012-09-20 at 14:33 +0200, Jens Axboe wrote:
>> On 09/18/2012 06:19 PM, Martin K. Petersen wrote:
>>> I've been beating on this patch kit for the last few weeks with a bunch
>>> of different devices. I have also tested extensively with Shaohua Li's
>>> MD RAID0 discard support.
>>>
>>> The only functional change is support for discard and write same
>>> bio/request merging. Other than that there were a few tweaks for later
>>> kernels by both Mike and me.
>>>
>>> I put a writesame5 branch on:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git
>>>
>>> It's against block/for-3.7/core.
>>>
>>> [PATCH 1/8] block: Clean up special command handling logic
>>> [PATCH 2/8] block: Consolidate command flag and queue limit checks
>>> [PATCH 3/8] block: Implement support for WRITE SAME
>>> [PATCH 4/8] block: Make blkdev_issue_zeroout use WRITE SAME
>>> [PATCH 5/8] block: ioctl to zero block ranges
>>> [PATCH 6/8] scsi: Add a report opcode helper
>>> [PATCH 7/8] sd: Permit merged discard requests
>>> [PATCH 8/8] sd: Implement support for WRITE SAME
>>
>> Thanks Martin, I queued up 1..5 in for-3.7/core. I'm assuming James will
>> pick up the SCSI bits?
>
> Yes, but Linus gave me a rocket about post merge trees. Can you do this
> on a separate branch and I'll build on top of that?
He did, didn't he... You could just use for-3.7/core as of today,
there's not that much in there yet. Alternatively, I'd have to rebase
for-3.7/core and rip them out, put them in a separate branch, and pull
that back into for-3.7/core.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread