[PATCHSET 0/4] struct request optimizations

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET 0/4] struct request optimizations
@ 2018-01-09 18:26 Jens Axboe
  2018-01-09 18:26 ` [PATCH 1/4] block: remove REQ_ATOM_POLL_SLEPT Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jens Axboe @ 2018-01-09 18:26 UTC (permalink / raw)
  To: linux-block

With the latest patchset from Tejun, we grew the request structure
a little bit. It's been quite a while since I've taken a look at
the layout of the structure, this patchset is a first attempt at
doing that.

One advantage of Tejun's patchset is that we no longer rely on
the atomic complete flag on blk-mq. We can use that to shuffle
some bits, and reclaim the full atomic_flags field.

Cache optimize the layout of struct request a bit, to group
things a little more logically. Not a huge shuffle, just a few
select members.

We end up doing better in synthetic testing after this, details
in the last patch.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] block: remove REQ_ATOM_POLL_SLEPT
  2018-01-09 18:26 [PATCHSET 0/4] struct request optimizations Jens Axboe
@ 2018-01-09 18:26 ` Jens Axboe
  2018-01-09 18:27 ` [PATCH 2/4] block: add accessors for setting/querying request deadline Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2018-01-09 18:26 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

We don't need this to be an atomic flag, it can be a regular
flag. We either end up on the same CPU for the polling, in which
case the state is sane, or we did the sleep which would imply
the needed barrier to ensure we see the right state.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq-debugfs.c | 1 -
 block/blk-mq.c         | 5 ++---
 block/blk.h            | 2 --
 include/linux/blkdev.h | 2 ++
 4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 8adc83786256..2a9c9f8b6162 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -294,7 +294,6 @@ static const char *const rqf_name[] = {
 #define RQAF_NAME(name) [REQ_ATOM_##name] = #name
 static const char *const rqaf_name[] = {
 	RQAF_NAME(COMPLETE),
-	RQAF_NAME(POLL_SLEPT),
 };
 #undef RQAF_NAME
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9aa24c9508f9..faa31814983c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -483,7 +483,6 @@ void blk_mq_free_request(struct request *rq)
 		blk_put_rl(blk_rq_rl(rq));
 
 	blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
-	clear_bit(REQ_ATOM_POLL_SLEPT, &rq->atomic_flags);
 	if (rq->tag != -1)
 		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
@@ -2970,7 +2969,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	unsigned int nsecs;
 	ktime_t kt;
 
-	if (test_bit(REQ_ATOM_POLL_SLEPT, &rq->atomic_flags))
+	if (rq->rq_flags & RQF_MQ_POLL_SLEPT)
 		return false;
 
 	/*
@@ -2990,7 +2989,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	if (!nsecs)
 		return false;
 
-	set_bit(REQ_ATOM_POLL_SLEPT, &rq->atomic_flags);
+	rq->rq_flags |= RQF_MQ_POLL_SLEPT;
 
 	/*
 	 * This will be replaced with the stats tracking code, using
diff --git a/block/blk.h b/block/blk.h
index a68dbe312ea3..eb306c52121e 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -124,8 +124,6 @@ void blk_account_io_done(struct request *req);
  */
 enum rq_atomic_flags {
 	REQ_ATOM_COMPLETE = 0,
-
-	REQ_ATOM_POLL_SLEPT,
 };
 
 /*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 007a7cf1f262..ba31674d8581 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -127,6 +127,8 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_ZONE_WRITE_LOCKED	((__force req_flags_t)(1 << 19))
 /* timeout is expired */
 #define RQF_MQ_TIMEOUT_EXPIRED	((__force req_flags_t)(1 << 20))
+/* already slept for hybrid poll */
+#define RQF_MQ_POLL_SLEPT	((__force req_flags_t)(1 << 21))
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] block: add accessors for setting/querying request deadline
  2018-01-09 18:26 [PATCHSET 0/4] struct request optimizations Jens Axboe
  2018-01-09 18:26 ` [PATCH 1/4] block: remove REQ_ATOM_POLL_SLEPT Jens Axboe
@ 2018-01-09 18:27 ` Jens Axboe
  2018-01-09 18:40   ` Bart Van Assche
  2018-01-09 18:27 ` [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit Jens Axboe
  2018-01-09 18:27 ` [PATCH 4/4] block: rearrange a few request fields for better cache layout Jens Axboe
  3 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2018-01-09 18:27 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

We reduce the resolution of request expiry, but since we're already
using jiffies for this where resolution depends on the kernel
configuration and since the timeout resolution is coarse anyway,
that should be fine.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c         |  2 +-
 block/blk-timeout.c    | 14 ++++++++------
 block/blk.h            | 13 +++++++++++++
 include/linux/blkdev.h |  4 +++-
 4 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index faa31814983c..d875c51bcff8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -858,7 +858,7 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 	while (true) {
 		start = read_seqcount_begin(&rq->gstate_seq);
 		gstate = READ_ONCE(rq->gstate);
-		deadline = rq->deadline;
+		deadline = blk_rq_deadline(rq);
 		if (!read_seqcount_retry(&rq->gstate_seq, start))
 			break;
 		cond_resched();
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index ebe99963386c..a05e3676d24a 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -112,7 +112,9 @@ static void blk_rq_timed_out(struct request *req)
 static void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout,
 			  unsigned int *next_set)
 {
-	if (time_after_eq(jiffies, rq->deadline)) {
+	const unsigned long deadline = blk_rq_deadline(rq);
+
+	if (time_after_eq(jiffies, deadline)) {
 		list_del_init(&rq->timeout_list);
 
 		/*
@@ -120,8 +122,8 @@ static void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout
 		 */
 		if (!blk_mark_rq_complete(rq))
 			blk_rq_timed_out(rq);
-	} else if (!*next_set || time_after(*next_timeout, rq->deadline)) {
-		*next_timeout = rq->deadline;
+	} else if (!*next_set || time_after(*next_timeout, deadline)) {
+		*next_timeout = deadline;
 		*next_set = 1;
 	}
 }
@@ -162,7 +164,7 @@ void blk_abort_request(struct request *req)
 		 * immediately and that scan sees the new timeout value.
 		 * No need for fancy synchronizations.
 		 */
-		req->deadline = jiffies;
+		blk_rq_set_deadline(req, jiffies);
 		mod_timer(&req->q->timeout, 0);
 	} else {
 		if (blk_mark_rq_complete(req))
@@ -213,7 +215,7 @@ void blk_add_timer(struct request *req)
 	if (!req->timeout)
 		req->timeout = q->rq_timeout;
 
-	req->deadline = jiffies + req->timeout;
+	blk_rq_set_deadline(req, jiffies + req->timeout);
 	req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED;
 
 	/*
@@ -228,7 +230,7 @@ void blk_add_timer(struct request *req)
 	 * than an existing one, modify the timer. Round up to next nearest
 	 * second.
 	 */
-	expiry = blk_rq_timeout(round_jiffies_up(req->deadline));
+	expiry = blk_rq_timeout(round_jiffies_up(blk_rq_deadline(req)));
 
 	if (!timer_pending(&q->timeout) ||
 	    time_before(expiry, q->timeout.expires)) {
diff --git a/block/blk.h b/block/blk.h
index eb306c52121e..8b26a8872e05 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -237,6 +237,19 @@ static inline void req_set_nomerge(struct request_queue *q, struct request *req)
 }
 
 /*
+ * Steal a bit from this field for legacy IO path atomic IO marking
+ */
+static inline void blk_rq_set_deadline(struct request *rq, unsigned long time)
+{
+	rq->__deadline = time & ~0x1;
+}
+
+static inline unsigned long blk_rq_deadline(struct request *rq)
+{
+	return rq->__deadline & ~0x1;
+}
+
+/*
  * Internal io_context interface
  */
 void get_io_context(struct io_context *ioc);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ba31674d8581..aa6698cf483c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -257,7 +257,9 @@ struct request {
 	struct u64_stats_sync aborted_gstate_sync;
 	u64 aborted_gstate;
 
-	unsigned long deadline;
+	/* access through blk_rq_set_deadline, blk_rq_deadline */
+	unsigned long __deadline;
+
 	struct list_head timeout_list;
 
 	/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit
  2018-01-09 18:26 [PATCHSET 0/4] struct request optimizations Jens Axboe
  2018-01-09 18:26 ` [PATCH 1/4] block: remove REQ_ATOM_POLL_SLEPT Jens Axboe
  2018-01-09 18:27 ` [PATCH 2/4] block: add accessors for setting/querying request deadline Jens Axboe
@ 2018-01-09 18:27 ` Jens Axboe
  2018-01-09 18:43   ` Bart Van Assche
  2018-01-09 18:27 ` [PATCH 4/4] block: rearrange a few request fields for better cache layout Jens Axboe
  3 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2018-01-09 18:27 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

We only have one atomic flag left. Instead of using an entire
unsigned long for that, steal the bottom bit of the deadline
field that we already reserved.

Remove ->atomic_flags, since it's now unused.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c       |  2 +-
 block/blk-mq-debugfs.c |  8 --------
 block/blk.h            | 19 +++++++++----------
 include/linux/blkdev.h |  2 --
 4 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index f843ae4f858d..7ba607527487 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2853,7 +2853,7 @@ void blk_start_request(struct request *req)
 		wbt_issue(req->q->rq_wb, &req->issue_stat);
 	}
 
-	BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags));
+	BUG_ON(blk_rq_is_complete(req));
 	blk_add_timer(req);
 }
 EXPORT_SYMBOL(blk_start_request);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 2a9c9f8b6162..ac99b78415ec 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -291,12 +291,6 @@ static const char *const rqf_name[] = {
 };
 #undef RQF_NAME
 
-#define RQAF_NAME(name) [REQ_ATOM_##name] = #name
-static const char *const rqaf_name[] = {
-	RQAF_NAME(COMPLETE),
-};
-#undef RQAF_NAME
-
 int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq)
 {
 	const struct blk_mq_ops *const mq_ops = rq->q->mq_ops;
@@ -313,8 +307,6 @@ int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq)
 	seq_puts(m, ", .rq_flags=");
 	blk_flags_show(m, (__force unsigned int)rq->rq_flags, rqf_name,
 		       ARRAY_SIZE(rqf_name));
-	seq_puts(m, ", .atomic_flags=");
-	blk_flags_show(m, rq->atomic_flags, rqaf_name, ARRAY_SIZE(rqaf_name));
 	seq_printf(m, ", .tag=%d, .internal_tag=%d", rq->tag,
 		   rq->internal_tag);
 	if (mq_ops->show_rq)
diff --git a/block/blk.h b/block/blk.h
index 8b26a8872e05..d5ae07cc4abb 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -120,24 +120,23 @@ void blk_account_io_completion(struct request *req, unsigned int bytes);
 void blk_account_io_done(struct request *req);
 
 /*
- * Internal atomic flags for request handling
- */
-enum rq_atomic_flags {
-	REQ_ATOM_COMPLETE = 0,
-};
-
-/*
  * EH timer and IO completion will both attempt to 'grab' the request, make
- * sure that only one of them succeeds
+ * sure that only one of them succeeds. Steal the bottom bit of the
+ * __deadline field for this.
  */
 static inline int blk_mark_rq_complete(struct request *rq)
 {
-	return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
+	return test_and_set_bit(0, &rq->__deadline);
 }
 
 static inline void blk_clear_rq_complete(struct request *rq)
 {
-	clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
+	clear_bit(0, &rq->__deadline);
+}
+
+static inline bool blk_rq_is_complete(struct request *rq)
+{
+	return test_bit(0, &rq->__deadline);
 }
 
 /*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index aa6698cf483c..d4b2f7bb18d6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -156,8 +156,6 @@ struct request {
 
 	int internal_tag;
 
-	unsigned long atomic_flags;
-
 	/* the following two fields are internal, NEVER access directly */
 	unsigned int __data_len;	/* total data len */
 	int tag;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] block: rearrange a few request fields for better cache layout
  2018-01-09 18:26 [PATCHSET 0/4] struct request optimizations Jens Axboe
                   ` (2 preceding siblings ...)
  2018-01-09 18:27 ` [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit Jens Axboe
@ 2018-01-09 18:27 ` Jens Axboe
  3 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2018-01-09 18:27 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

Move completion related items (like the call single data) near the
end of the struct, instead of mixing them in with the initial
queueing related fields.

Move queuelist below the bio structures. Then we have all
queueing related bits in the first cache line.

This yields a 1.5-2% increase in IOPS for a null_blk test, both for
sync and for high thread count access. Sync test goes form 975K to
992K, 32-thread case from 20.8M to 21.2M IOPS.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/blkdev.h | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d4b2f7bb18d6..1b2472f6662e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -141,12 +141,6 @@ typedef __u32 __bitwise req_flags_t;
  * especially blk_mq_rq_ctx_init() to take care of the added fields.
  */
 struct request {
-	struct list_head queuelist;
-	union {
-		call_single_data_t csd;
-		u64 fifo_time;
-	};
-
 	struct request_queue *q;
 	struct blk_mq_ctx *mq_ctx;
 
@@ -164,6 +158,8 @@ struct request {
 	struct bio *bio;
 	struct bio *biotail;
 
+	struct list_head queuelist;
+
 	/*
 	 * The hash is used inside the scheduler, and killed once the
 	 * request reaches the dispatch list. The ipi_list is only used
@@ -260,6 +256,11 @@ struct request {
 
 	struct list_head timeout_list;
 
+	union {
+		call_single_data_t csd;
+		u64 fifo_time;
+	};
+
 	/*
 	 * completion callback.
 	 */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] block: add accessors for setting/querying request deadline
  2018-01-09 18:27 ` [PATCH 2/4] block: add accessors for setting/querying request deadline Jens Axboe
@ 2018-01-09 18:40   ` Bart Van Assche
  2018-01-09 18:41     ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2018-01-09 18:40 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, axboe@kernel.dk

T24gVHVlLCAyMDE4LTAxLTA5IGF0IDExOjI3IC0wNzAwLCBKZW5zIEF4Ym9lIHdyb3RlOg0KPiAr
c3RhdGljIGlubGluZSB2b2lkIGJsa19ycV9zZXRfZGVhZGxpbmUoc3RydWN0IHJlcXVlc3QgKnJx
LCB1bnNpZ25lZCBsb25nIHRpbWUpDQo+ICt7DQo+ICsJcnEtPl9fZGVhZGxpbmUgPSB0aW1lICYg
fjB4MTsNCj4gK30NCj4gKw0KPiArc3RhdGljIGlubGluZSB1bnNpZ25lZCBsb25nIGJsa19ycV9k
ZWFkbGluZShzdHJ1Y3QgcmVxdWVzdCAqcnEpDQo+ICt7DQo+ICsJcmV0dXJuIHJxLT5fX2RlYWRs
aW5lICYgfjB4MTsNCj4gK30NCg0KSGVsbG8gSmVucywNCg0KVGhlIHR5cGUgb2YgcnEtPl9fZGVh
ZGxpbmUgaXMgInVuc2lnbmVkIGxvbmciIGJ1dCB0aGUgdHlwZSBvZiB0aGUgcmlnaHQtaGFuZA0K
c2lkZSBjb25zdGFudCBpcyBpbnQuIFNob3VsZG4ndCBhbiAiVUwiIHN1ZmZpeCBiZSBhZGRlZCB0
byB0aGUgUkhTIGNvbnN0YW50Pw0KDQpUaGFua3MsDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] block: add accessors for setting/querying request deadline
  2018-01-09 18:40   ` Bart Van Assche
@ 2018-01-09 18:41     ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2018-01-09 18:41 UTC (permalink / raw)
  To: Bart Van Assche, linux-block@vger.kernel.org

On 1/9/18 11:40 AM, Bart Van Assche wrote:
> On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote:
>> +static inline void blk_rq_set_deadline(struct request *rq, unsigned long time)
>> +{
>> +	rq->__deadline = time & ~0x1;
>> +}
>> +
>> +static inline unsigned long blk_rq_deadline(struct request *rq)
>> +{
>> +	return rq->__deadline & ~0x1;
>> +}
> 
> Hello Jens,
> 
> The type of rq->__deadline is "unsigned long" but the type of the right-hand
> side constant is int. Shouldn't an "UL" suffix be added to the RHS constant?

Good catch, yeah you're right. I'll make that change, thanks.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit
  2018-01-09 18:27 ` [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit Jens Axboe
@ 2018-01-09 18:43   ` Bart Van Assche
  2018-01-09 18:44     ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2018-01-09 18:43 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, axboe@kernel.dk

T24gVHVlLCAyMDE4LTAxLTA5IGF0IDExOjI3IC0wNzAwLCBKZW5zIEF4Ym9lIHdyb3RlOg0KPiAg
c3RhdGljIGlubGluZSBpbnQgYmxrX21hcmtfcnFfY29tcGxldGUoc3RydWN0IHJlcXVlc3QgKnJx
KQ0KPiAgew0KPiAtCXJldHVybiB0ZXN0X2FuZF9zZXRfYml0KFJFUV9BVE9NX0NPTVBMRVRFLCAm
cnEtPmF0b21pY19mbGFncyk7DQo+ICsJcmV0dXJuIHRlc3RfYW5kX3NldF9iaXQoMCwgJnJxLT5f
X2RlYWRsaW5lKTsNCj4gIH0NCj4gIA0KPiAgc3RhdGljIGlubGluZSB2b2lkIGJsa19jbGVhcl9y
cV9jb21wbGV0ZShzdHJ1Y3QgcmVxdWVzdCAqcnEpDQo+ICB7DQo+IC0JY2xlYXJfYml0KFJFUV9B
VE9NX0NPTVBMRVRFLCAmcnEtPmF0b21pY19mbGFncyk7DQo+ICsJY2xlYXJfYml0KDAsICZycS0+
X19kZWFkbGluZSk7DQo+ICt9DQo+ICsNCj4gK3N0YXRpYyBpbmxpbmUgYm9vbCBibGtfcnFfaXNf
Y29tcGxldGUoc3RydWN0IHJlcXVlc3QgKnJxKQ0KPiArew0KPiArCXJldHVybiB0ZXN0X2JpdCgw
LCAmcnEtPl9fZGVhZGxpbmUpOw0KPiAgfQ0KDQpIZWxsbyBKZW5zLA0KDQpXaXRoIHRoaXMgY2hh
bmdlIHNldHRpbmcgb3IgY2hhbmdpbmcgdGhlIGRlYWRsaW5lIGNsZWFycyB0aGUgQ09NUExFVEUg
ZmxhZy4NCklzIHRoYXQgdGhlIGludGVuZGVkIGJlaGF2aW9yPyBJZiBzbywgc2hvdWxkIHBlcmhh
cHMgYSBjb21tZW50IGJlIGFkZGVkIGFib3ZlDQpibGtfcnFfc2V0X2RlYWRsaW5lKCk/DQoNClRo
YW5rcywNCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit
  2018-01-09 18:43   ` Bart Van Assche
@ 2018-01-09 18:44     ` Jens Axboe
  2018-01-09 18:52       ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2018-01-09 18:44 UTC (permalink / raw)
  To: Bart Van Assche, linux-block@vger.kernel.org

On 1/9/18 11:43 AM, Bart Van Assche wrote:
> On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote:
>>  static inline int blk_mark_rq_complete(struct request *rq)
>>  {
>> -	return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
>> +	return test_and_set_bit(0, &rq->__deadline);
>>  }
>>  
>>  static inline void blk_clear_rq_complete(struct request *rq)
>>  {
>> -	clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
>> +	clear_bit(0, &rq->__deadline);
>> +}
>> +
>> +static inline bool blk_rq_is_complete(struct request *rq)
>> +{
>> +	return test_bit(0, &rq->__deadline);
>>  }
> 
> Hello Jens,
> 
> With this change setting or changing the deadline clears the COMPLETE flag.
> Is that the intended behavior? If so, should perhaps a comment be added above
> blk_rq_set_deadline()?

Yeah, it's intentional. I can add a comment to that effect. It's only done
before queueing - except for the case where we force a timeout, but for that
it's only on the blk-mq side, which doesn't care.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit
  2018-01-09 18:44     ` Jens Axboe
@ 2018-01-09 18:52       ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2018-01-09 18:52 UTC (permalink / raw)
  To: Bart Van Assche, linux-block@vger.kernel.org

On 1/9/18 11:44 AM, Jens Axboe wrote:
> On 1/9/18 11:43 AM, Bart Van Assche wrote:
>> On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote:
>>>  static inline int blk_mark_rq_complete(struct request *rq)
>>>  {
>>> -	return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
>>> +	return test_and_set_bit(0, &rq->__deadline);
>>>  }
>>>  
>>>  static inline void blk_clear_rq_complete(struct request *rq)
>>>  {
>>> -	clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
>>> +	clear_bit(0, &rq->__deadline);
>>> +}
>>> +
>>> +static inline bool blk_rq_is_complete(struct request *rq)
>>> +{
>>> +	return test_bit(0, &rq->__deadline);
>>>  }
>>
>> Hello Jens,
>>
>> With this change setting or changing the deadline clears the COMPLETE flag.
>> Is that the intended behavior? If so, should perhaps a comment be added above
>> blk_rq_set_deadline()?
> 
> Yeah, it's intentional. I can add a comment to that effect. It's only done
> before queueing - except for the case where we force a timeout, but for that
> it's only on the blk-mq side, which doesn't care.

Since we clear it when we init the request, we could also just leave the
bit intact when setting the deadline. That's probably the safer choice:

static inline void blk_rq_set_deadline(struct request *rq, unsigned long time)  
{                                                                               
        rq->__deadline = (time & ~0x1UL) | (rq->__deadline & 0x1UL);
}

I'll test that, previous testing didn't find anything wrong with clearing
the bit, but this does seem safer.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] block: rearrange a few request fields for better cache layout
  2018-01-10  0:29 [PATCHSET v2 0/4] struct request optimizations Jens Axboe
@ 2018-01-10  0:29 ` Jens Axboe
  2018-01-10 18:34   ` Bart Van Assche
  2018-01-10 18:43   ` Omar Sandoval
  0 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2018-01-10  0:29 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, bart.vanassche, Jens Axboe

Move completion related items (like the call single data) near the
end of the struct, instead of mixing them in with the initial
queueing related fields.

Move queuelist below the bio structures. Then we have all
queueing related bits in the first cache line.

This yields a 1.5-2% increase in IOPS for a null_blk test, both for
sync and for high thread count access. Sync test goes form 975K to
992K, 32-thread case from 20.8M to 21.2M IOPS.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c         | 19 ++++++++++---------
 include/linux/blkdev.h | 28 +++++++++++++++-------------
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7248ee043651..ec128001ea8b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -270,8 +270,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
 	struct request *rq = tags->static_rqs[tag];
 
-	rq->rq_flags = 0;
-
 	if (data->flags & BLK_MQ_REQ_INTERNAL) {
 		rq->tag = -1;
 		rq->internal_tag = tag;
@@ -285,26 +283,23 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		data->hctx->tags->rqs[rq->tag] = rq;
 	}
 
-	INIT_LIST_HEAD(&rq->queuelist);
 	/* csd/requeue_work/fifo_time is initialized before use */
 	rq->q = data->q;
 	rq->mq_ctx = data->ctx;
+	rq->rq_flags = 0;
+	rq->cpu = -1;
 	rq->cmd_flags = op;
 	if (data->flags & BLK_MQ_REQ_PREEMPT)
 		rq->rq_flags |= RQF_PREEMPT;
 	if (blk_queue_io_stat(data->q))
 		rq->rq_flags |= RQF_IO_STAT;
-	rq->cpu = -1;
+	/* do not touch atomic flags, it needs atomic ops against the timer */
+	INIT_LIST_HEAD(&rq->queuelist);
 	INIT_HLIST_NODE(&rq->hash);
 	RB_CLEAR_NODE(&rq->rb_node);
 	rq->rq_disk = NULL;
 	rq->part = NULL;
 	rq->start_time = jiffies;
-#ifdef CONFIG_BLK_CGROUP
-	rq->rl = NULL;
-	set_start_time_ns(rq);
-	rq->io_start_time_ns = 0;
-#endif
 	rq->nr_phys_segments = 0;
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
 	rq->nr_integrity_segments = 0;
@@ -321,6 +316,12 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	rq->end_io_data = NULL;
 	rq->next_rq = NULL;
 
+#ifdef CONFIG_BLK_CGROUP
+	rq->rl = NULL;
+	set_start_time_ns(rq);
+	rq->io_start_time_ns = 0;
+#endif
+
 	data->ctx->rq_dispatched[op_is_sync(op)]++;
 	return rq;
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d4b2f7bb18d6..71a9371c8182 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -141,12 +141,6 @@ typedef __u32 __bitwise req_flags_t;
  * especially blk_mq_rq_ctx_init() to take care of the added fields.
  */
 struct request {
-	struct list_head queuelist;
-	union {
-		call_single_data_t csd;
-		u64 fifo_time;
-	};
-
 	struct request_queue *q;
 	struct blk_mq_ctx *mq_ctx;
 
@@ -164,6 +158,8 @@ struct request {
 	struct bio *bio;
 	struct bio *biotail;
 
+	struct list_head queuelist;
+
 	/*
 	 * The hash is used inside the scheduler, and killed once the
 	 * request reaches the dispatch list. The ipi_list is only used
@@ -211,19 +207,16 @@ struct request {
 	struct hd_struct *part;
 	unsigned long start_time;
 	struct blk_issue_stat issue_stat;
-#ifdef CONFIG_BLK_CGROUP
-	struct request_list *rl;		/* rl this rq is alloced from */
-	unsigned long long start_time_ns;
-	unsigned long long io_start_time_ns;    /* when passed to hardware */
-#endif
 	/* Number of scatter-gather DMA addr+len pairs after
 	 * physical address coalescing is performed.
 	 */
 	unsigned short nr_phys_segments;
+
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
 	unsigned short nr_integrity_segments;
 #endif
 
+	unsigned short write_hint;
 	unsigned short ioprio;
 
 	unsigned int timeout;
@@ -232,8 +225,6 @@ struct request {
 
 	unsigned int extra_len;	/* length of alignment and padding */
 
-	unsigned short write_hint;
-
 	/*
 	 * On blk-mq, the lower bits of ->gstate (generation number and
 	 * state) carry the MQ_RQ_* state value and the upper bits the
@@ -260,6 +251,11 @@ struct request {
 
 	struct list_head timeout_list;
 
+	union {
+		call_single_data_t csd;
+		u64 fifo_time;
+	};
+
 	/*
 	 * completion callback.
 	 */
@@ -268,6 +264,12 @@ struct request {
 
 	/* for bidi */
 	struct request *next_rq;
+
+#ifdef CONFIG_BLK_CGROUP
+	struct request_list *rl;		/* rl this rq is alloced from */
+	unsigned long long start_time_ns;
+	unsigned long long io_start_time_ns;    /* when passed to hardware */
+#endif
 };
 
 static inline bool blk_rq_is_scsi(struct request *rq)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] block: rearrange a few request fields for better cache layout
  2018-01-10  0:29 ` [PATCH 4/4] block: rearrange a few request fields for better cache layout Jens Axboe
@ 2018-01-10 18:34   ` Bart Van Assche
  2018-01-10 18:43   ` Omar Sandoval
  1 sibling, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2018-01-10 18:34 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, axboe@kernel.dk; +Cc: osandov@fb.com

T24gVHVlLCAyMDE4LTAxLTA5IGF0IDE3OjI5IC0wNzAwLCBKZW5zIEF4Ym9lIHdyb3RlOg0KPiBN
b3ZlIGNvbXBsZXRpb24gcmVsYXRlZCBpdGVtcyAobGlrZSB0aGUgY2FsbCBzaW5nbGUgZGF0YSkg
bmVhciB0aGUNCj4gZW5kIG9mIHRoZSBzdHJ1Y3QsIGluc3RlYWQgb2YgbWl4aW5nIHRoZW0gaW4g
d2l0aCB0aGUgaW5pdGlhbA0KPiBxdWV1ZWluZyByZWxhdGVkIGZpZWxkcy4NCj4gDQo+IE1vdmUg
cXVldWVsaXN0IGJlbG93IHRoZSBiaW8gc3RydWN0dXJlcy4gVGhlbiB3ZSBoYXZlIGFsbA0KPiBx
dWV1ZWluZyByZWxhdGVkIGJpdHMgaW4gdGhlIGZpcnN0IGNhY2hlIGxpbmUuDQo+IA0KPiBUaGlz
IHlpZWxkcyBhIDEuNS0yJSBpbmNyZWFzZSBpbiBJT1BTIGZvciBhIG51bGxfYmxrIHRlc3QsIGJv
dGggZm9yDQo+IHN5bmMgYW5kIGZvciBoaWdoIHRocmVhZCBjb3VudCBhY2Nlc3MuIFN5bmMgdGVz
dCBnb2VzIGZvcm0gOTc1SyB0bw0KPiA5OTJLLCAzMi10aHJlYWQgY2FzZSBmcm9tIDIwLjhNIHRv
IDIxLjJNIElPUFMuDQoNClRoYXQncyBhIG5pY2UgcmVzdWx0IQ0KDQpSZXZpZXdlZC1ieTogQmFy
dCBWYW4gQXNzY2hlIDxiYXJ0LnZhbmFzc2NoZUB3ZGMuY29tPg0KDQo=

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] block: rearrange a few request fields for better cache layout
  2018-01-10  0:29 ` [PATCH 4/4] block: rearrange a few request fields for better cache layout Jens Axboe
  2018-01-10 18:34   ` Bart Van Assche
@ 2018-01-10 18:43   ` Omar Sandoval
  2018-01-10 18:45     ` Jens Axboe
  1 sibling, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2018-01-10 18:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, osandov, bart.vanassche

On Tue, Jan 09, 2018 at 05:29:27PM -0700, Jens Axboe wrote:
> Move completion related items (like the call single data) near the
> end of the struct, instead of mixing them in with the initial
> queueing related fields.
> 
> Move queuelist below the bio structures. Then we have all
> queueing related bits in the first cache line.
> 
> This yields a 1.5-2% increase in IOPS for a null_blk test, both for
> sync and for high thread count access. Sync test goes form 975K to
> 992K, 32-thread case from 20.8M to 21.2M IOPS.

One nit below, otherwise

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-mq.c         | 19 ++++++++++---------
>  include/linux/blkdev.h | 28 +++++++++++++++-------------
>  2 files changed, 25 insertions(+), 22 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 7248ee043651..ec128001ea8b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -270,8 +270,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>  	struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
>  	struct request *rq = tags->static_rqs[tag];
>  
> -	rq->rq_flags = 0;
> -
>  	if (data->flags & BLK_MQ_REQ_INTERNAL) {
>  		rq->tag = -1;
>  		rq->internal_tag = tag;
> @@ -285,26 +283,23 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>  		data->hctx->tags->rqs[rq->tag] = rq;
>  	}
>  
> -	INIT_LIST_HEAD(&rq->queuelist);
>  	/* csd/requeue_work/fifo_time is initialized before use */
>  	rq->q = data->q;
>  	rq->mq_ctx = data->ctx;
> +	rq->rq_flags = 0;
> +	rq->cpu = -1;
>  	rq->cmd_flags = op;
>  	if (data->flags & BLK_MQ_REQ_PREEMPT)
>  		rq->rq_flags |= RQF_PREEMPT;
>  	if (blk_queue_io_stat(data->q))
>  		rq->rq_flags |= RQF_IO_STAT;
> -	rq->cpu = -1;
> +	/* do not touch atomic flags, it needs atomic ops against the timer */

This comment was just removed in a previous patch but it snuck back in.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] block: rearrange a few request fields for better cache layout
  2018-01-10 18:43   ` Omar Sandoval
@ 2018-01-10 18:45     ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2018-01-10 18:45 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, osandov, bart.vanassche

On 1/10/18 11:43 AM, Omar Sandoval wrote:
>> -	INIT_LIST_HEAD(&rq->queuelist);
>>  	/* csd/requeue_work/fifo_time is initialized before use */
>>  	rq->q = data->q;
>>  	rq->mq_ctx = data->ctx;
>> +	rq->rq_flags = 0;
>> +	rq->cpu = -1;
>>  	rq->cmd_flags = op;
>>  	if (data->flags & BLK_MQ_REQ_PREEMPT)
>>  		rq->rq_flags |= RQF_PREEMPT;
>>  	if (blk_queue_io_stat(data->q))
>>  		rq->rq_flags |= RQF_IO_STAT;
>> -	rq->cpu = -1;
>> +	/* do not touch atomic flags, it needs atomic ops against the timer */
> 
> This comment was just removed in a previous patch but it snuck back in.

Eagle eyes - thanks, I will kill it.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-01-10 18:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-09 18:26 [PATCHSET 0/4] struct request optimizations Jens Axboe
2018-01-09 18:26 ` [PATCH 1/4] block: remove REQ_ATOM_POLL_SLEPT Jens Axboe
2018-01-09 18:27 ` [PATCH 2/4] block: add accessors for setting/querying request deadline Jens Axboe
2018-01-09 18:40   ` Bart Van Assche
2018-01-09 18:41     ` Jens Axboe
2018-01-09 18:27 ` [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit Jens Axboe
2018-01-09 18:43   ` Bart Van Assche
2018-01-09 18:44     ` Jens Axboe
2018-01-09 18:52       ` Jens Axboe
2018-01-09 18:27 ` [PATCH 4/4] block: rearrange a few request fields for better cache layout Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2018-01-10  0:29 [PATCHSET v2 0/4] struct request optimizations Jens Axboe
2018-01-10  0:29 ` [PATCH 4/4] block: rearrange a few request fields for better cache layout Jens Axboe
2018-01-10 18:34   ` Bart Van Assche
2018-01-10 18:43   ` Omar Sandoval
2018-01-10 18:45     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).