* [PATCH 0/2] Some performance fixes for fastpath
@ 2016-03-08 17:34 Jon Derrick
2016-03-08 17:34 ` [PATCH 1/2] Block: Disable polling stats when iostats are disabled Jon Derrick
2016-03-08 17:34 ` [PATCH 2/2] NVMe: Remove unused sq_head read in completion path Jon Derrick
0 siblings, 2 replies; 10+ messages in thread
From: Jon Derrick @ 2016-03-08 17:34 UTC (permalink / raw)
This is a small set of hopefully uncontroversial, performance-related
patches. Local testing on new hardware showed a 5-10MBps increase in
4k and 1M sequential synchronous null reads from an NVMe device.
Jon Derrick (2):
Block: Disable polling stats when iostats are disabled
NVMe: Remove unused sq_head read in completion path
block/blk-core.c | 7 +++++--
drivers/nvme/host/pci.c | 2 --
2 files changed, 5 insertions(+), 4 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] Block: Disable polling stats when iostats are disabled
2016-03-08 17:34 [PATCH 0/2] Some performance fixes for fastpath Jon Derrick
@ 2016-03-08 17:34 ` Jon Derrick
2016-03-08 17:38 ` Sagi Grimberg
2016-03-08 17:40 ` Jens Axboe
2016-03-08 17:34 ` [PATCH 2/2] NVMe: Remove unused sq_head read in completion path Jon Derrick
1 sibling, 2 replies; 10+ messages in thread
From: Jon Derrick @ 2016-03-08 17:34 UTC (permalink / raw)
Extends iostats to encompass polling statistics to save a few cycles
when disabled.
Signed-off-by: Jon Derrick <jonathan.derrick at intel.com>
---
block/blk-core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index ab51685..354d03b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3350,13 +3350,16 @@ bool blk_poll(struct request_queue *q, blk_qc_t cookie)
while (!need_resched()) {
unsigned int queue_num = blk_qc_t_to_queue_num(cookie);
struct blk_mq_hw_ctx *hctx = q->queue_hw_ctx[queue_num];
+ int io_stat = blk_queue_io_stat(q);
int ret;
- hctx->poll_invoked++;
+ if (io_stat)
+ hctx->poll_invoked++;
ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
if (ret > 0) {
- hctx->poll_success++;
+ if (io_stat)
+ hctx->poll_success++;
set_current_state(TASK_RUNNING);
return true;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] NVMe: Remove unused sq_head read in completion path
2016-03-08 17:34 [PATCH 0/2] Some performance fixes for fastpath Jon Derrick
2016-03-08 17:34 ` [PATCH 1/2] Block: Disable polling stats when iostats are disabled Jon Derrick
@ 2016-03-08 17:34 ` Jon Derrick
2016-03-08 17:40 ` Sagi Grimberg
` (2 more replies)
1 sibling, 3 replies; 10+ messages in thread
From: Jon Derrick @ 2016-03-08 17:34 UTC (permalink / raw)
Signed-off-by: Jon Derrick <jonathan.derrick at intel.com>
---
drivers/nvme/host/pci.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 74514c7..e9f18e1 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -134,7 +134,6 @@ struct nvme_queue {
u32 __iomem *q_db;
u16 q_depth;
s16 cq_vector;
- u16 sq_head;
u16 sq_tail;
u16 cq_head;
u16 qid;
@@ -719,7 +718,6 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag)
if ((status & 1) != phase)
break;
- nvmeq->sq_head = le16_to_cpu(cqe.sq_head);
if (++head == nvmeq->q_depth) {
head = 0;
phase = !phase;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 1/2] Block: Disable polling stats when iostats are disabled
2016-03-08 17:34 ` [PATCH 1/2] Block: Disable polling stats when iostats are disabled Jon Derrick
@ 2016-03-08 17:38 ` Sagi Grimberg
2016-03-08 17:58 ` Jon Derrick
2016-03-08 17:40 ` Jens Axboe
1 sibling, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2016-03-08 17:38 UTC (permalink / raw)
> Extends iostats to encompass polling statistics to save a few cycles
> when disabled.
>
> Signed-off-by: Jon Derrick <jonathan.derrick at intel.com>
> ---
> block/blk-core.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ab51685..354d03b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3350,13 +3350,16 @@ bool blk_poll(struct request_queue *q, blk_qc_t cookie)
> while (!need_resched()) {
> unsigned int queue_num = blk_qc_t_to_queue_num(cookie);
> struct blk_mq_hw_ctx *hctx = q->queue_hw_ctx[queue_num];
> + int io_stat = blk_queue_io_stat(q);
> int ret;
>
> - hctx->poll_invoked++;
> + if (io_stat)
> + hctx->poll_invoked++;
>
> ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
> if (ret > 0) {
> - hctx->poll_success++;
> + if (io_stat)
> + hctx->poll_success++;
> set_current_state(TASK_RUNNING);
> return true;
> }
I fail to see how replacing incrementation with a branch statement helps
performance or even not making it worse...
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] NVMe: Remove unused sq_head read in completion path
2016-03-08 17:34 ` [PATCH 2/2] NVMe: Remove unused sq_head read in completion path Jon Derrick
@ 2016-03-08 17:40 ` Sagi Grimberg
2016-03-08 17:41 ` Jens Axboe
2016-03-09 7:54 ` Johannes Thumshirn
2 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2016-03-08 17:40 UTC (permalink / raw)
Looks fine,
Reviewed-by: Sagi Grimberg <sagig at mellanox.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] Block: Disable polling stats when iostats are disabled
2016-03-08 17:34 ` [PATCH 1/2] Block: Disable polling stats when iostats are disabled Jon Derrick
2016-03-08 17:38 ` Sagi Grimberg
@ 2016-03-08 17:40 ` Jens Axboe
2016-03-08 18:02 ` Jon Derrick
1 sibling, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2016-03-08 17:40 UTC (permalink / raw)
On 03/08/2016 10:34 AM, Jon Derrick wrote:
> Extends iostats to encompass polling statistics to save a few cycles
> when disabled.
>
> Signed-off-by: Jon Derrick <jonathan.derrick at intel.com>
> ---
> block/blk-core.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ab51685..354d03b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3350,13 +3350,16 @@ bool blk_poll(struct request_queue *q, blk_qc_t cookie)
> while (!need_resched()) {
> unsigned int queue_num = blk_qc_t_to_queue_num(cookie);
> struct blk_mq_hw_ctx *hctx = q->queue_hw_ctx[queue_num];
> + int io_stat = blk_queue_io_stat(q);
> int ret;
>
> - hctx->poll_invoked++;
> + if (io_stat)
> + hctx->poll_invoked++;
>
> ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
> if (ret > 0) {
> - hctx->poll_success++;
> + if (io_stat)
> + hctx->poll_success++;
> set_current_state(TASK_RUNNING);
> return true;
> }
>
Not sure this is a great idea. First of all, the poll stats are per
hardware queue. How many submission queues and CPUs do you have in your
setup? For most cases, I'd assume there'd be a 1:1 mapping between the
two, which makes the stats essentially free. And secondly, even for a
less optimal mapping, the poll stats are a lot cheaper than the io
stats. So bundling them together might not make a ton of sense.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] NVMe: Remove unused sq_head read in completion path
2016-03-08 17:34 ` [PATCH 2/2] NVMe: Remove unused sq_head read in completion path Jon Derrick
2016-03-08 17:40 ` Sagi Grimberg
@ 2016-03-08 17:41 ` Jens Axboe
2016-03-09 7:54 ` Johannes Thumshirn
2 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2016-03-08 17:41 UTC (permalink / raw)
On 03/08/2016 10:34 AM, Jon Derrick wrote:
> Signed-off-by: Jon Derrick <jonathan.derrick at intel.com>
> ---
> drivers/nvme/host/pci.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 74514c7..e9f18e1 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -134,7 +134,6 @@ struct nvme_queue {
> u32 __iomem *q_db;
> u16 q_depth;
> s16 cq_vector;
> - u16 sq_head;
> u16 sq_tail;
> u16 cq_head;
> u16 qid;
> @@ -719,7 +718,6 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag)
>
> if ((status & 1) != phase)
> break;
> - nvmeq->sq_head = le16_to_cpu(cqe.sq_head);
> if (++head == nvmeq->q_depth) {
> head = 0;
> phase = !phase;
This seems like a no-brainer, good spotting.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] Block: Disable polling stats when iostats are disabled
2016-03-08 17:38 ` Sagi Grimberg
@ 2016-03-08 17:58 ` Jon Derrick
0 siblings, 0 replies; 10+ messages in thread
From: Jon Derrick @ 2016-03-08 17:58 UTC (permalink / raw)
>
> I fail to see how replacing incrementation with a branch statement helps
> performance or even not making it worse...
It may have been coincidental in my setup - I had assumed the test on the local io_stat was cheaper than the cost of following hctx to then increment (or not) poll_invoked.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] Block: Disable polling stats when iostats are disabled
2016-03-08 17:40 ` Jens Axboe
@ 2016-03-08 18:02 ` Jon Derrick
0 siblings, 0 replies; 10+ messages in thread
From: Jon Derrick @ 2016-03-08 18:02 UTC (permalink / raw)
> Not sure this is a great idea. First of all, the poll stats are per hardware
> queue. How many submission queues and CPUs do you have in your setup? For
31 hq, 64 cpus (so actually just 31 sq)- but the test I ran was:
taskset -c 1 fio --name=global --gtod_reduce=1 --filename=/dev/nvme0n1 --bs=4k --rw=read --ioengine=sync --iodepth=1 --numjobs=1 --direct=1 --name=job
I was a bit surprised at the delta I saw, but it may have been coincidental
> most cases, I'd assume there'd be a 1:1 mapping between the two, which makes
> the stats essentially free. And secondly, even for a less optimal mapping,
> the poll stats are a lot cheaper than the io stats. So bundling them
> together might not make a ton of sense.
Fair enough :)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] NVMe: Remove unused sq_head read in completion path
2016-03-08 17:34 ` [PATCH 2/2] NVMe: Remove unused sq_head read in completion path Jon Derrick
2016-03-08 17:40 ` Sagi Grimberg
2016-03-08 17:41 ` Jens Axboe
@ 2016-03-09 7:54 ` Johannes Thumshirn
2 siblings, 0 replies; 10+ messages in thread
From: Johannes Thumshirn @ 2016-03-09 7:54 UTC (permalink / raw)
On Tue, Mar 08, 2016@10:34:54AM -0700, Jon Derrick wrote:
> Signed-off-by: Jon Derrick <jonathan.derrick at intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
--
Johannes Thumshirn Storage
jthumshirn at suse.de +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-03-09 7:54 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-08 17:34 [PATCH 0/2] Some performance fixes for fastpath Jon Derrick
2016-03-08 17:34 ` [PATCH 1/2] Block: Disable polling stats when iostats are disabled Jon Derrick
2016-03-08 17:38 ` Sagi Grimberg
2016-03-08 17:58 ` Jon Derrick
2016-03-08 17:40 ` Jens Axboe
2016-03-08 18:02 ` Jon Derrick
2016-03-08 17:34 ` [PATCH 2/2] NVMe: Remove unused sq_head read in completion path Jon Derrick
2016-03-08 17:40 ` Sagi Grimberg
2016-03-08 17:41 ` Jens Axboe
2016-03-09 7:54 ` Johannes Thumshirn
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.