[PATCHSET v4] Various block optimizations

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET v4] Various block optimizations
@ 2018-11-17 21:43 Jens Axboe
  2018-11-17 21:43 ` [PATCH 1/7] block: avoid ordered task state change for polled IO Jens Axboe
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Jens Axboe @ 2018-11-17 21:43 UTC (permalink / raw)
  To: linux-block, linux-nvme

Some of these are optimizations, the latter part is prep work
for supporting polling with aio.

Patches against my for-4.21/block branch. These patches can also
be found in my mq-perf branch. Some of them are prep patches for
the aio poll work, which can be found in my aio-poll branch.
These will get posted soon.

Changes since v3:

- Fix tiny race in iocb flag checking for plugging
- Improve poll-many implementation, including nvme-rdma
- Get rid of unneeded smp_rmb()
- Drop merged patches

Changes since v2:

- Include polled swap IO in the poll optimizations
- Get rid of unnecessary write barrier for DIO wakeup
- Fix a potential stall if need_resched() was set and preempt
  wasn't enabled
- Provide separate mq_ops for NVMe with poll queues
- Drop q->mq_ops patch
- Rebase on top of for-4.21/block

Changes since v1:

- Improve nvme irq disabling for polled IO
- Fix barriers in the ordered wakeup for polled O_DIRECT
- Add patch to allow polling to find any command that is done
- Add patch to control whether polling spins or not
- Have async O_DIRECT mark a bio as pollable
- Don't plug for polling


 block/blk-core.c                  |  4 +-
 block/blk-mq.c                    | 91 +++++++++++++++++++++------------------
 drivers/nvme/host/fc.c            | 33 --------------
 drivers/nvme/host/multipath.c     |  6 +--
 drivers/nvme/host/pci.c           | 22 +++++-----
 drivers/nvme/host/rdma.c          | 38 ++++++++--------
 drivers/nvme/target/io-cmd-bdev.c |  2 +-
 fs/block_dev.c                    | 11 +++--
 fs/direct-io.c                    |  2 +-
 fs/iomap.c                        |  5 ++-
 include/linux/blk-mq.h            |  2 +-
 include/linux/blkdev.h            |  4 +-
 mm/page_io.c                      |  5 ++-
 13 files changed, 101 insertions(+), 124 deletions(-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] block: avoid ordered task state change for polled IO
  2018-11-17 21:43 [PATCHSET v4] Various block optimizations Jens Axboe
@ 2018-11-17 21:43 ` Jens Axboe
  2018-11-19  7:58   ` Christoph Hellwig
  2018-11-17 21:43 ` [PATCH 2/7] block: have ->poll_fn() return number of entries polled Jens Axboe
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-11-17 21:43 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

For the core poll helper, the task state setting don't need
to imply any atomics, as it's the current task itself that
is being modified and we're not going to sleep.

For IRQ driven, the wakeup path have the necessary barriers
to not need us using the heavy handed version of the task
state setting.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c | 4 ++--
 fs/block_dev.c | 7 +++++--
 fs/iomap.c     | 3 ++-
 mm/page_io.c   | 3 ++-
 4 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 32b246ed44c0..7fc4abb4cc36 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3331,12 +3331,12 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 		ret = q->mq_ops->poll(hctx, rq->tag);
 		if (ret > 0) {
 			hctx->poll_success++;
-			set_current_state(TASK_RUNNING);
+			__set_current_state(TASK_RUNNING);
 			return true;
 		}
 
 		if (signal_pending_state(state, current))
-			set_current_state(TASK_RUNNING);
+			__set_current_state(TASK_RUNNING);
 
 		if (current->state == TASK_RUNNING)
 			return true;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 4d79bc80fb41..64ba27b8b754 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -237,9 +237,11 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 
 	qc = submit_bio(&bio);
 	for (;;) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+
 		if (!READ_ONCE(bio.bi_private))
 			break;
+
 		if (!(iocb->ki_flags & IOCB_HIPRI) ||
 		    !blk_poll(bdev_get_queue(bdev), qc))
 			io_schedule();
@@ -415,7 +417,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 		return -EIOCBQUEUED;
 
 	for (;;) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+
 		if (!READ_ONCE(dio->waiter))
 			break;
 
diff --git a/fs/iomap.c b/fs/iomap.c
index b0462b363bad..c5df035ace6f 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1888,7 +1888,8 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			return -EIOCBQUEUED;
 
 		for (;;) {
-			set_current_state(TASK_UNINTERRUPTIBLE);
+			__set_current_state(TASK_UNINTERRUPTIBLE);
+
 			if (!READ_ONCE(dio->submit.waiter))
 				break;
 
diff --git a/mm/page_io.c b/mm/page_io.c
index 57572ff46016..a7271fa481f6 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -405,7 +405,8 @@ int swap_readpage(struct page *page, bool synchronous)
 	bio_get(bio);
 	qc = submit_bio(bio);
 	while (synchronous) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+
 		if (!READ_ONCE(bio->bi_private))
 			break;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] block: have ->poll_fn() return number of entries polled
  2018-11-17 21:43 [PATCHSET v4] Various block optimizations Jens Axboe
  2018-11-17 21:43 ` [PATCH 1/7] block: avoid ordered task state change for polled IO Jens Axboe
@ 2018-11-17 21:43 ` Jens Axboe
  2018-11-19  7:58   ` Christoph Hellwig
  2018-11-25 10:52   ` Sagi Grimberg
  2018-11-17 21:43 ` [PATCH 3/7] nvme-fc: remove unused poll implementation Jens Axboe
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 24+ messages in thread
From: Jens Axboe @ 2018-11-17 21:43 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

We currently only really support sync poll, ie poll with 1
IO in flight. This prepares us for supporting async poll.

Note that the returned value isn't necessarily 100% accurate.
If poll races with IRQ completion, we assume that the fact
that the task is now runnable means we found at least one
entry. In reality it could be more than 1, or not even 1.
This is fine, the caller will just need to take this into
account.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c                | 18 +++++++++---------
 drivers/nvme/host/multipath.c |  4 ++--
 include/linux/blkdev.h        |  2 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7fc4abb4cc36..52b1c97cd7c6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -38,7 +38,7 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
-static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
+static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 
@@ -3305,7 +3305,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	return true;
 }
 
-static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
+static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 {
 	struct request_queue *q = hctx->queue;
 	long state;
@@ -3318,7 +3318,7 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 	 * straight to the busy poll loop.
 	 */
 	if (blk_mq_poll_hybrid_sleep(q, hctx, rq))
-		return true;
+		return 1;
 
 	hctx->poll_considered++;
 
@@ -3332,30 +3332,30 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 		if (ret > 0) {
 			hctx->poll_success++;
 			__set_current_state(TASK_RUNNING);
-			return true;
+			return ret;
 		}
 
 		if (signal_pending_state(state, current))
 			__set_current_state(TASK_RUNNING);
 
 		if (current->state == TASK_RUNNING)
-			return true;
+			return 1;
 		if (ret < 0)
 			break;
 		cpu_relax();
 	}
 
 	__set_current_state(TASK_RUNNING);
-	return false;
+	return 0;
 }
 
-static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
+static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 {
 	struct blk_mq_hw_ctx *hctx;
 	struct request *rq;
 
 	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
-		return false;
+		return 0;
 
 	hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
 	if (!blk_qc_t_is_internal(cookie))
@@ -3369,7 +3369,7 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 		 * so we should be safe with just the NULL check.
 		 */
 		if (!rq)
-			return false;
+			return 0;
 	}
 
 	return __blk_mq_poll(hctx, rq);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index b82b0d3ca39a..65539c8df11d 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -220,11 +220,11 @@ static blk_qc_t nvme_ns_head_make_request(struct request_queue *q,
 	return ret;
 }
 
-static bool nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc)
+static int nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc)
 {
 	struct nvme_ns_head *head = q->queuedata;
 	struct nvme_ns *ns;
-	bool found = false;
+	int found = 0;
 	int srcu_idx;
 
 	srcu_idx = srcu_read_lock(&head->srcu);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1ad6eafc43f2..e97c0a3b2262 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -283,7 +283,7 @@ static inline unsigned short req_get_ioprio(struct request *req)
 struct blk_queue_ctx;
 
 typedef blk_qc_t (make_request_fn) (struct request_queue *q, struct bio *bio);
-typedef bool (poll_q_fn) (struct request_queue *q, blk_qc_t);
+typedef int (poll_q_fn) (struct request_queue *q, blk_qc_t);
 
 struct bio_vec;
 typedef int (dma_drain_needed_fn)(struct request *);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] nvme-fc: remove unused poll implementation
  2018-11-17 21:43 [PATCHSET v4] Various block optimizations Jens Axboe
  2018-11-17 21:43 ` [PATCH 1/7] block: avoid ordered task state change for polled IO Jens Axboe
  2018-11-17 21:43 ` [PATCH 2/7] block: have ->poll_fn() return number of entries polled Jens Axboe
@ 2018-11-17 21:43 ` Jens Axboe
  2018-11-19  7:59   ` Christoph Hellwig
  2018-11-17 21:43 ` [PATCH 4/7] blk-mq: when polling for IO, look for any completion Jens Axboe
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-11-17 21:43 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

This relies on the fc taget ops setting ->poll_queue, which
nobody does. Otherwise it just checks if something has
completed, which isn't very useful.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/nvme/host/fc.c | 33 ---------------------------------
 1 file changed, 33 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 98c3c77f48f6..de797c641265 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2302,38 +2302,6 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return nvme_fc_start_fcp_op(ctrl, queue, op, data_len, io_dir);
 }
 
-static struct blk_mq_tags *
-nvme_fc_tagset(struct nvme_fc_queue *queue)
-{
-	if (queue->qnum == 0)
-		return queue->ctrl->admin_tag_set.tags[queue->qnum];
-
-	return queue->ctrl->tag_set.tags[queue->qnum - 1];
-}
-
-static int
-nvme_fc_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
-
-{
-	struct nvme_fc_queue *queue = hctx->driver_data;
-	struct nvme_fc_ctrl *ctrl = queue->ctrl;
-	struct request *req;
-	struct nvme_fc_fcp_op *op;
-
-	req = blk_mq_tag_to_rq(nvme_fc_tagset(queue), tag);
-	if (!req)
-		return 0;
-
-	op = blk_mq_rq_to_pdu(req);
-
-	if ((atomic_read(&op->state) == FCPOP_STATE_ACTIVE) &&
-		 (ctrl->lport->ops->poll_queue))
-		ctrl->lport->ops->poll_queue(&ctrl->lport->localport,
-						 queue->lldd_handle);
-
-	return ((atomic_read(&op->state) != FCPOP_STATE_ACTIVE));
-}
-
 static void
 nvme_fc_submit_async_event(struct nvme_ctrl *arg)
 {
@@ -2404,7 +2372,6 @@ static const struct blk_mq_ops nvme_fc_mq_ops = {
 	.init_request	= nvme_fc_init_request,
 	.exit_request	= nvme_fc_exit_request,
 	.init_hctx	= nvme_fc_init_hctx,
-	.poll		= nvme_fc_poll,
 	.timeout	= nvme_fc_timeout,
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] blk-mq: when polling for IO, look for any completion
  2018-11-17 21:43 [PATCHSET v4] Various block optimizations Jens Axboe
                   ` (2 preceding siblings ...)
  2018-11-17 21:43 ` [PATCH 3/7] nvme-fc: remove unused poll implementation Jens Axboe
@ 2018-11-17 21:43 ` Jens Axboe
  2018-11-19  8:02   ` Christoph Hellwig
  2018-11-17 21:43 ` [PATCH 5/7] blk-mq: remove 'tag' parameter from mq_ops->poll() Jens Axboe
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-11-17 21:43 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

If we want to support async IO polling, then we have to allow
finding completions that aren't just for the one we are
looking for. Always pass in -1 to the mq_ops->poll() helper,
and have that return how many events were found in this poll
loop.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c           | 69 +++++++++++++++++++++++-----------------
 drivers/nvme/host/pci.c  | 14 ++++----
 drivers/nvme/host/rdma.c | 36 ++++++++++-----------
 3 files changed, 62 insertions(+), 57 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 52b1c97cd7c6..3ca00d712158 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3266,9 +3266,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	 *  0:	use half of prev avg
 	 * >0:	use this specific value
 	 */
-	if (q->poll_nsec == -1)
-		return false;
-	else if (q->poll_nsec > 0)
+	if (q->poll_nsec > 0)
 		nsecs = q->poll_nsec;
 	else
 		nsecs = blk_mq_poll_nsecs(q, hctx, rq);
@@ -3305,21 +3303,36 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	return true;
 }
 
-static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
+static bool blk_mq_poll_hybrid(struct request_queue *q,
+			       struct blk_mq_hw_ctx *hctx, blk_qc_t cookie)
+{
+	struct request *rq;
+
+	if (q->poll_nsec == -1)
+		return false;
+
+	if (!blk_qc_t_is_internal(cookie))
+		rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
+	else {
+		rq = blk_mq_tag_to_rq(hctx->sched_tags, blk_qc_t_to_tag(cookie));
+		/*
+		 * With scheduling, if the request has completed, we'll
+		 * get a NULL return here, as we clear the sched tag when
+		 * that happens. The request still remains valid, like always,
+		 * so we should be safe with just the NULL check.
+		 */
+		if (!rq)
+			return false;
+	}
+
+	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
+}
+
+static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	long state;
 
-	/*
-	 * If we sleep, have the caller restart the poll loop to reset
-	 * the state. Like for the other success return cases, the
-	 * caller is responsible for checking if the IO completed. If
-	 * the IO isn't complete, we'll get called again and will go
-	 * straight to the busy poll loop.
-	 */
-	if (blk_mq_poll_hybrid_sleep(q, hctx, rq))
-		return 1;
-
 	hctx->poll_considered++;
 
 	state = current->state;
@@ -3328,7 +3341,7 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 
 		hctx->poll_invoked++;
 
-		ret = q->mq_ops->poll(hctx, rq->tag);
+		ret = q->mq_ops->poll(hctx, -1U);
 		if (ret > 0) {
 			hctx->poll_success++;
 			__set_current_state(TASK_RUNNING);
@@ -3352,27 +3365,23 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 {
 	struct blk_mq_hw_ctx *hctx;
-	struct request *rq;
 
 	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
 		return 0;
 
 	hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
-	if (!blk_qc_t_is_internal(cookie))
-		rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
-	else {
-		rq = blk_mq_tag_to_rq(hctx->sched_tags, blk_qc_t_to_tag(cookie));
-		/*
-		 * With scheduling, if the request has completed, we'll
-		 * get a NULL return here, as we clear the sched tag when
-		 * that happens. The request still remains valid, like always,
-		 * so we should be safe with just the NULL check.
-		 */
-		if (!rq)
-			return 0;
-	}
 
-	return __blk_mq_poll(hctx, rq);
+	/*
+	 * If we sleep, have the caller restart the poll loop to reset
+	 * the state. Like for the other success return cases, the
+	 * caller is responsible for checking if the IO completed. If
+	 * the IO isn't complete, we'll get called again and will go
+	 * straight to the busy poll loop.
+	 */
+	if (blk_mq_poll_hybrid(q, hctx, cookie))
+		return 1;
+
+	return __blk_mq_poll(hctx);
 }
 
 unsigned int blk_mq_rq_cpu(struct request *rq)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 89874e23e422..1742c8ab8196 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1012,15 +1012,15 @@ static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
 	}
 }
 
-static inline bool nvme_process_cq(struct nvme_queue *nvmeq, u16 *start,
-		u16 *end, int tag)
+static inline int nvme_process_cq(struct nvme_queue *nvmeq, u16 *start,
+				  u16 *end, unsigned int tag)
 {
-	bool found = false;
+	int found = 0;
 
 	*start = nvmeq->cq_head;
-	while (!found && nvme_cqe_pending(nvmeq)) {
-		if (nvmeq->cqes[nvmeq->cq_head].command_id == tag)
-			found = true;
+	while (nvme_cqe_pending(nvmeq)) {
+		if (tag == -1U || nvmeq->cqes[nvmeq->cq_head].command_id == tag)
+			found++;
 		nvme_update_cq_head(nvmeq);
 	}
 	*end = nvmeq->cq_head;
@@ -1062,7 +1062,7 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
 static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 {
 	u16 start, end;
-	bool found;
+	int found;
 
 	if (!nvme_cqe_pending(nvmeq))
 		return 0;
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index d181cafedc58..53e44efc6d32 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1409,12 +1409,11 @@ static void nvme_rdma_submit_async_event(struct nvme_ctrl *arg)
 	WARN_ON_ONCE(ret);
 }
 
-static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
-		struct nvme_completion *cqe, struct ib_wc *wc, int tag)
+static void nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
+		struct nvme_completion *cqe, struct ib_wc *wc)
 {
 	struct request *rq;
 	struct nvme_rdma_request *req;
-	int ret = 0;
 
 	rq = blk_mq_tag_to_rq(nvme_rdma_tagset(queue), cqe->command_id);
 	if (!rq) {
@@ -1422,7 +1421,7 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
 			"tag 0x%x on QP %#x not found\n",
 			cqe->command_id, queue->qp->qp_num);
 		nvme_rdma_error_recovery(queue->ctrl);
-		return ret;
+		return;
 	}
 	req = blk_mq_rq_to_pdu(rq);
 
@@ -1437,6 +1436,8 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
 			nvme_rdma_error_recovery(queue->ctrl);
 		}
 	} else if (req->mr) {
+		int ret;
+
 		ret = nvme_rdma_inv_rkey(queue, req);
 		if (unlikely(ret < 0)) {
 			dev_err(queue->ctrl->ctrl.device,
@@ -1445,19 +1446,14 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
 			nvme_rdma_error_recovery(queue->ctrl);
 		}
 		/* the local invalidation completion will end the request */
-		return 0;
+		return;
 	}
 
-	if (refcount_dec_and_test(&req->ref)) {
-		if (rq->tag == tag)
-			ret = 1;
+	if (refcount_dec_and_test(&req->ref))
 		nvme_end_request(rq, req->status, req->result);
-	}
-
-	return ret;
 }
 
-static int __nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc, int tag)
+static void __nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 {
 	struct nvme_rdma_qe *qe =
 		container_of(wc->wr_cqe, struct nvme_rdma_qe, cqe);
@@ -1465,11 +1461,10 @@ static int __nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc, int tag)
 	struct ib_device *ibdev = queue->device->dev;
 	struct nvme_completion *cqe = qe->data;
 	const size_t len = sizeof(struct nvme_completion);
-	int ret = 0;
 
 	if (unlikely(wc->status != IB_WC_SUCCESS)) {
 		nvme_rdma_wr_error(cq, wc, "RECV");
-		return 0;
+		return;
 	}
 
 	ib_dma_sync_single_for_cpu(ibdev, qe->dma, len, DMA_FROM_DEVICE);
@@ -1484,16 +1479,15 @@ static int __nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc, int tag)
 		nvme_complete_async_event(&queue->ctrl->ctrl, cqe->status,
 				&cqe->result);
 	else
-		ret = nvme_rdma_process_nvme_rsp(queue, cqe, wc, tag);
+		nvme_rdma_process_nvme_rsp(queue, cqe, wc);
 	ib_dma_sync_single_for_device(ibdev, qe->dma, len, DMA_FROM_DEVICE);
 
 	nvme_rdma_post_recv(queue, qe);
-	return ret;
 }
 
 static void nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 {
-	__nvme_rdma_recv_done(cq, wc, -1);
+	__nvme_rdma_recv_done(cq, wc);
 }
 
 static int nvme_rdma_conn_established(struct nvme_rdma_queue *queue)
@@ -1758,10 +1752,12 @@ static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
 		struct ib_cqe *cqe = wc.wr_cqe;
 
 		if (cqe) {
-			if (cqe->done == nvme_rdma_recv_done)
-				found |= __nvme_rdma_recv_done(cq, &wc, tag);
-			else
+			if (cqe->done == nvme_rdma_recv_done) {
+				__nvme_rdma_recv_done(cq, &wc);
+				found++;
+			} else {
 				cqe->done(cq, &wc);
+			}
 		}
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] blk-mq: remove 'tag' parameter from mq_ops->poll()
  2018-11-17 21:43 [PATCHSET v4] Various block optimizations Jens Axboe
                   ` (3 preceding siblings ...)
  2018-11-17 21:43 ` [PATCH 4/7] blk-mq: when polling for IO, look for any completion Jens Axboe
@ 2018-11-17 21:43 ` Jens Axboe
  2018-11-19  8:02   ` Christoph Hellwig
  2018-11-17 21:43 ` [PATCH 6/7] block: make blk_poll() take a parameter on whether to spin or not Jens Axboe
  2018-11-17 21:43 ` [PATCH 7/7] blk-mq: ensure mq_ops ->poll() is entered at least once Jens Axboe
  6 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-11-17 21:43 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

We always pass in -1 now and none of the callers use the tag value,
remove the parameter.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c           | 2 +-
 drivers/nvme/host/pci.c  | 8 ++++----
 drivers/nvme/host/rdma.c | 2 +-
 include/linux/blk-mq.h   | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ca00d712158..10a14baa04ff 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3341,7 +3341,7 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
 
 		hctx->poll_invoked++;
 
-		ret = q->mq_ops->poll(hctx, -1U);
+		ret = q->mq_ops->poll(hctx);
 		if (ret > 0) {
 			hctx->poll_success++;
 			__set_current_state(TASK_RUNNING);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1742c8ab8196..1280cc327096 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1075,14 +1075,14 @@ static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 	return found;
 }
 
-static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
+static int nvme_poll(struct blk_mq_hw_ctx *hctx)
 {
 	struct nvme_queue *nvmeq = hctx->driver_data;
 
-	return __nvme_poll(nvmeq, tag);
+	return __nvme_poll(nvmeq, -1);
 }
 
-static int nvme_poll_noirq(struct blk_mq_hw_ctx *hctx, unsigned int tag)
+static int nvme_poll_noirq(struct blk_mq_hw_ctx *hctx)
 {
 	struct nvme_queue *nvmeq = hctx->driver_data;
 	u16 start, end;
@@ -1092,7 +1092,7 @@ static int nvme_poll_noirq(struct blk_mq_hw_ctx *hctx, unsigned int tag)
 		return 0;
 
 	spin_lock(&nvmeq->cq_lock);
-	found = nvme_process_cq(nvmeq, &start, &end, tag);
+	found = nvme_process_cq(nvmeq, &start, &end, -1);
 	spin_unlock(&nvmeq->cq_lock);
 
 	nvme_complete_cqes(nvmeq, start, end);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 53e44efc6d32..4b2ba563acdb 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1741,7 +1741,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return BLK_STS_IOERR;
 }
 
-static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
+static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx)
 {
 	struct nvme_rdma_queue *queue = hctx->driver_data;
 	struct ib_cq *cq = queue->ib_cq;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 929e8abc5535..ca0520ca6437 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -132,7 +132,7 @@ typedef void (exit_request_fn)(struct blk_mq_tag_set *set, struct request *,
 typedef bool (busy_iter_fn)(struct blk_mq_hw_ctx *, struct request *, void *,
 		bool);
 typedef bool (busy_tag_iter_fn)(struct request *, void *, bool);
-typedef int (poll_fn)(struct blk_mq_hw_ctx *, unsigned int);
+typedef int (poll_fn)(struct blk_mq_hw_ctx *);
 typedef int (map_queues_fn)(struct blk_mq_tag_set *set);
 typedef bool (busy_fn)(struct request_queue *);
 typedef void (complete_fn)(struct request *);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] block: make blk_poll() take a parameter on whether to spin or not
  2018-11-17 21:43 [PATCHSET v4] Various block optimizations Jens Axboe
                   ` (4 preceding siblings ...)
  2018-11-17 21:43 ` [PATCH 5/7] blk-mq: remove 'tag' parameter from mq_ops->poll() Jens Axboe
@ 2018-11-17 21:43 ` Jens Axboe
  2018-11-19  8:05   ` Christoph Hellwig
  2018-11-17 21:43 ` [PATCH 7/7] blk-mq: ensure mq_ops ->poll() is entered at least once Jens Axboe
  6 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-11-17 21:43 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

blk_poll() has always kept spinning until it found an IO. This is
fine for SYNC polling, since we need to find one request we have
pending, but in preparation for ASYNC polling it can be beneficial
to just check if we have any entries available or not.

Existing callers are converted to pass in 'spin == true', to retain
the old behavior.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c                  |  4 ++--
 block/blk-mq.c                    | 10 +++++-----
 drivers/nvme/host/multipath.c     |  4 ++--
 drivers/nvme/target/io-cmd-bdev.c |  2 +-
 fs/block_dev.c                    |  4 ++--
 fs/direct-io.c                    |  2 +-
 fs/iomap.c                        |  2 +-
 include/linux/blkdev.h            |  4 ++--
 mm/page_io.c                      |  2 +-
 9 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 0b684a520a11..ccf40f853afd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1284,14 +1284,14 @@ blk_qc_t submit_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(submit_bio);
 
-bool blk_poll(struct request_queue *q, blk_qc_t cookie)
+bool blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	if (!q->poll_fn || !blk_qc_t_valid(cookie))
 		return false;
 
 	if (current->plug)
 		blk_flush_plug_list(current->plug, false);
-	return q->poll_fn(q, cookie);
+	return q->poll_fn(q, cookie, spin);
 }
 EXPORT_SYMBOL_GPL(blk_poll);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 10a14baa04ff..0a2847d9248b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -38,7 +38,7 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
-static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
+static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 
@@ -3328,7 +3328,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
 }
 
-static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
+static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, bool spin)
 {
 	struct request_queue *q = hctx->queue;
 	long state;
@@ -3353,7 +3353,7 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
 
 		if (current->state == TASK_RUNNING)
 			return 1;
-		if (ret < 0)
+		if (ret < 0 || !spin)
 			break;
 		cpu_relax();
 	}
@@ -3362,7 +3362,7 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
 	return 0;
 }
 
-static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
+static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	struct blk_mq_hw_ctx *hctx;
 
@@ -3381,7 +3381,7 @@ static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 	if (blk_mq_poll_hybrid(q, hctx, cookie))
 		return 1;
 
-	return __blk_mq_poll(hctx);
+	return __blk_mq_poll(hctx, spin);
 }
 
 unsigned int blk_mq_rq_cpu(struct request *rq)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 65539c8df11d..c83bb3302684 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -220,7 +220,7 @@ static blk_qc_t nvme_ns_head_make_request(struct request_queue *q,
 	return ret;
 }
 
-static int nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc)
+static int nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc, bool spin)
 {
 	struct nvme_ns_head *head = q->queuedata;
 	struct nvme_ns *ns;
@@ -230,7 +230,7 @@ static int nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc)
 	srcu_idx = srcu_read_lock(&head->srcu);
 	ns = srcu_dereference(head->current_path[numa_node_id()], &head->srcu);
 	if (likely(ns && nvme_path_is_optimized(ns)))
-		found = ns->queue->poll_fn(q, qc);
+		found = ns->queue->poll_fn(q, qc, spin);
 	srcu_read_unlock(&head->srcu, srcu_idx);
 	return found;
 }
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index c1ec3475a140..f6971b45bc54 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -116,7 +116,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 
 	cookie = submit_bio(bio);
 
-	blk_poll(bdev_get_queue(req->ns->bdev), cookie);
+	blk_poll(bdev_get_queue(req->ns->bdev), cookie, true);
 }
 
 static void nvmet_bdev_execute_flush(struct nvmet_req *req)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 64ba27b8b754..d233a59ea364 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -243,7 +243,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 			break;
 
 		if (!(iocb->ki_flags & IOCB_HIPRI) ||
-		    !blk_poll(bdev_get_queue(bdev), qc))
+		    !blk_poll(bdev_get_queue(bdev), qc, true))
 			io_schedule();
 	}
 	__set_current_state(TASK_RUNNING);
@@ -423,7 +423,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 			break;
 
 		if (!(iocb->ki_flags & IOCB_HIPRI) ||
-		    !blk_poll(bdev_get_queue(bdev), qc))
+		    !blk_poll(bdev_get_queue(bdev), qc, true))
 			io_schedule();
 	}
 	__set_current_state(TASK_RUNNING);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index ea07d5a34317..a5a4e5a1423e 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -518,7 +518,7 @@ static struct bio *dio_await_one(struct dio *dio)
 		dio->waiter = current;
 		spin_unlock_irqrestore(&dio->bio_lock, flags);
 		if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
-		    !blk_poll(dio->bio_disk->queue, dio->bio_cookie))
+		    !blk_poll(dio->bio_disk->queue, dio->bio_cookie, true))
 			io_schedule();
 		/* wake up sets us TASK_RUNNING */
 		spin_lock_irqsave(&dio->bio_lock, flags);
diff --git a/fs/iomap.c b/fs/iomap.c
index c5df035ace6f..74c1f37f0fd6 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1896,7 +1896,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			if (!(iocb->ki_flags & IOCB_HIPRI) ||
 			    !dio->submit.last_queue ||
 			    !blk_poll(dio->submit.last_queue,
-					 dio->submit.cookie))
+					 dio->submit.cookie, true))
 				io_schedule();
 		}
 		__set_current_state(TASK_RUNNING);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e97c0a3b2262..c85ea63c6406 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -283,7 +283,7 @@ static inline unsigned short req_get_ioprio(struct request *req)
 struct blk_queue_ctx;
 
 typedef blk_qc_t (make_request_fn) (struct request_queue *q, struct bio *bio);
-typedef int (poll_q_fn) (struct request_queue *q, blk_qc_t);
+typedef int (poll_q_fn) (struct request_queue *q, blk_qc_t, bool spin);
 
 struct bio_vec;
 typedef int (dma_drain_needed_fn)(struct request *);
@@ -867,7 +867,7 @@ extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
 int blk_status_to_errno(blk_status_t status);
 blk_status_t errno_to_blk_status(int errno);
 
-bool blk_poll(struct request_queue *q, blk_qc_t cookie);
+bool blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin);
 
 static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
 {
diff --git a/mm/page_io.c b/mm/page_io.c
index a7271fa481f6..5bdfd21c1bd9 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -410,7 +410,7 @@ int swap_readpage(struct page *page, bool synchronous)
 		if (!READ_ONCE(bio->bi_private))
 			break;
 
-		if (!blk_poll(disk->queue, qc))
+		if (!blk_poll(disk->queue, qc, true))
 			break;
 	}
 	__set_current_state(TASK_RUNNING);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] blk-mq: ensure mq_ops ->poll() is entered at least once
  2018-11-17 21:43 [PATCHSET v4] Various block optimizations Jens Axboe
                   ` (5 preceding siblings ...)
  2018-11-17 21:43 ` [PATCH 6/7] block: make blk_poll() take a parameter on whether to spin or not Jens Axboe
@ 2018-11-17 21:43 ` Jens Axboe
  2018-11-19  8:05   ` Christoph Hellwig
  6 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-11-17 21:43 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe

Right now we immediately bail if need_resched() is true, but
we need to do at least one loop in case we have entries waiting.
So just invert the need_resched() check, putting it at the
bottom of the loop.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0a2847d9248b..4769c975b8c8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3336,7 +3336,7 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, bool spin)
 	hctx->poll_considered++;
 
 	state = current->state;
-	while (!need_resched()) {
+	do {
 		int ret;
 
 		hctx->poll_invoked++;
@@ -3356,7 +3356,7 @@ static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx, bool spin)
 		if (ret < 0 || !spin)
 			break;
 		cpu_relax();
-	}
+	} while (!need_resched());
 
 	__set_current_state(TASK_RUNNING);
 	return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] block: avoid ordered task state change for polled IO
  2018-11-17 21:43 ` [PATCH 1/7] block: avoid ordered task state change for polled IO Jens Axboe
@ 2018-11-19  7:58   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2018-11-19  7:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Sat, Nov 17, 2018 at 02:43:48PM -0700, Jens Axboe wrote:
> For the core poll helper, the task state setting don't need
> to imply any atomics, as it's the current task itself that
> is being modified and we're not going to sleep.
> 
> For IRQ driven, the wakeup path have the necessary barriers
> to not need us using the heavy handed version of the task
> state setting.

Looks good (except for the horizontally challegened commit log):

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] block: have ->poll_fn() return number of entries polled
  2018-11-17 21:43 ` [PATCH 2/7] block: have ->poll_fn() return number of entries polled Jens Axboe
@ 2018-11-19  7:58   ` Christoph Hellwig
  2018-11-25 10:52   ` Sagi Grimberg
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2018-11-19  7:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] nvme-fc: remove unused poll implementation
  2018-11-17 21:43 ` [PATCH 3/7] nvme-fc: remove unused poll implementation Jens Axboe
@ 2018-11-19  7:59   ` Christoph Hellwig
  2018-11-19 15:19     ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2018-11-19  7:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Sat, Nov 17, 2018 at 02:43:50PM -0700, Jens Axboe wrote:
> This relies on the fc taget ops setting ->poll_queue, which
> nobody does. Otherwise it just checks if something has
> completed, which isn't very useful.

Please also remove the unused ->poll_queue method.

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] blk-mq: when polling for IO, look for any completion
  2018-11-17 21:43 ` [PATCH 4/7] blk-mq: when polling for IO, look for any completion Jens Axboe
@ 2018-11-19  8:02   ` Christoph Hellwig
  2018-11-19 15:20     ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2018-11-19  8:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 52b1c97cd7c6..3ca00d712158 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3266,9 +3266,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
>  	 *  0:	use half of prev avg
>  	 * >0:	use this specific value
>  	 */
> -	if (q->poll_nsec == -1)
> -		return false;
> -	else if (q->poll_nsec > 0)
> +	if (q->poll_nsec > 0)
>  		nsecs = q->poll_nsec;
>  	else
>  		nsecs = blk_mq_poll_nsecs(q, hctx, rq);

The above comment now doesn't match the code here as the -1 case
is handled elsewhere.

> +static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
>  {
>  	struct request_queue *q = hctx->queue;
>  	long state;

Can you merge __blk_mq_poll into blk_mq_poll now that blk_mq_poll
is pretty trivial?

>  static void nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc)
>  {
> -	__nvme_rdma_recv_done(cq, wc, -1);
> +	__nvme_rdma_recv_done(cq, wc);
>  }

__nvme_rdma_recv_done and nvme_rdma_recv_done can be merged now.

>  static int nvme_rdma_conn_established(struct nvme_rdma_queue *queue)
> @@ -1758,10 +1752,12 @@ static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
>  		struct ib_cqe *cqe = wc.wr_cqe;
>  
>  		if (cqe) {
> -			if (cqe->done == nvme_rdma_recv_done)
> -				found |= __nvme_rdma_recv_done(cq, &wc, tag);
> -			else
> +			if (cqe->done == nvme_rdma_recv_done) {
> +				__nvme_rdma_recv_done(cq, &wc);
> +				found++;
> +			} else {
>  				cqe->done(cq, &wc);
> +			}

And we should probably look into separate poll queues for RDMA as well
while we're at it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] blk-mq: remove 'tag' parameter from mq_ops->poll()
  2018-11-17 21:43 ` [PATCH 5/7] blk-mq: remove 'tag' parameter from mq_ops->poll() Jens Axboe
@ 2018-11-19  8:02   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2018-11-19  8:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Sat, Nov 17, 2018 at 02:43:52PM -0700, Jens Axboe wrote:
> We always pass in -1 now and none of the callers use the tag value,
> remove the parameter.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] block: make blk_poll() take a parameter on whether to spin or not
  2018-11-17 21:43 ` [PATCH 6/7] block: make blk_poll() take a parameter on whether to spin or not Jens Axboe
@ 2018-11-19  8:05   ` Christoph Hellwig
  2018-11-19 15:21     ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2018-11-19  8:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

> -bool blk_poll(struct request_queue *q, blk_qc_t cookie)
> +bool blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)

I find the paramter name a little confusing.  Maybe wait_for_request,
although I don't particularly like that one either.  But we really need
to document the parameter well here, no matter what we end up naming
it.  And we should use a consistent name through the whole stack.

> index c1ec3475a140..f6971b45bc54 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -116,7 +116,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  
>  	cookie = submit_bio(bio);
>  
> -	blk_poll(bdev_get_queue(req->ns->bdev), cookie);
> +	blk_poll(bdev_get_queue(req->ns->bdev), cookie, true);

This opportunistic poll is pretty bogus now as we never set the HIPRI
flag and it should probably be removed in a prep patch.  We should then
later try to use a scheme similar to your aio polling for the nvme
target as well.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/7] blk-mq: ensure mq_ops ->poll() is entered at least once
  2018-11-17 21:43 ` [PATCH 7/7] blk-mq: ensure mq_ops ->poll() is entered at least once Jens Axboe
@ 2018-11-19  8:05   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2018-11-19  8:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme

On Sat, Nov 17, 2018 at 02:43:54PM -0700, Jens Axboe wrote:
> Right now we immediately bail if need_resched() is true, but
> we need to do at least one loop in case we have entries waiting.
> So just invert the need_resched() check, putting it at the
> bottom of the loop.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] nvme-fc: remove unused poll implementation
  2018-11-19  7:59   ` Christoph Hellwig
@ 2018-11-19 15:19     ` Jens Axboe
  2018-11-19 19:00       ` James Smart
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-11-19 15:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme

On 11/19/18 12:59 AM, Christoph Hellwig wrote:
> On Sat, Nov 17, 2018 at 02:43:50PM -0700, Jens Axboe wrote:
>> This relies on the fc taget ops setting ->poll_queue, which
>> nobody does. Otherwise it just checks if something has
>> completed, which isn't very useful.
> 
> Please also remove the unused ->poll_queue method.
> 
> Otherwise looks fine:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Sure, will do.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] blk-mq: when polling for IO, look for any completion
  2018-11-19  8:02   ` Christoph Hellwig
@ 2018-11-19 15:20     ` Jens Axboe
  0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2018-11-19 15:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme

On 11/19/18 1:02 AM, Christoph Hellwig wrote:
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 52b1c97cd7c6..3ca00d712158 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -3266,9 +3266,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
>>  	 *  0:	use half of prev avg
>>  	 * >0:	use this specific value
>>  	 */
>> -	if (q->poll_nsec == -1)
>> -		return false;
>> -	else if (q->poll_nsec > 0)
>> +	if (q->poll_nsec > 0)
>>  		nsecs = q->poll_nsec;
>>  	else
>>  		nsecs = blk_mq_poll_nsecs(q, hctx, rq);
> 
> The above comment now doesn't match the code here as the -1 case
> is handled elsewhere.

Good point, I'll fix the comment.

>> +static int __blk_mq_poll(struct blk_mq_hw_ctx *hctx)
>>  {
>>  	struct request_queue *q = hctx->queue;
>>  	long state;
> 
> Can you merge __blk_mq_poll into blk_mq_poll now that blk_mq_poll
> is pretty trivial?
> 
>>  static void nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc)
>>  {
>> -	__nvme_rdma_recv_done(cq, wc, -1);
>> +	__nvme_rdma_recv_done(cq, wc);
>>  }
> 
> __nvme_rdma_recv_done and nvme_rdma_recv_done can be merged now.

I'll fold them.

>>  static int nvme_rdma_conn_established(struct nvme_rdma_queue *queue)
>> @@ -1758,10 +1752,12 @@ static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
>>  		struct ib_cqe *cqe = wc.wr_cqe;
>>  
>>  		if (cqe) {
>> -			if (cqe->done == nvme_rdma_recv_done)
>> -				found |= __nvme_rdma_recv_done(cq, &wc, tag);
>> -			else
>> +			if (cqe->done == nvme_rdma_recv_done) {
>> +				__nvme_rdma_recv_done(cq, &wc);
>> +				found++;
>> +			} else {
>>  				cqe->done(cq, &wc);
>> +			}
> 
> And we should probably look into separate poll queues for RDMA as well
> while we're at it.

I'll leave that as an exercise for someone else :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] block: make blk_poll() take a parameter on whether to spin or not
  2018-11-19  8:05   ` Christoph Hellwig
@ 2018-11-19 15:21     ` Jens Axboe
  2018-11-20  9:11       ` Sagi Grimberg
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-11-19 15:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme

On 11/19/18 1:05 AM, Christoph Hellwig wrote:
>> -bool blk_poll(struct request_queue *q, blk_qc_t cookie)
>> +bool blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
> 
> I find the paramter name a little confusing.  Maybe wait_for_request,
> although I don't particularly like that one either.  But we really need
> to document the parameter well here, no matter what we end up naming
> it.  And we should use a consistent name through the whole stack.

spin_until_found? I don't like using 'wait*' as we're not waiting.

>> index c1ec3475a140..f6971b45bc54 100644
>> --- a/drivers/nvme/target/io-cmd-bdev.c
>> +++ b/drivers/nvme/target/io-cmd-bdev.c
>> @@ -116,7 +116,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>>  
>>  	cookie = submit_bio(bio);
>>  
>> -	blk_poll(bdev_get_queue(req->ns->bdev), cookie);
>> +	blk_poll(bdev_get_queue(req->ns->bdev), cookie, true);
> 
> This opportunistic poll is pretty bogus now as we never set the HIPRI
> flag and it should probably be removed in a prep patch.  We should then
> later try to use a scheme similar to your aio polling for the nvme
> target as well.

I'll kill it in a pre-patch.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] nvme-fc: remove unused poll implementation
  2018-11-19 15:19     ` Jens Axboe
@ 2018-11-19 19:00       ` James Smart
  0 siblings, 0 replies; 24+ messages in thread
From: James Smart @ 2018-11-19 19:00 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig; +Cc: linux-block, linux-nvme



On 11/19/2018 7:19 AM, Jens Axboe wrote:
> On 11/19/18 12:59 AM, Christoph Hellwig wrote:
>> On Sat, Nov 17, 2018 at 02:43:50PM -0700, Jens Axboe wrote:
>>> This relies on the fc taget ops setting ->poll_queue, which
>>> nobody does. Otherwise it just checks if something has
>>> completed, which isn't very useful.
>> Please also remove the unused ->poll_queue method.
>>
>> Otherwise looks fine:
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Sure, will do.
>

Looks fine on my end

Reviewed-by:  James Smart <jsmart2021@gmail.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] block: make blk_poll() take a parameter on whether to spin or not
  2018-11-19 15:21     ` Jens Axboe
@ 2018-11-20  9:11       ` Sagi Grimberg
  0 siblings, 0 replies; 24+ messages in thread
From: Sagi Grimberg @ 2018-11-20  9:11 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig; +Cc: linux-block, linux-nvme


>> This opportunistic poll is pretty bogus now as we never set the HIPRI
>> flag and it should probably be removed in a prep patch.  We should then
>> later try to use a scheme similar to your aio polling for the nvme
>> target as well.
> 
> I'll kill it in a pre-patch.

Agreed..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] block: have ->poll_fn() return number of entries polled
  2018-11-17 21:43 ` [PATCH 2/7] block: have ->poll_fn() return number of entries polled Jens Axboe
  2018-11-19  7:58   ` Christoph Hellwig
@ 2018-11-25 10:52   ` Sagi Grimberg
  2018-11-25 13:41     ` Jens Axboe
  1 sibling, 1 reply; 24+ messages in thread
From: Sagi Grimberg @ 2018-11-25 10:52 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme

> We currently only really support sync poll, ie poll with 1
> IO in flight. This prepares us for supporting async poll.

Hey Jens,

So are we sure that this is fine to simply replace the
poll functionality? you say that that we support poll
with only 1 I/O inflight but is it entirely true?
semantically speaking?

The interface is selective polling, which means that
the user can have more than a single I/O inflight but wants
to poll for a specific one that it really cares about.

Would this introduce a regression for users that rely
on preadv2 to basically "complete *this* IO as fast as possible"?

Note that I like the proposed direction, I'm merely questioning
if it is OK to simply change how selective polling worked until
today instead of introducing a separate blk_poll_all(q) (but with
a better name) which would be wired up to aio polling and friends.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] block: have ->poll_fn() return number of entries polled
  2018-11-25 10:52   ` Sagi Grimberg
@ 2018-11-25 13:41     ` Jens Axboe
  2018-11-27 10:10       ` Sagi Grimberg
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-11-25 13:41 UTC (permalink / raw)
  To: Sagi Grimberg, linux-block, linux-nvme

On 11/25/18 3:52 AM, Sagi Grimberg wrote:
>> We currently only really support sync poll, ie poll with 1
>> IO in flight. This prepares us for supporting async poll.
> 
> Hey Jens,
> 
> So are we sure that this is fine to simply replace the
> poll functionality? you say that that we support poll
> with only 1 I/O inflight but is it entirely true?
> semantically speaking?

It is, unless you have multiple threads all doing polling.  Which is
pretty inefficient, as I'm sure you know.

> The interface is selective polling, which means that
> the user can have more than a single I/O inflight but wants
> to poll for a specific one that it really cares about.
> 
> Would this introduce a regression for users that rely
> on preadv2 to basically "complete *this* IO as fast as possible"?

No, it'll be exactly the same, and believe me, I've done plenty of
performance testing to ensure that it works well.  In fact, with the
changes queued up and this on top, polling is both faster and more
efficient than it ever has been before, for both the classic
preadv2/pwritev2 and the async polled IO.

> Note that I like the proposed direction, I'm merely questioning
> if it is OK to simply change how selective polling worked until
> today instead of introducing a separate blk_poll_all(q) (but with
> a better name) which would be wired up to aio polling and friends.

I think that would be useless. For SYNC type polling with one thread, it
doesn't matter if we're looking for a particular IO, or just any IO.
Once your into two threads, you're already wasting huge amounts of CPU,
just to get to QD=2. The poll loop handles finding each others IOs just
fine, so there's no worry on that side.

Polling was originally designed for the strictly SYNC interfaces, and
since we had the cookie anyway, it was tempting to look for a specific
IO. I think that was a mistake, as it assumed that we'd never do async
polling.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] block: have ->poll_fn() return number of entries polled
  2018-11-25 13:41     ` Jens Axboe
@ 2018-11-27 10:10       ` Sagi Grimberg
  2018-11-27 15:20         ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Sagi Grimberg @ 2018-11-27 10:10 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme


>>> We currently only really support sync poll, ie poll with 1
>>> IO in flight. This prepares us for supporting async poll.
>>
>> Hey Jens,
>>
>> So are we sure that this is fine to simply replace the
>> poll functionality? you say that that we support poll
>> with only 1 I/O inflight but is it entirely true?
>> semantically speaking?
> 
> It is, unless you have multiple threads all doing polling.  Which is
> pretty inefficient, as I'm sure you know.

I am, wasn't referring to multiple threads though..


>> The interface is selective polling, which means that
>> the user can have more than a single I/O inflight but wants
>> to poll for a specific one that it really cares about.
>>
>> Would this introduce a regression for users that rely
>> on preadv2 to basically "complete *this* IO as fast as possible"?
> 
> No, it'll be exactly the same, and believe me, I've done plenty of
> performance testing to ensure that it works well.  In fact, with the
> changes queued up and this on top, polling is both faster and more
> efficient than it ever has been before, for both the classic
> preadv2/pwritev2 and the async polled IO.

I don't argue that this has not been tested, I was referring to the
following use-case: app queues say 16 I/Os with io_submit and then
issues preadv2 for an "important" 512B sector.

With this proposed change, is poll_fn more likely to return with the
first completion it sees rather than continue poll until it sees that
specific I/O it polled for?

> I think that would be useless. For SYNC type polling with one thread, it
> doesn't matter if we're looking for a particular IO, or just any IO.
> Once your into two threads, you're already wasting huge amounts of CPU,
> just to get to QD=2. The poll loop handles finding each others IOs just
> fine, so there's no worry on that side.
> 
> Polling was originally designed for the strictly SYNC interfaces, and
> since we had the cookie anyway, it was tempting to look for a specific
> IO. I think that was a mistake, as it assumed that we'd never do async
> polling.

Not arguing. Just now that we already have the selective interface in
place I'm asking is it OK to change that (IFF the above question is
indeed real) esoteric as this use-case may be...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] block: have ->poll_fn() return number of entries polled
  2018-11-27 10:10       ` Sagi Grimberg
@ 2018-11-27 15:20         ` Jens Axboe
  0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2018-11-27 15:20 UTC (permalink / raw)
  To: Sagi Grimberg, linux-block, linux-nvme

On 11/27/18 3:10 AM, Sagi Grimberg wrote:
>> No, it'll be exactly the same, and believe me, I've done plenty of
>> performance testing to ensure that it works well.  In fact, with the
>> changes queued up and this on top, polling is both faster and more
>> efficient than it ever has been before, for both the classic
>> preadv2/pwritev2 and the async polled IO.
> 
> I don't argue that this has not been tested, I was referring to the
> following use-case: app queues say 16 I/Os with io_submit and then
> issues preadv2 for an "important" 512B sector.
> 
> With this proposed change, is poll_fn more likely to return with the
> first completion it sees rather than continue poll until it sees that
> specific I/O it polled for?

Any IO completion will make it return, that was the case before and that
is still the case. If it isn't your specific IO, then you get to go
around the loop again.

>> I think that would be useless. For SYNC type polling with one thread, it
>> doesn't matter if we're looking for a particular IO, or just any IO.
>> Once your into two threads, you're already wasting huge amounts of CPU,
>> just to get to QD=2. The poll loop handles finding each others IOs just
>> fine, so there's no worry on that side.
>>
>> Polling was originally designed for the strictly SYNC interfaces, and
>> since we had the cookie anyway, it was tempting to look for a specific
>> IO. I think that was a mistake, as it assumed that we'd never do async
>> polling.
> 
> Not arguing. Just now that we already have the selective interface in
> place I'm asking is it OK to change that (IFF the above question is
> indeed real) esoteric as this use-case may be...

It won't cause any real change for the sync use case. There's no
difference between multiple sync users of polling and a mixed async/sync
polled use case in that regard.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-11-27 15:20 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-17 21:43 [PATCHSET v4] Various block optimizations Jens Axboe
2018-11-17 21:43 ` [PATCH 1/7] block: avoid ordered task state change for polled IO Jens Axboe
2018-11-19  7:58   ` Christoph Hellwig
2018-11-17 21:43 ` [PATCH 2/7] block: have ->poll_fn() return number of entries polled Jens Axboe
2018-11-19  7:58   ` Christoph Hellwig
2018-11-25 10:52   ` Sagi Grimberg
2018-11-25 13:41     ` Jens Axboe
2018-11-27 10:10       ` Sagi Grimberg
2018-11-27 15:20         ` Jens Axboe
2018-11-17 21:43 ` [PATCH 3/7] nvme-fc: remove unused poll implementation Jens Axboe
2018-11-19  7:59   ` Christoph Hellwig
2018-11-19 15:19     ` Jens Axboe
2018-11-19 19:00       ` James Smart
2018-11-17 21:43 ` [PATCH 4/7] blk-mq: when polling for IO, look for any completion Jens Axboe
2018-11-19  8:02   ` Christoph Hellwig
2018-11-19 15:20     ` Jens Axboe
2018-11-17 21:43 ` [PATCH 5/7] blk-mq: remove 'tag' parameter from mq_ops->poll() Jens Axboe
2018-11-19  8:02   ` Christoph Hellwig
2018-11-17 21:43 ` [PATCH 6/7] block: make blk_poll() take a parameter on whether to spin or not Jens Axboe
2018-11-19  8:05   ` Christoph Hellwig
2018-11-19 15:21     ` Jens Axboe
2018-11-20  9:11       ` Sagi Grimberg
2018-11-17 21:43 ` [PATCH 7/7] blk-mq: ensure mq_ops ->poll() is entered at least once Jens Axboe
2018-11-19  8:05   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).