All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] NVMe: fix retry/error logic in nvme_queue_rq()
@ 2014-12-11 20:33 Jens Axboe
  2014-12-11 20:54 ` Keith Busch
  0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2014-12-11 20:33 UTC (permalink / raw)


Hi,

The logic around retrying and erroring IO in nvme_queue_rq() is broken
in a few ways:

- If we fail allocating dma memory for a discard, we return retry. We
  have the 'iod' stored in ->special, but we free the 'iod'.

- For a normal request, if we fail dma mapping of setting up prps, we
  have the same iod situation. Additionally, we haven't set the callback
  for the request yet, so we also potentially leak IOMMU resources.

Get rid of the ->special 'iod' store. The retry is uncommon enough that
it's not worth optimizing for or holding on to resources to attempt to
speed it up. Additionally, it's usually best practice to free any
request related resources when doing retries.

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 95f2310255ce..e92bdf4c68fc 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -621,24 +621,15 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(req);
 	struct nvme_iod *iod;
 	int psegs = req->nr_phys_segments;
-	int result = BLK_MQ_RQ_QUEUE_BUSY;
 	enum dma_data_direction dma_dir;
 	unsigned size = !(req->cmd_flags & REQ_DISCARD) ? blk_rq_bytes(req) :
 						sizeof(struct nvme_dsm_range);
 
-	/*
-	 * Requeued IO has already been prepped
-	 */
-	iod = req->special;
-	if (iod)
-		goto submit_iod;
-
 	iod = nvme_alloc_iod(psegs, size, ns->dev, GFP_ATOMIC);
 	if (!iod)
-		return result;
+		return BLK_MQ_RQ_QUEUE_BUSY;
 
 	iod->private = req;
-	req->special = iod;
 
 	if (req->cmd_flags & REQ_DISCARD) {
 		void *range;
@@ -651,7 +642,7 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 						GFP_ATOMIC,
 						&iod->first_dma);
 		if (!range)
-			goto finish_cmd;
+			goto retry_cmd;
 		iod_list(iod)[0] = (__le64 *)range;
 		iod->npages = 0;
 	} else if (psegs) {
@@ -659,22 +650,22 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 		sg_init_table(iod->sg, psegs);
 		iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
-		if (!iod->nents) {
-			result = BLK_MQ_RQ_QUEUE_ERROR;
-			goto finish_cmd;
-		}
+		if (!iod->nents)
+			goto error_cmd;
 
 		if (!dma_map_sg(nvmeq->q_dmadev, iod->sg, iod->nents, dma_dir))
-			goto finish_cmd;
+			goto retry_cmd;
 
-		if (blk_rq_bytes(req) != nvme_setup_prps(nvmeq->dev, iod,
-						blk_rq_bytes(req), GFP_ATOMIC))
-			goto finish_cmd;
+		if (blk_rq_bytes(req) !=
+                    nvme_setup_prps(nvmeq->dev, iod, blk_rq_bytes(req), GFP_ATOMIC)) {
+			dma_unmap_sg(&nvmeq->dev->pci_dev->dev, iod->sg,
+					iod->nents, dma_dir);
+			goto retry_cmd;
+		}
 	}
 
 	blk_mq_start_request(req);
 
- submit_iod:
 	nvme_set_info(cmd, iod, req_completion);
 	spin_lock_irq(&nvmeq->q_lock);
 	if (req->cmd_flags & REQ_DISCARD)
@@ -688,10 +679,12 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	spin_unlock_irq(&nvmeq->q_lock);
 	return BLK_MQ_RQ_QUEUE_OK;
 
- finish_cmd:
-	nvme_finish_cmd(nvmeq, req->tag, NULL);
+ error_cmd:
 	nvme_free_iod(nvmeq->dev, iod);
-	return result;
+	return BLK_MQ_RQ_QUEUE_ERROR;
+ retry_cmd:
+	nvme_free_iod(nvmeq->dev, iod);
+	return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
 static int nvme_process_cq(struct nvme_queue *nvmeq)

Signed-off-by: Jens Axboe <axboe at fb.com>

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH] NVMe: fix retry/error logic in nvme_queue_rq()
  2014-12-11 20:33 [PATCH] NVMe: fix retry/error logic in nvme_queue_rq() Jens Axboe
@ 2014-12-11 20:54 ` Keith Busch
  2014-12-11 20:58   ` Jens Axboe
  0 siblings, 1 reply; 3+ messages in thread
From: Keith Busch @ 2014-12-11 20:54 UTC (permalink / raw)


The real reason for reusing the nvme_iod was we needed some structure
to exist for the life of the command to track total time so we don't
retry indefinitely. Keeping the dma mapping was just a bonus.

Total time is tracked with "req->start_time" now, so nothing wrong with
removing the iod reuse.

Acked-by: Keith Busch <keith.busch at intel.com>

On Thu, 11 Dec 2014, Jens Axboe wrote:
> Hi,
>
> The logic around retrying and erroring IO in nvme_queue_rq() is broken
> in a few ways:
>
> - If we fail allocating dma memory for a discard, we return retry. We
>  have the 'iod' stored in ->special, but we free the 'iod'.
>
> - For a normal request, if we fail dma mapping of setting up prps, we
>  have the same iod situation. Additionally, we haven't set the callback
>  for the request yet, so we also potentially leak IOMMU resources.
>
> Get rid of the ->special 'iod' store. The retry is uncommon enough that
> it's not worth optimizing for or holding on to resources to attempt to
> speed it up. Additionally, it's usually best practice to free any
> request related resources when doing retries.
>
> Signed-off-by: Jens Axboe <axboe at fb.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] NVMe: fix retry/error logic in nvme_queue_rq()
  2014-12-11 20:54 ` Keith Busch
@ 2014-12-11 20:58   ` Jens Axboe
  0 siblings, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2014-12-11 20:58 UTC (permalink / raw)


On 12/11/2014 01:54 PM, Keith Busch wrote:
> The real reason for reusing the nvme_iod was we needed some structure
> to exist for the life of the command to track total time so we don't
> retry indefinitely. Keeping the dma mapping was just a bonus.

Before blk-mq, you could have cases where ->make_request_fn() was
entered and you'd have to defer due to being out of commands, for
instance. That's no longer the case. Right now it should only happen if
we run into resource constraints, on the IOMMU or memory side. Now we
are definitely better off freeing everything. Besides, we didn't retain
the full state, so the retry was buggy.

> Total time is tracked with "req->start_time" now, so nothing wrong with
> removing the iod reuse.
> 
> Acked-by: Keith Busch <keith.busch at intel.com>

Thanks!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-12-11 20:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-11 20:33 [PATCH] NVMe: fix retry/error logic in nvme_queue_rq() Jens Axboe
2014-12-11 20:54 ` Keith Busch
2014-12-11 20:58   ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.