Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion

From: Chao Leng <lengchao@huawei.com>
To: Sagi Grimberg <sagi@grimberg.me>, <linux-nvme@lists.infradead.org>
Cc: kbusch@kernel.org, axboe@fb.com, linux-block@vger.kernel.org,
	hch@lst.de, axboe@kernel.dk
Subject: Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
Date: Thu, 14 Jan 2021 14:55:21 +0800	[thread overview]
Message-ID: <7b12be41-0fcd-5a22-0e01-8cd4ac9cde5b@huawei.com> (raw)
In-Reply-To: <07e41b4f-914a-11e8-5638-e2d6408feb3f@grimberg.me>


On 2021/1/14 8:19, Sagi Grimberg wrote:
> 
>> When a request is queued failed, blk_status_t is directly returned
>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>> In two scenarios, the request should be retried and may succeed.
>> First, if work with nvme multipath, the request may be retried
>> successfully in another path, because the error is probably related to
>> the path. Second, if work without multipath software, the request may
>> be retried successfully after error recovery.
>> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>> scenario the request will be repeated freed in tear down.
>> If a non-resource error occurs in queue_rq, should directly call
>> nvme_complete_rq to complete request and set the state of request to
>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>> the request.
>>
>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>> ---
>>   drivers/nvme/host/rdma.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index df9f6f4549f1..4a89bf44ecdc 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>>   unmap_qe:
>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>>                   DMA_TO_DEVICE);
>> -    return ret;
>> +    return nvme_try_complete_failed_req(rq, ret);
> 
> I don't understand this. There are errors that may not be related to
> anything that is pathing related (sw bug, memory leak, mapping error,
> etc, etc) why should we return this one-shot error?
Although fail over retry is not required, if we return the error to
blk-mq, a low probability crash may happen. because blk-mq do not set
the state of request to MQ_RQ_COMPLETE before complete the request,
the request may be freed asynchronously such as in nvme_submit_user_cmd.
If race with error recovery, request double completion may happens.

So we can not return the error to blk-mq if the blk_status_t is not
BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
> .

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme