From: Chao Leng <lengchao@huawei.com>
To: Sagi Grimberg <sagi@grimberg.me>, <linux-nvme@lists.infradead.org>
Cc: kbusch@kernel.org, axboe@fb.com, linux-block@vger.kernel.org,
hch@lst.de, axboe@kernel.dk
Subject: Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
Date: Mon, 18 Jan 2021 11:22:16 +0800 [thread overview]
Message-ID: <0b5c8e31-8dc2-994a-1710-1b1be07549c9@huawei.com> (raw)
In-Reply-To: <4ff22d33-12fa-1f70-3606-54821f314c45@grimberg.me>
On 2021/1/16 9:18, Sagi Grimberg wrote:
>
>>>>>> When a request is queued failed, blk_status_t is directly returned
>>>>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>>>>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>>>>>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>>>>>> In two scenarios, the request should be retried and may succeed.
>>>>>> First, if work with nvme multipath, the request may be retried
>>>>>> successfully in another path, because the error is probably related to
>>>>>> the path. Second, if work without multipath software, the request may
>>>>>> be retried successfully after error recovery.
>>>>>> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
>>>>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>>>>>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>>>>>> scenario the request will be repeated freed in tear down.
>>>>>> If a non-resource error occurs in queue_rq, should directly call
>>>>>> nvme_complete_rq to complete request and set the state of request to
>>>>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>>>>>> the request.
>>>>>>
>>>>>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>>>>>> ---
>>>>>> drivers/nvme/host/rdma.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>>>> index df9f6f4549f1..4a89bf44ecdc 100644
>>>>>> --- a/drivers/nvme/host/rdma.c
>>>>>> +++ b/drivers/nvme/host/rdma.c
>>>>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>> unmap_qe:
>>>>>> ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>>>>>> DMA_TO_DEVICE);
>>>>>> - return ret;
>>>>>> + return nvme_try_complete_failed_req(rq, ret);
>>>>>
>>>>> I don't understand this. There are errors that may not be related to
>>>>> anything that is pathing related (sw bug, memory leak, mapping error,
>>>>> etc, etc) why should we return this one-shot error?
>>>> Although fail over retry is not required, if we return the error to
>>>> blk-mq, a low probability crash may happen. because blk-mq do not set
>>>> the state of request to MQ_RQ_COMPLETE before complete the request,
>>>> the request may be freed asynchronously such as in nvme_submit_user_cmd.
>>>> If race with error recovery, request double completion may happens.
>>>
>>> Then fix that, don't work around it.
>> I'm not trying to work around it. The purpose of this is to solve
>> the problem of nvme native multipathing at the same time.
>
> Please explain how this is an nvme-multipath issue?
>
>>>
>>>>
>>>> So we can not return the error to blk-mq if the blk_status_t is not
>>>> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
>>>
>>> This is not something we should be handling in nvme. block drivers
>>> should be able to fail queue_rq, and this all should live in the
>>> block layer.
>> Of course, it is also an idea to repair the block drivers directly.
>> However, block layer is unaware of nvme native multipathing,
>
> Nor it should be
>
>> will cause the request return error which should be avoided.
>
> Not sure I understand..
> requests should failover for path related errors,
> what queue_rq errors are expected to be failed over from your
> perspective?
Although fail over for only path related errors is the best choice, it's
almost impossible to achieve.
The probability of non-path-related errors is very low. Although these
errors do not require fail over retry, the cost of fail over retry
is complete the request with error delay a bit long time(retry several
times). It's not the best choice, but I think it's acceptable, because
HBA driver does not have path-related error codes but only general error
codes. It is difficult to identify whether the general error codes are
path-related.
>
>> The scenario: use two HBAs for nvme native multipath, and then one HBA
>> fault,
>
> What is the specific error the driver sees?
The path related error code is closely related to HBA driver
implementation. In general it is EIO. I don't think it's a good idea to
assume what general error code the driver returns in the event of a path
error.
>
>> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
>> blk_mq_end_request to complete the request which bypass name native
>> multipath. We expect the request fail over to normal HBA, but the request
>> is directly completed with BLK_STS_IOERR.
>> The two scenarios can be fixed by directly completing the request in queue_rq.
> Well, certainly this one-shot always return 0 and complete the command
> with HOST_PATH error is not a good approach IMO
So what's the better option? Just complete the request with host path
error for non-ENOMEM and EAGAIN returned by the HBA driver?
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2021-01-18 3:22 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-07 3:31 [PATCH v2 0/6] avoid repeated request completion and IO error Chao Leng
2021-01-07 3:31 ` [PATCH v2 1/6] blk-mq: introduce blk_mq_set_request_complete Chao Leng
2021-01-14 0:17 ` Sagi Grimberg
2021-01-14 6:50 ` Chao Leng
2021-01-07 3:31 ` [PATCH v2 2/6] nvme-core: introduce complete failed request Chao Leng
2021-01-21 8:14 ` Hannes Reinecke
2021-01-22 1:45 ` Chao Leng
2021-01-07 3:31 ` [PATCH v2 3/6] nvme-fabrics: avoid repeated request completion for nvmf_fail_nonready_command Chao Leng
2021-01-07 3:31 ` [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion Chao Leng
2021-01-14 0:19 ` Sagi Grimberg
2021-01-14 6:55 ` Chao Leng
2021-01-14 21:25 ` Sagi Grimberg
2021-01-15 2:53 ` Chao Leng
2021-01-16 1:18 ` Sagi Grimberg
2021-01-18 3:22 ` Chao Leng [this message]
2021-01-18 17:49 ` Christoph Hellwig
2021-01-19 1:50 ` Chao Leng
2021-01-20 21:35 ` Sagi Grimberg
2021-01-21 1:34 ` Chao Leng
2021-01-07 3:31 ` [PATCH v2 5/6] nvme-tcp: " Chao Leng
2021-01-07 3:31 ` [PATCH v2 6/6] nvme-fc: " Chao Leng
2021-01-14 0:15 ` [PATCH v2 0/6] avoid repeated request completion and IO error Sagi Grimberg
2021-01-14 6:50 ` Chao Leng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0b5c8e31-8dc2-994a-1710-1b1be07549c9@huawei.com \
--to=lengchao@huawei.com \
--cc=axboe@fb.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox