From: Chao Leng <lengchao@huawei.com>
To: Sagi Grimberg <sagi@grimberg.me>, <linux-nvme@lists.infradead.org>
Cc: kbusch@kernel.org, axboe@fb.com, linux-block@vger.kernel.org,
hch@lst.de, axboe@kernel.dk
Subject: Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
Date: Thu, 21 Jan 2021 09:34:30 +0800 [thread overview]
Message-ID: <6bfca033-8fda-4ace-c05f-285fccb070fd@huawei.com> (raw)
In-Reply-To: <2ed5391c-fe43-f512-adf0-214effd5d599@grimberg.me>
On 2021/1/21 5:35, Sagi Grimberg wrote:
>
> is not something we should be handling in nvme. block drivers
>>>>> should be able to fail queue_rq, and this all should live in the
>>>>> block layer.
>>>> Of course, it is also an idea to repair the block drivers directly.
>>>> However, block layer is unaware of nvme native multipathing,
>>>
>>> Nor it should be
>>>
>>>> will cause the request return error which should be avoided.
>>>
>>> Not sure I understand..
>>> requests should failover for path related errors,
>>> what queue_rq errors are expected to be failed over from your
>>> perspective?
>> Although fail over for only path related errors is the best choice, it's
>> almost impossible to achieve.
>> The probability of non-path-related errors is very low. Although these
>> errors do not require fail over retry, the cost of fail over retry
>> is complete the request with error delay a bit long time(retry several
>> times). It's not the best choice, but I think it's acceptable, because
>> HBA driver does not have path-related error codes but only general error
>> codes. It is difficult to identify whether the general error codes are
>> path-related.
>
> If we have a SW bug or breakage that can happen occasionally, this can
> result in a constant failover rather than a simple failure. This is just
> not a good approach IMO.
>
>>>> The scenario: use two HBAs for nvme native multipath, and then one HBA
>>>> fault,
>>>
>>> What is the specific error the driver sees?
>> The path related error code is closely related to HBA driver
>> implementation. In general it is EIO. I don't think it's a good idea to
>> assume what general error code the driver returns in the event of a path
>> error.
>
> But assuming every error is a path error a good idea?
Of course not, according to the old code logic, assuming !ENOMEM && !EAGIAN
for HBA drivers is a path error. I think it might be reasonable.
>
>>>> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
>>>> blk_mq_end_request to complete the request which bypass name native
>>>> multipath. We expect the request fail over to normal HBA, but the request
>>>> is directly completed with BLK_STS_IOERR.
>>>> The two scenarios can be fixed by directly completing the request in queue_rq.
>>> Well, certainly this one-shot always return 0 and complete the command
>>> with HOST_PATH error is not a good approach IMO
>> So what's the better option? Just complete the request with host path
>> error for non-ENOMEM and EAGAIN returned by the HBA driver?
>
> Well, the correct thing to do here would be to clone the bio and
> failover if the end_io error status is BLK_STS_IOERR. That sucks
> because it adds overhead, but this proposal doesn't sit well. it
> looks wrong to me.
>
> Alternatively, a more creative idea would be to encode the error
> status somehow in the cookie returned from submit_bio, but that
> also feels like a small(er) hack.
If HBA drivers return !ENOMEM && !EAGIAN, queue_rq Directly call
nvme_complete_rq with NVME_SC_HOST_PATH_ERROR like
nvmf_fail_nonready_command. nvme_complete_rq will decide to retry,
fail over or end the request. This may not be the best, but there seems
to be no better choice.
I will try to send the patch v2.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2021-01-21 1:35 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-07 3:31 [PATCH v2 0/6] avoid repeated request completion and IO error Chao Leng
2021-01-07 3:31 ` [PATCH v2 1/6] blk-mq: introduce blk_mq_set_request_complete Chao Leng
2021-01-14 0:17 ` Sagi Grimberg
2021-01-14 6:50 ` Chao Leng
2021-01-07 3:31 ` [PATCH v2 2/6] nvme-core: introduce complete failed request Chao Leng
2021-01-21 8:14 ` Hannes Reinecke
2021-01-22 1:45 ` Chao Leng
2021-01-07 3:31 ` [PATCH v2 3/6] nvme-fabrics: avoid repeated request completion for nvmf_fail_nonready_command Chao Leng
2021-01-07 3:31 ` [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion Chao Leng
2021-01-14 0:19 ` Sagi Grimberg
2021-01-14 6:55 ` Chao Leng
2021-01-14 21:25 ` Sagi Grimberg
2021-01-15 2:53 ` Chao Leng
2021-01-16 1:18 ` Sagi Grimberg
2021-01-18 3:22 ` Chao Leng
2021-01-18 17:49 ` Christoph Hellwig
2021-01-19 1:50 ` Chao Leng
2021-01-20 21:35 ` Sagi Grimberg
2021-01-21 1:34 ` Chao Leng [this message]
2021-01-07 3:31 ` [PATCH v2 5/6] nvme-tcp: " Chao Leng
2021-01-07 3:31 ` [PATCH v2 6/6] nvme-fc: " Chao Leng
2021-01-14 0:15 ` [PATCH v2 0/6] avoid repeated request completion and IO error Sagi Grimberg
2021-01-14 6:50 ` Chao Leng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6bfca033-8fda-4ace-c05f-285fccb070fd@huawei.com \
--to=lengchao@huawei.com \
--cc=axboe@fb.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox