Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: jianchao.w.wang@oracle.com (jianchao.wang)
Subject: [PATCH 8/9] nvme-pci: break up nvme_timeout and nvme_dev_disable
Date: Mon, 12 Feb 2018 15:51:48 +0800	[thread overview]
Message-ID: <b8748837-616e-7e92-bf58-38246ae8ca0f@oracle.com> (raw)
In-Reply-To: <992179f4-becc-f0bf-867a-e49adbd756ee@oracle.com>

Hi Sagi

Just make some supplement here.

On 02/12/2018 10:16 AM, jianchao.wang wrote:
>> I think this is going in the wrong direction. Every state that is needed
>> to handle serialization should be done in core ctrl state. Moreover,
>> please try to avoid handling this locally in nvme-pci, place common
>> helpers in nvme-core and use them. I won't be surprised if fc would
>> need some of these.
>>
>> Also, can we try and not disable the controller from nvme_timeout?
> In fact, that is what this patch what to do. For the previous outstanding requests,
> this patch return BLK_EH_NOT_HANDLED and defer the work to nvme_dev_disable.
> 
>  I'm
>> not sure I understand why is this needed at all. What's wrong with
>> scheduling a controller reset/delete? Why is something like
>> nvme_pci_disable_ctrl_directly needed?
> Keith used to point out to me that, we cannot complete and free a request
> before we close the controller and pci master bus, otherwise, there will
> be somethings wrong in DMA accessing, because when we complete a request, 
> the associated DMA mapping will be freed.
> 
> For the previous outstanding requests, this patch could grab them with blk_abort_request
> in nvme_dev_disable, and complete them after we disable/shutdown the controller.
> 
> But for the adminq requests in nvme_dev_disable and nvme_reset_work, we cannot do this.
> We cannot schedule another reset_work->nvme_dev_disable to do that, because we are in it.
> So I use this nvme_pci_disable_ctrl_directly which looks like very ugly, to disable the
> controller and then we could complete the request with failure to move progress forward.
> 
>> I'd like to see an effort to consolidate error handling paths rather
>> than enhancing the current skew...
> Yes, absolutely. That is also what I expect. :)
> 
> This patch has two aspects:
> 1. grab all the previous outstanding requests with blk_abort_request.
>    It is safe when race with the irq completion path. And then complete them
>    after we disable/shutdown the controller completely.
>    I think this part could be placed in nvme ctrl core.

The 'grab' here is to avoid the timeout path to be triggered during nvme_dev_disable.
This is important, because nvme_timeout may issue IOs on adminq or invoke nvme_pci_disable_ctrl_directly
which could race with nvme_dev_disable.

> 2. avoid nvme_timeout invoke nvme_dev_disable. this is the most tricky part.

And also, this will introduce some dangerous scenarios. I have reported some of them before.

>    as I shared above, we have to _disable_ the controller _before_ we compete the adminq request
>    from the nvme_dev_disable and nvme_reset_work. Consequently, we cannot do as the
>    nvme_rdma_timeout, schedule a recovery work and then return. 

Actually, the nvme_timeout have a similar pattern with nvme_rdma_timeout.
When adminq/IOq request timeout, we can schedule reset_work for it. But if the requests from the reset_work
procedure timeout, we cannot schedule reset_work any more. At the moment, we have to close the controller
directly and fail the requests.

Looking forward some advice on this.
That's really appreciated.

Thanks
Jianchao

  reply	other threads:[~2018-02-12  7:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-11  9:38 [PATCH V3 0/6]nvme-pci: fixes on nvme_timeout and nvme_dev_disable Jianchao Wang
2018-02-11  9:38 ` [PATCH 1/9] nvme: fix the dangerous reference of namespaces list Jianchao Wang
2018-02-11 11:14   ` Sagi Grimberg
2018-02-11 11:15   ` Sagi Grimberg
2018-02-11  9:38 ` [PATCH 2/9] nvme: fix the deadlock in nvme_update_formats Jianchao Wang
2018-02-11 11:16   ` Sagi Grimberg
2018-02-12  1:40     ` jianchao.wang
2018-02-11  9:38 ` [PATCH 3/9] nvme: change namespaces_mutext to namespaces_rwsem Jianchao Wang
2018-02-11 11:17   ` Sagi Grimberg
2018-02-12  2:33     ` jianchao.wang
2018-02-11  9:38 ` [PATCH 4/9] nvme-pci: quiesce IO queues prior to disabling device HMB accesses Jianchao Wang
2018-02-11 11:19   ` Sagi Grimberg
2018-02-12  2:17     ` jianchao.wang
2018-02-11  9:38 ` [PATCH 5/9] nvme-pci: suspend queues based on online_queues Jianchao Wang
2018-02-11  9:38 ` [PATCH 6/9] nvme-pci: drain the entered requests after ctrl is shutdown Jianchao Wang
2018-02-11  9:38 ` [PATCH 7/9] blk-mq: make blk_mq_rq_update_aborted_gstate a external interface Jianchao Wang
2018-02-11  9:38 ` [PATCH 8/9] nvme-pci: break up nvme_timeout and nvme_dev_disable Jianchao Wang
2018-02-11 11:36   ` Sagi Grimberg
2018-02-12  2:16     ` jianchao.wang
2018-02-12  7:51       ` jianchao.wang [this message]
2018-02-11  9:38 ` [PATCH 9/9] nvme-pci: discard wait timeout when delete cq/sq Jianchao Wang
2018-02-11 11:24   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8748837-616e-7e92-bf58-38246ae8ca0f@oracle.com \
    --to=jianchao.w.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox