All of lore.kernel.org
 help / color / mirror / Atom feed
From: jianchao.w.wang@oracle.com (jianchao.wang)
Subject: [PATCH] nvme-pci: fix the timeout case when reset is ongoing
Date: Fri, 5 Jan 2018 13:36:11 +0800	[thread overview]
Message-ID: <496ee23f-2dd2-9c7b-17a2-23fe02ec00e2@oracle.com> (raw)
In-Reply-To: <20180104103546.GA5109@lst.de>

Hi Christoph

Many thanks for your kindly response.

On 01/04/2018 06:35 PM, Christoph Hellwig wrote:
> On Wed, Jan 03, 2018@06:31:44AM +0800, Jianchao Wang wrote:
>> NVME_CTRL_RESETTING used to indicate the range of nvme initializing
>> strictly in fd634f41(nvme: merge probe_work and reset_work), but it
>> is not now. The NVME_CTRL_RESETTING is set before queue the
>> reset_work, there could be a big gap before the reset work handles
>> the outstanding requests. So when the NVME_CTRL_RESETTING is set,
>> nvme_timeout will not only meet the admin requests from the
>> initializing procedure, but also the IO and admin requests from
>> previous work before nvme_dev_disable is invoked.
>>
>> To fix it, introduce a flag NVME_DEV_FLAG_INITIALIZING to mark the
>> range of initializing. When this flag is not set, handle the expried
>> requests as nvme_cancel_request. Otherwise, the requests should be
>> from the initializing procedure. Handle them as before. Because the
>> nvme_reset_work will see the error and disable the dev itself, so
>> discard the nvme_dev_disable here.
> 
> Instead of a parallel set of states we'll need to split
> NVME_CTRL_RESET into NVME_CTRL_RESET_SCHEDULED and NVME_CTRL_RESETTING.
> 
> And if my memory doesn't fail me we were already considering that a while
> ago.
> 
Yes, it is indeed more reasonable to split current NVME_CTRL_RESETTING into 
two states, but the nvme_dev_disable() in nvme_reset_work() should be the boundary.
After that, all the in-flight requests are requeued and request queue is quiesced,
the nvme driver is clear. So the new state maybe something like NEW_CTRL_RESET_PREPARE.:)

Thanks
Jianchao 

WARNING: multiple messages have this Message-ID (diff)
From: "jianchao.wang" <jianchao.w.wang@oracle.com>
To: Christoph Hellwig <hch@lst.de>
Cc: keith.busch@intel.com, axboe@fb.com, sagi@grimberg.me,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] nvme-pci: fix the timeout case when reset is ongoing
Date: Fri, 5 Jan 2018 13:36:11 +0800	[thread overview]
Message-ID: <496ee23f-2dd2-9c7b-17a2-23fe02ec00e2@oracle.com> (raw)
In-Reply-To: <20180104103546.GA5109@lst.de>

Hi Christoph

Many thanks for your kindly response.

On 01/04/2018 06:35 PM, Christoph Hellwig wrote:
> On Wed, Jan 03, 2018 at 06:31:44AM +0800, Jianchao Wang wrote:
>> NVME_CTRL_RESETTING used to indicate the range of nvme initializing
>> strictly in fd634f41(nvme: merge probe_work and reset_work), but it
>> is not now. The NVME_CTRL_RESETTING is set before queue the
>> reset_work, there could be a big gap before the reset work handles
>> the outstanding requests. So when the NVME_CTRL_RESETTING is set,
>> nvme_timeout will not only meet the admin requests from the
>> initializing procedure, but also the IO and admin requests from
>> previous work before nvme_dev_disable is invoked.
>>
>> To fix it, introduce a flag NVME_DEV_FLAG_INITIALIZING to mark the
>> range of initializing. When this flag is not set, handle the expried
>> requests as nvme_cancel_request. Otherwise, the requests should be
>> from the initializing procedure. Handle them as before. Because the
>> nvme_reset_work will see the error and disable the dev itself, so
>> discard the nvme_dev_disable here.
> 
> Instead of a parallel set of states we'll need to split
> NVME_CTRL_RESET into NVME_CTRL_RESET_SCHEDULED and NVME_CTRL_RESETTING.
> 
> And if my memory doesn't fail me we were already considering that a while
> ago.
> 
Yes, it is indeed more reasonable to split current NVME_CTRL_RESETTING into 
two states, but the nvme_dev_disable() in nvme_reset_work() should be the boundary.
After that, all the in-flight requests are requeued and request queue is quiesced,
the nvme driver is clear. So the new state maybe something like NEW_CTRL_RESET_PREPARE.:)

Thanks
Jianchao 

  reply	other threads:[~2018-01-05  5:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-02 22:31 [PATCH] nvme-pci: fix the timeout case when reset is ongoing Jianchao Wang
2018-01-02 22:31 ` Jianchao Wang
2018-01-04 10:35 ` Christoph Hellwig
2018-01-04 10:35   ` Christoph Hellwig
2018-01-05  5:36   ` jianchao.wang [this message]
2018-01-05  5:36     ` jianchao.wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=496ee23f-2dd2-9c7b-17a2-23fe02ec00e2@oracle.com \
    --to=jianchao.w.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.