From mboxrd@z Thu Jan  1 00:00:00 1970
From: keith.busch@intel.com (Keith Busch)
Date: Sun, 3 Jan 2016 15:43:31 +0000
Subject: [PATCH 5/5] NVMe: IO queue deletion re-write
In-Reply-To: <20160103114052.GA24893@infradead.org>
References: <1451496471-29370-1-git-send-email-keith.busch@intel.com>
 <1451496471-29370-6-git-send-email-keith.busch@intel.com>
 <20151230180430.GA12828@infradead.org>
 <20151230190706.GC12454@localhost.localdomain>
 <20160102170730.GA30184@infradead.org>
 <20160102213008.GA10969@localhost.localdomain>
 <20160103114052.GA24893@infradead.org>
Message-ID: <20160103154331.GA31375@localhost.localdomain>

On Sun, Jan 03, 2016@03:40:52AM -0800, Christoph Hellwig wrote:
> How about something like the lightly tested patch below.  It uses
> synchronous command submission, but schedules a work item on the
> system unbound workqueue for each queue, allowing the scheduler
> to execture them in parallel.

This works if everything else works, but the failure cases are the hard
ones. This'll deadlock if the controller stops responding during a reset,
which might be why the reset occured in the first place, and we can't
invoke another reset to clean up a failed reset.

We can use "wait_event_timeout" to fix the deadlock in the reset handler.
The handler will cancel IO's, ending work queue items waiting for command
responses. But that's only half of it. You'll also need something to end
work waiting for a request when more queues exist than admin tags.