From mboxrd@z Thu Jan 1 00:00:00 1970 From: hch@lst.de (Christoph Hellwig) Date: Thu, 6 Oct 2016 11:34:49 +0200 Subject: [PATCH 1/2] nvme: don't schedule multiple resets In-Reply-To: <1475699566-5284-1-git-send-email-keith.busch@intel.com> References: <1475699566-5284-1-git-send-email-keith.busch@intel.com> Message-ID: <20161006093449.GB4999@lst.de> On Wed, Oct 05, 2016@04:32:45PM -0400, Keith Busch wrote: > The queue_work only fails if the work is pending, but not yet running. If > the work is running, the work item would get requeued, triggering a > double reset. If the first reset fails for any reason, the second > reset triggers: > > WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING) > > Hitting that schedules controller deletion for a second time, which > potentially takes a reference on the device that is being deleted. > If the reset occurs at the same time as a hot removal event, this causes > a double-free. > > This patch has the reset helper function check if the work is busy > prior to queueing, and changes all places that schedule resets to use > this function. Since most users don't want to sync with that work, the > "flush_work" is moved to the only caller that wants to sync. Looks fine. I actually have something very similar in an old branch, except that I also moved nvme_reset to common code and made the fabrics drivers use it. I'll really need to get back to that stuff.. Reviewed-by: Christoph Hellwig