From mboxrd@z Thu Jan 1 00:00:00 1970 From: keith.busch@intel.com (Keith Busch) Date: Wed, 25 Nov 2015 18:24:40 +0000 Subject: [PATCH] NVMe: Shutdown fixes In-Reply-To: <20151125175512.GA23638@infradead.org> References: <1448407814-12544-1-git-send-email-keith.busch@intel.com> <20151125085537.GA19066@infradead.org> <20151125160544.GA2587@localhost.localdomain> <20151125164257.GB4321@infradead.org> <20151125165703.GC2587@localhost.localdomain> <20151125170842.GA29291@infradead.org> <20151125172845.GD2587@localhost.localdomain> <20151125175512.GA23638@infradead.org> Message-ID: <20151125182439.GE2587@localhost.localdomain> On Wed, Nov 25, 2015@09:55:12AM -0800, Christoph Hellwig wrote: > On Wed, Nov 25, 2015@05:28:45PM +0000, Keith Busch wrote: > > > So let's get rid of the nvme_end_req thing and return the real errors. > > > > nvme_end_req _does_ set the real error. > > > > We can't set it in the timeout handler. How would you even know to > > set an error? > > Because the command timed out, which means we're done with it. That's > the whole point of the timeout - after that the command is fenced off > and we are done with it. I suppose some users with their erasure codes prefer to see failure status quicker than wait for a slower success. > > Maybe the controller posted success status, so we should > > let nvme_dev_shutdown reap the command and set req->errors. > > If the controller had posted success status in time we would not have > timed out the command. Right, I'm just concerned with the race. There's no reason the controller couldn't post completion at the same time as our completely arbitrary timeout kicks in. > What you basically seem to care about is handling the race where the > controller still completes the command after the timeout handler is > running, and you want the controller to win over the timeout handler. Yes, preferring not to fake a status if a real one is available. > But as I said above the race also works the other way around: the > controller completed the command just before we shut down the > controller, and now we overwrite req->errors with the wrong value > in nvme_cancel_queue_ios. Oh I see. I thought the tag wouldn't be seen by nvme_clear_queue's busy iterator if the request's CQ entry was seen through process_cq, but I see now that I was wrong about that if it's the timed out request. Okay, so the status is overridden to "abort" in my patch. I think that's the status you wanted to see, but it's far removed from where we're returning "BLK_EH_HANDLED". > > Such devices return status 6, internal device error, and the senario > > requires user management to rectify. > > Eww, ok. So what do we want to do with the devices? Just ensure the > admin quue is up to be able to send vendor specific admin commands to > fix it? Yep. > > How do you explicitly check for a timeout? We used to have negative > > error codes for these, but you changed that to fake NVMe status codes. > > The aborted one is pretty obvious for NVMe. Anyway, I'm finally > understanding what the magic negative values are for. Without the long > back story that you actually want to support broken devices that need > to come up without I/O queues and thus we have to ignore the Set > Features field there was absolutely no way to figure them out. > > I'm perfectly happy to add back a: > > + /* Linux driver specific value: */ > + NVME_SC_TIMEOUT = -1; > > and use it specificly for this case. But only as a trade in for > documenting in detail why we even bother with all this special casing. Cool, sounds good.