From mboxrd@z Thu Jan 1 00:00:00 1970 From: hch@infradead.org (Christoph Hellwig) Date: Thu, 1 Oct 2015 00:39:51 -0700 Subject: [PATCH 04/10] blk-mq: kill undead requests during CPU hotplug notify In-Reply-To: References: <1443380518-6829-1-git-send-email-hch@lst.de> <1443380518-6829-5-git-send-email-hch@lst.de> <20150928174648.GA2136@lst.de> Message-ID: <20151001073951.GA1989@infradead.org> On Mon, Sep 28, 2015@06:15:47PM +0000, Keith Busch wrote: > >My impression was that's it's flakey to broken already and we don't > >change that situation. With my changes we'll mark it as completed > >and if the command comes in during the small hotplug CPU window the > >completion handler will see it already completed and ignore the > >actual hardware completion. > > It's not only during the window that there is a problem. Without > a controller reset, the driver and drive will be permanently out of > sync with the block layer after a hot cpu event, so we'll never have a > successful async event notification. > > Yes, the original was a kludge, but worked. > > It'd be really cool if we can run the blk-mq cpu mapping on unfrozen > queues. It doesn't look safe, though. I've looked into AENs a it more, and the situation is worse than I though: AENs can't even be aborted on most devices I have access to (after hacking the driver to allow aborts on admin commands), so we can't even cancel them on a queue freeze. So I'm goint to look into moving them entirely into the nvme driver and remove the REQ_NO_TIMEOUT hacks in the block layer. Given that we only have on of AEN request, and it as a fixed tag number there shouldn't be any need to abuse blk-mq as a tag allocator for them.