From mboxrd@z Thu Jan  1 00:00:00 1970
From: hch@lst.de (Christoph Hellwig)
Date: Tue, 29 May 2018 15:47:57 +0200
Subject: [PATCH 02/10] nvme: ANA transition timeout handling
In-Reply-To: <20180529153441.6eeff725@pentland.suse.de>
References: <20180529101431.62271-1-hare@suse.de>
 <20180529101431.62271-3-hare@suse.de> <20180529124729.GC7376@lst.de>
 <20180529153441.6eeff725@pentland.suse.de>
Message-ID: <20180529134757.GA9038@lst.de>

On Tue, May 29, 2018@03:34:41PM +0200, Hannes Reinecke wrote:
> In an ideal world, yes.
> But what happens if we don't?

Then the whole ANA scheme doesn't work.

> We don't stop processing. We just record the error so that we can
> retrigger the ANA log page scan.

But showing a change state is in no way an error.

> The idea of the patch is to start off a delayed workqueue function to
> ensure we're catching ANA transition timeout errors.
> 
> The workqueue function will be cancelled if we get an AEN, but gives it
> another go at reading the ANA log if the transition timeout expires.

But a delayed work on it's own isn't going to work.  For each group
we need to record when it did transition to the change state, and then
check for ANATT expiry vs that particular value.  And we really don't
need to queue reading a log page for that.  We need to do this algorithm:

 1) when transitioning any group to change state set a timer to fire
   at now + ANATT
 2) when transitioning away from change state cancel the timer
 3) when the timer fires reset the controller as it is toast

Given that nvme_reset_ctrl is irq save it might be easiest to actually
use real Linux timers for that.  It sounds like a waste to have an
array of those for each group, but opencoding time expiration like
in the blk-mq eh code just seems like an invitation for subtle bugs.