From mboxrd@z Thu Jan 1 00:00:00 1970 From: willy@linux.intel.com (Matthew Wilcox) Date: Tue, 19 Feb 2013 10:05:38 -0500 Subject: [PATCH 2/2] nvme: fix the placement of set_current_state to TASK_INTERRUPTIBLE In-Reply-To: <51103A15.8010508@linux.intel.com> References: <51103A15.8010508@linux.intel.com> Message-ID: <20130219150538.GA4530@linux.intel.com> On Mon, Feb 04, 2013@02:45:41PM -0800, Arjan van de Ven wrote: > the nvme driver has a kthread that processes certain events periodically. > However, the kthread also gets woken for certain urgent actions. > > The current code does not use the current_state logic correctly; > it calls set_current_state(TASK_INTERRUPTIBLE); right before doing > a schedule_timeout(), with the result that there is a race condition > where a wakeup can get lost, and thus delayed by one second. > > This patch moves the set_current_state(TASK_INTERRUPTIBLE); to before > the place where the queue of outstanding work is checked, to close > the race condition The thing is, that's not a queue of outstanding work, that's the list of devices that exist in the system. Your patch makes the kthread do all of its work in the TASK_INTERRUPTIBLE state, which I don't think is right either (is it?) We could add a flag to indicate that there's urgent work to be done (checked after setting TASK_INTERRUPTIBLE), or just live with the occasional race. Alternatively, we could get rid of the kthread in favour of timers for cancelling commands and a workqueue for dealing with urgent work. I have this on my long-term todo list, but in the absence of having hardware to test with on large-scale systems, I didn't want to make such a large change.