From mboxrd@z Thu Jan  1 00:00:00 1970
From: Chris Mason <clm@fb.com>
Subject: Re: [PATCH RFC] fs/aio: fix sleeping while TASK_INTERRUPTIBLE
Date: Mon, 29 Dec 2014 10:08:14 -0500
Message-ID: <1419865694.13012.17@mail.thefacebook.com>
References: <20141223001619.GA26385@ret.masoncoding.com>
	<20141225025641.GC29607@moria.home.lan>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format=flowed
Cc: <linux-fsdevel@vger.kernel.org>, <linux-aio@kvack.org>,
	Peter Zijlstra <peterz@infradead.org>
To: Kent Overstreet <kmo@daterainc.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:48091 "EHLO
	mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1751562AbaL2PIp (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Mon, 29 Dec 2014 10:08:45 -0500
In-Reply-To: <20141225025641.GC29607@moria.home.lan>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>


On Wed, Dec 24, 2014 at 9:56 PM, Kent Overstreet <kmo@daterainc.com> 
wrote:
> On Mon, Dec 22, 2014 at 07:16:25PM -0500, Chris Mason wrote:
>>  The 3.19 merge window brought in a great new warning to catch 
>> someone
>>  calling might_sleep with their state != TASK_RUNNING.  The idea was 
>> to
>>  find buggy code locking mutexes after calling prepare_to_wait(), 
>> kind
>>  of like this:
> 
> Ben just told me about this issue.
> 
> IMO, the way the code is structured now is correct, I would argue the 
> problem is
> with the way wait_event() works - they way they have to mess with the 
> global-ish
> task state when adding a wait_queue_t to a wait_queue_head (who came 
> up with
> these names?)

Grin, probably related to the guy who made closure_wait() not actually 
wait.

The advantage to the waitqueue head _t setup is its a very well 
understood mechanism for sleeping on something without missing wakeups. 
 The locking overhead for the waitqueues can be a problem for lots of 
waiters on the same queue, but otherwise the overhead is low.

I think closures are too big a hammer for this problem, unless 
benchmarks show we need the lockless lists (I really like that part).  
I do hesitate to make big changes here because debugging AIO hangs is 
horrible.  The code is only tested by a few workloads, and we can go a 
long time before problems are noticed.  When people do hit bugs, we 
only notice the ones where applications pile up in getevents.  
Otherwise it's just strange performance changes that we can't explain 
because they are hidden in the app's AIO state machine.

When I first looked at the warning, I didn't realize that might_sleep 
and friends were setting a preempted flag to make sure the task wasn't 
removed from the runqueue.  So I thought we'd potentially sleep forever 
(thanks Peter for details++).  The real risk here is burning CPU in the 
running state, potentially a lot of it if the mutex is highly 
contended. We've probably been hitting this for a while, but since we 
test AIO performance with fast storage, the burning just made us look 
faster.

-chris