From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [PATCH RFC] fs/aio: fix sleeping while TASK_INTERRUPTIBLE Date: Mon, 29 Dec 2014 10:08:14 -0500 Message-ID: <1419865694.13012.17@mail.thefacebook.com> References: <20141223001619.GA26385@ret.masoncoding.com> <20141225025641.GC29607@moria.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Cc: , , Peter Zijlstra To: Kent Overstreet Return-path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:48091 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751562AbaL2PIp (ORCPT ); Mon, 29 Dec 2014 10:08:45 -0500 In-Reply-To: <20141225025641.GC29607@moria.home.lan> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Dec 24, 2014 at 9:56 PM, Kent Overstreet wrote: > On Mon, Dec 22, 2014 at 07:16:25PM -0500, Chris Mason wrote: >> The 3.19 merge window brought in a great new warning to catch >> someone >> calling might_sleep with their state != TASK_RUNNING. The idea was >> to >> find buggy code locking mutexes after calling prepare_to_wait(), >> kind >> of like this: > > Ben just told me about this issue. > > IMO, the way the code is structured now is correct, I would argue the > problem is > with the way wait_event() works - they way they have to mess with the > global-ish > task state when adding a wait_queue_t to a wait_queue_head (who came > up with > these names?) Grin, probably related to the guy who made closure_wait() not actually wait. The advantage to the waitqueue head _t setup is its a very well understood mechanism for sleeping on something without missing wakeups. The locking overhead for the waitqueues can be a problem for lots of waiters on the same queue, but otherwise the overhead is low. I think closures are too big a hammer for this problem, unless benchmarks show we need the lockless lists (I really like that part). I do hesitate to make big changes here because debugging AIO hangs is horrible. The code is only tested by a few workloads, and we can go a long time before problems are noticed. When people do hit bugs, we only notice the ones where applications pile up in getevents. Otherwise it's just strange performance changes that we can't explain because they are hidden in the app's AIO state machine. When I first looked at the warning, I didn't realize that might_sleep and friends were setting a preempted flag to make sure the task wasn't removed from the runqueue. So I thought we'd potentially sleep forever (thanks Peter for details++). The real risk here is burning CPU in the running state, potentially a lot of it if the mutex is highly contended. We've probably been hitting this for a while, but since we test AIO performance with fast storage, the burning just made us look faster. -chris