From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758446AbcHCVbP (ORCPT ); Wed, 3 Aug 2016 17:31:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33668 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758214AbcHCVbJ (ORCPT ); Wed, 3 Aug 2016 17:31:09 -0400 Date: Wed, 3 Aug 2016 23:30:06 +0200 From: Oleg Nesterov To: Bart Van Assche Cc: Peter Zijlstra , "mingo@kernel.org" , Andrew Morton , Johannes Weiner , Neil Brown , Michael Shaver , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs Message-ID: <20160803213006.GA11712@redhat.com> References: <20160803181128.GH6879@twins.programming.kicks-ass.net> <11007730-3fa5-139a-8091-655743894ae8@sandisk.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <11007730-3fa5-139a-8091-655743894ae8@sandisk.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 03 Aug 2016 21:30:10 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Bart, I too can't understand the problem. Perhaps you missed the fact that abort_exclusive_wait() does everything under wait_queue_head_t->lock ? On 08/03, Bart Van Assche wrote: > > try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not. > My assumption is that the following sequence of events leads to the lockup > that I had mentioned in the description of my patch: > * try_to_wake_up() is called for the task that will execute > abort_exclusive_wait(). > * After try_to_wake_up() has checked task_struct.state and before > autoremove_wake_function() has tried to remove the task from the wait > queue, abort_exclusive_wait() is executed for the same task. But we do not care if we race with another try_to_wake_up(), or even with another exclusive wake_up_nr(wq)/whatever unless wq is the same. And if this wq is the same, then wake_up_nr() will do try_to_wake_up/autoremove either before or after abort_exclusive_wait(), wake_up_nr() takes the same wq->lock. And this means that abort_exclusive_wait() can't be called "After try_to_wake_up()" and "before autoremove_wake_function()". Oleg.