From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757389Ab1LWPr1 (ORCPT ); Fri, 23 Dec 2011 10:47:27 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47721 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751614Ab1LWPrZ (ORCPT ); Fri, 23 Dec 2011 10:47:25 -0500 Date: Fri, 23 Dec 2011 16:41:37 +0100 From: Oleg Nesterov To: Peter Zijlstra Cc: Yasunori Goto , Ingo Molnar , Hiroyuki KAMEZAWA , Motohiro Kosaki , Linux Kernel ML Subject: Re: [BUG] TASK_DEAD task is able to be woken up in special condition Message-ID: <20111223154137.GA27901@redhat.com> References: <20111222094241.C691.E1E9C6FF@jp.fujitsu.com> <1324633794.24803.48.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1324633794.24803.48.camel@twins> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/23, Peter Zijlstra wrote: > > On Thu, 2011-12-22 at 09:42 +0900, Yasunori Goto wrote: > > > > ----------------------------------+----------------------------- > > | > > CPU A | CPU B > > ----------------------------------+----------------------------- > > TASK A calls exit().... > > > > do_exit() > > > > exit_mm() > > down_read(mm->mmap_sem); > > > > rwsem_down_failed_common() > > > > set TASK_UNINTERRUPTIBLE > > set waiter.task <= task A > > list_add to sem->wait_list > > : > > raw_spin_unlock_irq() > > (I/O interruption occured) > > > > __rwsem_do_wake(mmap_sem) > > > > list_del(&waiter->list); > > waiter->task = NULL > > wake_up_process(task A) > > try_to_wake_up() > > (task is still > > TASK_UNINTERRUPTIBLE) > > p->on_rq is still 1.) > > > > ttwu_do_wakeup() > > (*A) > > : > > (I/O interruption handler finished) > > > > if (!waiter.task) > > schedule() is not called > > due to waiter.task is NULL. > > > > tsk->state = TASK_RUNNING > > > > : > > check_preempt_curr(); > > : > > task->state = TASK_DEAD > > (*B) > > <--- set TASK_RUNNING (*C) > > > > > > > > schedule() > > (exit task is running again) > > BUG_ON() is called! > > -------------------------------------------------------- > > > > > This is very bad senario. > > But, I suppose this phenomenon is able to occur on a guest system of > > virtual machine too. > > > > Please fix it. > > > > I suppose task->pi_lock should be held when task->state is changed to > > TASK_DEAD like the following patch (not tested yet). > > Because try_to_wake_up() hold it before checking task state. > > I don't think this can actually happen, note the raw_spin_unlock_wait() > in do_exit() long before setting TASK_DEAD, that should synchronize > against the in-progress wakeup and ensure its finished and has set > TASK_RUNNING. How? raw_spin_unlock_wait() is calles before exit_mm(), then the exiting task plays with its ->state. IIRC, this was already discussed a bit. Say, try_to_wake_up(TASK_INTERRUPTIBLE) can wakeup a TASK_UNINTERRUPTIBLE task if it temporary sets INTERRUPTIBLE but doesn't call schedule() in this state. May be ttwu_do_wakeup() should do cmpxchg(old_state, RUNNING) ? Oleg.