From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753742Ab0CYKQc (ORCPT ); Thu, 25 Mar 2010 06:16:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49297 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753706Ab0CYKQ3 (ORCPT ); Thu, 25 Mar 2010 06:16:29 -0400 Date: Thu, 25 Mar 2010 11:14:20 +0100 From: Oleg Nesterov To: Miao Xie Cc: Peter Zijlstra , Ingo Molnar , Ben Blum , Jiri Slaby , Lai Jiangshan , Li Zefan , Paul Menage , "Rafael J. Wysocki" , Tejun Heo , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/6] kill the broken and deadlockable cpuset_lock/cpuset_cpus_allowed_locked code Message-ID: <20100325101420.GA30779@redhat.com> References: <20100315091003.GA9123@redhat.com> <4BAAD1EB.4020201@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4BAAD1EB.4020201@cn.fujitsu.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/25, Miao Xie wrote: > > on 2010-3-15 17:10, Oleg Nesterov wrote: > > This patch just states the fact the cpusets/cpuhotplug interaction is > > broken and removes the deadlockable code which only pretends to work. > > > > - cpuset_lock() doesn't really work. It is needed for > > cpuset_cpus_allowed_locked() but we can't take this lock in > > try_to_wake_up()->select_fallback_rq() path. > > > > - cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes > > callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex > > stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take > > cpuset_lock() and hangs forever because CPU is already dead and thus > > T can't be scheduled. > > The problem what you said don't exist, because the kernel already move T to > the active cpu when preparing to turn off a CPU. we need cpuset_lock() to move T. please look at _cpu_down(). OK. A task T holds callback_mutex, and it is bound to CPU 1. _cpu_down(cpu => 1) is called by the task X. _cpu_down()->stop_machine() spawns rt-threads for each cpu, a thread running on CPU 1 preempts T and calls take_cpu_down() which removes CPU 1 from online/active masks. X continues, and does raw_notifier_call_chain(CPU_DEAD), this calls migration_call(CPU_DEAD), and _this_ is what move the tasks from the dead CPU. migration_call(CPU_DEAD) calls cpuset_lock() and deadlocks. See? Oleg.