From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933116Ab0CKPl2 (ORCPT ); Thu, 11 Mar 2010 10:41:28 -0500 Received: from casper.infradead.org ([85.118.1.10]:56091 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932198Ab0CKPl0 (ORCPT ); Thu, 11 Mar 2010 10:41:26 -0500 Subject: Re: Q: select_fallback_rq() && cpuset_lock() From: Peter Zijlstra To: Oleg Nesterov Cc: Ingo Molnar , Lai Jiangshan , Tejun Heo , linux-kernel@vger.kernel.org In-Reply-To: <20100311152201.GA13888@redhat.com> References: <20100309180615.GA11681@redhat.com> <1268239242.5279.46.camel@twins> <20100310173018.GA1294@redhat.com> <1268244075.5279.53.camel@twins> <20100310183259.GA23648@redhat.com> <20100311145248.GA12907@redhat.com> <20100311152201.GA13888@redhat.com> Content-Type: text/plain; charset="UTF-8" Date: Thu, 11 Mar 2010 16:41:18 +0100 Message-ID: <1268322078.5037.118.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2010-03-11 at 16:22 +0100, Oleg Nesterov wrote: > On 03/11, Oleg Nesterov wrote: > > > > How can we fix this later? Perhaps we can change > > cpuset_track_online_cpus(CPU_DEAD) to scan all affected cpusets and > > fixup the tasks with the wrong ->cpus_allowed == cpu_possible_mask. > > Wait. We need to fix the CPU_DEAD case anyway? > > Hmm. 6ad4c18884e864cf4c77f9074d3d1816063f99cd > "sched: Fix balance vs hotplug race" did s/CPU_DEAD/CPU_DOWN_PREPARE/ > in cpuset_track_online_cpus(). This doesn't look exactly right to me, > we shouldn't do remove_tasks_in_empty_cpuset() at CPU_DOWN_PREPARE > stage, it can fail. Sure, tough luck for those few tasks. > Otoh. This means that move_task_of_dead_cpu() can never see the > task without active cpus in ->cpus_allowed, it is called later by > CPU_DEAD. So, cpuset_lock() is not needed at all. Right,.. so the whole problem is cpumask ops are terribly expensive since we got this CONFIG_CPUMASK_OFFSTACK muck, so we try to reduce these ops in the regular scheduling paths, in the patch you referenced above the tradeof was between fixing the sched_domains up too often vs adding a cpumask_and in a hot-path, guess who won ;-)