From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261161AbVFASzW (ORCPT ); Wed, 1 Jun 2005 14:55:22 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261247AbVFASxC (ORCPT ); Wed, 1 Jun 2005 14:53:02 -0400 Received: from mail.ccur.com ([208.248.32.212]:32115 "EHLO flmx.iccur.com") by vger.kernel.org with ESMTP id S261161AbVFASld (ORCPT ); Wed, 1 Jun 2005 14:41:33 -0400 Subject: Re: SD_SHARE_CPUPOWER breaks scheduler fairness From: Steve Rotolo Reply-To: steve.rotolo@ccur.com To: Con Kolivas Cc: linux-kernel@vger.kernel.org, bugsy@ccur.com In-Reply-To: <200506020047.16752.kernel@kolivas.org> References: <1117561608.1439.168.camel@whiz> <200506011249.47655.kernel@kolivas.org> <1117636171.22879.29.camel@bonefish> <200506020047.16752.kernel@kolivas.org> Content-Type: text/plain Organization: Concurrent Computer Corporation Message-Id: <1117651285.22879.73.camel@bonefish> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.5 (1.4.5-1) Date: Wed, 01 Jun 2005 14:41:25 -0400 Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 01 Jun 2005 18:41:25.0880 (UTC) FILETIME=[84053F80:01C566D9] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2005-06-01 at 10:47, Con Kolivas wrote: > I didn't miss the point, but I guess I should have made that clear too. > > The number of tasks seen running on that sibling is still the same even if the > queue is forced to be idle (witness by top thinking the load is 1 on that > sibling even if it also shows quite a lot of idle time). It should therefore > not attract any more tasks to itself. > The task that is there will be trapped based on the fact that there is only > one task _only_ if the other sibling is indefinitely running real time tasks, > and _if_ there are other physical cpus we can use we should try to schedule > the trapped task away. If we have N physical cpus (and N*2 logical), and we > are running N real time threads I don't think we should expect to run > SCHED_NORMAL tasks as well. If we have 1) then > we should still be able to run SCHED_NORMAL tasks, I agree. I'm a little > reluctant to tackle this at this stage with the number of SMP balancing > things already queued for -mm, but making a sibling appear more heavily laden > when "pegged" (nr_running + 1) should suffice. > Consider what happens if: - you have 2 physical cpus, 4 logical cpus - you have 40 running SCHED_NORMAL tasks on a well balanced system -- roughly 10 on each runqueue - start up a spinning SCHED_FIFO task on cpu 0 Assuming that cpu 1 is the sibling of 0, cpu 1 now has 10 SCHED_NORMAL tasks that are totally screwed -- they will never, ever, run anywhere, period. Now consider what happens if I start up 40 more SCHED_NORMAL tasks. The load-balancer will kindly place 10 of them on cpu 1's runqueue so they too can be screwed for all eternity. Nice. One more thing: I *think* wake_idle() tends to wake tasks to idle cpus regardless of the idle cpu's runqueue length. This is why I say the idle cpu becomes a magnet for even more tasks, until the balancer straightens things out again. I guess the bottom-line is: given N logical cpus, 1/N of all SCHED_NORMAL tasks may get stuck on a sibling cpu with no chance to run. All it takes is one spinning SCHED_FIFO task. Sounds like a bug. -- Steve