From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759375AbXG2ClK (ORCPT ); Sat, 28 Jul 2007 22:41:10 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755866AbXG2Ckz (ORCPT ); Sat, 28 Jul 2007 22:40:55 -0400 Received: from mx1.redhat.com ([66.187.233.31]:44237 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755735AbXG2Cky (ORCPT ); Sat, 28 Jul 2007 22:40:54 -0400 Message-ID: <46ABFE2D.1060505@redhat.com> Date: Sat, 28 Jul 2007 22:40:45 -0400 From: Chris Snook User-Agent: Thunderbird 1.5.0.12 (Macintosh/20070509) MIME-Version: 1.0 To: Tong Li CC: "Bill Huey (hui)" , Ingo Molnar , linux-kernel@vger.kernel.org Subject: Re: [RFC] scheduler: improve SMP fairness in CFS References: <20070725120358.GA30755@elte.hu> <20070725192442.GC4463@elte.hu> <20070726213154.GA26569@elte.hu> <1185487225.3122.11.camel@tongli.jf.intel.com> <46AA287D.8070200@redhat.com> <46AA8171.2060400@redhat.com> <20070728005438.GE32582@gnuppy.monkey.org> <46AAB106.2030104@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Tong Li wrote: > On Fri, 27 Jul 2007, Chris Snook wrote: > >> Bill Huey (hui) wrote: >>> You have to consider the target for this kind of code. There are >>> applications >>> where you need something that falls within a constant error bound. >>> According >>> to the numbers, the current CFS rebalancing logic doesn't achieve >>> that to >>> any degree of rigor. So CFS is ok for SCHED_OTHER, but not for >>> anything more >>> strict than that. >> >> I've said from the beginning that I think that anyone who desperately >> needs perfect fairness should be explicitly enforcing it with the aid >> of realtime priorities. The problem is that configuring and tuning a >> realtime application is a pain, and people want to be able to >> approximate this behavior without doing a whole lot of dirty work >> themselves. I believe that CFS can and should be enhanced to ensure >> SMP-fairness over potentially short, user-configurable intervals, even >> for SCHED_OTHER. I do not, however, believe that we should take it to >> the extreme of wasting CPU cycles on migrations that will not improve >> performance for *any* task, just to avoid letting some tasks get ahead >> of others. We should be as fair as possible but no fairer. If we've >> already made it as fair as possible, we should account for the margin >> of error and correct for it the next time we rebalance. We should not >> burn the surplus just to get rid of it. > > Proportional-share scheduling actually has one of its roots in real-time > and having a p-fair scheduler is essential for real-time apps (soft > real-time). Sounds like another scheduler class might be in order. I find CFS to be fair enough for most purposes. If the code that gives us near-perfect fairness at the expense of efficiency only runs when tasks have been given boosted priority by a privileged user, and only on the CPUs that have such tasks queued on them, the run time overhead and code complexity become much smaller concerns. >> >> On a non-NUMA box with single-socket, non-SMT processors, a constant >> error bound is fine. Once we add SMT, go multi-core, go NUMA, and add >> inter-chassis interconnects on top of that, we need to multiply this >> error bound at each stage in the hierarchy, or else we'll end up >> wasting CPU cycles on migrations that actually hurt the processes >> they're supposed to be helping, and hurt everyone else even more. I >> believe we should enforce an error bound that is proportional to >> migration cost. >> > > I think we are actually in agreement. When I say constant bound, it can > certainly be a constant that's determined based on inputs from the > memory hierarchy. The point is that it needs to be a constant > independent of things like # of tasks. Agreed. >> But this patch is only relevant to SCHED_OTHER. The realtime >> scheduler doesn't have a concept of fairness, just priorities. That >> why each realtime priority level has its own separate runqueue. >> Realtime schedulers are supposed to be dumb as a post, so they cannot >> heuristically decide to do anything other than precisely what you >> configured them to do, and so they don't get in the way when you're >> context switching a million times a second. > > Are you referring to hard real-time? As I said, an infrastructure that > enables p-fair scheduling, EDF, or things alike is the foundation for > real-time. I designed DWRR, however, with a target of non-RT apps, > although I was hoping the research results might be applicable to RT. I'm referring to the static priority SCHED_FIFO and SCHED_RR schedulers, which are (intentionally) dumb as a post, allowing userspace to manage CPU time explicitly. Proportionally fair scheduling is a cool capability, but not a design goal of those schedulers. -- Chris