From mboxrd@z Thu Jan 1 00:00:00 1970 From: Catalin Marinas Subject: Re: [0/11] Energy-aware scheduling use-cases and scheduler issues Date: Mon, 13 Jan 2014 12:04:18 +0000 Message-ID: <20140113120418.GD11805@arm.com> References: <1387557951-21750-1-git-send-email-morten.rasmussen@arm.com> <20131222162744.GB3250@mgross-Lenovo-Yoga-2-Pro> <20131230121010.GA2936@e103034-lin> <20140112164759.GB5008@mgross-Lenovo-Yoga-2-Pro> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from fw-tnat.austin.arm.com ([217.140.110.23]:59086 "EHLO collaborate-mta1.arm.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751377AbaAMMEs (ORCPT ); Mon, 13 Jan 2014 07:04:48 -0500 Content-Disposition: inline In-Reply-To: <20140112164759.GB5008@mgross-Lenovo-Yoga-2-Pro> Sender: linux-pm-owner@vger.kernel.org List-Id: linux-pm@vger.kernel.org To: mark gross Cc: Morten Rasmussen , "peterz@infradead.org" , "mingo@kernel.org" , "rjw@rjwysocki.net" , "vincent.guittot@linaro.org" , "linux-pm@vger.kernel.org" On Sun, Jan 12, 2014 at 04:47:59PM +0000, mark gross wrote: > On Mon, Dec 30, 2013 at 12:10:10PM +0000, Morten Rasmussen wrote: > > I was hoping that we could come up with a fairly simplistic energy model > > that could guide the scheduling decisions based on data provided by the > > vendor. I would start we something very simple and see far we can get > > and which data that is necessary. > > I keep flip flopping in my mind over what is more important. Energy modeling or > latency performance measuring. Both ;) > I mean, one way to look at the world is given a workload with minimal latency > and throughput expectations we need deliver those first and then optimize power. I agree, that's why I don't think blind packing tasks to the left would work either for power or performance. > With poor load balancing we do not deliver on performance expectations typically > in the areas of latencies. Note, Linux does well on throughput IMO because that > is easier to measure with kstats and other sampling. > > what sorts of missing thing are needed to measure and understand when wrong > choices are getting made? What basic information do we need to capture to know > if we are doing a good job or not? I think whatever power awareness we add to the scheduler should aim to optimise the power consumption (based on some simple model of measuring the idle time/states and transition on certain platforms and estimating the energy) but with *minimal* effect on the latency and throughput. Standard latency/performance benchmarks should always be run to ensure there are no regressions. Morten's use-cases try to describe scenarios where the scheduler can do better from a power perspective but without (drastically) affecting other parameters. If you have a predictable workload, the scheduler can make the right decision to optimise for power while keeping the latency under control. The problem is when the workload changes, the latency would be affected if tasks need to migrate or the CPU frequency needs to be increased (and for the latter we currently rely on a cpufreq governor or driver to detect the workload change and this introduces additional latencies). Given these pretty independent cpufreq decisions, the best heuristics for now wrt latency is probably to spread the workload among all the CPUs and leave enough room for workload changes. But even with latency under certain limits, you may have for example small threads (like audio decoding) that could still fit on a CPU when running at the minimal P-state, with the risk of a big sudden change in the workload of such thread. That's a trade-off between optimising for performance and power. A power-aware scheduler does not aim to trade the latency or throughput for power but rather how well it copes with workload unpredictability, what margins are guaranteed. IMHO, adding power awareness to the scheduler could be done in two (main) ways: 1. Heuristics like packing small tasks with tunables like what "small" actually means, how many such "small" tasks and such parameters would be specific to each SoC. 2. Power model in the scheduler (I proposed a simplistic one at the end of last year) where the scheduler can associate an energy cost with its actions (e.g. migrating a task to a CPU) and it would try to optimise the overall system energy consumption while preserving the latency and throughput. I consider the second approach being better as you can extend it other things like power budgets. But it doesn't always go well with hardware people who don't want to expose real numbers (they don't even need to be real W or J but just some relative numbers). -- Catalin