From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756894AbbJ2NC0 (ORCPT ); Thu, 29 Oct 2015 09:02:26 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:36589 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751432AbbJ2NCY (ORCPT ); Thu, 29 Oct 2015 09:02:24 -0400 Subject: Re: [PATCH 1/3] cpuidle,x86: increase forced cut-off for polling to 20us To: Rik van Riel , linux-kernel@vger.kernel.org References: <1446072416-13622-1-git-send-email-riel@redhat.com> <1446072416-13622-2-git-send-email-riel@redhat.com> <5631F24E.2060508@linaro.org> <563208FC.4060407@redhat.com> Cc: arjan@linux.intel.com, khilman@ti.com, len.brown@intel.com, rafael.j.wysocki@intel.com, javi.merino@arm.com, tuukka.tikkanen@linaro.org From: Daniel Lezcano Message-ID: <563218DD.302@linaro.org> Date: Thu, 29 Oct 2015 14:02:21 +0100 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <563208FC.4060407@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/29/2015 12:54 PM, Rik van Riel wrote: > On 10/29/2015 06:17 AM, Daniel Lezcano wrote: >> On 10/28/2015 11:46 PM, riel@redhat.com wrote: >>> From: Rik van Riel >>> >>> The cpuidle menu governor has a forced cut-off for polling at 5us, >>> in order to deal with firmware that gives the OS bad information >>> on cpuidle states, leading to the system spending way too much time >>> in polling. >> >> May be I am misunderstanding your explanation but it is not how I read >> the code. >> >> The default idle state is C1 (hlt) if no other states suits the >> constraint. If a timer is happening really soon, then set the default >> idle state to POLL if no other idle state suits the constraint. >> >> That applies only on x86. > > With the current code, the default idle state is C1 (hlt) even if > C1 does not suit the constraint. > >> This is not related to break-even but exit latency. > > Why would we not care about break-even for C1? > > On systems where going into C1 for too-short periods wastes > power, why would we waste the power when we expect a very > short sleep? > >> IMO, we should just drop this 5us and the POLL state selection in the >> menu governor as we have since a while hyper fast C1 exit. Except a few >> embedded processors where polling is not adequate. > > We have hyper fast C1 exit on Nehalem and newer high performance > chips. On those chips, we will pick C1 (or deeper) when we have > an expected sleep time of just a few microseconds. > > However, on Atom, and for the paravirt cpuidle driver I am > working on, C1 exit latency and target residence are higher > than the cut-off hardcoded in the menu governor. > >> Furthermore, the number of times the poll state is selected vs the other >> states is negligible. > > And it will continue to be with this patch, on CPUs with > hyper fast C1 exit. > > Which makes me confused about what your are objecting to, > since the system should continue to be have the way you want, > with the patch applied. Ok, I don't object the correctness of your patch but the reasoning behind this small optimization which bring us a lot of mess in the cpuidle code. As you are touching this part of the code, I take the opportunity to raise a discussion about it. From my POV, the poll state is *not* an idle state. It is like a vehicle burnout [1]. But it is inserted into the idle state tables using a trick with a macro CPUIDLE_DRIVER_STATE_START which already led us to some bugs. So instead of falling back into the poll state under certain circumstances, I propose we extract this state from the idle state table and we let the menu governor to fail choosing a state (or not). From the caller, we decide what to do (poll or C1) if the idle state selection fails or we choose to poll *before* like what we already have in kernel/sched/idle.c: in the idle loop: if (cpu_idle_force_poll || tick_check_broadcast_expired()) cpu_idle_poll(); else cpuidle_idle_call(); By this way, we: 1) factor out the idle state selection with the find_deepest_idle_state 2) remove the CPUIDLE_DRIVER_STATE_START macro 3) concentrate the optimization logic outside of a governor which will benefit to all architectures Does it make sense ? -- Daniel [1] https://en.wikipedia.org/wiki/Burnout_%28vehicle%29 -- Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog