From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f67.google.com ([209.85.218.67]:41518 "EHLO mail-oi0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932122AbdKQMVy (ORCPT ); Fri, 17 Nov 2017 07:21:54 -0500 Subject: Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path To: Thomas Gleixner Cc: Peter Zijlstra , Quan Xu , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, LKML , virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org, Yang Zhang , Ingo Molnar , "H. Peter Anvin" , Borislav Petkov , Kyle Huey , Len Brown , Andy Lutomirski , Tom Lendacky , Tobias Klauser , Daniel Lezcano References: <1510567565-5118-1-git-send-email-quan.xu0@gmail.com> <1510567565-5118-4-git-send-email-quan.xu0@gmail.com> <20171115121152.gqug5wzerlo3eimd@hirez.programming.kicks-ass.net> <46086489-5a01-16e1-9314-70ae53c01952@gmail.com> <564b8a6e-8ddd-4e3d-c670-10f1697e6c06@gmail.com> From: Quan Xu Message-ID: Date: Fri, 17 Nov 2017 20:21:42 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 2017-11-17 19:36, Thomas Gleixner wrote: > On Fri, 17 Nov 2017, Quan Xu wrote: >> On 2017-11-16 17:53, Thomas Gleixner wrote: >>> That's just plain wrong. We don't want to see any of this PARAVIRT crap in >>> anything outside the architecture/hypervisor interfacing code which really >>> needs it. >>> >>> The problem can and must be solved at the generic level in the first place >>> to gather the data which can be used to make such decisions. >>> >>> How that information is used might be either completely generic or requires >>> system specific variants. But as long as we don't have any information at >>> all we cannot discuss that. >>> >>> Please sit down and write up which data needs to be considered to make >>> decisions about probabilistic polling. Then we need to compare and contrast >>> that with the data which is necessary to make power/idle state decisions. >>> >>> I would be very surprised if this data would not overlap by at least 90%. >>> >> 1. which data needs to considerd to make decisions about probabilistic polling >> >> I really need to write up which data needs to considerd to make >> decisions about probabilistic polling. At last several months, >> I always focused on the data _from idle to reschedule_, then to bypass >> the idle loops. unfortunately, this makes me touch scheduler/idle/nohz >> code inevitably. >> >> with tglx's suggestion, the data which is necessary to make power/idle >> state decisions, is the last idle state's residency time. IIUC this data >> is duration from idle to wakeup, which maybe by reschedule irq or other irq. > That's part of the picture, but not complete. tglx, could you share more? I am very curious about it.. >> I also test that the reschedule irq overlap by more than 90% (trace the >> need_resched status after cpuidle_idle_call), when I run ctxsw/netperf for >> one minute. >> >> as the overlap, I think I can input the last idle state's residency time >> to make decisions about probabilistic polling, as @dev->last_residency does. >> it is much easier to get data. > That's only true for your particular use case. > >> 2. do a HV specific idle driver (function) >> >> so far, power management is not exposed to guest.. idle is simple for KVM >> guest, >> calling "sti" / "hlt"(cpuidle_idle_call() --> default_idle_call()).. >> thanks Xen guys, who has implemented the paravirt framework. I can implement >> it >> as easy as following: >> >> ������������ --- a/arch/x86/kernel/kvm.c > Your email client is using a very strange formatting. my bad, I insert space to highlight these code. > This is definitely better than what you proposed so far and implementing it > as a prove of concept seems to be worthwhile. > > But I doubt that this is the final solution. It's not generic and not > necessarily suitable for all use case scenarios. > > yes, I am exhausted :):) could you tell me the gap to be generic and necessarily suitable for all use case scenarios? as lack of irq/idle predictors? �I really want to upstream it for all of public cloud users/providers.. as kvm host has a similar one, is it possible to upstream with following conditions? : ��� 1). add a QEMU configuration, whether enable or not, by default disable. ��� 2). add some "TODO" comments near the code. ��� 3). ... anyway, thanks for your help.. Quan �Alibaba Cloud