From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler Date: Mon, 24 Sep 2012 17:58:27 +0200 Message-ID: <50608323.9000603@redhat.com> References: <20120921115942.27611.67488.sendpatchset@codeblue> <20120921120019.27611.66093.sendpatchset@codeblue> <50607BBE.8070507@redhat.com> <1348500861.11847.72.camel@twins> <50607F9B.7090701@redhat.com> <1348501929.11847.81.camel@twins> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Raghavendra K T , "H. Peter Anvin" , Marcelo Tosatti , Ingo Molnar , Rik van Riel , Srikar , "Nikunj A. Dadhania" , KVM , Jiannan Ouyang , chegu vinod , "Andrew M. Theurer" , LKML , Srivatsa Vaddagiri , Gleb Natapov To: Peter Zijlstra Return-path: In-Reply-To: <1348501929.11847.81.camel@twins> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 09/24/2012 05:52 PM, Peter Zijlstra wrote: > On Mon, 2012-09-24 at 17:43 +0200, Avi Kivity wrote: >> Wouldn't this correspond to the scheduler interrupt firing and causing a >> reschedule? I thought the timer was programmed for exactly the point in >> time that CFS considers the right time for a switch. But I'm basing >> this on my mental model of CFS, not CFS itself. > > No, we tried this for hrtimer kernels for a while, but programming > hrtimers the whole time (every actual task-switch) turns out to be far > too expensive. So we're back to HZ ticks and 'polling' the preemption > state. Ok, so I wasn't completely off base. With HZ=1000, we can only be faster than the poll by a millisecond than the interrupt-driven schedule(), and we need to be a lot faster. > Even if we remove all the hrtimer infrastructure overhead (can do with a > few hacks) setting the hardware requires going out to the LAPIC, which > is stupid slow. > > Some hardware actually has fast/reliable/usable timers, sadly none of it > is popular. There is the TSC deadline timer mode of newer Intels. Programming the timer is a simple wrmsr, and it will fire immediately if it already expired. Unfortunately on AMDs it is not available, and on virtual hardware it will be slow (~1-2 usec). -- error compiling committee.c: too many arguments to function