From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757094Ab2IXQDh (ORCPT ); Mon, 24 Sep 2012 12:03:37 -0400 Received: from casper.infradead.org ([85.118.1.10]:41583 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757069Ab2IXQDd convert rfc822-to-8bit (ORCPT ); Mon, 24 Sep 2012 12:03:33 -0400 Message-ID: <1348502600.11847.90.camel@twins> Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler From: Peter Zijlstra To: Avi Kivity Cc: Raghavendra K T , "H. Peter Anvin" , Marcelo Tosatti , Ingo Molnar , Rik van Riel , Srikar , "Nikunj A. Dadhania" , KVM , Jiannan Ouyang , chegu vinod , "Andrew M. Theurer" , LKML , Srivatsa Vaddagiri , Gleb Natapov , Andrew Jones Date: Mon, 24 Sep 2012 18:03:20 +0200 In-Reply-To: <50608176.1040805@redhat.com> References: <20120921115942.27611.67488.sendpatchset@codeblue> <1348486479.11847.46.camel@twins> <50604988.2030506@linux.vnet.ibm.com> <1348490165.11847.58.camel@twins> <50606050.309@linux.vnet.ibm.com> <1348494895.11847.64.camel@twins> <50608176.1040805@redhat.com> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Mailer: Evolution 3.2.2- Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2012-09-24 at 17:51 +0200, Avi Kivity wrote: > On 09/24/2012 03:54 PM, Peter Zijlstra wrote: > > On Mon, 2012-09-24 at 18:59 +0530, Raghavendra K T wrote: > >> However Rik had a genuine concern in the cases where runqueue is not > >> equally distributed and lockholder might actually be on a different run > >> queue but not running. > > > > Load should eventually get distributed equally -- that's what the > > load-balancer is for -- so this is a temporary situation. > > What's the expected latency? This is the whole problem. Eventually the > scheduler would pick the lock holder as well, the problem is that it's > in the millisecond scale while lock hold times are in the microsecond > scale, leading to a 1000x slowdown. Yeah I know.. Heisenberg's uncertainty applied to SMP computing becomes something like accurate or fast, never both. > If we want to yield, we really want to boost someone. Now if only you knew which someone ;-) This non-modified guest nonsense is such a snake pit.. but you know how I feel about all that. > > We already try and favour the non running vcpu in this case, that's what > > yield_to_task_fair() is about. If its still not eligible to run, tough > > luck. > > Crazy idea: instead of yielding, just run that other vcpu in the thread > that would otherwise spin. I can see about a million objections to this > already though. Yah.. you want me to list a few? :-) It would require synchronization with the other cpu to pull its task -- one really wants to avoid it also running it. Do this at a high enough frequency and you're dead too. Anyway, you can do this inside the KVM stuff, simply flip the vcpu state associated with a vcpu thread and use the preemption notifiers to sort things against the scheduler or somesuch. > >> Do you think instead of using rq->nr_running, we could get a global > >> sense of load using avenrun (something like avenrun/num_onlinecpus) > > > > To what purpose? Also, global stuff is expensive, so you should try and > > stay away from it as hard as you possibly can. > > Spinning is also expensive. How about we do the global stuff every N > times, to amortize the cost (and reduce contention)? Nah, spinning isn't expensive, its a waste of time, similar end result for someone who wants to do useful work though, but not the same cause. Pick N and I'll come up with a scenario for which its wrong ;-) Anyway, its an ugly problem and one I really want to contain inside the insanity that created it (virt), lets not taint the rest of the kernel more than we need to.