From mboxrd@z Thu Jan 1 00:00:00 1970 From: Raghavendra K T Subject: Re: [PATCH V3 RESEND RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task Date: Fri, 25 Jan 2013 21:24:50 +0530 Message-ID: <5102AACA.3040406@linux.vnet.ibm.com> References: <20130122073854.24731.9426.sendpatchset@codeblue.in.ibm.com> <20130122073913.24731.65118.sendpatchset@codeblue.in.ibm.com> <20130124103213.GD27602@gmail.com> <20130125104025.GA14978@linux.vnet.ibm.com> <20130125104737.GA23332@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Peter Zijlstra , Avi Kivity , "H. Peter Anvin" , Thomas Gleixner , Gleb Natapov , Ingo Molnar , Marcelo Tosatti , Rik van Riel , Srikar , "Nikunj A. Dadhania" , KVM , Jiannan Ouyang , Chegu Vinod , "Andrew M. Theurer" , LKML , Srivatsa Vaddagiri , Andrew Jones To: Ingo Molnar Return-path: In-Reply-To: <20130125104737.GA23332@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 01/25/2013 04:17 PM, Ingo Molnar wrote: > > * Raghavendra K T wrote: > >> * Ingo Molnar [2013-01-24 11:32:13]: >> >>> >>> * Raghavendra K T wrote: >>> >>>> From: Peter Zijlstra >>>> >>>> In case of undercomitted scenarios, especially in large guests >>>> yield_to overhead is significantly high. when run queue length of >>>> source and target is one, take an opportunity to bail out and return >>>> -ESRCH. This return condition can be further exploited to quickly come >>>> out of PLE handler. >>>> >>>> (History: Raghavendra initially worked on break out of kvm ple handler upon >>>> seeing source runqueue length = 1, but it had to export rq length). >>>> Peter came up with the elegant idea of return -ESRCH in scheduler core. >>>> >>>> Signed-off-by: Peter Zijlstra >>>> Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) >>>> Reviewed-by: Srikar Dronamraju >>>> Signed-off-by: Raghavendra K T >>>> Acked-by: Andrew Jones >>>> Tested-by: Chegu Vinod >>>> --- >>>> >>>> kernel/sched/core.c | 25 +++++++++++++++++++------ >>>> 1 file changed, 19 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>>> index 2d8927f..fc219a5 100644 >>>> --- a/kernel/sched/core.c >>>> +++ b/kernel/sched/core.c >>>> @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); >>>> * It's the caller's job to ensure that the target task struct >>>> * can't go away on us before we can do any checks. >>>> * >>>> - * Returns true if we indeed boosted the target task. >>>> + * Returns: >>>> + * true (>0) if we indeed boosted the target task. >>>> + * false (0) if we failed to boost the target. >>>> + * -ESRCH if there's no task to yield to. >>>> */ >>>> bool __sched yield_to(struct task_struct *p, bool preempt) >>>> { >>>> @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) >>>> >>>> again: >>>> p_rq = task_rq(p); >>>> + /* >>>> + * If we're the only runnable task on the rq and target rq also >>>> + * has only one task, there's absolutely no point in yielding. >>>> + */ >>>> + if (rq->nr_running == 1 && p_rq->nr_running == 1) { >>>> + yielded = -ESRCH; >>>> + goto out_irq; >>>> + } >>> >>> Looks good to me in principle. >>> >>> Would be nice to get more consistent benchmark numbers. Once >>> those are unambiguously showing that this is a win: >>> >>> Acked-by: Ingo Molnar >>> >> >> I ran the test with kernbench and sysbench again on 32 core mx3850 >> machine with 32 vcpu guests. Results shows definite improvements. >> >> ebizzy and dbench show similar improvement for 1x overcommit >> (note that stdev for 1x in dbench is lesser improvemet is now seen at >> only 20%) >> >> [ all the experiments are taken out of 8 run averages ]. >> >> The patches benefit large guest undercommit scenarios, so I believe >> with large guest performance improvemnt is even significant. [ Chegu >> Vinod results show performance near to no ple cases ]. Unfortunately I >> do not have a machine to test larger guest (>32). >> >> Ingo, Please let me know if this is okay to you. >> >> base kernel = 3.8.0-rc4 >> >> +-----------+-----------+-----------+------------+-----------+ >> kernbench (time in sec lower is better) >> +-----------+-----------+-----------+------------+-----------+ >> base stdev patched stdev %improve >> +-----------+-----------+-----------+------------+-----------+ >> 1x 46.6028 1.8672 42.4494 1.1390 8.91234 >> 2x 99.9074 9.1859 90.4050 2.6131 9.51121 >> +-----------+-----------+-----------+------------+-----------+ >> +-----------+-----------+-----------+------------+-----------+ >> sysbench (time in sec lower is better) >> +-----------+-----------+-----------+------------+-----------+ >> base stdev patched stdev %improve >> +-----------+-----------+-----------+------------+-----------+ >> 1x 18.7402 0.3764 17.7431 0.3589 5.32065 >> 2x 13.2238 0.1935 13.0096 0.3152 1.61981 >> +-----------+-----------+-----------+------------+-----------+ >> >> +-----------+-----------+-----------+------------+-----------+ >> ebizzy (records/sec higher is better) >> +-----------+-----------+-----------+------------+-----------+ >> base stdev patched stdev %improve >> +-----------+-----------+-----------+------------+-----------+ >> 1x 2421.9000 19.1801 5883.1000 112.7243 142.91259 >> +-----------+-----------+-----------+------------+-----------+ >> >> +-----------+-----------+-----------+------------+-----------+ >> dbench (throughput MB/sec higher is better) >> +-----------+-----------+-----------+------------+-----------+ >> base stdev patched stdev %improve >> +-----------+-----------+-----------+------------+-----------+ >> 1x 11675.9900 857.4154 14103.5000 215.8425 20.79061 >> +-----------+-----------+-----------+------------+-----------+ > > The numbers look pretty convincing, thanks. The workloads were > CPU bound most of the time, right? Yes. CPU bound most of the time. I also used tmpfs to reduce io overhead (for dbbench).