From: Andrew Jones <drjones@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Ingo Molnar <mingo@redhat.com>, Avi Kivity <avi@redhat.com>,
Rik van Riel <riel@redhat.com>,
Srikar <srikar@linux.vnet.ibm.com>,
"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
KVM <kvm@vger.kernel.org>, Jiannan Ouyang <ouyang@cs.pitt.edu>,
chegu vinod <chegu_vinod@hp.com>,
"Andrew M. Theurer" <habanero@linux.vnet.ibm.com>,
LKML <linux-kernel@vger.kernel.org>,
Srivatsa Vaddagiri <srivatsa.vaddagiri@gmail.com>,
Gleb Natapov <gleb@redhat.com>
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
Date: Wed, 26 Sep 2012 14:57:28 +0200 [thread overview]
Message-ID: <20120926125727.GA7633@turtle.usersys.redhat.com> (raw)
In-Reply-To: <1348490165.11847.58.camel@twins>
On Mon, Sep 24, 2012 at 02:36:05PM +0200, Peter Zijlstra wrote:
> On Mon, 2012-09-24 at 17:22 +0530, Raghavendra K T wrote:
> > On 09/24/2012 05:04 PM, Peter Zijlstra wrote:
> > > On Fri, 2012-09-21 at 17:29 +0530, Raghavendra K T wrote:
> > >> In some special scenarios like #vcpu<= #pcpu, PLE handler may
> > >> prove very costly, because there is no need to iterate over vcpus
> > >> and do unsuccessful yield_to burning CPU.
> > >
> > > What's the costly thing? The vm-exit, the yield (which should be a nop
> > > if its the only task there) or something else entirely?
> > >
> > Both vmexit and yield_to() actually,
> >
> > because unsuccessful yield_to() overall is costly in PLE handler.
> >
> > This is because when we have large guests, say 32/16 vcpus, and one
> > vcpu is holding lock, rest of the vcpus waiting for the lock, when they
> > do PL-exit, each of the vcpu try to iterate over rest of vcpu list in
> > the VM and try to do directed yield (unsuccessful). (O(n^2) tries).
> >
> > this results is fairly high amount of cpu burning and double run queue
> > lock contention.
> >
> > (if they were spinning probably lock progress would have been faster).
> > As Avi/Chegu Vinod had felt it is better to avoid vmexit itself, which
> > seems little complex to achieve currently.
>
> OK, so the vmexit stays and we need to improve yield_to.
Can't we do this check sooner as well, as it only requires per-cpu data?
If we do it way back in kvm_vcpu_on_spin, then we avoid get_pid_task()
and a bunch of read barriers from kvm_for_each_vcpu. Also, moving the test
into kvm code would allow us to do other kvm things as a result of the
check in order to avoid some vmexits. It looks like we should be able to
avoid some without much complexity by just making a per-vm ple_window
variable, and then, when we hit the nr_running == 1 condition, also doing
vmcs_write32(PLE_WINDOW, (kvm->ple_window += PLE_WINDOW_BUMP))
Reset the window to the default value when we successfully yield (and
maybe we should limit the number of bumps).
Drew
>
> How about something like the below, that would allow breaking out of the
> for-each-vcpu loop and simply going back into the vm, right?
>
> ---
> kernel/sched/core.c | 25 +++++++++++++++++++------
> 1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b38f00e..5d5b355 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4272,7 +4272,10 @@ EXPORT_SYMBOL(yield);
> * It's the caller's job to ensure that the target task struct
> * can't go away on us before we can do any checks.
> *
> - * Returns true if we indeed boosted the target task.
> + * Returns:
> + * true (>0) if we indeed boosted the target task.
> + * false (0) if we failed to boost the target.
> + * -ESRCH if there's no task to yield to.
> */
> bool __sched yield_to(struct task_struct *p, bool preempt)
> {
> @@ -4284,6 +4287,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt)
> local_irq_save(flags);
> rq = this_rq();
>
> + /*
> + * If we're the only runnable task on the rq, there's absolutely no
> + * point in yielding.
> + */
> + if (rq->nr_running == 1) {
> + yielded = -ESRCH;
> + goto out_irq;
> + }
> +
> again:
> p_rq = task_rq(p);
> double_rq_lock(rq, p_rq);
> @@ -4293,13 +4305,13 @@ bool __sched yield_to(struct task_struct *p, bool preempt)
> }
>
> if (!curr->sched_class->yield_to_task)
> - goto out;
> + goto out_unlock;
>
> if (curr->sched_class != p->sched_class)
> - goto out;
> + goto out_unlock;
>
> if (task_running(p_rq, p) || p->state)
> - goto out;
> + goto out_unlock;
>
> yielded = curr->sched_class->yield_to_task(rq, p, preempt);
> if (yielded) {
> @@ -4312,11 +4324,12 @@ bool __sched yield_to(struct task_struct *p, bool preempt)
> resched_task(p_rq->curr);
> }
>
> -out:
> +out_unlock:
> double_rq_unlock(rq, p_rq);
> +out_irq:
> local_irq_restore(flags);
>
> - if (yielded)
> + if (yielded > 0)
> schedule();
>
> return yielded;
>
next prev parent reply other threads:[~2012-09-26 12:57 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-21 11:59 [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler Raghavendra K T
2012-09-21 12:00 ` [PATCH RFC 1/2] kvm: Handle undercommitted guest case " Raghavendra K T
2012-09-21 13:02 ` Rik van Riel
2012-09-21 17:24 ` Raghavendra K T
2012-09-24 15:41 ` Avi Kivity
2012-09-24 16:06 ` Avi Kivity
2012-09-24 16:14 ` Peter Zijlstra
2012-09-24 16:25 ` Avi Kivity
2012-09-25 8:09 ` Raghavendra K T
2012-09-25 8:54 ` Avi Kivity
2012-09-25 13:49 ` Raghavendra K T
2012-09-27 7:44 ` Gleb Natapov
2012-09-27 8:59 ` Avi Kivity
2012-09-27 9:11 ` Gleb Natapov
2012-09-27 9:33 ` Avi Kivity
2012-09-27 9:58 ` Gleb Natapov
2012-09-27 10:04 ` Avi Kivity
2012-09-27 10:08 ` Gleb Natapov
2012-09-27 10:15 ` Avi Kivity
[not found] ` <CAJocwcf+8u84_yDC-PK0Yni93YSTWzYvr69nq6b3pNv1MwVJzQ@mail.gmail.com>
2012-09-27 8:50 ` Avi Kivity
2012-09-27 11:26 ` Raghavendra K T
2012-09-27 12:06 ` Avi Kivity
2012-09-28 18:18 ` Konrad Rzeszutek Wilk
2012-09-30 8:16 ` Avi Kivity
[not found] ` <CAJocwcc19F+PtsQ5okGMvYeVnkEigpZRpwWY9JgeRPFqfcVoXA@mail.gmail.com>
2012-09-28 6:16 ` Raghavendra K T
2012-09-30 8:18 ` Avi Kivity
2012-09-30 11:07 ` Gleb Natapov
2012-09-30 11:13 ` Avi Kivity
2012-10-03 14:17 ` Raghavendra K T
2012-10-03 14:56 ` Avi Kivity
2012-10-04 7:29 ` Gleb Natapov
2012-10-05 8:36 ` Raghavendra K T
2012-10-07 9:51 ` Avi Kivity
2012-09-25 7:36 ` Raghavendra K T
2012-09-25 8:12 ` Avi Kivity
2012-09-25 14:21 ` Takuya Yoshikawa
2012-09-27 8:43 ` Avi Kivity
2012-10-03 12:22 ` Raghavendra K T
2012-10-03 17:05 ` Avi Kivity
2012-10-04 10:49 ` Raghavendra K T
2012-10-04 12:41 ` Avi Kivity
2012-10-04 13:07 ` Peter Zijlstra
2012-10-04 15:00 ` Avi Kivity
2012-10-09 18:51 ` Raghavendra K T
2012-10-10 2:59 ` Andrew Theurer
2012-10-10 17:54 ` Raghavendra K T
2012-10-10 18:03 ` David Ahern
2012-10-10 18:14 ` Raghavendra K T
2012-10-10 19:36 ` Andrew Theurer
2012-10-15 12:10 ` Raghavendra K T
2012-10-15 14:34 ` Andrew Theurer
2012-10-19 8:30 ` Raghavendra K T
2012-10-19 13:31 ` Andrew Theurer
2012-10-10 14:24 ` Andrew Theurer
2012-10-10 17:43 ` Raghavendra K T
2012-10-10 19:27 ` Andrew Theurer
2012-10-11 17:13 ` Raghavendra K T
2012-10-11 10:39 ` Nikunj A Dadhania
2012-10-18 12:39 ` Avi Kivity
2012-10-19 8:19 ` Raghavendra K T
2012-10-04 14:41 ` Andrew Theurer
2012-10-05 9:06 ` Raghavendra K T
2012-10-05 9:02 ` Raghavendra K T
2012-09-24 11:33 ` Peter Zijlstra
2012-09-24 11:40 ` Raghavendra K T
2012-09-21 12:00 ` [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario " Raghavendra K T
2012-09-21 13:22 ` Rik van Riel
2012-09-21 13:46 ` Takuya Yoshikawa
2012-09-21 13:52 ` Rik van Riel
2012-09-21 17:45 ` Raghavendra K T
2012-09-24 13:43 ` Takuya Yoshikawa
2012-09-24 15:26 ` Avi Kivity
2012-09-24 15:34 ` Peter Zijlstra
2012-09-24 15:43 ` Avi Kivity
2012-09-24 15:52 ` Peter Zijlstra
2012-09-24 15:58 ` Avi Kivity
2012-09-24 16:05 ` Peter Zijlstra
2012-09-24 16:10 ` Avi Kivity
2012-09-24 16:13 ` Peter Zijlstra
2012-09-24 16:21 ` Avi Kivity
2012-09-25 10:11 ` Avi Kivity
2012-09-21 13:18 ` [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios " Chegu Vinod
2012-09-21 17:36 ` Raghavendra K T
2012-09-24 8:42 ` Dor Laor
2012-09-24 12:02 ` Raghavendra K T
2012-09-25 15:00 ` Dor Laor
2012-09-26 12:27 ` Konrad Rzeszutek Wilk
2012-09-27 10:07 ` Raghavendra K T
2012-09-27 9:49 ` Raghavendra K T
2012-09-27 10:28 ` Andrew Jones
2012-09-27 10:44 ` Avi Kivity
2012-09-27 11:31 ` Raghavendra K T
2012-09-27 10:33 ` Dor Laor
2012-09-24 11:34 ` Peter Zijlstra
2012-09-24 11:52 ` Raghavendra K T
2012-09-24 12:36 ` Peter Zijlstra
2012-09-24 13:29 ` Raghavendra K T
2012-09-24 13:54 ` Peter Zijlstra
2012-09-24 14:16 ` Raghavendra K T
2012-09-25 13:40 ` Raghavendra K T
2012-09-27 8:36 ` Avi Kivity
2012-09-27 11:23 ` Raghavendra K T
2012-09-27 12:03 ` Avi Kivity
2012-09-27 12:25 ` Andrew Theurer
2012-09-28 5:38 ` Raghavendra K T
2012-09-28 5:45 ` H. Peter Anvin
2012-09-28 6:03 ` Raghavendra K T
2012-09-28 8:38 ` Peter Zijlstra
2012-09-28 11:40 ` Andrew Theurer
2012-09-28 14:11 ` Raghavendra K T
2012-09-28 14:13 ` Peter Zijlstra
2012-09-30 8:24 ` Avi Kivity
2012-10-03 14:29 ` Raghavendra K T
2012-10-03 17:25 ` Avi Kivity
2012-10-04 10:56 ` Raghavendra K T
2012-10-04 12:44 ` Avi Kivity
2012-10-05 9:04 ` Raghavendra K T
2012-09-24 15:51 ` Avi Kivity
2012-09-24 16:03 ` Peter Zijlstra
2012-09-24 16:20 ` Avi Kivity
2012-09-26 13:20 ` Andrew Jones
2012-09-26 13:26 ` Peter Zijlstra
2012-09-26 13:39 ` Andrew Jones
2012-09-26 13:45 ` Peter Zijlstra
2012-09-26 12:57 ` Andrew Jones [this message]
2012-09-27 10:21 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120926125727.GA7633@turtle.usersys.redhat.com \
--to=drjones@redhat.com \
--cc=avi@redhat.com \
--cc=chegu_vinod@hp.com \
--cc=gleb@redhat.com \
--cc=habanero@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=nikunj@linux.vnet.ibm.com \
--cc=ouyang@cs.pitt.edu \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=srivatsa.vaddagiri@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.