* [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly
@ 2008-05-08 13:20 Christian Borntraeger
2008-05-08 13:33 ` Jeremy Fitzhardinge
2008-05-09 1:10 ` Rusty Russell
0 siblings, 2 replies; 6+ messages in thread
From: Christian Borntraeger @ 2008-05-08 13:20 UTC (permalink / raw)
To: Rusty Russell; +Cc: Ingo Molnar, virtualization, linux-kernel, kvm-devel
On kvm I have seen some rare hangs in stop_machine when I used more guest
cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
hang quite often. I could also reproduce the problem on a 4 way z/VM host with
a 64 way guest.
It turned out that the guest was consuming all available cpus mostly for
spinning on scheduler locks like rq->lock. This is expected as the threads are
calling yield all the time.
The problem is now, that the host scheduling decisings together with the guest
scheduling decisions and spinlocks not being fair managed to create an
interesting scenario similar to a live lock. (Sometimes the hang resolved
itself after some minutes)
Changing stop_machine to yield the cpu to the hypervisor when yielding inside
the guest fixed the problem for me. While I am not completely happy with this
patch, I think it causes no harm and it really improves the situation for me.
I used cpu_relax for yielding to the hypervisor, does that work on all
architectures?
p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use
stop_machine_run and both triggered the problem after some retries.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Rusty Russell <rusty@rustcorp.com.au>
---
kernel/stop_machine.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
Index: kvm/kernel/stop_machine.c
===================================================================
--- kvm.orig/kernel/stop_machine.c
+++ kvm/kernel/stop_machine.c
@@ -62,8 +62,7 @@ static int stopmachine(void *cpu)
* help our sisters onto their CPUs. */
if (!prepared && !irqs_disabled)
yield();
- else
- cpu_relax();
+ cpu_relax();
}
/* Ack: we are exiting. */
@@ -106,8 +105,10 @@ static int stop_machine(void)
}
/* Wait for them all to come to life. */
- while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads)
+ while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads) {
yield();
+ cpu_relax();
+ }
/* If some failed, kill them all. */
if (ret < 0) {
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly
2008-05-08 13:20 [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly Christian Borntraeger
@ 2008-05-08 13:33 ` Jeremy Fitzhardinge
2008-05-08 14:41 ` Christian Borntraeger
2008-05-09 1:10 ` Rusty Russell
1 sibling, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-08 13:33 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Rusty Russell, kvm-devel, Ingo Molnar, linux-kernel,
virtualization
Christian Borntraeger wrote:
> On kvm I have seen some rare hangs in stop_machine when I used more guest
> cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
> hang quite often. I could also reproduce the problem on a 4 way z/VM host with
> a 64 way guest.
>
I think that's one of those "don't do that then" cases ;)
> It turned out that the guest was consuming all available cpus mostly for
> spinning on scheduler locks like rq->lock. This is expected as the threads are
> calling yield all the time.
> The problem is now, that the host scheduling decisings together with the guest
> scheduling decisions and spinlocks not being fair managed to create an
> interesting scenario similar to a live lock. (Sometimes the hang resolved
> itself after some minutes)
>
I think x86 (at least) is now using ticket locks, which is fair. Which
kernel are you seeing this problem on?
> Changing stop_machine to yield the cpu to the hypervisor when yielding inside
> the guest fixed the problem for me. While I am not completely happy with this
> patch, I think it causes no harm and it really improves the situation for me.
>
> I used cpu_relax for yielding to the hypervisor, does that work on all
> architectures?
>
On x86, cpu_relax is just a "pause" instruction ("rep;nop"). We don't
hook it in paravirt_ops, and while VT/SVM can be used to fault into the
hypervisor on this instruction, I don't know if kvm actually does so.
Either way, it wouldn't work for VMI, Xen or lguest.
J
> p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use
> stop_machine_run and both triggered the problem after some retries.
>
>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> CC: Ingo Molnar <mingo@elte.hu>
> CC: Rusty Russell <rusty@rustcorp.com.au>
>
> ---
> kernel/stop_machine.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> Index: kvm/kernel/stop_machine.c
> ===================================================================
> --- kvm.orig/kernel/stop_machine.c
> +++ kvm/kernel/stop_machine.c
> @@ -62,8 +62,7 @@ static int stopmachine(void *cpu)
> * help our sisters onto their CPUs. */
> if (!prepared && !irqs_disabled)
> yield();
> - else
> - cpu_relax();
> + cpu_relax();
> }
>
> /* Ack: we are exiting. */
> @@ -106,8 +105,10 @@ static int stop_machine(void)
> }
>
> /* Wait for them all to come to life. */
> - while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads)
> + while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads) {
> yield();
> + cpu_relax();
> + }
>
> /* If some failed, kill them all. */
> if (ret < 0) {
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly
2008-05-08 13:33 ` Jeremy Fitzhardinge
@ 2008-05-08 14:41 ` Christian Borntraeger
2008-05-08 14:58 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 6+ messages in thread
From: Christian Borntraeger @ 2008-05-08 14:41 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Rusty Russell, kvm-devel, Ingo Molnar, linux-kernel,
virtualization
Am Donnerstag, 8. Mai 2008 schrieb Jeremy Fitzhardinge:
> Christian Borntraeger wrote:
> > On kvm I have seen some rare hangs in stop_machine when I used more guest
> > cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
> > hang quite often. I could also reproduce the problem on a 4 way z/VM host
with
> > a 64 way guest.
> >
>
> I think that's one of those "don't do that then" cases ;)
I really like 64 guest cpus as a good testcase for all kind of things.
>
> I think x86 (at least) is now using ticket locks, which is fair. Which
> kernel are you seeing this problem on?
Sorry, forgot to mention. Its kvm.git from 2 days ago on s390.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly
2008-05-08 14:41 ` Christian Borntraeger
@ 2008-05-08 14:58 ` Jeremy Fitzhardinge
2008-05-08 16:23 ` Christian Borntraeger
0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-08 14:58 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Rusty Russell, kvm-devel, Ingo Molnar, linux-kernel,
virtualization
Christian Borntraeger wrote:
> I really like 64 guest cpus as a good testcase for all kind of things.
>
Sure, I do the same kind of thing.
>> I think x86 (at least) is now using ticket locks, which is fair. Which
>> kernel are you seeing this problem on?
>>
>
> Sorry, forgot to mention. Its kvm.git from 2 days ago on s390.
>
And on s390 cpu_relax yields the vcpu? That's not common behaviour
across architectures.
J
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly
2008-05-08 14:58 ` Jeremy Fitzhardinge
@ 2008-05-08 16:23 ` Christian Borntraeger
0 siblings, 0 replies; 6+ messages in thread
From: Christian Borntraeger @ 2008-05-08 16:23 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Rusty Russell, kvm-devel, Ingo Molnar, linux-kernel,
virtualization
Am Donnerstag, 8. Mai 2008 schrieb Jeremy Fitzhardinge:
> > Sorry, forgot to mention. Its kvm.git from 2 days ago on s390.
> >
>
> And on s390 cpu_relax yields the vcpu? That's not common behaviour
> across architectures.
Yes, cpu_relax on s390 calls diagnose 44. Diagnose 44 translates to yield on
z/VM and LPAR. Guessing from the number of the diagnose, I think it was used
on z/VM for timeslice yielding long before Linux came to s390.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly
2008-05-08 13:20 [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly Christian Borntraeger
2008-05-08 13:33 ` Jeremy Fitzhardinge
@ 2008-05-09 1:10 ` Rusty Russell
1 sibling, 0 replies; 6+ messages in thread
From: Rusty Russell @ 2008-05-09 1:10 UTC (permalink / raw)
To: Christian Borntraeger; +Cc: kvm-devel, linux-kernel, virtualization
On Thursday 08 May 2008 23:20:38 Christian Borntraeger wrote:
> Changing stop_machine to yield the cpu to the hypervisor when yielding
> inside the guest fixed the problem for me. While I am not completely happy
> with this patch, I think it causes no harm and it really improves the
> situation for me.
Yes, this change is harmless. I'm reworking (ie. rewriting) stop_machine at
the moment to simplify it, and as a side effect it won't be yielding. (The
yield is almost useless, since there's nothing at same priority as this
thread anyway).
I've included this patch for my next push to Linus.
Thanks,
Rusty.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-05-09 1:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-08 13:20 [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly Christian Borntraeger
2008-05-08 13:33 ` Jeremy Fitzhardinge
2008-05-08 14:41 ` Christian Borntraeger
2008-05-08 14:58 ` Jeremy Fitzhardinge
2008-05-08 16:23 ` Christian Borntraeger
2008-05-09 1:10 ` Rusty Russell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox