* [PATCH] vhost: don't forget to schedule()
@ 2012-02-27 13:07 Nadav Har'El
2012-02-28 13:07 ` Avi Kivity
0 siblings, 1 reply; 7+ messages in thread
From: Nadav Har'El @ 2012-02-27 13:07 UTC (permalink / raw)
To: kvm; +Cc: mst, abelg
This is a tiny, but important, patch to vhost.
Vhost's worker thread only called schedule() when it had no work to do, and
it wanted to go to sleep. But if there's always work to do, e.g., the guest
is running a network-intensive program like netperf with small message sizes,
schedule() was *never* called. This had several negative implications (on
non-preemptive kernels):
1. Passing time was not properly accounted to the "vhost" process (ps and
top would wrongly show it using zero CPU time).
2. Sometimes error messages about RCU timeouts would be printed, if the
core running the vhost thread didn't schedule() for a very long time.
3. Worst of all, a vhost thread would "hog" the core. If several vhost
threads need to share the same core, typically one would get most of the
CPU time (and its associated guest most of the performance), while the
others hardly get any work done.
The trivial solution is to add
if (need_resched())
schedule();
After doing every piece of work. This will not do the heavy schedule() all
the time, just when the timer interrupt decided a reschedule is warranted
(so need_resched returns true).
Thanks to Abel Gordon for this patch.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
---
vhost.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c14c42b..ae66278 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -222,6 +222,8 @@ static int vhost_worker(void *data)
if (work) {
__set_current_state(TASK_RUNNING);
work->fn(work);
+ if (need_resched())
+ schedule();
} else
schedule();
--
Nadav Har'El | Monday, Feb 27 2012,
nyh@math.technion.ac.il |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Ways to Relieve Stress #10: Make up a
http://nadav.harel.org.il |language and ask people for directions.
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] vhost: don't forget to schedule()
2012-02-27 13:07 [PATCH] vhost: don't forget to schedule() Nadav Har'El
@ 2012-02-28 13:07 ` Avi Kivity
2012-02-28 14:00 ` Nadav Har'El
0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2012-02-28 13:07 UTC (permalink / raw)
To: Nadav Har'El; +Cc: kvm, mst, abelg
On 02/27/2012 03:07 PM, Nadav Har'El wrote:
> This is a tiny, but important, patch to vhost.
>
> Vhost's worker thread only called schedule() when it had no work to do, and
> it wanted to go to sleep. But if there's always work to do, e.g., the guest
> is running a network-intensive program like netperf with small message sizes,
> schedule() was *never* called. This had several negative implications (on
> non-preemptive kernels):
>
> 1. Passing time was not properly accounted to the "vhost" process (ps and
> top would wrongly show it using zero CPU time).
>
> 2. Sometimes error messages about RCU timeouts would be printed, if the
> core running the vhost thread didn't schedule() for a very long time.
>
> 3. Worst of all, a vhost thread would "hog" the core. If several vhost
> threads need to share the same core, typically one would get most of the
> CPU time (and its associated guest most of the performance), while the
> others hardly get any work done.
>
> The trivial solution is to add
>
> if (need_resched())
> schedule();
>
> After doing every piece of work. This will not do the heavy schedule() all
> the time, just when the timer interrupt decided a reschedule is warranted
> (so need_resched returns true).
>
> Thanks to Abel Gordon for this patch.
>
> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> ---
> vhost.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index c14c42b..ae66278 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -222,6 +222,8 @@ static int vhost_worker(void *data)
> if (work) {
> __set_current_state(TASK_RUNNING);
> work->fn(work);
> + if (need_resched())
> + schedule();
>
This is cond_resched(), no?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] vhost: don't forget to schedule()
2012-02-28 13:07 ` Avi Kivity
@ 2012-02-28 14:00 ` Nadav Har'El
2012-02-28 14:29 ` Avi Kivity
0 siblings, 1 reply; 7+ messages in thread
From: Nadav Har'El @ 2012-02-28 14:00 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, mst, abelg
On Tue, Feb 28, 2012, Avi Kivity wrote about "Re: [PATCH] vhost: don't forget to schedule()":
> > + if (need_resched())
> > + schedule();
>
> This is cond_resched(), no?
It indeed looks similar, but it appears there are some slightly
different things happening in both cases, especially for a preemptive
kernel... Unfortunately, I am not astute (or experienced) enough to tell
which of the two idioms are better or more appropriate for this case.
The idiom that I used seemed right, and seemed to work in my tests.
Moreover I also noticed it was used in vmx.c. Also, vhost.c was already
calling schedule(), not cond_resched(), so I thought it made sense to
call the same thing...
But I now see that in kvm_main.c, there's also this:
if (!need_resched())
return;
cond_resched();
Which seems to combine both idioms ;-) Can anybody shed a light on what
is the right way to do it?
--
Nadav Har'El | Tuesday, Feb 28 2012,
nyh@math.technion.ac.il |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |I am logged in, therefore I am.
http://nadav.harel.org.il |
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] vhost: don't forget to schedule()
2012-02-28 14:00 ` Nadav Har'El
@ 2012-02-28 14:29 ` Avi Kivity
2012-03-04 9:10 ` Nadav Har'El
0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2012-02-28 14:29 UTC (permalink / raw)
To: Nadav Har'El; +Cc: kvm, mst, abelg
On 02/28/2012 04:00 PM, Nadav Har'El wrote:
> On Tue, Feb 28, 2012, Avi Kivity wrote about "Re: [PATCH] vhost: don't forget to schedule()":
> > > + if (need_resched())
> > > + schedule();
> >
> > This is cond_resched(), no?
>
> It indeed looks similar, but it appears there are some slightly
> different things happening in both cases, especially for a preemptive
> kernel... Unfortunately, I am not astute (or experienced) enough to tell
> which of the two idioms are better or more appropriate for this case.
I'd have expected that cond_resched() is a no-op with preemptible
kernels, but I see this is not the case.
>
> The idiom that I used seemed right, and seemed to work in my tests.
> Moreover I also noticed it was used in vmx.c. Also, vhost.c was already
> calling schedule(), not cond_resched(), so I thought it made sense to
> call the same thing...
>
> But I now see that in kvm_main.c, there's also this:
>
> if (!need_resched())
> return;
> cond_resched();
>
> Which seems to combine both idioms ;-) Can anybody shed a light on what
> is the right way to do it?
>
It's bogus. Look at commit 3fca03653010:
Author: Yaozu Dong <eddie.dong@intel.com>
Date: Wed Apr 25 16:49:19 2007 +0300
KVM: VMX: Avoid unnecessary vcpu_load()/vcpu_put() cycles
By checking if a reschedule is needed, we avoid dropping the vcpu.
[With changes by me, based on Anthony Liguori's observations]
Signed-off-by: Avi Kivity <avi@qumranet.com>
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 03c0ee7..f535635 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1590,6 +1590,8 @@ static int set_msr(struct kvm_vcpu *vcpu, u32
msr_index, u64 data)
void kvm_resched(struct kvm_vcpu *vcpu)
{
+ if (!need_resched())
+ return;
vcpu_put(vcpu);
cond_resched();
vcpu_load(vcpu);
at that time, it made sense to do the extra check to avoid the expensive
vcpu_put/vcpu_load. Later preempt notifiers made them redundant
(15ad71460d75), and they were removed, but the extra check remained.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] vhost: don't forget to schedule()
2012-02-28 14:29 ` Avi Kivity
@ 2012-03-04 9:10 ` Nadav Har'El
2012-03-04 9:16 ` Michael S. Tsirkin
0 siblings, 1 reply; 7+ messages in thread
From: Nadav Har'El @ 2012-03-04 9:10 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, mst, abelg
On Tue, Feb 28, 2012, Avi Kivity wrote about "Re: [PATCH] vhost: don't forget to schedule()":
> > > > + if (need_resched())
> > > > + schedule();
> > > This is cond_resched(), no?
> >
> > It indeed looks similar, but it appears there are some slightly
> > different things happening in both cases, especially for a preemptive
>...
> I'd have expected that cond_resched() is a no-op with preemptible
> kernels, but I see this is not the case.
Hi. This discussion is already getting several orders of magnitude longer
than the patch :-)
Would you like me to send a new one-line patch calling "cond_resched()"
instead of need_resched/schedule? Or anything else that I can do?
> > But I now see that in kvm_main.c, there's also this:
> >
> > if (!need_resched())
> > return;
> > cond_resched();
>...
>
> It's bogus. Look at commit 3fca03653010:
>...
> + if (!need_resched())
> + return;
> vcpu_put(vcpu);
> cond_resched();
> vcpu_load(vcpu);
>
> at that time, it made sense to do the extra check to avoid the expensive
> vcpu_put/vcpu_load. Later preempt notifiers made them redundant
> (15ad71460d75), and they were removed, but the extra check remained.
Do you want a patch to remove this extra check? Or you can just remove
it yourself?
Thanks,
Nadav.
--
Nadav Har'El | Sunday, Mar 4 2012,
nyh@math.technion.ac.il |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |As every cat owner knows, nobody owns a
http://nadav.harel.org.il |cat.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] vhost: don't forget to schedule()
2012-03-04 9:10 ` Nadav Har'El
@ 2012-03-04 9:16 ` Michael S. Tsirkin
2012-03-18 13:55 ` Nadav Har'El
0 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2012-03-04 9:16 UTC (permalink / raw)
To: Nadav Har'El; +Cc: Avi Kivity, kvm, abelg
On Sun, Mar 04, 2012 at 11:10:01AM +0200, Nadav Har'El wrote:
> On Tue, Feb 28, 2012, Avi Kivity wrote about "Re: [PATCH] vhost: don't forget to schedule()":
> > > > > + if (need_resched())
> > > > > + schedule();
> > > > This is cond_resched(), no?
> > >
> > > It indeed looks similar, but it appears there are some slightly
> > > different things happening in both cases, especially for a preemptive
> >...
> > I'd have expected that cond_resched() is a no-op with preemptible
> > kernels, but I see this is not the case.
>
> Hi. This discussion is already getting several orders of magnitude longer
> than the patch :-)
>
> Would you like me to send a new one-line patch calling "cond_resched()"
> instead of need_resched/schedule? Or anything else that I can do?
The argument is mostly about what's faster, right?
You can push the argument along if you run some benchmarks to show the
performance impact of the proposed variants.
Measure bandwidth (using e.g. netperf) divided by host CPU utilization
(note: netperf reports guest utilization, that is mostly not
interesting, use something like mpstat to measure on host),
and compare between baseline, your patch, cond_resched, whatever else
you come up with.
> > > But I now see that in kvm_main.c, there's also this:
> > >
> > > if (!need_resched())
> > > return;
> > > cond_resched();
> >...
> >
> > It's bogus. Look at commit 3fca03653010:
> >...
> > + if (!need_resched())
> > + return;
> > vcpu_put(vcpu);
> > cond_resched();
> > vcpu_load(vcpu);
> >
> > at that time, it made sense to do the extra check to avoid the expensive
> > vcpu_put/vcpu_load. Later preempt notifiers made them redundant
> > (15ad71460d75), and they were removed, but the extra check remained.
>
> Do you want a patch to remove this extra check? Or you can just remove
> it yourself?
>
> Thanks,
> Nadav.
>
> --
> Nadav Har'El | Sunday, Mar 4 2012,
> nyh@math.technion.ac.il |-----------------------------------------
> Phone +972-523-790466, ICQ 13349191 |As every cat owner knows, nobody owns a
> http://nadav.harel.org.il |cat.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] vhost: don't forget to schedule()
2012-03-04 9:16 ` Michael S. Tsirkin
@ 2012-03-18 13:55 ` Nadav Har'El
0 siblings, 0 replies; 7+ messages in thread
From: Nadav Har'El @ 2012-03-18 13:55 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Avi Kivity, kvm, abelg
On Sun, Mar 04, 2012, Michael S. Tsirkin wrote about "Re: [PATCH] vhost: don't forget to schedule()":
> On Sun, Mar 04, 2012 at 11:10:01AM +0200, Nadav Har'El wrote:
> > On Tue, Feb 28, 2012, Avi Kivity wrote about "Re: [PATCH] vhost: don't forget to schedule()":
> > > > > > + if (need_resched())
> > > > > > + schedule();
> > > > > This is cond_resched(), no?
>...
> > Hi. This discussion is already getting several orders of magnitude longer
> > than the patch :-)
>...
>
> The argument is mostly about what's faster, right?
> You can push the argument along if you run some benchmarks to show the
> performance impact of the proposed variants.
Hi,
I did some tests using Netperf TCP-STREAM, message size 1K, over a
10Gbps network. Each of the numbers below are an average of several runs.
Test 1: "First, Do no harm" - verify that with just one guest, when these
extra reschedule attempts can't help, they at least don't hurt.
* Unmodified vhost-net: 3135 Mbps.
* My patch, with if(need_sched())schedule(): 3088 Mbps.
The 1.5% average performance decrease is very small, and close the to
measurement variation (1.4%) so I'd say it's negligable.
* Avi's proposed alternative, with cond_sched() had, suprisingly, a
very high variation in the results. Two measurments yielded an
average of 3038 Mbps (close, but slightly lower than above), but
6 other measurements yielded significantly lower numbers, between 2300
and 2875.
So it seems my patch does not hurt performance, so we don't need to
find a "better" version for performance's sake.
The cond_resched() version was worse - had a slightly worse maximum
performance, and a high variance, so I don't recommend it. That being
said, I don't have a good explanation on why cond_resched() is bad.
Test 2: Demonstrate that the patch is actually important.
Here we run two or three guests doing the same Netperf to a different
server, with all 2 or 3 vhost threads' affinity set to the same core.
* Behaviour of the unmodified vhost-net was terrible, and demonstrates
what this bug fixes:
- For two guests, one yielded a throughput of 3172 Mbps, but the second
yielded only 0.5 Mbps (!!), because its vhost thread hardly ran at all.
For three guests, again, one guest yielded 3202 Mbps, but the second
and third each yielded less than 1Mbps.
- "top" and similar utilities did not show the vhost threads at all,
because CPU time is not accounted if you never schedule().
* Performance with our patch (if(need_sched())schedule()), fixes both
problems:
- For two guests, one yields 1557 Mbps, the second 1531 Mbps.
Fair, and efficient (1557+1531=3088), as expected.
For three guests, the number sare 1032+1028+1045=3105.
- "top" shows each vhost thread to take close to the expect half or
third of the core running both.
* With the alternative cond_resched(), again the problems are fixed,
but performance is less consistent, varying by almost 20% from run to
run. Again, I can't explaine why.
All it all, I think this proves that the patch I sent, as is, is
important, and does no harm, and definitely not worse (and perhaps
better) than the proposed alternative implementation with cond_resched.
So please apply.
Thanks,
Nadav.
--
Nadav Har'El | Sunday, Mar 18 2012,
nyh@math.technion.ac.il |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |If glory comes after death, I'm not in a
http://nadav.harel.org.il |hurry. (Latin proverb)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-03-18 13:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-27 13:07 [PATCH] vhost: don't forget to schedule() Nadav Har'El
2012-02-28 13:07 ` Avi Kivity
2012-02-28 14:00 ` Nadav Har'El
2012-02-28 14:29 ` Avi Kivity
2012-03-04 9:10 ` Nadav Har'El
2012-03-04 9:16 ` Michael S. Tsirkin
2012-03-18 13:55 ` Nadav Har'El
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).