From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Avi Kivity <avi@redhat.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Srikar <srikar@linux.vnet.ibm.com>,
"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
KVM <kvm@vger.kernel.org>, Jiannan Ouyang <ouyang@cs.pitt.edu>,
chegu vinod <chegu_vinod@hp.com>,
"Andrew M. Theurer" <habanero@linux.vnet.ibm.com>,
LKML <linux-kernel@vger.kernel.org>,
Srivatsa Vaddagiri <srivatsa.vaddagiri@gmail.com>,
Gleb Natapov <gleb@redhat.com>
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
Date: Wed, 3 Oct 2012 17:52:09 +0530 [thread overview]
Message-ID: <20121003122209.GA9076@linux.vnet.ibm.com> (raw)
In-Reply-To: <50607F1F.2040704@redhat.com>
* Avi Kivity <avi@redhat.com> [2012-09-24 17:41:19]:
> On 09/21/2012 08:24 PM, Raghavendra K T wrote:
> > On 09/21/2012 06:32 PM, Rik van Riel wrote:
> >> On 09/21/2012 08:00 AM, Raghavendra K T wrote:
> >>> From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> >>>
> >>> When total number of VCPUs of system is less than or equal to physical
> >>> CPUs,
> >>> PLE exits become costly since each VCPU can have dedicated PCPU, and
> >>> trying to find a target VCPU to yield_to just burns time in PLE handler.
> >>>
> >>> This patch reduces overhead, by simply doing a return in such
> >>> scenarios by
> >>> checking the length of current cpu runqueue.
> >>
> >> I am not convinced this is the way to go.
> >>
> >> The VCPU that is holding the lock, and is not releasing it,
> >> probably got scheduled out. That implies that VCPU is on a
> >> runqueue with at least one other task.
> >
> > I see your point here, we have two cases:
> >
> > case 1)
> >
> > rq1 : vcpu1->wait(lockA) (spinning)
> > rq2 : vcpu2->holding(lockA) (running)
> >
> > Here Ideally vcpu1 should not enter PLE handler, since it would surely
> > get the lock within ple_window cycle. (assuming ple_window is tuned for
> > that workload perfectly).
> >
> > May be this explains why we are not seeing benefit with kernbench.
> >
> > On the other side, Since we cannot have a perfect ple_window tuned for
> > all type of workloads, for those workloads, which may need more than
> > 4096 cycles, we gain. thinking is it that we are seeing in benefited
> > cases?
>
> Maybe we need to increase the ple window regardless. 4096 cycles is 2
> microseconds or less (call it t_spin). The overhead from
> kvm_vcpu_on_spin() and the associated task switches is at least a few
> microseconds, increasing as contention is added (call it t_tield). The
> time for a natural context switch is several milliseconds (call it
> t_slice). There is also the time the lock holder owns the lock,
> assuming no contention (t_hold).
>
> If t_yield > t_spin, then in the undercommitted case it dominates
> t_spin. If t_hold > t_spin we lose badly.
>
> If t_spin > t_yield, then the undercommitted case doesn't suffer as much
> as most of the spinning happens in the guest instead of the host, so it
> can pick up the unlock timely. We don't lose too much in the
> overcommitted case provided the values aren't too far apart (say a
> factor of 3).
>
> Obviously t_spin must be significantly smaller than t_slice, otherwise
> it accomplishes nothing.
>
> Regarding t_hold: if it is small, then a larger t_spin helps avoid false
> exits. If it is large, then we're not very sensitive to t_spin. It
> doesn't matter if it takes us 2 usec or 20 usec to yield, if we end up
> yielding for several milliseconds.
>
> So I think it's worth trying again with ple_window of 20000-40000.
>
Hi Avi,
I ran different benchmarks increasing ple_window, and results does not
seem to be encouraging for increasing ple_window.
Results:
16 core PLE machine with 16 vcpu guest.
base kernel = 3.6-rc5 + ple handler optimization patch
base_pleopt_8k = base kernel + ple window = 8k
base_pleopt_16k = base kernel + ple window = 16k
base_pleopt_32k = base kernel + ple window = 32k
Percentage improvements of benchmarks w.r.t base_pleopt with ple_window = 4096
base_pleopt_8k base_pleopt_16k base_pleopt_32k
-----------------------------------------------------------------
kernbench_1x -5.54915 -15.94529 -44.31562
kernbench_2x -7.89399 -17.75039 -37.73498
-----------------------------------------------------------------
sysbench_1x 0.45955 -0.98778 0.05252
sysbench_2x 1.44071 -0.81625 1.35620
sysbench_3x 0.45549 1.51795 -0.41573
-----------------------------------------------------------------
hackbench_1x -3.80272 -13.91456 -40.79059
hackbench_2x -4.78999 -7.61382 -7.24475
-----------------------------------------------------------------
ebizzy_1x -2.54626 -16.86050 -38.46109
ebizzy_2x -8.75526 -19.29116 -48.33314
-----------------------------------------------------------------
I also got perf top output to analyse the difference. Difference comes
because of flushtlb (and also spinlock).
Ebizzy run for 4k ple_window
- 87.20% [kernel] [k] arch_local_irq_restore
- arch_local_irq_restore
- 100.00% _raw_spin_unlock_irqrestore
+ 52.89% release_pages
+ 47.10% pagevec_lru_move_fn
- 5.71% [kernel] [k] arch_local_irq_restore
- arch_local_irq_restore
+ 86.03% default_send_IPI_mask_allbutself_phys
+ 13.96% default_send_IPI_mask_sequence_phys
- 3.10% [kernel] [k] smp_call_function_many
smp_call_function_many
Ebizzy run for 32k ple_window
- 91.40% [kernel] [k] arch_local_irq_restore
- arch_local_irq_restore
- 100.00% _raw_spin_unlock_irqrestore
+ 53.13% release_pages
+ 46.86% pagevec_lru_move_fn
- 4.38% [kernel] [k] smp_call_function_many
smp_call_function_many
- 2.51% [kernel] [k] arch_local_irq_restore
- arch_local_irq_restore
+ 90.76% default_send_IPI_mask_allbutself_phys
+ 9.24% default_send_IPI_mask_sequence_phys
Below is the detailed result:
patch = base_pleopt_8k
+-----------+-----------+-----------+------------+-----------+
kernbench
+-----------+-----------+-----------+------------+-----------+
base stddev patch stdev %improve
+-----------+-----------+-----------+------------+-----------+
41.0027 0.7990 43.2780 0.5180 -5.54915
89.2983 1.2406 96.3475 1.8891 -7.89399
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
sysbench
+-----------+-----------+-----------+------------+-----------+
9.9010 0.0558 9.8555 0.1246 0.45955
19.7611 0.4290 19.4764 0.0835 1.44071
29.1775 0.9903 29.0446 0.8641 0.45549
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
hackbench
+-----------+-----------+-----------+------------+-----------+
77.1580 1.9787 80.0921 2.9696 -3.80272
239.2490 1.5660 250.7090 2.6074 -4.78999
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
ebizzy
+-----------+-----------+-----------+------------+-----------+
4256.2500 186.8053 4147.8750 206.1840 -2.54626
2197.2500 93.1048 2004.8750 85.7995 -8.75526
+-----------+-----------+-----------+------------+-----------+
patch = base_pleopt_16k
+-----------+-----------+-----------+------------+-----------+
kernbench
+-----------+-----------+-----------+------------+-----------+
base stddev patch stdev %improve
+-----------+-----------+-----------+------------+-----------+
41.0027 0.7990 47.5407 0.5739 -15.94529
89.2983 1.2406 105.1491 1.2244 -17.75039
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
sysbench
+-----------+-----------+-----------+------------+-----------+
9.9010 0.0558 9.9988 0.1106 -0.98778
19.7611 0.4290 19.9224 0.9016 -0.81625
29.1775 0.9903 28.7346 0.2788 1.51795
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
hackbench
+-----------+-----------+-----------+------------+-----------+
77.1580 1.9787 87.8942 2.2132 -13.91456
239.2490 1.5660 257.4650 5.3674 -7.61382
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
ebizzy
+-----------+-----------+-----------+------------+-----------+
4256.2500 186.8053 3538.6250 101.1165 -16.86050
2197.2500 93.1048 1773.3750 91.8414 -19.29116
+-----------+-----------+-----------+------------+-----------+
patch = base_pleopt_32k
+-----------+-----------+-----------+------------+-----------+
kernbench
+-----------+-----------+-----------+------------+-----------+
base stddev patch stdev %improve
+-----------+-----------+-----------+------------+-----------+
41.0027 0.7990 59.1733 0.8102 -44.31562
89.2983 1.2406 122.9950 1.5534 -37.73498
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
sysbench
+-----------+-----------+-----------+------------+-----------+
9.9010 0.0558 9.8958 0.0593 0.05252
19.7611 0.4290 19.4931 0.1767 1.35620
29.1775 0.9903 29.2988 1.0420 -0.41573
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
hackbench
+-----------+-----------+-----------+------------+-----------+
77.1580 1.9787 108.6312 13.1500 -40.79059
239.2490 1.5660 256.5820 2.2722 -7.24475
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
ebizzy
+-----------+-----------+-----------+------------+-----------+
4256.2500 186.8053 2619.2500 80.8150 -38.46109
2197.2500 93.1048 1135.2500 22.2887 -48.33314
+-----------+-----------+-----------+------------+-----------+
next prev parent reply other threads:[~2012-10-03 12:26 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-21 11:59 [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler Raghavendra K T
2012-09-21 12:00 ` [PATCH RFC 1/2] kvm: Handle undercommitted guest case " Raghavendra K T
2012-09-21 13:02 ` Rik van Riel
2012-09-21 17:24 ` Raghavendra K T
2012-09-24 15:41 ` Avi Kivity
2012-09-24 16:06 ` Avi Kivity
2012-09-24 16:14 ` Peter Zijlstra
2012-09-24 16:25 ` Avi Kivity
2012-09-25 8:09 ` Raghavendra K T
2012-09-25 8:54 ` Avi Kivity
2012-09-25 13:49 ` Raghavendra K T
2012-09-27 7:44 ` Gleb Natapov
2012-09-27 8:59 ` Avi Kivity
2012-09-27 9:11 ` Gleb Natapov
2012-09-27 9:33 ` Avi Kivity
2012-09-27 9:58 ` Gleb Natapov
2012-09-27 10:04 ` Avi Kivity
2012-09-27 10:08 ` Gleb Natapov
2012-09-27 10:15 ` Avi Kivity
[not found] ` <CAJocwcf+8u84_yDC-PK0Yni93YSTWzYvr69nq6b3pNv1MwVJzQ@mail.gmail.com>
2012-09-27 8:50 ` Avi Kivity
2012-09-27 11:26 ` Raghavendra K T
2012-09-27 12:06 ` Avi Kivity
2012-09-28 18:18 ` Konrad Rzeszutek Wilk
2012-09-30 8:16 ` Avi Kivity
[not found] ` <CAJocwcc19F+PtsQ5okGMvYeVnkEigpZRpwWY9JgeRPFqfcVoXA@mail.gmail.com>
2012-09-28 6:16 ` Raghavendra K T
2012-09-30 8:18 ` Avi Kivity
2012-09-30 11:07 ` Gleb Natapov
2012-09-30 11:13 ` Avi Kivity
2012-10-03 14:17 ` Raghavendra K T
2012-10-03 14:56 ` Avi Kivity
2012-10-04 7:29 ` Gleb Natapov
2012-10-05 8:36 ` Raghavendra K T
2012-10-07 9:51 ` Avi Kivity
2012-09-25 7:36 ` Raghavendra K T
2012-09-25 8:12 ` Avi Kivity
2012-09-25 14:21 ` Takuya Yoshikawa
2012-09-27 8:43 ` Avi Kivity
2012-10-03 12:22 ` Raghavendra K T [this message]
2012-10-03 17:05 ` Avi Kivity
2012-10-04 10:49 ` Raghavendra K T
2012-10-04 12:41 ` Avi Kivity
2012-10-04 13:07 ` Peter Zijlstra
2012-10-04 15:00 ` Avi Kivity
2012-10-09 18:51 ` Raghavendra K T
2012-10-10 2:59 ` Andrew Theurer
2012-10-10 17:54 ` Raghavendra K T
2012-10-10 18:03 ` David Ahern
2012-10-10 18:14 ` Raghavendra K T
2012-10-10 19:36 ` Andrew Theurer
2012-10-15 12:10 ` Raghavendra K T
2012-10-15 14:34 ` Andrew Theurer
2012-10-19 8:30 ` Raghavendra K T
2012-10-19 13:31 ` Andrew Theurer
2012-10-10 14:24 ` Andrew Theurer
2012-10-10 17:43 ` Raghavendra K T
2012-10-10 19:27 ` Andrew Theurer
2012-10-11 17:13 ` Raghavendra K T
2012-10-11 10:39 ` Nikunj A Dadhania
2012-10-18 12:39 ` Avi Kivity
2012-10-19 8:19 ` Raghavendra K T
2012-10-04 14:41 ` Andrew Theurer
2012-10-05 9:06 ` Raghavendra K T
2012-10-05 9:02 ` Raghavendra K T
2012-09-24 11:33 ` Peter Zijlstra
2012-09-24 11:40 ` Raghavendra K T
2012-09-21 12:00 ` [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario " Raghavendra K T
2012-09-21 13:22 ` Rik van Riel
2012-09-21 13:46 ` Takuya Yoshikawa
2012-09-21 13:52 ` Rik van Riel
2012-09-21 17:45 ` Raghavendra K T
2012-09-24 13:43 ` Takuya Yoshikawa
2012-09-24 15:26 ` Avi Kivity
2012-09-24 15:34 ` Peter Zijlstra
2012-09-24 15:43 ` Avi Kivity
2012-09-24 15:52 ` Peter Zijlstra
2012-09-24 15:58 ` Avi Kivity
2012-09-24 16:05 ` Peter Zijlstra
2012-09-24 16:10 ` Avi Kivity
2012-09-24 16:13 ` Peter Zijlstra
2012-09-24 16:21 ` Avi Kivity
2012-09-25 10:11 ` Avi Kivity
2012-09-21 13:18 ` [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios " Chegu Vinod
2012-09-21 17:36 ` Raghavendra K T
2012-09-24 8:42 ` Dor Laor
2012-09-24 12:02 ` Raghavendra K T
2012-09-25 15:00 ` Dor Laor
2012-09-26 12:27 ` Konrad Rzeszutek Wilk
2012-09-27 10:07 ` Raghavendra K T
2012-09-27 9:49 ` Raghavendra K T
2012-09-27 10:28 ` Andrew Jones
2012-09-27 10:44 ` Avi Kivity
2012-09-27 11:31 ` Raghavendra K T
2012-09-27 10:33 ` Dor Laor
2012-09-24 11:34 ` Peter Zijlstra
2012-09-24 11:52 ` Raghavendra K T
2012-09-24 12:36 ` Peter Zijlstra
2012-09-24 13:29 ` Raghavendra K T
2012-09-24 13:54 ` Peter Zijlstra
2012-09-24 14:16 ` Raghavendra K T
2012-09-25 13:40 ` Raghavendra K T
2012-09-27 8:36 ` Avi Kivity
2012-09-27 11:23 ` Raghavendra K T
2012-09-27 12:03 ` Avi Kivity
2012-09-27 12:25 ` Andrew Theurer
2012-09-28 5:38 ` Raghavendra K T
2012-09-28 5:45 ` H. Peter Anvin
2012-09-28 6:03 ` Raghavendra K T
2012-09-28 8:38 ` Peter Zijlstra
2012-09-28 11:40 ` Andrew Theurer
2012-09-28 14:11 ` Raghavendra K T
2012-09-28 14:13 ` Peter Zijlstra
2012-09-30 8:24 ` Avi Kivity
2012-10-03 14:29 ` Raghavendra K T
2012-10-03 17:25 ` Avi Kivity
2012-10-04 10:56 ` Raghavendra K T
2012-10-04 12:44 ` Avi Kivity
2012-10-05 9:04 ` Raghavendra K T
2012-09-24 15:51 ` Avi Kivity
2012-09-24 16:03 ` Peter Zijlstra
2012-09-24 16:20 ` Avi Kivity
2012-09-26 13:20 ` Andrew Jones
2012-09-26 13:26 ` Peter Zijlstra
2012-09-26 13:39 ` Andrew Jones
2012-09-26 13:45 ` Peter Zijlstra
2012-09-26 12:57 ` Andrew Jones
2012-09-27 10:21 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121003122209.GA9076@linux.vnet.ibm.com \
--to=raghavendra.kt@linux.vnet.ibm.com \
--cc=avi@redhat.com \
--cc=chegu_vinod@hp.com \
--cc=gleb@redhat.com \
--cc=habanero@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=nikunj@linux.vnet.ibm.com \
--cc=ouyang@cs.pitt.edu \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=srivatsa.vaddagiri@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.