From: Avi Kivity <avi@redhat.com>
To: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>,
peterz@infradead.org, linux-kernel@vger.kernel.org,
vatsa@linux.vnet.ibm.com, bharata@linux.vnet.ibm.com
Subject: Re: [RFC PATCH 0/4] Gang scheduling in CFS
Date: Mon, 02 Jan 2012 11:37:22 +0200 [thread overview]
Message-ID: <4F017AD2.3090504@redhat.com> (raw)
In-Reply-To: <87pqf5mqg4.fsf@abhimanyu.in.ibm.com>
On 12/31/2011 04:21 AM, Nikunj A Dadhania wrote:
> Here is the results collected from the 64bit VM runs.
Thanks, the data is clearer now.
> Avi, x2apic is enabled in the both guest/host.
>
> One more change in the test setup is I am creating and destroying the VM
> for each benchmark run. Earlier, I used to create 2/4/8 VMs and run 5
> benchmarks one by one(VM was not fresh for some benchmark)
>
> PLE - Test Setup:
> =================
> - x3850x5 machine - PLE enabled
> - 8 CPUs (HT disabled)
> - 264GB memory
> - VM details:
> - Guest kernel: 2.6.32 based enterprise kernel
> - 1024MB memory
> - 8 VCPUs
> - During gang runs, vcpus are pinned
>
> Results:
> * GangVsBase - Gang vs Baseline kernel
> * GangVsPin - Gang vs Baseline kernel + vcpus pinned
> * V1 - Using set_next_buddy
> * V2 - Using set_gang_buddy
> * Results are % improvement/degradation
> +-------------+-----------------------+----------------------+
> | | V1 | V2 |
> + Benchmarks +-----------+-----------+-----------+----------+
> | | GngVsBase | GngVsPin | GngVsBase | GngVsPin |
> +-------------+-----------+-----------+-----------+----------+
> | kbench-2vm | -4 | -5 | -1 | -1 |
> | kbench-4vm | -13 | -3 | 3 | 12 |
> | kbench-8vm | -11 | 0 | -5 | 5 |
> +-------------+-----------+-----------+-----------+----------+
> | ebizzy-2vm | -1 | -2 | 17 | 16 |
> | ebizzy-4vm | 4 | 6 | 58 | 61 |
> | ebizzy-8vm | 3 | 25 | 68 | 103 |
> +-------------+-----------+-----------+-----------+----------+
> | specjbb-2vm | -7 | 0 | -6 | 1 |
> | specjbb-4vm | 19 | 30 | -5 | 3 |
> | specjbb-8vm | -6 | 1 | 5 | 15 |
> +-------------+-----------+-----------+-----------+----------+
> | hbench-2vm | -1 | -6 | 18 | 14 |
> | hbench-4vm | -64 | -9 | -2 | 31 |
> | hbench-8vm | -28 | 10 | 32 | 53 |
> +-------------+-----------+-----------+-----------+----------+
> | dbench-2vm | -3 | -5 | -2 | -3 |
> | dbench-4vm | 9 | 0 | 3 | -5 |
> | dbench-8vm | -3 | -23 | -8 | -26 |
> +-------------+-----------+-----------+-----------+----------+
>
> The best and worst case in V2(GangVsBase).
>
> ebizzy 8vm (improved 68%)
> +------------+--------------------+--------------------+----------+
> | Ebizzy |
> +------------+--------------------+--------------------+----------+
> | Parameter | GangBase | Gang V2 | % imprv |
> +------------+--------------------+--------------------+----------+
> | ebizzy| 2531.75 | 4268.12 | 68 |
> | EbzyUser| 32.60 | 60.70 | 86 |
> | EbzySys| 165.48 | 171.05 | -3 |
> | EbzyReal| 60.00 | 60.00 | 0 |
> | BwUsage| 568645533105.00 | 767186043286.00 | 34 |
> | HostIdle| 89.00 | 89.00 | 0 |
> | UsrTime| 2.00 | 4.00 | 100 |
> | SysTime| 12.00 | 13.00 | -8 |
> | IOWait| 3.00 | 4.00 | -33 |
> | IdleTime| 81.00 | 77.00 | -4 |
> | TPS| 12.00 | 12.00 | 0 |
> +-----------------------------------------------------------------+
>
> GangV2:
> 27.45% ebizzy libc-2.12.so [.] __memcpy_ssse3_back
> 12.12% ebizzy [kernel.kallsyms] [k] clear_page
> 9.22% ebizzy [kernel.kallsyms] [k] __do_page_fault
> 6.91% ebizzy [kernel.kallsyms] [k] flush_tlb_others_ipi
> 4.06% ebizzy [kernel.kallsyms] [k] get_page_from_freelist
> 4.04% ebizzy [kernel.kallsyms] [k] ____pagevec_lru_add
>
> GangBase:
> 45.08% ebizzy [kernel.kallsyms] [k] flush_tlb_others_ipi
> 15.38% ebizzy libc-2.12.so [.] __memcpy_ssse3_back
> 7.00% ebizzy [kernel.kallsyms] [k] clear_page
> 4.88% ebizzy [kernel.kallsyms] [k] __do_page_fault
Looping in flush_tlb_others(). Rik, what trace an we run to find out
why PLE directed yield isn't working as expected?
>
> dbench 8vm (degraded -8%)
> +------------+--------------------+--------------------+----------+
> | Dbench |
> +------------+--------------------+--------------------+----------+
> | Parameter | GangBase | Gang V2 | % imprv |
> +------------+--------------------+--------------------+----------+
> | dbench| 2.27 | 2.09 | -8 |
> | BwUsage| 138973336762.00 | 187382519973.00 | 34 |
> | HostIdle| 95.00 | 93.00 | 2 |
> | IOWait| 20.00 | 19.00 | 5 |
> | IdleTime| 78.00 | 78.00 | 0 |
> | TPS| 13.00 | 14.00 | 7 |
> | CacheMisses| 81611667.00 | 72959014.00 | 10 |
> | CacheRefs| 4990591975.00 | 4624251595.00 | -7 |
> |BranchMisses| 812569051.00 | 1162137278.00 | -43 |
> | Branches| 20196543212.00 | 30318934960.00 | 50 |
> |Instructions| 99519592926.00 | 152169154440.00 | -52 |
> | Cycles| 265699995531.00 | 330718402913.00 | -24 |
> | PageFlt| 36083.00 | 35897.00 | 0 |
> | ContextSW| 3170710.00 | 8304284.00 | -161 |
> | CPUMigrat| 63387.00 | 155521.00 | -145 |
> +-----------------------------------------------------------------+
> dbench needs some more love, i will get the perf top caller for
> that.
>
> non-PLE - Test Setup:
> =====================
> - x3650 M2 machine
> - 8 CPUs (HT disabled)
> - 64GB memory
> - VM details:
> - Guest kernel: 2.6.32 based enterprise kernel
> - 1024MB memory
> - 8 VCPUs
> - During gang runs, vcpus are pinned
>
> Results:
> * GangVsBase - Gang vs Baseline kernel
> * GangVsPin - Gang vs Baseline kernel + vcpus pinned
> * V1 - using set_next_buddy
> * V2 - using set_gang_buddy
> * Results are % improvement/degradation
> +-------------+-----------------------+----------------------+
> | | V1 | V2 |
> + Benchmarks +-----------+-----------+-----------+----------+
> | | GngVsBase | GngVsPin | GngVsBase | GngVsPin |
> +-------------+-----------+-----------+-----------+----------+
> | kbench-2vm | 0 | 2 | -7 | -5 |
> | kbench-4vm | 2 | -3 | 7 | 2 |
> | kbench-8vm | 0 | -1 | -1 | -3 |
> +-------------+-----------+-----------+-----------+----------+
> | ebizzy-2vm | 221 | 109 | 241 | 122 |
> | ebizzy-4vm | 215 | 173 | 366 | 304 |
> | ebizzy-8vm | 225 | 88 | 331 | 149 |
> +-------------+-----------+-----------+-----------+----------+
> | specjbb-2vm | -5 | -3 | -7 | -5 |
> | specjbb-4vm | 29 | -4 | 3 | -23 |
> | specjbb-8vm | 6 | -6 | 16 | 2 |
> +-------------+-----------+-----------+-----------+----------+
> | hbench-2vm | -16 | 2 | 15 | 29 |
> | hbench-4vm | -25 | 2 | 32 | 47 |
> | hbench-8vm | -46 | -19 | 35 | 47 |
> +-------------+-----------+-----------+-----------+----------+
> | dbench-2vm | 0 | 1 | -5 | -3 |
> | dbench-4vm | -9 | -4 | -2 | 2 |
> | dbench-8vm | -52 | 17 | -30 | 69 |
> +-------------+-----------+-----------+-----------+----------+
>
> The best and worst case in V2(GangVsBase).
>
> ebizzy 8vm (improved 331%)
> +------------+--------------------+--------------------+----------+
> | Ebizzy |
> +------------+--------------------+--------------------+----------+
> | Parameter | GangBase | Gang V2 | % imprv |
> +------------+--------------------+--------------------+----------+
> | ebizzy| 719.50 | 3101.38 | 331 |
> | EbzyUser| 3.79 | 58.04 | 1432 |
> | EbzySys| 66.61 | 140.04 | -110 |
> | EbzyReal| 60.00 | 60.00 | 0 |
> | BwUsage| 526550032993.00 | 652012141757.00 | 23 |
> | HostIdle| 59.00 | 62.00 | -5 |
> | SysTime| 5.00 | 11.00 | -120 |
> | IOWait| 4.00 | 4.00 | 0 |
> | IdleTime| 89.00 | 79.00 | -11 |
> | TPS| 11.00 | 12.00 | 9 |
> +-----------------------------------------------------------------+
>
> GangV2:
> 27.96% ebizzy libc-2.12.so [.] __memcpy_ssse3_back
> 12.13% ebizzy [kernel.kallsyms] [k] clear_page
> 11.66% ebizzy [kernel.kallsyms] [k] __bitmap_empty
> 11.54% ebizzy [kernel.kallsyms] [k] flush_tlb_others_ipi
> 5.93% ebizzy [kernel.kallsyms] [k] __do_page_fault
>
> GangBase;
> 36.34% ebizzy [kernel.kallsyms] [k] __bitmap_empty
> 35.95% ebizzy [kernel.kallsyms] [k] flush_tlb_others_ipi
> 8.52% ebizzy libc-2.12.so [.] __memcpy_ssse3_back
Same thing. __bitmap_empty() is likely the cpumask_empty() called from
flush_tlb_others_ipi(), so 70% of time is spent in this loop.
Xen works around this particular busy loop by having a hypercall for
flushing the tlb, but this is very fragile (and broken wrt
get_user_pages_fast() IIRC).
>
> dbench 8vm (degraded -30%)
> +------------+--------------------+--------------------+----------+
> | Dbench |
> +------------+--------------------+--------------------+----------+
> | Parameter | GangBase | Gang V2 | % imprv |
> +------------+--------------------+--------------------+----------+
> | dbench| 2.01 | 1.38 | -30 |
> | BwUsage| 100408068913.00 | 176095548113.00 | 75 |
> | HostIdle| 82.00 | 74.00 | 9 |
> | IOWait| 25.00 | 23.00 | 8 |
> | IdleTime| 74.00 | 71.00 | -4 |
> | TPS| 13.00 | 13.00 | 0 |
> | CacheMisses| 137351386.00 | 267116184.00 | -94 |
> | CacheRefs| 4347880250.00 | 5830408064.00 | 34 |
> |BranchMisses| 602120546.00 | 1110592466.00 | -84 |
> | Branches| 22275747114.00 | 39163309805.00 | 75 |
> |Instructions| 107942079625.00 | 195313721170.00 | -80 |
> | Cycles| 271014283494.00 | 481886203993.00 | -77 |
> | PageFlt| 44373.00 | 47679.00 | -7 |
> | ContextSW| 3318033.00 | 11598234.00 | -249 |
> | CPUMigrat| 82475.00 | 423066.00 | -412 |
> +-----------------------------------------------------------------+
>
Rik, what's going on? ContextSW is relatively low in the base load,
looks like PLE is asleep on the wheel.
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2012-01-02 9:37 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-19 8:33 [RFC PATCH 0/4] Gang scheduling in CFS Nikunj A. Dadhania
2011-12-19 8:34 ` [RFC PATCH 1/4] sched: Adding cpu.gang file to cpu cgroup Nikunj A. Dadhania
2011-12-19 8:34 ` [RFC PATCH 2/4] sched: Adding gang scheduling infrastrucure Nikunj A. Dadhania
2011-12-19 15:51 ` Peter Zijlstra
2011-12-19 16:51 ` Peter Zijlstra
2011-12-20 1:43 ` Nikunj A Dadhania
2011-12-20 1:39 ` Nikunj A Dadhania
2011-12-19 8:34 ` [RFC PATCH 3/4] sched: Gang using set_next_buddy Nikunj A. Dadhania
2011-12-19 8:35 ` [RFC PATCH 4/4] sched:Implement set_gang_buddy Nikunj A. Dadhania
2011-12-19 15:51 ` Peter Zijlstra
2011-12-20 1:43 ` Nikunj A Dadhania
2011-12-26 2:30 ` Nikunj A Dadhania
2011-12-19 11:23 ` [RFC PATCH 0/4] Gang scheduling in CFS Ingo Molnar
2011-12-19 11:44 ` Avi Kivity
2011-12-19 11:50 ` Nikunj A Dadhania
2011-12-19 11:59 ` Avi Kivity
2011-12-19 12:06 ` Nikunj A Dadhania
2011-12-19 12:50 ` Avi Kivity
2011-12-19 13:09 ` Nikunj A Dadhania
2011-12-19 11:45 ` Nikunj A Dadhania
2011-12-19 13:22 ` Nikunj A Dadhania
2011-12-19 16:28 ` Ingo Molnar
2011-12-21 10:39 ` Nikunj A Dadhania
2011-12-21 10:43 ` Avi Kivity
2011-12-23 3:20 ` Nikunj A Dadhania
2011-12-23 10:36 ` Ingo Molnar
2011-12-25 10:58 ` Avi Kivity
2011-12-25 15:45 ` Avi Kivity
2011-12-26 3:14 ` Nikunj A Dadhania
2011-12-26 9:05 ` Avi Kivity
2011-12-26 11:33 ` Nikunj A Dadhania
2011-12-26 11:41 ` Avi Kivity
2011-12-27 1:47 ` Nikunj A Dadhania
2011-12-27 9:15 ` Avi Kivity
2011-12-27 10:24 ` Nikunj A Dadhania
2011-12-29 16:07 ` Better qemu/kvm defaults (was Re: [RFC PATCH 0/4] Gang scheduling in CFS) Dor Laor
2011-12-29 16:07 ` [Qemu-devel] " Dor Laor
2011-12-29 16:13 ` Avi Kivity
2011-12-29 16:13 ` [Qemu-devel] " Avi Kivity
2011-12-29 16:16 ` Anthony Liguori
2011-12-29 16:16 ` Anthony Liguori
2012-01-01 10:16 ` Dor Laor
2012-01-01 10:16 ` [Qemu-devel] " Dor Laor
2012-01-01 14:01 ` Ronen Hod
2012-01-01 14:01 ` Ronen Hod
2012-01-02 9:37 ` Dor Laor
2012-01-02 9:37 ` [Qemu-devel] " Dor Laor
2012-01-03 15:48 ` Anthony Liguori
2012-01-03 15:48 ` Anthony Liguori
2012-01-03 22:31 ` Dor Laor
2012-01-03 22:31 ` Dor Laor
2012-01-03 22:45 ` Anthony Liguori
2012-01-03 22:45 ` [Qemu-devel] " Anthony Liguori
2012-01-03 22:59 ` Dor Laor
2012-01-03 22:59 ` Dor Laor
2011-12-27 3:15 ` [RFC PATCH 0/4] Gang scheduling in CFS Nikunj A Dadhania
2011-12-27 9:17 ` Avi Kivity
2011-12-27 9:44 ` Nikunj A Dadhania
2011-12-27 9:51 ` Avi Kivity
2011-12-27 10:10 ` Nikunj A Dadhania
2011-12-27 10:34 ` Avi Kivity
2011-12-27 10:43 ` Nikunj A Dadhania
2011-12-27 10:53 ` Avi Kivity
2011-12-30 9:51 ` Ingo Molnar
2011-12-30 10:10 ` Nikunj A Dadhania
2011-12-31 2:21 ` Nikunj A Dadhania
2012-01-02 4:20 ` Nikunj A Dadhania
2012-01-02 9:39 ` Avi Kivity
2012-01-02 10:22 ` Nikunj A Dadhania
2012-01-02 9:37 ` Avi Kivity [this message]
2012-01-02 10:30 ` Nikunj A Dadhania
2012-01-02 13:33 ` Avi Kivity
2012-01-04 10:52 ` Nikunj A Dadhania
2012-01-04 14:41 ` Avi Kivity
2012-01-04 14:56 ` Srivatsa Vaddagiri
2012-01-04 17:13 ` Avi Kivity
2012-01-05 6:57 ` Nikunj A Dadhania
2012-01-04 16:47 ` Rik van Riel
2012-01-04 17:16 ` Avi Kivity
2012-01-04 20:56 ` Rik van Riel
2012-01-04 21:31 ` Peter Zijlstra
2012-01-04 21:41 ` Avi Kivity
2012-01-05 9:10 ` Ingo Molnar
2012-02-20 8:08 ` Nikunj A Dadhania
2012-02-20 8:14 ` Ingo Molnar
2012-02-20 10:51 ` Peter Zijlstra
2012-02-20 11:53 ` Nikunj A Dadhania
2012-02-20 12:02 ` Srivatsa Vaddagiri
2012-02-20 12:14 ` Peter Zijlstra
2012-01-05 2:10 ` Nikunj A Dadhania
2011-12-19 15:51 ` Peter Zijlstra
2011-12-19 16:09 ` Alan Cox
2011-12-19 22:10 ` Benjamin Herrenschmidt
2011-12-20 1:56 ` Nikunj A Dadhania
2011-12-20 8:52 ` Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F017AD2.3090504@redhat.com \
--to=avi@redhat.com \
--cc=bharata@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nikunj@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.