From: Avi Kivity <avi@redhat.com>
To: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>,
peterz@infradead.org, linux-kernel@vger.kernel.org,
vatsa@linux.vnet.ibm.com, bharata@linux.vnet.ibm.com
Subject: Re: [RFC PATCH 0/4] Gang scheduling in CFS
Date: Mon, 02 Jan 2012 11:37:22 +0200 [thread overview]
Message-ID: <4F017AD2.3090504@redhat.com> (raw)
In-Reply-To: <87pqf5mqg4.fsf@abhimanyu.in.ibm.com>
On 12/31/2011 04:21 AM, Nikunj A Dadhania wrote:
> Here is the results collected from the 64bit VM runs.
Thanks, the data is clearer now.
> Avi, x2apic is enabled in the both guest/host.
>
> One more change in the test setup is I am creating and destroying the VM
> for each benchmark run. Earlier, I used to create 2/4/8 VMs and run 5
> benchmarks one by one(VM was not fresh for some benchmark)
>
> PLE - Test Setup:
> =================
> - x3850x5 machine - PLE enabled
> - 8 CPUs (HT disabled)
> - 264GB memory
> - VM details:
> - Guest kernel: 2.6.32 based enterprise kernel
> - 1024MB memory
> - 8 VCPUs
> - During gang runs, vcpus are pinned
>
> Results:
> * GangVsBase - Gang vs Baseline kernel
> * GangVsPin - Gang vs Baseline kernel + vcpus pinned
> * V1 - Using set_next_buddy
> * V2 - Using set_gang_buddy
> * Results are % improvement/degradation
> +-------------+-----------------------+----------------------+
> | | V1 | V2 |
> + Benchmarks +-----------+-----------+-----------+----------+
> | | GngVsBase | GngVsPin | GngVsBase | GngVsPin |
> +-------------+-----------+-----------+-----------+----------+
> | kbench-2vm | -4 | -5 | -1 | -1 |
> | kbench-4vm | -13 | -3 | 3 | 12 |
> | kbench-8vm | -11 | 0 | -5 | 5 |
> +-------------+-----------+-----------+-----------+----------+
> | ebizzy-2vm | -1 | -2 | 17 | 16 |
> | ebizzy-4vm | 4 | 6 | 58 | 61 |
> | ebizzy-8vm | 3 | 25 | 68 | 103 |
> +-------------+-----------+-----------+-----------+----------+
> | specjbb-2vm | -7 | 0 | -6 | 1 |
> | specjbb-4vm | 19 | 30 | -5 | 3 |
> | specjbb-8vm | -6 | 1 | 5 | 15 |
> +-------------+-----------+-----------+-----------+----------+
> | hbench-2vm | -1 | -6 | 18 | 14 |
> | hbench-4vm | -64 | -9 | -2 | 31 |
> | hbench-8vm | -28 | 10 | 32 | 53 |
> +-------------+-----------+-----------+-----------+----------+
> | dbench-2vm | -3 | -5 | -2 | -3 |
> | dbench-4vm | 9 | 0 | 3 | -5 |
> | dbench-8vm | -3 | -23 | -8 | -26 |
> +-------------+-----------+-----------+-----------+----------+
>
> The best and worst case in V2(GangVsBase).
>
> ebizzy 8vm (improved 68%)
> +------------+--------------------+--------------------+----------+
> | Ebizzy |
> +------------+--------------------+--------------------+----------+
> | Parameter | GangBase | Gang V2 | % imprv |
> +------------+--------------------+--------------------+----------+
> | ebizzy| 2531.75 | 4268.12 | 68 |
> | EbzyUser| 32.60 | 60.70 | 86 |
> | EbzySys| 165.48 | 171.05 | -3 |
> | EbzyReal| 60.00 | 60.00 | 0 |
> | BwUsage| 568645533105.00 | 767186043286.00 | 34 |
> | HostIdle| 89.00 | 89.00 | 0 |
> | UsrTime| 2.00 | 4.00 | 100 |
> | SysTime| 12.00 | 13.00 | -8 |
> | IOWait| 3.00 | 4.00 | -33 |
> | IdleTime| 81.00 | 77.00 | -4 |
> | TPS| 12.00 | 12.00 | 0 |
> +-----------------------------------------------------------------+
>
> GangV2:
> 27.45% ebizzy libc-2.12.so [.] __memcpy_ssse3_back
> 12.12% ebizzy [kernel.kallsyms] [k] clear_page
> 9.22% ebizzy [kernel.kallsyms] [k] __do_page_fault
> 6.91% ebizzy [kernel.kallsyms] [k] flush_tlb_others_ipi
> 4.06% ebizzy [kernel.kallsyms] [k] get_page_from_freelist
> 4.04% ebizzy [kernel.kallsyms] [k] ____pagevec_lru_add
>
> GangBase:
> 45.08% ebizzy [kernel.kallsyms] [k] flush_tlb_others_ipi
> 15.38% ebizzy libc-2.12.so [.] __memcpy_ssse3_back
> 7.00% ebizzy [kernel.kallsyms] [k] clear_page
> 4.88% ebizzy [kernel.kallsyms] [k] __do_page_fault
Looping in flush_tlb_others(). Rik, what trace an we run to find out
why PLE directed yield isn't working as expected?
>
> dbench 8vm (degraded -8%)
> +------------+--------------------+--------------------+----------+
> | Dbench |
> +------------+--------------------+--------------------+----------+
> | Parameter | GangBase | Gang V2 | % imprv |
> +------------+--------------------+--------------------+----------+
> | dbench| 2.27 | 2.09 | -8 |
> | BwUsage| 138973336762.00 | 187382519973.00 | 34 |
> | HostIdle| 95.00 | 93.00 | 2 |
> | IOWait| 20.00 | 19.00 | 5 |
> | IdleTime| 78.00 | 78.00 | 0 |
> | TPS| 13.00 | 14.00 | 7 |
> | CacheMisses| 81611667.00 | 72959014.00 | 10 |
> | CacheRefs| 4990591975.00 | 4624251595.00 | -7 |
> |BranchMisses| 812569051.00 | 1162137278.00 | -43 |
> | Branches| 20196543212.00 | 30318934960.00 | 50 |
> |Instructions| 99519592926.00 | 152169154440.00 | -52 |
> | Cycles| 265699995531.00 | 330718402913.00 | -24 |
> | PageFlt| 36083.00 | 35897.00 | 0 |
> | ContextSW| 3170710.00 | 8304284.00 | -161 |
> | CPUMigrat| 63387.00 | 155521.00 | -145 |
> +-----------------------------------------------------------------+
> dbench needs some more love, i will get the perf top caller for
> that.
>
> non-PLE - Test Setup:
> =====================
> - x3650 M2 machine
> - 8 CPUs (HT disabled)
> - 64GB memory
> - VM details:
> - Guest kernel: 2.6.32 based enterprise kernel
> - 1024MB memory
> - 8 VCPUs
> - During gang runs, vcpus are pinned
>
> Results:
> * GangVsBase - Gang vs Baseline kernel
> * GangVsPin - Gang vs Baseline kernel + vcpus pinned
> * V1 - using set_next_buddy
> * V2 - using set_gang_buddy
> * Results are % improvement/degradation
> +-------------+-----------------------+----------------------+
> | | V1 | V2 |
> + Benchmarks +-----------+-----------+-----------+----------+
> | | GngVsBase | GngVsPin | GngVsBase | GngVsPin |
> +-------------+-----------+-----------+-----------+----------+
> | kbench-2vm | 0 | 2 | -7 | -5 |
> | kbench-4vm | 2 | -3 | 7 | 2 |
> | kbench-8vm | 0 | -1 | -1 | -3 |
> +-------------+-----------+-----------+-----------+----------+
> | ebizzy-2vm | 221 | 109 | 241 | 122 |
> | ebizzy-4vm | 215 | 173 | 366 | 304 |
> | ebizzy-8vm | 225 | 88 | 331 | 149 |
> +-------------+-----------+-----------+-----------+----------+
> | specjbb-2vm | -5 | -3 | -7 | -5 |
> | specjbb-4vm | 29 | -4 | 3 | -23 |
> | specjbb-8vm | 6 | -6 | 16 | 2 |
> +-------------+-----------+-----------+-----------+----------+
> | hbench-2vm | -16 | 2 | 15 | 29 |
> | hbench-4vm | -25 | 2 | 32 | 47 |
> | hbench-8vm | -46 | -19 | 35 | 47 |
> +-------------+-----------+-----------+-----------+----------+
> | dbench-2vm | 0 | 1 | -5 | -3 |
> | dbench-4vm | -9 | -4 | -2 | 2 |
> | dbench-8vm | -52 | 17 | -30 | 69 |
> +-------------+-----------+-----------+-----------+----------+
>
> The best and worst case in V2(GangVsBase).
>
> ebizzy 8vm (improved 331%)
> +------------+--------------------+--------------------+----------+
> | Ebizzy |
> +------------+--------------------+--------------------+----------+
> | Parameter | GangBase | Gang V2 | % imprv |
> +------------+--------------------+--------------------+----------+
> | ebizzy| 719.50 | 3101.38 | 331 |
> | EbzyUser| 3.79 | 58.04 | 1432 |
> | EbzySys| 66.61 | 140.04 | -110 |
> | EbzyReal| 60.00 | 60.00 | 0 |
> | BwUsage| 526550032993.00 | 652012141757.00 | 23 |
> | HostIdle| 59.00 | 62.00 | -5 |
> | SysTime| 5.00 | 11.00 | -120 |
> | IOWait| 4.00 | 4.00 | 0 |
> | IdleTime| 89.00 | 79.00 | -11 |
> | TPS| 11.00 | 12.00 | 9 |
> +-----------------------------------------------------------------+
>
> GangV2:
> 27.96% ebizzy libc-2.12.so [.] __memcpy_ssse3_back
> 12.13% ebizzy [kernel.kallsyms] [k] clear_page
> 11.66% ebizzy [kernel.kallsyms] [k] __bitmap_empty
> 11.54% ebizzy [kernel.kallsyms] [k] flush_tlb_others_ipi
> 5.93% ebizzy [kernel.kallsyms] [k] __do_page_fault
>
> GangBase;
> 36.34% ebizzy [kernel.kallsyms] [k] __bitmap_empty
> 35.95% ebizzy [kernel.kallsyms] [k] flush_tlb_others_ipi
> 8.52% ebizzy libc-2.12.so [.] __memcpy_ssse3_back
Same thing. __bitmap_empty() is likely the cpumask_empty() called from
flush_tlb_others_ipi(), so 70% of time is spent in this loop.
Xen works around this particular busy loop by having a hypercall for
flushing the tlb, but this is very fragile (and broken wrt
get_user_pages_fast() IIRC).
>
> dbench 8vm (degraded -30%)
> +------------+--------------------+--------------------+----------+
> | Dbench |
> +------------+--------------------+--------------------+----------+
> | Parameter | GangBase | Gang V2 | % imprv |
> +------------+--------------------+--------------------+----------+
> | dbench| 2.01 | 1.38 | -30 |
> | BwUsage| 100408068913.00 | 176095548113.00 | 75 |
> | HostIdle| 82.00 | 74.00 | 9 |
> | IOWait| 25.00 | 23.00 | 8 |
> | IdleTime| 74.00 | 71.00 | -4 |
> | TPS| 13.00 | 13.00 | 0 |
> | CacheMisses| 137351386.00 | 267116184.00 | -94 |
> | CacheRefs| 4347880250.00 | 5830408064.00 | 34 |
> |BranchMisses| 602120546.00 | 1110592466.00 | -84 |
> | Branches| 22275747114.00 | 39163309805.00 | 75 |
> |Instructions| 107942079625.00 | 195313721170.00 | -80 |
> | Cycles| 271014283494.00 | 481886203993.00 | -77 |
> | PageFlt| 44373.00 | 47679.00 | -7 |
> | ContextSW| 3318033.00 | 11598234.00 | -249 |
> | CPUMigrat| 82475.00 | 423066.00 | -412 |
> +-----------------------------------------------------------------+
>
Rik, what's going on? ContextSW is relatively low in the base load,
looks like PLE is asleep on the wheel.
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2012-01-02 9:37 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-19 8:33 [RFC PATCH 0/4] Gang scheduling in CFS Nikunj A. Dadhania
2011-12-19 8:34 ` [RFC PATCH 1/4] sched: Adding cpu.gang file to cpu cgroup Nikunj A. Dadhania
2011-12-19 8:34 ` [RFC PATCH 2/4] sched: Adding gang scheduling infrastrucure Nikunj A. Dadhania
2011-12-19 15:51 ` Peter Zijlstra
2011-12-19 16:51 ` Peter Zijlstra
2011-12-20 1:43 ` Nikunj A Dadhania
2011-12-20 1:39 ` Nikunj A Dadhania
2011-12-19 8:34 ` [RFC PATCH 3/4] sched: Gang using set_next_buddy Nikunj A. Dadhania
2011-12-19 8:35 ` [RFC PATCH 4/4] sched:Implement set_gang_buddy Nikunj A. Dadhania
2011-12-19 15:51 ` Peter Zijlstra
2011-12-20 1:43 ` Nikunj A Dadhania
2011-12-26 2:30 ` Nikunj A Dadhania
2011-12-19 11:23 ` [RFC PATCH 0/4] Gang scheduling in CFS Ingo Molnar
2011-12-19 11:44 ` Avi Kivity
2011-12-19 11:50 ` Nikunj A Dadhania
2011-12-19 11:59 ` Avi Kivity
2011-12-19 12:06 ` Nikunj A Dadhania
2011-12-19 12:50 ` Avi Kivity
2011-12-19 13:09 ` Nikunj A Dadhania
2011-12-19 11:45 ` Nikunj A Dadhania
2011-12-19 13:22 ` Nikunj A Dadhania
2011-12-19 16:28 ` Ingo Molnar
2011-12-21 10:39 ` Nikunj A Dadhania
2011-12-21 10:43 ` Avi Kivity
2011-12-23 3:20 ` Nikunj A Dadhania
2011-12-23 10:36 ` Ingo Molnar
2011-12-25 10:58 ` Avi Kivity
2011-12-25 15:45 ` Avi Kivity
2011-12-26 3:14 ` Nikunj A Dadhania
2011-12-26 9:05 ` Avi Kivity
2011-12-26 11:33 ` Nikunj A Dadhania
2011-12-26 11:41 ` Avi Kivity
2011-12-27 1:47 ` Nikunj A Dadhania
2011-12-27 9:15 ` Avi Kivity
2011-12-27 10:24 ` Nikunj A Dadhania
2011-12-27 3:15 ` Nikunj A Dadhania
2011-12-27 9:17 ` Avi Kivity
2011-12-27 9:44 ` Nikunj A Dadhania
2011-12-27 9:51 ` Avi Kivity
2011-12-27 10:10 ` Nikunj A Dadhania
2011-12-27 10:34 ` Avi Kivity
2011-12-27 10:43 ` Nikunj A Dadhania
2011-12-27 10:53 ` Avi Kivity
2011-12-30 9:51 ` Ingo Molnar
2011-12-30 10:10 ` Nikunj A Dadhania
2011-12-31 2:21 ` Nikunj A Dadhania
2012-01-02 4:20 ` Nikunj A Dadhania
2012-01-02 9:39 ` Avi Kivity
2012-01-02 10:22 ` Nikunj A Dadhania
2012-01-02 9:37 ` Avi Kivity [this message]
2012-01-02 10:30 ` Nikunj A Dadhania
2012-01-02 13:33 ` Avi Kivity
2012-01-04 10:52 ` Nikunj A Dadhania
2012-01-04 14:41 ` Avi Kivity
2012-01-04 14:56 ` Srivatsa Vaddagiri
2012-01-04 17:13 ` Avi Kivity
2012-01-05 6:57 ` Nikunj A Dadhania
2012-01-04 16:47 ` Rik van Riel
2012-01-04 17:16 ` Avi Kivity
2012-01-04 20:56 ` Rik van Riel
2012-01-04 21:31 ` Peter Zijlstra
2012-01-04 21:41 ` Avi Kivity
2012-01-05 9:10 ` Ingo Molnar
2012-02-20 8:08 ` Nikunj A Dadhania
2012-02-20 8:14 ` Ingo Molnar
2012-02-20 10:51 ` Peter Zijlstra
2012-02-20 11:53 ` Nikunj A Dadhania
2012-02-20 12:02 ` Srivatsa Vaddagiri
2012-02-20 12:14 ` Peter Zijlstra
2012-01-05 2:10 ` Nikunj A Dadhania
2011-12-19 15:51 ` Peter Zijlstra
2011-12-19 16:09 ` Alan Cox
2011-12-19 22:10 ` Benjamin Herrenschmidt
2011-12-20 1:56 ` Nikunj A Dadhania
2011-12-20 8:52 ` Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F017AD2.3090504@redhat.com \
--to=avi@redhat.com \
--cc=bharata@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nikunj@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).