From: "David S. Ahern" <daahern@cisco.com>
To: Avi Kivity <avi@qumranet.com>
Cc: kvm@vger.kernel.org
Subject: Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)
Date: Mon, 02 Jun 2008 10:42:22 -0600 [thread overview]
Message-ID: <484422EE.5090501@cisco.com> (raw)
In-Reply-To: <4841094A.8090507@qumranet.com>
Avi Kivity wrote:
> David S. Ahern wrote:
>>> I haven't been able to reproduce this:
>>>
>>>
>>>> [root@localhost root]# ps -elf | grep -E 'memuser|kscand'
>>>> 1 S root 7 1 1 75 0 - 0 schedu 10:07 ?
>>>> 00:00:26 [kscand]
>>>> 0 S root 1464 1 1 75 0 - 196986 schedu 10:20 pts/0
>>>> 00:00:21 ./memuser 768M 120 5 300
>>>> 0 S root 1465 1 0 75 0 - 98683 schedu 10:20 pts/0
>>>> 00:00:10 ./memuser 384M 300 10 600
>>>> 0 S root 2148 1293 0 75 0 - 922 pipe_w 10:48 pts/0
>>>> 00:00:00 grep -E memuser|kscand
>>>>
>>> The workload has been running for about half an hour, and kswapd cpu
>>> usage doesn't seem significant. This is a 2GB guest running with my
>>> patch ported to kvm.git HEAD. Guest is has 2G of memory.
>>>
>>>
>>
>> I'm running on the per-page-pte-tracking branch, and I am still seeing
>> it.
>> I doubt you want to sit and watch the screen for an hour, so install
>> sysstat if not already, change the sample rate to 1 minute
>> (/etc/cron.d/sysstat), let the server run for a few hours and then run
>> 'sar -u'. You'll see something like this:
>>
>> 10:12:11 AM LINUX RESTART
>>
>> 10:13:03 AM CPU %user %nice %system %iowait %idle
>> 10:14:01 AM all 0.08 0.00 2.08 0.35 97.49
>> 10:15:03 AM all 0.05 0.00 0.79 0.04 99.12
>> 10:15:59 AM all 0.15 0.00 1.52 0.06 98.27
>> 10:17:01 AM all 0.04 0.00 0.69 0.04 99.23
>> 10:17:59 AM all 0.01 0.00 0.39 0.00 99.60
>> 10:18:59 AM all 0.00 0.00 0.12 0.02 99.87
>> 10:20:02 AM all 0.18 0.00 14.62 0.09 85.10
>> 10:21:01 AM all 0.71 0.00 26.35 0.01 72.94
>> 10:22:02 AM all 0.67 0.00 10.61 0.00 88.72
>> 10:22:59 AM all 0.14 0.00 1.80 0.00 98.06
>> 10:24:03 AM all 0.13 0.00 0.50 0.00 99.37
>> 10:24:59 AM all 0.09 0.00 11.46 0.00 88.45
>> 10:26:03 AM all 0.16 0.00 0.69 0.03 99.12
>> 10:26:59 AM all 0.14 0.00 10.01 0.02 89.83
>> 10:28:03 AM all 0.57 0.00 2.20 0.03 97.20
>> Average: all 0.21 0.00 5.55 0.05 94.20
>>
>>
>> every one of those jumps in %system time directly correlates to kscand
>> activity. Without the memuser programs running the guest %system time
>> is <1%. The point of this silly memuser program is just to use high
>> memory -- let it age, then make it active again, sit idle, repeat. If
>> you run kvm_stat with -l in the host you'll see the jump in pte
>> writes/updates. An intern here added a timestamp to the kvm_stat
>> output for me which helps to directly correlate guest/host data.
>>
>>
>> I also ran my real guest on the branch. Performance at boot through
>> the first 15 minutes was much better, but I'm still seeing recurring
>> hits every 5 minutes when kscand kicks in. Here's the data from the
>> guest for the first one which happened after 15 minutes of uptime:
>>
>> active_anon_scan: HighMem, age 11, count[age] 24886 -> 5796, direct
>> 24845, dj 59
>>
>> active_anon_scan: HighMem, age 7, count[age] 47772 -> 21289, direct
>> 40868, dj 103
>>
>> active_anon_scan: HighMem, age 3, count[age] 91007 -> 329, direct
>> 45805, dj 1212
>>
>>
>
> We touched 90,000 ptes in 12 seconds. That's 8,000 ptes per second.
> Yet we see 180,000 page faults per second in the trace.
>
> Oh! Only 45K pages were direct, so the other 45K were shared, with
> perhaps many ptes. We shoud count ptes, not pages.
>
> Can you modify page_referenced() to count the numbers of ptes mapped (1
> for direct pages, nr_chains for indirect pages) and print the total
> deltas in active_anon_scan?
>
Here you go. I've shortened the line lengths to get them to squeeze into
80 columns:
anon_scan, all HighMem zone, 187,910 active pages at loop start:
count[12] 21462 -> 230, direct 20469, chains 3479, dj 58
count[11] 1338 -> 1162, direct 227, chains 26144, dj 59
count[8] 29397 -> 5410, direct 26115, chains 27617, dj 117
count[4] 35804 -> 25556, direct 31508, chains 82929, dj 256
count[3] 2738 -> 2207, direct 2680, chains 58, dj 7
count[0] 92580 -> 89509, direct 75024, chains 262834, dj 726
(age number is the index in [])
cache_scan, all HighMem zone, 48,298 active pages at loop start:
count[12] 3642 -> 2982, direct 499, chains 20022, dj 44
count[8] 11254 -> 11187, direct 7189, chains 9854, dj 37
count[4] 15709 -> 15702, direct 5071, chains 9388, dj 31
(with anon_cache_count bug fixed)
If you sum the direct pages and the chains count for each row, convert
dj into dt (divided by HZ = 100) you get:
( 20469 + 3479 ) / 0.58 = 41289
( 227 + 26144 ) / 0.59 = 44696
( 26115 + 27617 ) / 1.17 = 45924
( 31508 + 82929 ) / 2.56 = 44701
( 2680 + 58 ) / 0.07 = 39114
( 75024 + 262834 ) / 7.26 = 46536
( 499 + 20022 ) / 0.44 = 46638
( 7189 + 9854 ) / 0.37 = 46062
( 5071 + 9388 ) / 0.31 = 46641
For 4 pte writes per direct page or chain entry comes to ~187,000/sec
which is close to the total collected by kvm_stat (data width shrunk to
fit in e-mail; hope this is readable still):
|---------- mmu_ ----------|----- pf_ -----|
cache flood pde_z pte_u pte_w shado fixed guest
267 271 95 21455 21842 285 22840 165
66 88 0 12102 12224 88 12458 0
2042 2133 0 178146 180515 2133 188089 387
1053 1212 0 187067 188485 1212 193011 8
4771 4811 88 185129 190998 4825 207490 448
910 824 7 183066 184050 824 195836 12
707 785 0 176381 177300 785 180350 6
1167 1144 0 189618 191014 1144 195902 10
4238 4193 87 188381 193590 4206 207030 465
1448 1400 7 187786 189509 1400 198688 21
982 971 0 187880 189076 971 198405 2
1165 1208 0 190007 191503 1208 195746 13
1106 1146 0 189144 190550 1146 195143 0
4767 4788 96 185802 191704 4802 206362 477
1388 1431 0 187387 188991 1431 195115 3
584 551 0 77176 77802 551 84829 10
12 7 0 3601 3609 7 13497 4
243 153 91 31085 31333 167 35059 879
21 18 6 3130 3155 18 3827 2
21 4 1 4665 4670 4 6825 9
>> The kvm_stat data for this time period is attached due to line lengths.
>>
>>
>> Also, I forgot to mention this before, but there is a bug in the
>> kscand code in the RHEL3U8 kernel. When it scans the cache list it
>> uses the count from the anonymous list:
>>
>> if (need_active_cache_scan(zone)) {
>> for (age = MAX_AGE-1; age >= 0; age--) {
>> scan_active_list(zone, age,
>> &zone->active_cache_list[age],
>> zone->active_anon_count[age]);
>> ^^^^^^^^^^^^^^^^^
>> if (current->need_resched)
>> schedule();
>> }
>> }
>>
>> When the anonymous count is higher it is scanning the cache list
>> repeatedly. An example of that was captured here:
>>
>> active_cache_scan: HighMem, age 7, count[age] 222 -> 179, count anon
>> 111967, direct 626, dj 3
>>
>> count anon is active_anon_count[age] which at this moment was 111,967.
>> There were only 222 entries in the cache list, but the count value
>> passed to scan_active_list was 111,967. When the cache list has a lot
>> of direct pages, that causes a larger hit on kvm than needed. That
>> said, I have to live with the bug in the guest.
>>
>
> For debugging, can you fix it? It certainly has a large impact.
>
yes, I have run a few tests with it fixed to get a ballpark on the
impact. The fix is included in the number above.
> Perhaps it is fixed in an update kernel. There's a 2.4.21-50.EL in the
> centos 3.8 update repos.
>
next prev parent reply other threads:[~2008-06-02 16:42 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-16 0:15 performance with guests running 2.4 kernels (specifically RHEL3) David S. Ahern
2008-04-16 8:46 ` Avi Kivity
2008-04-17 21:12 ` David S. Ahern
2008-04-18 7:57 ` Avi Kivity
2008-04-21 4:31 ` David S. Ahern
2008-04-21 9:19 ` Avi Kivity
2008-04-21 17:07 ` David S. Ahern
2008-04-22 20:23 ` David S. Ahern
2008-04-23 8:04 ` Avi Kivity
2008-04-23 15:23 ` David S. Ahern
2008-04-23 15:53 ` Avi Kivity
2008-04-23 16:39 ` David S. Ahern
2008-04-24 17:25 ` David S. Ahern
2008-04-26 6:43 ` Avi Kivity
2008-04-26 6:20 ` Avi Kivity
2008-04-25 17:33 ` David S. Ahern
2008-04-26 6:45 ` Avi Kivity
2008-04-28 18:15 ` Marcelo Tosatti
2008-04-28 23:45 ` David S. Ahern
2008-04-30 4:18 ` David S. Ahern
2008-04-30 9:55 ` Avi Kivity
2008-04-30 13:39 ` David S. Ahern
2008-04-30 13:49 ` Avi Kivity
2008-05-11 12:32 ` Avi Kivity
2008-05-11 13:36 ` Avi Kivity
2008-05-13 3:49 ` David S. Ahern
2008-05-13 7:25 ` Avi Kivity
2008-05-14 20:35 ` David S. Ahern
2008-05-15 10:53 ` Avi Kivity
2008-05-17 4:31 ` David S. Ahern
[not found] ` <482FCEE1.5040306@qumranet.com>
[not found] ` <4830F90A.1020809@cisco.com>
2008-05-19 4:14 ` [kvm-devel] " David S. Ahern
2008-05-19 14:27 ` Avi Kivity
2008-05-19 16:25 ` David S. Ahern
2008-05-19 17:04 ` Avi Kivity
2008-05-20 14:19 ` Avi Kivity
2008-05-20 14:34 ` Avi Kivity
2008-05-22 22:08 ` David S. Ahern
2008-05-28 10:51 ` Avi Kivity
2008-05-28 14:13 ` David S. Ahern
2008-05-28 14:35 ` Avi Kivity
2008-05-28 19:49 ` David S. Ahern
2008-05-29 6:37 ` Avi Kivity
2008-05-28 14:48 ` Andrea Arcangeli
2008-05-28 14:57 ` Avi Kivity
2008-05-28 15:39 ` David S. Ahern
2008-05-29 11:49 ` Avi Kivity
2008-05-29 12:10 ` Avi Kivity
2008-05-29 13:49 ` David S. Ahern
2008-05-29 14:08 ` Avi Kivity
2008-05-28 15:58 ` Andrea Arcangeli
2008-05-28 15:37 ` Avi Kivity
2008-05-28 15:43 ` David S. Ahern
2008-05-28 17:04 ` Andrea Arcangeli
2008-05-28 17:24 ` David S. Ahern
2008-05-29 10:01 ` Avi Kivity
2008-05-29 14:27 ` Andrea Arcangeli
2008-05-29 15:11 ` David S. Ahern
2008-05-29 15:16 ` Avi Kivity
2008-05-30 13:12 ` Andrea Arcangeli
2008-05-31 7:39 ` Avi Kivity
2008-05-29 16:42 ` David S. Ahern
2008-05-31 8:16 ` Avi Kivity
2008-06-02 16:42 ` David S. Ahern [this message]
2008-06-05 8:37 ` Avi Kivity
2008-06-05 16:20 ` David S. Ahern
2008-06-06 16:40 ` Avi Kivity
2008-06-19 4:20 ` David S. Ahern
2008-06-22 6:34 ` Avi Kivity
2008-06-23 14:09 ` David S. Ahern
2008-06-25 9:51 ` Avi Kivity
2008-04-30 13:56 ` Daniel P. Berrange
2008-04-30 14:23 ` David S. Ahern
2008-04-23 8:03 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=484422EE.5090501@cisco.com \
--to=daahern@cisco.com \
--cc=avi@qumranet.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox