From: "David S. Ahern" <daahern@cisco.com>
To: Avi Kivity <avi@qumranet.com>
Cc: kvm@vger.kernel.org
Subject: Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)
Date: Thu, 29 May 2008 10:42:22 -0600 [thread overview]
Message-ID: <483EDCEE.6070307@cisco.com> (raw)
In-Reply-To: <483D391F.7050007@qumranet.com>
[-- Attachment #1: Type: text/plain, Size: 5838 bytes --]
Avi Kivity wrote:
> David S. Ahern wrote:
>> The short answer is that I am still see large system time hiccups in the
>> guests due to kscand in the guest scanning its active lists. I do see
>> better response for a KVM_MAX_PTE_HISTORY of 3 than with 4. (For
>> completeness I also tried a history of 2, but it performed worse than 3
>> which is no surprise given the meaning of it.)
>>
>>
>> I have been able to scratch out a simplistic program that stimulates
>> kscand activity similar to what is going on in my real guest (see
>> attached). The program requests a memory allocation, initializes it (to
>> get it backed) and then in a loop sweeps through the memory in chunks
>> similar to a program using parts of its memory here and there but
>> eventually accessing all of it.
>>
>> Start the RHEL3/CentOS 3 guest with *2GB* of RAM (or more). The key is
>> using a fair amount of highmem. Start a couple of instances of the
>> attached. For example, I've been using these 2:
>>
>> memuser 768M 120 5 300
>> memuser 384M 300 10 600
>>
>> Together these instances take up a 1GB of RAM and once initialized
>> consume very little CPU. On kvm they make kscand and kswapd go nuts
>> every 5-15 minutes. For comparison, I do not see the same behavior for
>> an identical setup running on esx 3.5.
>>
>
> I haven't been able to reproduce this:
>
>> [root@localhost root]# ps -elf | grep -E 'memuser|kscand'
>> 1 S root 7 1 1 75 0 - 0 schedu 10:07 ?
>> 00:00:26 [kscand]
>> 0 S root 1464 1 1 75 0 - 196986 schedu 10:20 pts/0
>> 00:00:21 ./memuser 768M 120 5 300
>> 0 S root 1465 1 0 75 0 - 98683 schedu 10:20 pts/0
>> 00:00:10 ./memuser 384M 300 10 600
>> 0 S root 2148 1293 0 75 0 - 922 pipe_w 10:48 pts/0
>> 00:00:00 grep -E memuser|kscand
>
> The workload has been running for about half an hour, and kswapd cpu
> usage doesn't seem significant. This is a 2GB guest running with my
> patch ported to kvm.git HEAD. Guest is has 2G of memory.
>
I'm running on the per-page-pte-tracking branch, and I am still seeing it.
I doubt you want to sit and watch the screen for an hour, so install sysstat if not already, change the sample rate to 1 minute (/etc/cron.d/sysstat), let the server run for a few hours and then run 'sar -u'. You'll see something like this:
10:12:11 AM LINUX RESTART
10:13:03 AM CPU %user %nice %system %iowait %idle
10:14:01 AM all 0.08 0.00 2.08 0.35 97.49
10:15:03 AM all 0.05 0.00 0.79 0.04 99.12
10:15:59 AM all 0.15 0.00 1.52 0.06 98.27
10:17:01 AM all 0.04 0.00 0.69 0.04 99.23
10:17:59 AM all 0.01 0.00 0.39 0.00 99.60
10:18:59 AM all 0.00 0.00 0.12 0.02 99.87
10:20:02 AM all 0.18 0.00 14.62 0.09 85.10
10:21:01 AM all 0.71 0.00 26.35 0.01 72.94
10:22:02 AM all 0.67 0.00 10.61 0.00 88.72
10:22:59 AM all 0.14 0.00 1.80 0.00 98.06
10:24:03 AM all 0.13 0.00 0.50 0.00 99.37
10:24:59 AM all 0.09 0.00 11.46 0.00 88.45
10:26:03 AM all 0.16 0.00 0.69 0.03 99.12
10:26:59 AM all 0.14 0.00 10.01 0.02 89.83
10:28:03 AM all 0.57 0.00 2.20 0.03 97.20
Average: all 0.21 0.00 5.55 0.05 94.20
every one of those jumps in %system time directly correlates to kscand activity. Without the memuser programs running the guest %system time is <1%. The point of this silly memuser program is just to use high memory -- let it age, then make it active again, sit idle, repeat. If you run kvm_stat with -l in the host you'll see the jump in pte writes/updates. An intern here added a timestamp to the kvm_stat output for me which helps to directly correlate guest/host data.
I also ran my real guest on the branch. Performance at boot through the first 15 minutes was much better, but I'm still seeing recurring hits every 5 minutes when kscand kicks in. Here's the data from the guest for the first one which happened after 15 minutes of uptime:
active_anon_scan: HighMem, age 11, count[age] 24886 -> 5796, direct 24845, dj 59
active_anon_scan: HighMem, age 7, count[age] 47772 -> 21289, direct 40868, dj 103
active_anon_scan: HighMem, age 3, count[age] 91007 -> 329, direct 45805, dj 1212
The kvm_stat data for this time period is attached due to line lengths.
Also, I forgot to mention this before, but there is a bug in the kscand code in the RHEL3U8 kernel. When it scans the cache list it uses the count from the anonymous list:
if (need_active_cache_scan(zone)) {
for (age = MAX_AGE-1; age >= 0; age--) {
scan_active_list(zone, age,
&zone->active_cache_list[age],
zone->active_anon_count[age]);
^^^^^^^^^^^^^^^^^
if (current->need_resched)
schedule();
}
}
When the anonymous count is higher it is scanning the cache list repeatedly. An example of that was captured here:
active_cache_scan: HighMem, age 7, count[age] 222 -> 179, count anon 111967, direct 626, dj 3
count anon is active_anon_count[age] which at this moment was 111,967. There were only 222 entries in the cache list, but the count value passed to scan_active_list was 111,967. When the cache list has a lot of direct pages, that causes a larger hit on kvm than needed. That said, I have to live with the bug in the guest.
david
[-- Attachment #2: kvm_stat.kscand --]
[-- Type: text/plain, Size: 2650 bytes --]
kvm-69/kvm_stat -f 'mmu*|pf*' -l:
mmio_exit mmu_cache mmu_flood mmu_pde_z mmu_pte_u mmu_pte_w mmu_recyc mmu_shado pf_fixed pf_guest
182 18 18 0 5664 5682 0 18 5720 21
211 59 59 0 7040 7105 0 59 7348 99
81 0 48 0 45861 45909 0 48 45910 1
209 683 814 0 178527 179405 0 814 181410 9
67 111 320 0 175602 175922 0 320 177202 0
28 0 29 0 181365 181394 0 29 181394 0
7 0 22 0 181834 181856 0 22 181855 0
35 0 14 0 180129 180143 0 14 180144 0
7 0 10 0 179141 179151 0 10 179150 0
35 0 3 0 181359 181361 0 3 181362 0
7 0 4 0 181565 181570 0 4 181570 0
21 0 3 0 181435 181437 0 3 181437 0
21 0 4 0 181281 181286 0 4 181285 0
21 0 3 0 179444 179447 0 3 179448 0
91 0 61 0 179841 179902 0 61 179902 0
7 0 247 0 176628 176875 0 247 176874 0
313 478 133 1 100486 100604 0 133 126690 80
162 21 18 0 6361 6379 0 18 6584 5
294 40 23 21 9144 9188 0 25 9544 45
143 5 1 0 5026 5027 0 1 5502 1
The above corresponds to the following from the guest:
active_anon_scan: HighMem, age 11, count[age] 24886 -> 5796, direct 24845, dj 59
active_anon_scan: HighMem, age 7, count[age] 47772 -> 21289, direct 40868, dj 103
active_anon_scan: HighMem, age 3, count[age] 91007 -> 329, direct 45805, dj 1212
next prev parent reply other threads:[~2008-05-29 16:43 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-16 0:15 performance with guests running 2.4 kernels (specifically RHEL3) David S. Ahern
2008-04-16 8:46 ` Avi Kivity
2008-04-17 21:12 ` David S. Ahern
2008-04-18 7:57 ` Avi Kivity
2008-04-21 4:31 ` David S. Ahern
2008-04-21 9:19 ` Avi Kivity
2008-04-21 17:07 ` David S. Ahern
2008-04-22 20:23 ` David S. Ahern
2008-04-23 8:04 ` Avi Kivity
2008-04-23 15:23 ` David S. Ahern
2008-04-23 15:53 ` Avi Kivity
2008-04-23 16:39 ` David S. Ahern
2008-04-24 17:25 ` David S. Ahern
2008-04-26 6:43 ` Avi Kivity
2008-04-26 6:20 ` Avi Kivity
2008-04-25 17:33 ` David S. Ahern
2008-04-26 6:45 ` Avi Kivity
2008-04-28 18:15 ` Marcelo Tosatti
2008-04-28 23:45 ` David S. Ahern
2008-04-30 4:18 ` David S. Ahern
2008-04-30 9:55 ` Avi Kivity
2008-04-30 13:39 ` David S. Ahern
2008-04-30 13:49 ` Avi Kivity
2008-05-11 12:32 ` Avi Kivity
2008-05-11 13:36 ` Avi Kivity
2008-05-13 3:49 ` David S. Ahern
2008-05-13 7:25 ` Avi Kivity
2008-05-14 20:35 ` David S. Ahern
2008-05-15 10:53 ` Avi Kivity
2008-05-17 4:31 ` David S. Ahern
[not found] ` <482FCEE1.5040306@qumranet.com>
[not found] ` <4830F90A.1020809@cisco.com>
2008-05-19 4:14 ` [kvm-devel] " David S. Ahern
2008-05-19 14:27 ` Avi Kivity
2008-05-19 16:25 ` David S. Ahern
2008-05-19 17:04 ` Avi Kivity
2008-05-20 14:19 ` Avi Kivity
2008-05-20 14:34 ` Avi Kivity
2008-05-22 22:08 ` David S. Ahern
2008-05-28 10:51 ` Avi Kivity
2008-05-28 14:13 ` David S. Ahern
2008-05-28 14:35 ` Avi Kivity
2008-05-28 19:49 ` David S. Ahern
2008-05-29 6:37 ` Avi Kivity
2008-05-28 14:48 ` Andrea Arcangeli
2008-05-28 14:57 ` Avi Kivity
2008-05-28 15:39 ` David S. Ahern
2008-05-29 11:49 ` Avi Kivity
2008-05-29 12:10 ` Avi Kivity
2008-05-29 13:49 ` David S. Ahern
2008-05-29 14:08 ` Avi Kivity
2008-05-28 15:58 ` Andrea Arcangeli
2008-05-28 15:37 ` Avi Kivity
2008-05-28 15:43 ` David S. Ahern
2008-05-28 17:04 ` Andrea Arcangeli
2008-05-28 17:24 ` David S. Ahern
2008-05-29 10:01 ` Avi Kivity
2008-05-29 14:27 ` Andrea Arcangeli
2008-05-29 15:11 ` David S. Ahern
2008-05-29 15:16 ` Avi Kivity
2008-05-30 13:12 ` Andrea Arcangeli
2008-05-31 7:39 ` Avi Kivity
2008-05-29 16:42 ` David S. Ahern [this message]
2008-05-31 8:16 ` Avi Kivity
2008-06-02 16:42 ` David S. Ahern
2008-06-05 8:37 ` Avi Kivity
2008-06-05 16:20 ` David S. Ahern
2008-06-06 16:40 ` Avi Kivity
2008-06-19 4:20 ` David S. Ahern
2008-06-22 6:34 ` Avi Kivity
2008-06-23 14:09 ` David S. Ahern
2008-06-25 9:51 ` Avi Kivity
2008-04-30 13:56 ` Daniel P. Berrange
2008-04-30 14:23 ` David S. Ahern
2008-04-23 8:03 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=483EDCEE.6070307@cisco.com \
--to=daahern@cisco.com \
--cc=avi@qumranet.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.