From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David S. Ahern" Subject: Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3) Date: Thu, 29 May 2008 10:42:22 -0600 Message-ID: <483EDCEE.6070307@cisco.com> References: <48054518.3000104@cisco.com> <4805BCF1.6040605@qumranet.com> <4807BD53.6020304@cisco.com> <48085485.3090205@qumranet.com> <480C188F.3020101@cisco.com> <480C5C39.4040300@qumranet.com> <480E492B.3060500@cisco.com> <480EEDA0.3080209@qumranet.com> <480F546C.2030608@cisco.com> <481215DE.3000302@cisco.com> <20080428181550.GA3965@dmt> <4816617F.3080403@cisco.com> <4817F30C.6050308@cisco.com> <48184228.2020701@qumranet.com> <481876A9.1010806@cisco.com> <48187903.2070409@qumranet.com> <4826E744.1080107@qumranet.com> <4826F668.6030305@qumranet.com> <48290FC2.4070505@cisco.com> <48294272.5020801@qumranet.com> <482B4D29.7010202@cisco.com> <482C1633.5070302@qumranet.com> <482E5F9C.6000207@cisco.com> <482FCEE1.5040306@qumranet.com> <4830F90A.1020809@cisco.com> <4830FE8D.6010006@cisco.com> <4 8318E64.8090706@qumranet.com> <4832DDEB.4000100@qumranet.com> <4835EEF5.9010600@cisco.com> <483D391F.7050007@qumranet.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------010809060909000601060204" Cc: kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from sj-iport-1.cisco.com ([171.71.176.70]:45747 "EHLO sj-iport-1.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755827AbYE2QnT (ORCPT ); Thu, 29 May 2008 12:43:19 -0400 In-Reply-To: <483D391F.7050007@qumranet.com> Sender: kvm-owner@vger.kernel.org List-ID: This is a multi-part message in MIME format. --------------010809060909000601060204 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Avi Kivity wrote: > David S. Ahern wrote: >> The short answer is that I am still see large system time hiccups in the >> guests due to kscand in the guest scanning its active lists. I do see >> better response for a KVM_MAX_PTE_HISTORY of 3 than with 4. (For >> completeness I also tried a history of 2, but it performed worse than 3 >> which is no surprise given the meaning of it.) >> >> >> I have been able to scratch out a simplistic program that stimulates >> kscand activity similar to what is going on in my real guest (see >> attached). The program requests a memory allocation, initializes it (to >> get it backed) and then in a loop sweeps through the memory in chunks >> similar to a program using parts of its memory here and there but >> eventually accessing all of it. >> >> Start the RHEL3/CentOS 3 guest with *2GB* of RAM (or more). The key is >> using a fair amount of highmem. Start a couple of instances of the >> attached. For example, I've been using these 2: >> >> memuser 768M 120 5 300 >> memuser 384M 300 10 600 >> >> Together these instances take up a 1GB of RAM and once initialized >> consume very little CPU. On kvm they make kscand and kswapd go nuts >> every 5-15 minutes. For comparison, I do not see the same behavior for >> an identical setup running on esx 3.5. >> > > I haven't been able to reproduce this: > >> [root@localhost root]# ps -elf | grep -E 'memuser|kscand' >> 1 S root 7 1 1 75 0 - 0 schedu 10:07 ? >> 00:00:26 [kscand] >> 0 S root 1464 1 1 75 0 - 196986 schedu 10:20 pts/0 >> 00:00:21 ./memuser 768M 120 5 300 >> 0 S root 1465 1 0 75 0 - 98683 schedu 10:20 pts/0 >> 00:00:10 ./memuser 384M 300 10 600 >> 0 S root 2148 1293 0 75 0 - 922 pipe_w 10:48 pts/0 >> 00:00:00 grep -E memuser|kscand > > The workload has been running for about half an hour, and kswapd cpu > usage doesn't seem significant. This is a 2GB guest running with my > patch ported to kvm.git HEAD. Guest is has 2G of memory. > I'm running on the per-page-pte-tracking branch, and I am still seeing it. I doubt you want to sit and watch the screen for an hour, so install sysstat if not already, change the sample rate to 1 minute (/etc/cron.d/sysstat), let the server run for a few hours and then run 'sar -u'. You'll see something like this: 10:12:11 AM LINUX RESTART 10:13:03 AM CPU %user %nice %system %iowait %idle 10:14:01 AM all 0.08 0.00 2.08 0.35 97.49 10:15:03 AM all 0.05 0.00 0.79 0.04 99.12 10:15:59 AM all 0.15 0.00 1.52 0.06 98.27 10:17:01 AM all 0.04 0.00 0.69 0.04 99.23 10:17:59 AM all 0.01 0.00 0.39 0.00 99.60 10:18:59 AM all 0.00 0.00 0.12 0.02 99.87 10:20:02 AM all 0.18 0.00 14.62 0.09 85.10 10:21:01 AM all 0.71 0.00 26.35 0.01 72.94 10:22:02 AM all 0.67 0.00 10.61 0.00 88.72 10:22:59 AM all 0.14 0.00 1.80 0.00 98.06 10:24:03 AM all 0.13 0.00 0.50 0.00 99.37 10:24:59 AM all 0.09 0.00 11.46 0.00 88.45 10:26:03 AM all 0.16 0.00 0.69 0.03 99.12 10:26:59 AM all 0.14 0.00 10.01 0.02 89.83 10:28:03 AM all 0.57 0.00 2.20 0.03 97.20 Average: all 0.21 0.00 5.55 0.05 94.20 every one of those jumps in %system time directly correlates to kscand activity. Without the memuser programs running the guest %system time is <1%. The point of this silly memuser program is just to use high memory -- let it age, then make it active again, sit idle, repeat. If you run kvm_stat with -l in the host you'll see the jump in pte writes/updates. An intern here added a timestamp to the kvm_stat output for me which helps to directly correlate guest/host data. I also ran my real guest on the branch. Performance at boot through the first 15 minutes was much better, but I'm still seeing recurring hits every 5 minutes when kscand kicks in. Here's the data from the guest for the first one which happened after 15 minutes of uptime: active_anon_scan: HighMem, age 11, count[age] 24886 -> 5796, direct 24845, dj 59 active_anon_scan: HighMem, age 7, count[age] 47772 -> 21289, direct 40868, dj 103 active_anon_scan: HighMem, age 3, count[age] 91007 -> 329, direct 45805, dj 1212 The kvm_stat data for this time period is attached due to line lengths. Also, I forgot to mention this before, but there is a bug in the kscand code in the RHEL3U8 kernel. When it scans the cache list it uses the count from the anonymous list: if (need_active_cache_scan(zone)) { for (age = MAX_AGE-1; age >= 0; age--) { scan_active_list(zone, age, &zone->active_cache_list[age], zone->active_anon_count[age]); ^^^^^^^^^^^^^^^^^ if (current->need_resched) schedule(); } } When the anonymous count is higher it is scanning the cache list repeatedly. An example of that was captured here: active_cache_scan: HighMem, age 7, count[age] 222 -> 179, count anon 111967, direct 626, dj 3 count anon is active_anon_count[age] which at this moment was 111,967. There were only 222 entries in the cache list, but the count value passed to scan_active_list was 111,967. When the cache list has a lot of direct pages, that causes a larger hit on kvm than needed. That said, I have to live with the bug in the guest. david --------------010809060909000601060204 Content-Type: text/plain; name="kvm_stat.kscand" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="kvm_stat.kscand" kvm-69/kvm_stat -f 'mmu*|pf*' -l: mmio_exit mmu_cache mmu_flood mmu_pde_z mmu_pte_u mmu_pte_w mmu_recyc mmu_shado pf_fixed pf_guest 182 18 18 0 5664 5682 0 18 5720 21 211 59 59 0 7040 7105 0 59 7348 99 81 0 48 0 45861 45909 0 48 45910 1 209 683 814 0 178527 179405 0 814 181410 9 67 111 320 0 175602 175922 0 320 177202 0 28 0 29 0 181365 181394 0 29 181394 0 7 0 22 0 181834 181856 0 22 181855 0 35 0 14 0 180129 180143 0 14 180144 0 7 0 10 0 179141 179151 0 10 179150 0 35 0 3 0 181359 181361 0 3 181362 0 7 0 4 0 181565 181570 0 4 181570 0 21 0 3 0 181435 181437 0 3 181437 0 21 0 4 0 181281 181286 0 4 181285 0 21 0 3 0 179444 179447 0 3 179448 0 91 0 61 0 179841 179902 0 61 179902 0 7 0 247 0 176628 176875 0 247 176874 0 313 478 133 1 100486 100604 0 133 126690 80 162 21 18 0 6361 6379 0 18 6584 5 294 40 23 21 9144 9188 0 25 9544 45 143 5 1 0 5026 5027 0 1 5502 1 The above corresponds to the following from the guest: active_anon_scan: HighMem, age 11, count[age] 24886 -> 5796, direct 24845, dj 59 active_anon_scan: HighMem, age 7, count[age] 47772 -> 21289, direct 40868, dj 103 active_anon_scan: HighMem, age 3, count[age] 91007 -> 329, direct 45805, dj 1212 --------------010809060909000601060204--