From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3) Date: Sat, 31 May 2008 11:16:10 +0300 Message-ID: <4841094A.8090507@qumranet.com> References: <48054518.3000104@cisco.com> <4805BCF1.6040605@qumranet.com> <4807BD53.6020304@cisco.com> <48085485.3090205@qumranet.com> <480C188F.3020101@cisco.com> <480C5C39.4040300@qumranet.com> <480E492B.3060500@cisco.com> <480EEDA0.3080209@qumranet.com> <480F546C.2030608@cisco.com> <481215DE.3000302@cisco.com> <20080428181550.GA3965@dmt> <4816617F.3080403@cisco.com> <4817F30C.6050308@cisco.com> <48184228.2020701@qumranet.com> <481876A9.1010806@cisco.com> <48187903.2070409@qumranet.com> <4826E744.1080107@qumranet.com> <4826F668.6030305@qumranet.com> <48290FC2.4070505@cisco.com> <48294272.5020801@qumranet.com> <482B4D29.7010202@cisco.com> <482C1633.5070302@qumranet.com> <482E5F9C.6000207@cisco.com> <482FCEE1.5040306@qumranet.com> <4830F90A.1020809@cisco.com> <4830FE8D.6010006@cisco.com> <4 8318E64.8090706@qumranet.com> <4832DDEB.4000100@qumranet.com> <4835EEF5.9010600@cisco.com> <483D391F.7050007@qumranet.com> <483EDCEE.6070307@cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: "David S. Ahern" Return-path: Received: from bzq-179-150-194.static.bezeqint.net ([212.179.150.194]:32418 "EHLO il.qumranet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751290AbYEaIQK (ORCPT ); Sat, 31 May 2008 04:16:10 -0400 In-Reply-To: <483EDCEE.6070307@cisco.com> Sender: kvm-owner@vger.kernel.org List-ID: David S. Ahern wrote: >> I haven't been able to reproduce this: >> >> >>> [root@localhost root]# ps -elf | grep -E 'memuser|kscand' >>> 1 S root 7 1 1 75 0 - 0 schedu 10:07 ? >>> 00:00:26 [kscand] >>> 0 S root 1464 1 1 75 0 - 196986 schedu 10:20 pts/0 >>> 00:00:21 ./memuser 768M 120 5 300 >>> 0 S root 1465 1 0 75 0 - 98683 schedu 10:20 pts/0 >>> 00:00:10 ./memuser 384M 300 10 600 >>> 0 S root 2148 1293 0 75 0 - 922 pipe_w 10:48 pts/0 >>> 00:00:00 grep -E memuser|kscand >>> >> The workload has been running for about half an hour, and kswapd cpu >> usage doesn't seem significant. This is a 2GB guest running with my >> patch ported to kvm.git HEAD. Guest is has 2G of memory. >> >> > > I'm running on the per-page-pte-tracking branch, and I am still seeing it. > > I doubt you want to sit and watch the screen for an hour, so install sysstat if not already, change the sample rate to 1 minute (/etc/cron.d/sysstat), let the server run for a few hours and then run 'sar -u'. You'll see something like this: > > 10:12:11 AM LINUX RESTART > > 10:13:03 AM CPU %user %nice %system %iowait %idle > 10:14:01 AM all 0.08 0.00 2.08 0.35 97.49 > 10:15:03 AM all 0.05 0.00 0.79 0.04 99.12 > 10:15:59 AM all 0.15 0.00 1.52 0.06 98.27 > 10:17:01 AM all 0.04 0.00 0.69 0.04 99.23 > 10:17:59 AM all 0.01 0.00 0.39 0.00 99.60 > 10:18:59 AM all 0.00 0.00 0.12 0.02 99.87 > 10:20:02 AM all 0.18 0.00 14.62 0.09 85.10 > 10:21:01 AM all 0.71 0.00 26.35 0.01 72.94 > 10:22:02 AM all 0.67 0.00 10.61 0.00 88.72 > 10:22:59 AM all 0.14 0.00 1.80 0.00 98.06 > 10:24:03 AM all 0.13 0.00 0.50 0.00 99.37 > 10:24:59 AM all 0.09 0.00 11.46 0.00 88.45 > 10:26:03 AM all 0.16 0.00 0.69 0.03 99.12 > 10:26:59 AM all 0.14 0.00 10.01 0.02 89.83 > 10:28:03 AM all 0.57 0.00 2.20 0.03 97.20 > Average: all 0.21 0.00 5.55 0.05 94.20 > > > every one of those jumps in %system time directly correlates to kscand activity. Without the memuser programs running the guest %system time is <1%. The point of this silly memuser program is just to use high memory -- let it age, then make it active again, sit idle, repeat. If you run kvm_stat with -l in the host you'll see the jump in pte writes/updates. An intern here added a timestamp to the kvm_stat output for me which helps to directly correlate guest/host data. > > > I also ran my real guest on the branch. Performance at boot through the first 15 minutes was much better, but I'm still seeing recurring hits every 5 minutes when kscand kicks in. Here's the data from the guest for the first one which happened after 15 minutes of uptime: > > active_anon_scan: HighMem, age 11, count[age] 24886 -> 5796, direct 24845, dj 59 > > active_anon_scan: HighMem, age 7, count[age] 47772 -> 21289, direct 40868, dj 103 > > active_anon_scan: HighMem, age 3, count[age] 91007 -> 329, direct 45805, dj 1212 > > We touched 90,000 ptes in 12 seconds. That's 8,000 ptes per second. Yet we see 180,000 page faults per second in the trace. Oh! Only 45K pages were direct, so the other 45K were shared, with perhaps many ptes. We shoud count ptes, not pages. Can you modify page_referenced() to count the numbers of ptes mapped (1 for direct pages, nr_chains for indirect pages) and print the total deltas in active_anon_scan? > The kvm_stat data for this time period is attached due to line lengths. > > > Also, I forgot to mention this before, but there is a bug in the kscand code in the RHEL3U8 kernel. When it scans the cache list it uses the count from the anonymous list: > > if (need_active_cache_scan(zone)) { > for (age = MAX_AGE-1; age >= 0; age--) { > scan_active_list(zone, age, > &zone->active_cache_list[age], > zone->active_anon_count[age]); > ^^^^^^^^^^^^^^^^^ > if (current->need_resched) > schedule(); > } > } > > When the anonymous count is higher it is scanning the cache list repeatedly. An example of that was captured here: > > active_cache_scan: HighMem, age 7, count[age] 222 -> 179, count anon 111967, direct 626, dj 3 > > count anon is active_anon_count[age] which at this moment was 111,967. There were only 222 entries in the cache list, but the count value passed to scan_active_list was 111,967. When the cache list has a lot of direct pages, that causes a larger hit on kvm than needed. That said, I have to live with the bug in the guest. > For debugging, can you fix it? It certainly has a large impact. Perhaps it is fixed in an update kernel. There's a 2.4.21-50.EL in the centos 3.8 update repos. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.