From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David S. Ahern" Subject: Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3) Date: Thu, 22 May 2008 16:08:53 -0600 Message-ID: <4835EEF5.9010600@cisco.com> References: <48054518.3000104@cisco.com> <4805BCF1.6040605@qumranet.com> <4807BD53.6020304@cisco.com> <48085485.3090205@qumranet.com> <480C188F.3020101@cisco.com> <480C5C39.4040300@qumranet.com> <480E492B.3060500@cisco.com> <480EEDA0.3080209@qumranet.com> <480F546C.2030608@cisco.com> <481215DE.3000302@cisco.com> <20080428181550.GA3965@dmt> <4816617F.3080403@cisco.com> <4817F30C.6050308@cisco.com> <48184228.2020701@qumranet.com> <481876A9.1010806@cisco.com> <48187903.2070409@qumranet.com> <4826E744.1080107@qumranet.com> <4826F668.6030305@qumranet.com> <48290FC2.4070505@cisco.com> <48294272.5020801@qumranet.com> <482B4D29.7010202@cisco.com> <482C1633.5070302@qumranet.com> <482E5F9C.6000207@cisco.com> <482FCEE1.5040306@qumranet.com> <4830F90A.1020809@cisco.com> <4830FE8D.6010006@cisco.com> <4 8318E64.8090706@qumranet.com> <4832DDEB.4000100@qumranet.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------090400040701000009020707" Cc: kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from sj-iport-2.cisco.com ([171.71.176.71]:23675 "EHLO sj-iport-2.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758837AbYEVWJo (ORCPT ); Thu, 22 May 2008 18:09:44 -0400 In-Reply-To: <4832DDEB.4000100@qumranet.com> Sender: kvm-owner@vger.kernel.org List-ID: This is a multi-part message in MIME format. --------------090400040701000009020707 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit The short answer is that I am still see large system time hiccups in the guests due to kscand in the guest scanning its active lists. I do see better response for a KVM_MAX_PTE_HISTORY of 3 than with 4. (For completeness I also tried a history of 2, but it performed worse than 3 which is no surprise given the meaning of it.) I have been able to scratch out a simplistic program that stimulates kscand activity similar to what is going on in my real guest (see attached). The program requests a memory allocation, initializes it (to get it backed) and then in a loop sweeps through the memory in chunks similar to a program using parts of its memory here and there but eventually accessing all of it. Start the RHEL3/CentOS 3 guest with *2GB* of RAM (or more). The key is using a fair amount of highmem. Start a couple of instances of the attached. For example, I've been using these 2: memuser 768M 120 5 300 memuser 384M 300 10 600 Together these instances take up a 1GB of RAM and once initialized consume very little CPU. On kvm they make kscand and kswapd go nuts every 5-15 minutes. For comparison, I do not see the same behavior for an identical setup running on esx 3.5. david Avi Kivity wrote: > Avi Kivity wrote: >> >> There are (at least) three options available: >> - detect and special-case this scenario >> - change the flood detector to be per page table instead of per vcpu >> - change the flood detector to look at a list of recently used page >> tables instead of the last page table >> >> I'm having a hard time trying to pick between the second and third >> options. >> > > The answer turns out to be "yes", so here's a patch that adds a pte > access history table for each shadowed guest page-table. Let me know if > it helps. Benchmarking a variety of workloads on all guests supported > by kvm is left as an exercise for the reader, but I suspect the patch > will either improve things all around, or can be modified to do so. > --------------090400040701000009020707 Content-Type: text/x-csrc; name="memuser.c" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="memuser.c" /* simple program to malloc memory, inialize it, and * then repetitively use it to keep it active. */ #include #include #include #include #include #include #include /* goal is to sweep memory every T1 sec by accessing a * percentage at a time and sleeping T2 sec in between accesses. * Once all the memory has been accessed, sleep for T3 sec * before starting the cycle over. */ #define T1 180 #define T2 5 #define T3 300 const char *timestamp(void); void usage(const char *prog) { fprintf(stderr, "\nusage: %s memlen{M|K}) [t1 t2 t3]\n", prog); } int main(int argc, char *argv[]) { int len; char *endp; int factor, endp_len; int start, incr; int t1 = T1, t2 = T2, t3 = T3; char *mem; char c = 0; if (argc < 2) { usage(basename(argv[0])); return 1; } /* * determine memory to request */ len = (int) strtol(argv[1], &endp, 0); factor = 1; endp_len = strlen(endp); if ((endp_len == 1) && ((*endp == 'M') || (*endp == 'm'))) factor = 1024 * 1024; else if ((endp_len == 1) && ((*endp == 'K') || (*endp == 'k'))) factor = 1024; else if (endp_len) { fprintf(stderr, "invalid memory len.\n"); return 1; } len *= factor; if (len == 0) { fprintf(stdout, "memory len is 0.\n"); return 1; } /* * convert times if given */ if (argc > 2) { if (argc < 5) { usage(basename(argv[0])); return 1; } t1 = atoi(argv[2]); t2 = atoi(argv[3]); t3 = atoi(argv[4]); } /* * amount of memory to sweep at one time */ if (t1 && t2) incr = len / t1 * t2; else incr = len; mem = (char *) malloc(len); if (mem == NULL) { fprintf(stderr, "malloc failed\n"); return 1; } printf("memory allocated. initializing to 0\n"); memset(mem, 0, len); start = 0; printf("%s starting memory update.\n", timestamp()); while (1) { c++; if (c == 0x7f) c = 0; memset(mem + start, c, incr); start += incr; if ((start >= len) || ((start + incr) >= len)) { printf("%s scan complete. sleeping %d\n", timestamp(), t3); start = 0; sleep(t3); printf("%s starting memory update.\n", timestamp()); } else if (t2) sleep(t2); } return 0; } const char *timestamp(void) { static char date[64]; struct timeval now; struct tm ltime; memset(date, 0, sizeof(date)); if (gettimeofday(&now, NULL) == 0) { if (localtime_r(&now.tv_sec, <ime)) strftime(date, sizeof(date), "%m/%d %H:%M:%S", <ime); } if (strlen(date) == 0) strcpy(date, "unknown"); return date; } --------------090400040701000009020707--