From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?THVrw6HFoSBEb2t0b3I=?= Subject: Re: [Autotest] [KVM-AUTOTEST] KSM-overcommit test v.2 (python version) Date: Thu, 26 Nov 2009 11:11:09 +0100 Message-ID: <4B0E543D.6050807@redhat.com> References: <2093284271.132771258469355013.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> <4B09618E.8010504@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Jiri Zupka , autotest , kvm To: dlaor@redhat.com Return-path: Received: from mx1.redhat.com ([209.132.183.28]:61829 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755027AbZKZKLH (ORCPT ); Thu, 26 Nov 2009 05:11:07 -0500 In-Reply-To: <4B09618E.8010504@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Hello Dor, Thank you for your review. I have few questions about your comments: ----------- snip ----------- >> + stat +=3D "Guests memsh =3D {" >> + for vm in lvms: >> + if vm.is_dead(): >> + logging.info("Trying to get informations of death VM: %s" >> + % vm.name) >> + continue > > You can fail the entire test. Afterwards it will be hard to find the = issue. > Well if it's what the community wants, we can change it. We just didn't= =20 want to lose information about the rest of the systems. Perhaps we can=20 set some DIE flag and after collecting all statistics raise an Error. ----------- snip ----------- >> + def get_true_pid(vm): >> + pid =3D vm.process.get_pid() >> + for i in range(1,10): >> + pid =3D pid + 1 > > What are you trying to do here? It's seems like a nasty hack that mig= ht > fail on load. > Yes and I'm really sorry for this ugly hack. The qemu command has=20 changed since the first patch was made. Nowadays the "vm.pid" returns=20 PID of the command itself, not the actual qemu process. We need to have the PID of the actual qemu process, which is executed b= y=20 the command with PID "vm.pid". That's why first I try finding the qemu=20 process as the following "vm.pid" PID. I haven't found another solution= =20 yet (in case we don't want to change the qemu command back in the=20 framework). We have tested this solution under heavy process load and either first=20 or second part always finds the right value. ----------- snip ----------- >> + if (params['ksm_test_size'] =3D=3D "paralel") : >> + vmsc =3D 1 >> + overcommit =3D 1 >> + mem =3D host_mem >> + # 32bit system adjustment >> + if not params['image_name'].endswith("64"): >> + logging.debug("Probably i386 guest architecture, "\ >> + "max allocator mem =3D 2G") > > Better not to relay on the guest name. You can test percentage of the > guest mem. > What do you mean by "percentage of the guest mem"? This adjustment is=20 made because the maximum memory for 1 process in 32 bit OS is 2GB. Testing of the 'image_name' showed to be most reliable method we found. ----------- snip ----------- >> + # Guest can have more than 2G but kvm mem + 1MB (allocator itself) >> + # can't >> + if (host_mem> 2048): >> + mem =3D 2047 >> + >> + >> + if os.popen("uname -i").readline().startswith("i386"): >> + logging.debug("Host is i386 architecture, max guest mem is 2G") > > There are bigger 32 bit guests. > How do you mean this note? We are testing whether the host machine is 3= 2=20 bit. If so, the maximum process allocation is 2GB (similar case to 32=20 bit guest) but this time the whole qemu process (2GB qemu machine + 64=20 MB qemu overhead) can't exceeded 2GB. Still the maximum memory used in test is the same (as we increase the V= M=20 count - host_mem =3D quest_mem * vm_count; quest_mem is decreased,=20 vm_count is increased) ----------- snip ----------- >> + >> + # Copy the allocator.c into guests > > .py > yes indeed. ----------- snip ----------- >> + # Let kksmd works (until shared mem rich expected value) >> + shm =3D 0 >> + i =3D 0 >> + cmd =3D "cat/proc/%d/statm" % get_true_pid(vm) >> + while shm< ksm_size: >> + if i> 64: >> + logging.info(get_stat(lvms)) >> + raise error.TestError("SHM didn't merged the memory until "\ >> + "the DL on guest: %s"% (vm.name)) >> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio)) >> + time.sleep(ksm_size / 200 * perf_ratio) >> + try: >> + shm =3D int(os.popen(cmd).readline().split()[2]) >> + shm =3D shm * 4 / 1024 >> + i =3D i + 1 > > Either you have nice statistic calculation function or not. > I vote for the first case. > Yes, we are using the statistics function for the output. But in this=20 case we just need to know the shm value, not to log anything. If this is a big problem even for others, we can split the statistics=20 function into 2: int =3D _get_stat(vm) - returns shm value string =3D get_stat(vm) - Uses _get_stats and creates a nice log output ----------- snip ----------- >> + """ Check if memory in max loading guest is allright""" >> + logging.info("Starting phase 3b") >> + >> + """ Kill rest of machine""" > > We should have a function for it for all kvm autotest > you think lsessions[i].close() instead of (status,data) =3D=20 lsessions[i].get_command_status_output("exit;",20)? Yes, it would be better. >> + for i in range(last_vm+1, vmsc): >> + (status,data) =3D lsessions[i].get_command_status_output("exit;",2= 0) >> + if i =3D=3D (vmsc-1): >> + logging.info(get_stat([lvms[i]])) >> + lvms[i].destroy(gracefully =3D False) ----------- snip ----------- >> + def phase_paralel(): >> + """ Paralel page spliting """ >> + logging.info("Phase 1: Paralel page spliting") >> + # We have to wait until allocator is finished (it waits 5 seconds = to >> + # clean the socket >> + > > The whole function is very similar to phase_separate_first_guest plea= se > refactor them. > Yes, those functions are a bit similar. On the other hand there are som= e=20 differences. Instead of creating more complex function we agreed to=20 split them for better readability of the code. ----------- snip ----------- >> + while shm< ksm_size: >> + if i> 64: >> + logging.info(get_stat(lvms)) >> + raise error.TestError("SHM didn't merged the memory until DL") >> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio)) >> + time.sleep(ksm_size / 200 * perf_ratio) >> + try: >> + shm =3D int(os.popen(cmd).readline().split()[2]) >> + shm =3D shm * 4 / 1024 >> + except: >> + raise error.TestError("Could not fetch shmem info from proc") > > Didn't you needed to increase i? > yes, you are right. This line somehow disappeard "i =3D i + 1"... ----------- snip ----------- >> + def compare_page(self,original,inmem): >> + """ >> + compare memory >> + > > > Why do you need it? Is it to really check ksm didn't do damage? > Interesting, I never doubted ksm for that. Actually it is good idea t= o > test... > We were asked to do so (be paranoid, everything could happened). We can= =20 make this voluntary in the config. Once again thanks, I'm looking forward to your replay. Best regards, Luk=C3=A1=C5=A1 Doktor