From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dor Laor Subject: Re: [Autotest] [KVM-AUTOTEST] KSM-overcommit test v.2 (python version) Date: Sun, 29 Nov 2009 18:17:14 +0200 Message-ID: <4B129E8A.5070402@redhat.com> References: <2093284271.132771258469355013.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> <4B09618E.8010504@redhat.com> <4B0E543D.6050807@redhat.com> Reply-To: dlaor@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Jiri Zupka , autotest , kvm To: =?UTF-8?B?THVrw6HFoSBEb2t0b3I=?= Return-path: Received: from mx1.redhat.com ([209.132.183.28]:63536 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750918AbZK2QQ5 (ORCPT ); Sun, 29 Nov 2009 11:16:57 -0500 In-Reply-To: <4B0E543D.6050807@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 11/26/2009 12:11 PM, Luk=C3=A1=C5=A1 Doktor wrote: > Hello Dor, > > Thank you for your review. I have few questions about your comments: > > ----------- snip ----------- >>> + stat +=3D "Guests memsh =3D {" >>> + for vm in lvms: >>> + if vm.is_dead(): >>> + logging.info("Trying to get informations of death VM: %s" >>> + % vm.name) >>> + continue >> >> You can fail the entire test. Afterwards it will be hard to find the >> issue. >> > > Well if it's what the community wants, we can change it. We just didn= 't > want to lose information about the rest of the systems. Perhaps we ca= n > set some DIE flag and after collecting all statistics raise an Error. I don't think we need to continue testing if some thing as basic as VM=20 died upon us. > > ----------- snip ----------- >>> + def get_true_pid(vm): >>> + pid =3D vm.process.get_pid() >>> + for i in range(1,10): >>> + pid =3D pid + 1 >> >> What are you trying to do here? It's seems like a nasty hack that mi= ght >> fail on load. >> qemu has -pifile option. It works fine. > > Yes and I'm really sorry for this ugly hack. The qemu command has > changed since the first patch was made. Nowadays the "vm.pid" returns > PID of the command itself, not the actual qemu process. > We need to have the PID of the actual qemu process, which is executed= by > the command with PID "vm.pid". That's why first I try finding the qem= u > process as the following "vm.pid" PID. I haven't found another soluti= on > yet (in case we don't want to change the qemu command back in the > framework). > We have tested this solution under heavy process load and either firs= t > or second part always finds the right value. > > ----------- snip ----------- >>> + if (params['ksm_test_size'] =3D=3D "paralel") : >>> + vmsc =3D 1 >>> + overcommit =3D 1 >>> + mem =3D host_mem >>> + # 32bit system adjustment >>> + if not params['image_name'].endswith("64"): >>> + logging.debug("Probably i386 guest architecture, "\ >>> + "max allocator mem =3D 2G") >> >> Better not to relay on the guest name. You can test percentage of th= e >> guest mem. >> > > What do you mean by "percentage of the guest mem"? This adjustment is > made because the maximum memory for 1 process in 32 bit OS is 2GB. > Testing of the 'image_name' showed to be most reliable method we foun= d. It's not that important but it should be a convention of kvm autotest. If that's acceptable, fine, otherwise, each VM will define it in the=20 config file > > ----------- snip ----------- >>> + # Guest can have more than 2G but kvm mem + 1MB (allocator itself= ) >>> + # can't >>> + if (host_mem> 2048): >>> + mem =3D 2047 >>> + >>> + >>> + if os.popen("uname -i").readline().startswith("i386"): >>> + logging.debug("Host is i386 architecture, max guest mem is 2G") >> >> There are bigger 32 bit guests. >> > How do you mean this note? We are testing whether the host machine is= 32 > bit. If so, the maximum process allocation is 2GB (similar case to 32 > bit guest) but this time the whole qemu process (2GB qemu machine + 6= 4 > MB qemu overhead) can't exceeded 2GB. > Still the maximum memory used in test is the same (as we increase the= VM > count - host_mem =3D quest_mem * vm_count; quest_mem is decreased, > vm_count is increased) i386 guests with PAE mode (additional 4 bits) can have up to 16G ram on= =20 theory. > > ----------- snip ----------- >>> + >>> + # Copy the allocator.c into guests >> >> .py >> > yes indeed. > > ----------- snip ----------- >>> + # Let kksmd works (until shared mem rich expected value) >>> + shm =3D 0 >>> + i =3D 0 >>> + cmd =3D "cat/proc/%d/statm" % get_true_pid(vm) >>> + while shm< ksm_size: >>> + if i> 64: >>> + logging.info(get_stat(lvms)) >>> + raise error.TestError("SHM didn't merged the memory until "\ >>> + "the DL on guest: %s"% (vm.name)) >>> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio)) >>> + time.sleep(ksm_size / 200 * perf_ratio) >>> + try: >>> + shm =3D int(os.popen(cmd).readline().split()[2]) >>> + shm =3D shm * 4 / 1024 >>> + i =3D i + 1 >> >> Either you have nice statistic calculation function or not. >> I vote for the first case. >> > > Yes, we are using the statistics function for the output. But in this > case we just need to know the shm value, not to log anything. > If this is a big problem even for others, we can split the statistics > function into 2: > int =3D _get_stat(vm) - returns shm value > string =3D get_stat(vm) - Uses _get_stats and creates a nice log outp= ut > > ----------- snip ----------- >>> + """ Check if memory in max loading guest is allright""" >>> + logging.info("Starting phase 3b") >>> + >>> + """ Kill rest of machine""" >> >> We should have a function for it for all kvm autotest >> > > you think lsessions[i].close() instead of (status,data) =3D > lsessions[i].get_command_status_output("exit;",20)? > Yes, it would be better. > >>> + for i in range(last_vm+1, vmsc): >>> + (status,data) =3D lsessions[i].get_command_status_output("exit;",= 20) >>> + if i =3D=3D (vmsc-1): >>> + logging.info(get_stat([lvms[i]])) >>> + lvms[i].destroy(gracefully =3D False) > > ----------- snip ----------- >>> + def phase_paralel(): >>> + """ Paralel page spliting """ >>> + logging.info("Phase 1: Paralel page spliting") >>> + # We have to wait until allocator is finished (it waits 5 seconds= to >>> + # clean the socket >>> + >> >> The whole function is very similar to phase_separate_first_guest ple= ase >> refactor them. >> > Yes, those functions are a bit similar. On the other hand there are s= ome > differences. Instead of creating more complex function we agreed to > split them for better readability of the code. The trouble is that re-implementing it make the code longer and more=20 error prone > > ----------- snip ----------- >>> + while shm< ksm_size: >>> + if i> 64: >>> + logging.info(get_stat(lvms)) >>> + raise error.TestError("SHM didn't merged the memory until DL") >>> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio)) >>> + time.sleep(ksm_size / 200 * perf_ratio) >>> + try: >>> + shm =3D int(os.popen(cmd).readline().split()[2]) >>> + shm =3D shm * 4 / 1024 >>> + except: >>> + raise error.TestError("Could not fetch shmem info from proc") >> >> Didn't you needed to increase i? >> > yes, you are right. This line somehow disappeard "i =3D i + 1"... Does it pass all the tests? > > ----------- snip ----------- >>> + def compare_page(self,original,inmem): >>> + """ >>> + compare memory >>> + >> >> >> Why do you need it? Is it to really check ksm didn't do damage? >> Interesting, I never doubted ksm for that. Actually it is good idea = to >> test... >> > > We were asked to do so (be paranoid, everything could happened). We c= an > make this voluntary in the config. I guess it just add sanity, better leave it on always, I guess it's not= =20 that time consuming. > > > Once again thanks, I'm looking forward to your replay. > > Best regards, > Luk=C3=A1=C5=A1 Doktor > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html