From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?THVrw6HFoSBEb2t0b3I=?= Subject: Re: [Autotest] [KVM-AUTOTEST] KSM-overcommit test v.2 (python version) Date: Tue, 01 Dec 2009 13:23:23 +0100 Message-ID: <4B150ABB.5030600@redhat.com> References: <2093284271.132771258469355013.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> <4B09618E.8010504@redhat.com> <4B0E543D.6050807@redhat.com> <4B129E8A.5070402@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Jiri Zupka , autotest , kvm To: dlaor@redhat.com Return-path: Received: from mx1.redhat.com ([209.132.183.28]:11761 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752389AbZLAMXW (ORCPT ); Tue, 1 Dec 2009 07:23:22 -0500 In-Reply-To: <4B129E8A.5070402@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Dne 29.11.2009 17:17, Dor Laor napsal(a): > On 11/26/2009 12:11 PM, Luk=C3=A1=C5=A1 Doktor wrote: >> Hello Dor, >> >> Thank you for your review. I have few questions about your comments: >> >> ----------- snip ----------- >>>> + stat +=3D "Guests memsh =3D {" >>>> + for vm in lvms: >>>> + if vm.is_dead(): >>>> + logging.info("Trying to get informations of death VM: %s" >>>> + % vm.name) >>>> + continue >>> >>> You can fail the entire test. Afterwards it will be hard to find th= e >>> issue. >>> >> >> Well if it's what the community wants, we can change it. We just did= n't >> want to lose information about the rest of the systems. Perhaps we c= an >> set some DIE flag and after collecting all statistics raise an Error= =2E > > I don't think we need to continue testing if some thing as basic as V= M > died upon us. OK, we are going to change this. > >> >> ----------- snip ----------- >>>> + def get_true_pid(vm): >>>> + pid =3D vm.process.get_pid() >>>> + for i in range(1,10): >>>> + pid =3D pid + 1 >>> >>> What are you trying to do here? It's seems like a nasty hack that m= ight >>> fail on load. >>> > > > qemu has -pifile option. It works fine. > Oh my, I haven't thought on this. Of course I'm going to use -pidfile=20 instead of this silly thing... >> >> Yes and I'm really sorry for this ugly hack. The qemu command has >> changed since the first patch was made. Nowadays the "vm.pid" return= s >> PID of the command itself, not the actual qemu process. >> We need to have the PID of the actual qemu process, which is execute= d by >> the command with PID "vm.pid". That's why first I try finding the qe= mu >> process as the following "vm.pid" PID. I haven't found another solut= ion >> yet (in case we don't want to change the qemu command back in the >> framework). >> We have tested this solution under heavy process load and either fir= st >> or second part always finds the right value. >> >> ----------- snip ----------- >>>> + if (params['ksm_test_size'] =3D=3D "paralel") : >>>> + vmsc =3D 1 >>>> + overcommit =3D 1 >>>> + mem =3D host_mem >>>> + # 32bit system adjustment >>>> + if not params['image_name'].endswith("64"): >>>> + logging.debug("Probably i386 guest architecture, "\ >>>> + "max allocator mem =3D 2G") >>> >>> Better not to relay on the guest name. You can test percentage of t= he >>> guest mem. >>> >> >> What do you mean by "percentage of the guest mem"? This adjustment i= s >> made because the maximum memory for 1 process in 32 bit OS is 2GB. >> Testing of the 'image_name' showed to be most reliable method we fou= nd. > > > It's not that important but it should be a convention of kvm autotest= =2E > If that's acceptable, fine, otherwise, each VM will define it in the > config file > Yes kvm-autotest definitely need a way to decide whether this is 32 or=20 64 bit guest. I'll send a separate email to KVM-autotest mailing list t= o=20 let others express their opinions. >> >> ----------- snip ----------- >>>> + # Guest can have more than 2G but kvm mem + 1MB (allocator itsel= f) >>>> + # can't >>>> + if (host_mem> 2048): >>>> + mem =3D 2047 >>>> + >>>> + >>>> + if os.popen("uname -i").readline().startswith("i386"): >>>> + logging.debug("Host is i386 architecture, max guest mem is 2G") >>> >>> There are bigger 32 bit guests. >>> >> How do you mean this note? We are testing whether the host machine i= s 32 >> bit. If so, the maximum process allocation is 2GB (similar case to 3= 2 >> bit guest) but this time the whole qemu process (2GB qemu machine + = 64 >> MB qemu overhead) can't exceeded 2GB. >> Still the maximum memory used in test is the same (as we increase th= e VM >> count - host_mem =3D quest_mem * vm_count; quest_mem is decreased, >> vm_count is increased) > > i386 guests with PAE mode (additional 4 bits) can have up to 16G ram = on > theory. > OK so we should first check whether PAE is on and separate into 3 group= s=20 (64bit-unlimited, PAE-16G, 32bit-2G). >> >> ----------- snip ----------- >>>> + >>>> + # Copy the allocator.c into guests >>> >>> .py >>> >> yes indeed. >> >> ----------- snip ----------- >>>> + # Let kksmd works (until shared mem rich expected value) >>>> + shm =3D 0 >>>> + i =3D 0 >>>> + cmd =3D "cat/proc/%d/statm" % get_true_pid(vm) >>>> + while shm< ksm_size: >>>> + if i> 64: >>>> + logging.info(get_stat(lvms)) >>>> + raise error.TestError("SHM didn't merged the memory until "\ >>>> + "the DL on guest: %s"% (vm.name)) >>>> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio)) >>>> + time.sleep(ksm_size / 200 * perf_ratio) >>>> + try: >>>> + shm =3D int(os.popen(cmd).readline().split()[2]) >>>> + shm =3D shm * 4 / 1024 >>>> + i =3D i + 1 >>> >>> Either you have nice statistic calculation function or not. >>> I vote for the first case. >>> >> >> Yes, we are using the statistics function for the output. But in thi= s >> case we just need to know the shm value, not to log anything. >> If this is a big problem even for others, we can split the statistic= s >> function into 2: >> int =3D _get_stat(vm) - returns shm value >> string =3D get_stat(vm) - Uses _get_stats and creates a nice log out= put >> >> ----------- snip ----------- >>>> + """ Check if memory in max loading guest is allright""" >>>> + logging.info("Starting phase 3b") >>>> + >>>> + """ Kill rest of machine""" >>> >>> We should have a function for it for all kvm autotest >>> >> >> you think lsessions[i].close() instead of (status,data) =3D >> lsessions[i].get_command_status_output("exit;",20)? >> Yes, it would be better. >> >>>> + for i in range(last_vm+1, vmsc): >>>> + (status,data) =3D lsessions[i].get_command_status_output("exit;"= ,20) >>>> + if i =3D=3D (vmsc-1): >>>> + logging.info(get_stat([lvms[i]])) >>>> + lvms[i].destroy(gracefully =3D False) >> >> ----------- snip ----------- >>>> + def phase_paralel(): >>>> + """ Paralel page spliting """ >>>> + logging.info("Phase 1: Paralel page spliting") >>>> + # We have to wait until allocator is finished (it waits 5 second= s to >>>> + # clean the socket >>>> + >>> >>> The whole function is very similar to phase_separate_first_guest pl= ease >>> refactor them. >>> >> Yes, those functions are a bit similar. On the other hand there are = some >> differences. Instead of creating more complex function we agreed to >> split them for better readability of the code. > > The trouble is that re-implementing it make the code longer and more > error prone > OK, we'll create only one function. >> >> ----------- snip ----------- >>>> + while shm< ksm_size: >>>> + if i> 64: >>>> + logging.info(get_stat(lvms)) >>>> + raise error.TestError("SHM didn't merged the memory until DL") >>>> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio)) >>>> + time.sleep(ksm_size / 200 * perf_ratio) >>>> + try: >>>> + shm =3D int(os.popen(cmd).readline().split()[2]) >>>> + shm =3D shm * 4 / 1024 >>>> + except: >>>> + raise error.TestError("Could not fetch shmem info from proc") >>> >>> Didn't you needed to increase i? >>> >> yes, you are right. This line somehow disappeard "i =3D i + 1"... > > Does it pass all the tests? > Not incrementing the "i" could lead to infinite loop but only when KSM=20 fails. That's why we haven't found the mistake by testing... Well fix=20 this altogether with other modifications. >> >> ----------- snip ----------- >>>> + def compare_page(self,original,inmem): >>>> + """ >>>> + compare memory >>>> + >>> >>> >>> Why do you need it? Is it to really check ksm didn't do damage? >>> Interesting, I never doubted ksm for that. Actually it is good idea= to >>> test... >>> >> >> We were asked to do so (be paranoid, everything could happened). We = can >> make this voluntary in the config. > > I guess it just add sanity, better leave it on always, I guess it's n= ot > that time consuming. > +1 (Actually using python this is kind of time consuming. But compare to=20 the whole test duration it's nothing.) >> >> >> Once again thanks, I'm looking forward to your replay. >> >> Best regards, >> Luk=C3=A1=C5=A1 Doktor >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >