From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dor Laor <dlaor@redhat.com>
Subject: Re: [Autotest] [KVM-AUTOTEST] KSM-overcommit test v.2 (python version)
Date: Sun, 29 Nov 2009 18:17:14 +0200
Message-ID: <4B129E8A.5070402@redhat.com>
References: <2093284271.132771258469355013.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> <4B09618E.8010504@redhat.com> <4B0E543D.6050807@redhat.com>
Reply-To: dlaor@redhat.com
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Jiri Zupka <jzupka@redhat.com>,
	autotest <autotest@test.kernel.org>, kvm <kvm@vger.kernel.org>
To: =?UTF-8?B?THVrw6HFoSBEb2t0b3I=?= <ldoktor@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:63536 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750918AbZK2QQ5 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Sun, 29 Nov 2009 11:16:57 -0500
In-Reply-To: <4B0E543D.6050807@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 11/26/2009 12:11 PM, Luk=C3=A1=C5=A1 Doktor wrote:
> Hello Dor,
>
> Thank you for your review. I have few questions about your comments:
>
> ----------- snip -----------
>>> + stat +=3D "Guests memsh =3D {"
>>> + for vm in lvms:
>>> + if vm.is_dead():
>>> + logging.info("Trying to get informations of death VM: %s"
>>> + % vm.name)
>>> + continue
>>
>> You can fail the entire test. Afterwards it will be hard to find the
>> issue.
>>
>
> Well if it's what the community wants, we can change it. We just didn=
't
> want to lose information about the rest of the systems. Perhaps we ca=
n
> set some DIE flag and after collecting all statistics raise an Error.

I don't think we need to continue testing if some thing as basic as VM=20
died upon us.

>
> ----------- snip -----------
>>> + def get_true_pid(vm):
>>> + pid =3D vm.process.get_pid()
>>> + for i in range(1,10):
>>> + pid =3D pid + 1
>>
>> What are you trying to do here? It's seems like a nasty hack that mi=
ght
>> fail on load.
>>


qemu has -pifile option. It works fine.

>
> Yes and I'm really sorry for this ugly hack. The qemu command has
> changed since the first patch was made. Nowadays the "vm.pid" returns
> PID of the command itself, not the actual qemu process.
> We need to have the PID of the actual qemu process, which is executed=
 by
> the command with PID "vm.pid". That's why first I try finding the qem=
u
> process as the following "vm.pid" PID. I haven't found another soluti=
on
> yet (in case we don't want to change the qemu command back in the
> framework).
> We have tested this solution under heavy process load and either firs=
t
> or second part always finds the right value.
>
> ----------- snip -----------
>>> + if (params['ksm_test_size'] =3D=3D "paralel") :
>>> + vmsc =3D 1
>>> + overcommit =3D 1
>>> + mem =3D host_mem
>>> + # 32bit system adjustment
>>> + if not params['image_name'].endswith("64"):
>>> + logging.debug("Probably i386 guest architecture, "\
>>> + "max allocator mem =3D 2G")
>>
>> Better not to relay on the guest name. You can test percentage of th=
e
>> guest mem.
>>
>
> What do you mean by "percentage of the guest mem"? This adjustment is
> made because the maximum memory for 1 process in 32 bit OS is 2GB.
> Testing of the 'image_name' showed to be most reliable method we foun=
d.


It's not that important but it should be a convention of kvm autotest.
If that's acceptable, fine, otherwise, each VM will define it in the=20
config file

>
> ----------- snip -----------
>>> + # Guest can have more than 2G but kvm mem + 1MB (allocator itself=
)
>>> + # can't
>>> + if (host_mem> 2048):
>>> + mem =3D 2047
>>> +
>>> +
>>> + if os.popen("uname -i").readline().startswith("i386"):
>>> + logging.debug("Host is i386 architecture, max guest mem is 2G")
>>
>> There are bigger 32 bit guests.
>>
> How do you mean this note? We are testing whether the host machine is=
 32
> bit. If so, the maximum process allocation is 2GB (similar case to 32
> bit guest) but this time the whole qemu process (2GB qemu machine + 6=
4
> MB qemu overhead) can't exceeded 2GB.
> Still the maximum memory used in test is the same (as we increase the=
 VM
> count - host_mem =3D quest_mem * vm_count; quest_mem is decreased,
> vm_count is increased)

i386 guests with PAE mode (additional 4 bits) can have up to 16G ram on=
=20
theory.

>
> ----------- snip -----------
>>> +
>>> + # Copy the allocator.c into guests
>>
>> .py
>>
> yes indeed.
>
> ----------- snip -----------
>>> + # Let kksmd works (until shared mem rich expected value)
>>> + shm =3D 0
>>> + i =3D 0
>>> + cmd =3D "cat/proc/%d/statm" % get_true_pid(vm)
>>> + while shm< ksm_size:
>>> + if i> 64:
>>> + logging.info(get_stat(lvms))
>>> + raise error.TestError("SHM didn't merged the memory until "\
>>> + "the DL on guest: %s"% (vm.name))
>>> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio))
>>> + time.sleep(ksm_size / 200 * perf_ratio)
>>> + try:
>>> + shm =3D int(os.popen(cmd).readline().split()[2])
>>> + shm =3D shm * 4 / 1024
>>> + i =3D i + 1
>>
>> Either you have nice statistic calculation function or not.
>> I vote for the first case.
>>
>
> Yes, we are using the statistics function for the output. But in this
> case we just need to know the shm value, not to log anything.
> If this is a big problem even for others, we can split the statistics
> function into 2:
> int =3D _get_stat(vm) - returns shm value
> string =3D get_stat(vm) - Uses _get_stats and creates a nice log outp=
ut
>
> ----------- snip -----------
>>> + """ Check if memory in max loading guest is allright"""
>>> + logging.info("Starting phase 3b")
>>> +
>>> + """ Kill rest of machine"""
>>
>> We should have a function for it for all kvm autotest
>>
>
> you think lsessions[i].close() instead of (status,data) =3D
> lsessions[i].get_command_status_output("exit;",20)?
> Yes, it would be better.
>
>>> + for i in range(last_vm+1, vmsc):
>>> + (status,data) =3D lsessions[i].get_command_status_output("exit;",=
20)
>>> + if i =3D=3D (vmsc-1):
>>> + logging.info(get_stat([lvms[i]]))
>>> + lvms[i].destroy(gracefully =3D False)
>
> ----------- snip -----------
>>> + def phase_paralel():
>>> + """ Paralel page spliting """
>>> + logging.info("Phase 1: Paralel page spliting")
>>> + # We have to wait until allocator is finished (it waits 5 seconds=
 to
>>> + # clean the socket
>>> +
>>
>> The whole function is very similar to phase_separate_first_guest ple=
ase
>> refactor them.
>>
> Yes, those functions are a bit similar. On the other hand there are s=
ome
> differences. Instead of creating more complex function we agreed to
> split them for better readability of the code.

The trouble is that re-implementing it make the code longer and more=20
error prone

>
> ----------- snip -----------
>>> + while shm< ksm_size:
>>> + if i> 64:
>>> + logging.info(get_stat(lvms))
>>> + raise error.TestError("SHM didn't merged the memory until DL")
>>> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio))
>>> + time.sleep(ksm_size / 200 * perf_ratio)
>>> + try:
>>> + shm =3D int(os.popen(cmd).readline().split()[2])
>>> + shm =3D shm * 4 / 1024
>>> + except:
>>> + raise error.TestError("Could not fetch shmem info from proc")
>>
>> Didn't you needed to increase i?
>>
> yes, you are right. This line somehow disappeard "i =3D i + 1"...

Does it pass all the tests?

>
> ----------- snip -----------
>>> + def compare_page(self,original,inmem):
>>> + """
>>> + compare memory
>>> +
>>
>>
>> Why do you need it? Is it to really check ksm didn't do damage?
>> Interesting, I never doubted ksm for that. Actually it is good idea =
to
>> test...
>>
>
> We were asked to do so (be paranoid, everything could happened). We c=
an
> make this voluntary in the config.

I guess it just add sanity, better leave it on always, I guess it's not=
=20
that time consuming.

>
>
> Once again thanks, I'm looking forward to your replay.
>
> Best regards,
> Luk=C3=A1=C5=A1 Doktor
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html