From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?UTF-8?B?THVrw6HFoSBEb2t0b3I=?= <ldoktor@redhat.com>
Subject: Re: [Autotest] [KVM-AUTOTEST] KSM-overcommit test v.2 (python version)
Date: Thu, 26 Nov 2009 11:11:09 +0100
Message-ID: <4B0E543D.6050807@redhat.com>
References: <2093284271.132771258469355013.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> <4B09618E.8010504@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Jiri Zupka <jzupka@redhat.com>,
	autotest <autotest@test.kernel.org>, kvm <kvm@vger.kernel.org>
To: dlaor@redhat.com
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:61829 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1755027AbZKZKLH (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 26 Nov 2009 05:11:07 -0500
In-Reply-To: <4B09618E.8010504@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Hello Dor,

Thank you for your review. I have few questions about your comments:

----------- snip -----------
>> + stat +=3D "Guests memsh =3D {"
>> + for vm in lvms:
>> + if vm.is_dead():
>> + logging.info("Trying to get informations of death VM: %s"
>> + % vm.name)
>> + continue
>
> You can fail the entire test. Afterwards it will be hard to find the =
issue.
>

Well if it's what the community wants, we can change it. We just didn't=
=20
want to lose information about the rest of the systems. Perhaps we can=20
set some DIE flag and after collecting all statistics raise an Error.

----------- snip -----------
>> + def get_true_pid(vm):
>> + pid =3D vm.process.get_pid()
>> + for i in range(1,10):
>> + pid =3D pid + 1
>
> What are you trying to do here? It's seems like a nasty hack that mig=
ht
> fail on load.
>

Yes and I'm really sorry for this ugly hack. The qemu command has=20
changed since the first patch was made. Nowadays the "vm.pid" returns=20
PID of the command itself, not the actual qemu process.
We need to have the PID of the actual qemu process, which is executed b=
y=20
the command with PID "vm.pid". That's why first I try finding the qemu=20
process as the following "vm.pid" PID. I haven't found another solution=
=20
yet (in case we don't want to change the qemu command back in the=20
framework).
We have tested this solution under heavy process load and either first=20
or second part always finds the right value.

----------- snip -----------
>> + if (params['ksm_test_size'] =3D=3D "paralel") :
>> + vmsc =3D 1
>> + overcommit =3D 1
>> + mem =3D host_mem
>> + # 32bit system adjustment
>> + if not params['image_name'].endswith("64"):
>> + logging.debug("Probably i386 guest architecture, "\
>> + "max allocator mem =3D 2G")
>
> Better not to relay on the guest name. You can test percentage of the
> guest mem.
>

What do you mean by "percentage of the guest mem"? This adjustment is=20
made because the maximum memory for 1 process in 32 bit OS is 2GB.
Testing of the 'image_name' showed to be most reliable method we found.

----------- snip -----------
>> + # Guest can have more than 2G but kvm mem + 1MB (allocator itself)
>> + # can't
>> + if (host_mem> 2048):
>> + mem =3D 2047
>> +
>> +
>> + if os.popen("uname -i").readline().startswith("i386"):
>> + logging.debug("Host is i386 architecture, max guest mem is 2G")
>
> There are bigger 32 bit guests.
>
How do you mean this note? We are testing whether the host machine is 3=
2=20
bit. If so, the maximum process allocation is 2GB (similar case to 32=20
bit guest) but this time the whole qemu process (2GB qemu machine + 64=20
MB qemu overhead) can't exceeded 2GB.
Still the maximum memory used in test is the same (as we increase the V=
M=20
count - host_mem =3D quest_mem * vm_count; quest_mem is decreased,=20
vm_count is increased)

----------- snip -----------
>> +
>> + # Copy the allocator.c into guests
>
> .py
>
yes indeed.

----------- snip -----------
>> + # Let kksmd works (until shared mem rich expected value)
>> + shm =3D 0
>> + i =3D 0
>> + cmd =3D "cat/proc/%d/statm" % get_true_pid(vm)
>> + while shm< ksm_size:
>> + if i> 64:
>> + logging.info(get_stat(lvms))
>> + raise error.TestError("SHM didn't merged the memory until "\
>> + "the DL on guest: %s"% (vm.name))
>> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio))
>> + time.sleep(ksm_size / 200 * perf_ratio)
>> + try:
>> + shm =3D int(os.popen(cmd).readline().split()[2])
>> + shm =3D shm * 4 / 1024
>> + i =3D i + 1
>
> Either you have nice statistic calculation function or not.
> I vote for the first case.
>

Yes, we are using the statistics function for the output. But in this=20
case we just need to know the shm value, not to log anything.
If this is a big problem even for others, we can split the statistics=20
function into 2:
int =3D _get_stat(vm) - returns shm value
string =3D get_stat(vm) - Uses _get_stats and creates a nice log output

----------- snip -----------
>> + """ Check if memory in max loading guest is allright"""
>> + logging.info("Starting phase 3b")
>> +
>> + """ Kill rest of machine"""
>
> We should have a function for it for all kvm autotest
>

you think lsessions[i].close() instead of (status,data) =3D=20
lsessions[i].get_command_status_output("exit;",20)?
Yes, it would be better.

>> + for i in range(last_vm+1, vmsc):
>> + (status,data) =3D lsessions[i].get_command_status_output("exit;",2=
0)
>> + if i =3D=3D (vmsc-1):
>> + logging.info(get_stat([lvms[i]]))
>> + lvms[i].destroy(gracefully =3D False)

----------- snip -----------
>> + def phase_paralel():
>> + """ Paralel page spliting """
>> + logging.info("Phase 1: Paralel page spliting")
>> + # We have to wait until allocator is finished (it waits 5 seconds =
to
>> + # clean the socket
>> +
>
> The whole function is very similar to phase_separate_first_guest plea=
se
> refactor them.
>
Yes, those functions are a bit similar. On the other hand there are som=
e=20
differences. Instead of creating more complex function we agreed to=20
split them for better readability of the code.

----------- snip -----------
>> + while shm< ksm_size:
>> + if i> 64:
>> + logging.info(get_stat(lvms))
>> + raise error.TestError("SHM didn't merged the memory until DL")
>> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio))
>> + time.sleep(ksm_size / 200 * perf_ratio)
>> + try:
>> + shm =3D int(os.popen(cmd).readline().split()[2])
>> + shm =3D shm * 4 / 1024
>> + except:
>> + raise error.TestError("Could not fetch shmem info from proc")
>
> Didn't you needed to increase i?
>
yes, you are right. This line somehow disappeard "i =3D i + 1"...

----------- snip -----------
>> + def compare_page(self,original,inmem):
>> + """
>> + compare memory
>> +
>
>
> Why do you need it? Is it to really check ksm didn't do damage?
> Interesting, I never doubted ksm for that. Actually it is good idea t=
o
> test...
>

We were asked to do so (be paranoid, everything could happened). We can=
=20
make this voluntary in the config.


Once again thanks, I'm looking forward to your replay.

Best regards,
Luk=C3=A1=C5=A1 Doktor