From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?UTF-8?B?THVrw6HFoSBEb2t0b3I=?= <ldoktor@redhat.com>
Subject: Re: [Autotest] [KVM-AUTOTEST] KSM-overcommit test v.2 (python version)
Date: Tue, 01 Dec 2009 13:23:23 +0100
Message-ID: <4B150ABB.5030600@redhat.com>
References: <2093284271.132771258469355013.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> <4B09618E.8010504@redhat.com> <4B0E543D.6050807@redhat.com> <4B129E8A.5070402@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Jiri Zupka <jzupka@redhat.com>,
	autotest <autotest@test.kernel.org>, kvm <kvm@vger.kernel.org>
To: dlaor@redhat.com
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:11761 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752389AbZLAMXW (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 1 Dec 2009 07:23:22 -0500
In-Reply-To: <4B129E8A.5070402@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Dne 29.11.2009 17:17, Dor Laor napsal(a):
> On 11/26/2009 12:11 PM, Luk=C3=A1=C5=A1 Doktor wrote:
>> Hello Dor,
>>
>> Thank you for your review. I have few questions about your comments:
>>
>> ----------- snip -----------
>>>> + stat +=3D "Guests memsh =3D {"
>>>> + for vm in lvms:
>>>> + if vm.is_dead():
>>>> + logging.info("Trying to get informations of death VM: %s"
>>>> + % vm.name)
>>>> + continue
>>>
>>> You can fail the entire test. Afterwards it will be hard to find th=
e
>>> issue.
>>>
>>
>> Well if it's what the community wants, we can change it. We just did=
n't
>> want to lose information about the rest of the systems. Perhaps we c=
an
>> set some DIE flag and after collecting all statistics raise an Error=
=2E
>
> I don't think we need to continue testing if some thing as basic as V=
M
> died upon us.
OK, we are going to change this.

>
>>
>> ----------- snip -----------
>>>> + def get_true_pid(vm):
>>>> + pid =3D vm.process.get_pid()
>>>> + for i in range(1,10):
>>>> + pid =3D pid + 1
>>>
>>> What are you trying to do here? It's seems like a nasty hack that m=
ight
>>> fail on load.
>>>
>
>
> qemu has -pifile option. It works fine.
>
Oh my, I haven't thought on this. Of course I'm going to use -pidfile=20
instead of this silly thing...

>>
>> Yes and I'm really sorry for this ugly hack. The qemu command has
>> changed since the first patch was made. Nowadays the "vm.pid" return=
s
>> PID of the command itself, not the actual qemu process.
>> We need to have the PID of the actual qemu process, which is execute=
d by
>> the command with PID "vm.pid". That's why first I try finding the qe=
mu
>> process as the following "vm.pid" PID. I haven't found another solut=
ion
>> yet (in case we don't want to change the qemu command back in the
>> framework).
>> We have tested this solution under heavy process load and either fir=
st
>> or second part always finds the right value.
>>
>> ----------- snip -----------
>>>> + if (params['ksm_test_size'] =3D=3D "paralel") :
>>>> + vmsc =3D 1
>>>> + overcommit =3D 1
>>>> + mem =3D host_mem
>>>> + # 32bit system adjustment
>>>> + if not params['image_name'].endswith("64"):
>>>> + logging.debug("Probably i386 guest architecture, "\
>>>> + "max allocator mem =3D 2G")
>>>
>>> Better not to relay on the guest name. You can test percentage of t=
he
>>> guest mem.
>>>
>>
>> What do you mean by "percentage of the guest mem"? This adjustment i=
s
>> made because the maximum memory for 1 process in 32 bit OS is 2GB.
>> Testing of the 'image_name' showed to be most reliable method we fou=
nd.
>
>
> It's not that important but it should be a convention of kvm autotest=
=2E
> If that's acceptable, fine, otherwise, each VM will define it in the
> config file
>
Yes kvm-autotest definitely need a way to decide whether this is 32 or=20
64 bit guest. I'll send a separate email to KVM-autotest mailing list t=
o=20
let others express their opinions.

>>
>> ----------- snip -----------
>>>> + # Guest can have more than 2G but kvm mem + 1MB (allocator itsel=
f)
>>>> + # can't
>>>> + if (host_mem> 2048):
>>>> + mem =3D 2047
>>>> +
>>>> +
>>>> + if os.popen("uname -i").readline().startswith("i386"):
>>>> + logging.debug("Host is i386 architecture, max guest mem is 2G")
>>>
>>> There are bigger 32 bit guests.
>>>
>> How do you mean this note? We are testing whether the host machine i=
s 32
>> bit. If so, the maximum process allocation is 2GB (similar case to 3=
2
>> bit guest) but this time the whole qemu process (2GB qemu machine + =
64
>> MB qemu overhead) can't exceeded 2GB.
>> Still the maximum memory used in test is the same (as we increase th=
e VM
>> count - host_mem =3D quest_mem * vm_count; quest_mem is decreased,
>> vm_count is increased)
>
> i386 guests with PAE mode (additional 4 bits) can have up to 16G ram =
on
> theory.
>
OK so we should first check whether PAE is on and separate into 3 group=
s=20
(64bit-unlimited, PAE-16G, 32bit-2G).

>>
>> ----------- snip -----------
>>>> +
>>>> + # Copy the allocator.c into guests
>>>
>>> .py
>>>
>> yes indeed.
>>
>> ----------- snip -----------
>>>> + # Let kksmd works (until shared mem rich expected value)
>>>> + shm =3D 0
>>>> + i =3D 0
>>>> + cmd =3D "cat/proc/%d/statm" % get_true_pid(vm)
>>>> + while shm< ksm_size:
>>>> + if i> 64:
>>>> + logging.info(get_stat(lvms))
>>>> + raise error.TestError("SHM didn't merged the memory until "\
>>>> + "the DL on guest: %s"% (vm.name))
>>>> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio))
>>>> + time.sleep(ksm_size / 200 * perf_ratio)
>>>> + try:
>>>> + shm =3D int(os.popen(cmd).readline().split()[2])
>>>> + shm =3D shm * 4 / 1024
>>>> + i =3D i + 1
>>>
>>> Either you have nice statistic calculation function or not.
>>> I vote for the first case.
>>>
>>
>> Yes, we are using the statistics function for the output. But in thi=
s
>> case we just need to know the shm value, not to log anything.
>> If this is a big problem even for others, we can split the statistic=
s
>> function into 2:
>> int =3D _get_stat(vm) - returns shm value
>> string =3D get_stat(vm) - Uses _get_stats and creates a nice log out=
put
>>
>> ----------- snip -----------
>>>> + """ Check if memory in max loading guest is allright"""
>>>> + logging.info("Starting phase 3b")
>>>> +
>>>> + """ Kill rest of machine"""
>>>
>>> We should have a function for it for all kvm autotest
>>>
>>
>> you think lsessions[i].close() instead of (status,data) =3D
>> lsessions[i].get_command_status_output("exit;",20)?
>> Yes, it would be better.
>>
>>>> + for i in range(last_vm+1, vmsc):
>>>> + (status,data) =3D lsessions[i].get_command_status_output("exit;"=
,20)
>>>> + if i =3D=3D (vmsc-1):
>>>> + logging.info(get_stat([lvms[i]]))
>>>> + lvms[i].destroy(gracefully =3D False)
>>
>> ----------- snip -----------
>>>> + def phase_paralel():
>>>> + """ Paralel page spliting """
>>>> + logging.info("Phase 1: Paralel page spliting")
>>>> + # We have to wait until allocator is finished (it waits 5 second=
s to
>>>> + # clean the socket
>>>> +
>>>
>>> The whole function is very similar to phase_separate_first_guest pl=
ease
>>> refactor them.
>>>
>> Yes, those functions are a bit similar. On the other hand there are =
some
>> differences. Instead of creating more complex function we agreed to
>> split them for better readability of the code.
>
> The trouble is that re-implementing it make the code longer and more
> error prone
>
OK, we'll create only one function.

>>
>> ----------- snip -----------
>>>> + while shm< ksm_size:
>>>> + if i> 64:
>>>> + logging.info(get_stat(lvms))
>>>> + raise error.TestError("SHM didn't merged the memory until DL")
>>>> + logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio))
>>>> + time.sleep(ksm_size / 200 * perf_ratio)
>>>> + try:
>>>> + shm =3D int(os.popen(cmd).readline().split()[2])
>>>> + shm =3D shm * 4 / 1024
>>>> + except:
>>>> + raise error.TestError("Could not fetch shmem info from proc")
>>>
>>> Didn't you needed to increase i?
>>>
>> yes, you are right. This line somehow disappeard "i =3D i + 1"...
>
> Does it pass all the tests?
>
Not incrementing the "i" could lead to infinite loop but only when KSM=20
fails. That's why we haven't found the mistake by testing... Well fix=20
this altogether with other modifications.

>>
>> ----------- snip -----------
>>>> + def compare_page(self,original,inmem):
>>>> + """
>>>> + compare memory
>>>> +
>>>
>>>
>>> Why do you need it? Is it to really check ksm didn't do damage?
>>> Interesting, I never doubted ksm for that. Actually it is good idea=
 to
>>> test...
>>>
>>
>> We were asked to do so (be paranoid, everything could happened). We =
can
>> make this voluntary in the config.
>
> I guess it just add sanity, better leave it on always, I guess it's n=
ot
> that time consuming.
>
+1
(Actually using python this is kind of time consuming. But compare to=20
the whole test duration it's nothing.)

>>
>>
>> Once again thanks, I'm looking forward to your replay.
>>
>> Best regards,
>> Luk=C3=A1=C5=A1 Doktor
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>