From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bear Yang Subject: Re: [KVM-AUTOTEST][PATCH] timedrift support Date: Tue, 12 May 2009 20:31:36 +0800 Message-ID: <4A096C28.1060900@redhat.com> References: <4A010BCD.8060307@redhat.com> <20090506130247.GA5048@amt.cnet> <4A08008E.8060105@redhat.com> <1242046789.2930.8.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , uril@redhat.com, kvm@vger.kernel.org To: Lucas Meneghel Rodrigues Return-path: Received: from mx2.redhat.com ([66.187.237.31]:60067 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754263AbZELMbH (ORCPT ); Tue, 12 May 2009 08:31:07 -0400 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n4CCV8CU030968 for ; Tue, 12 May 2009 08:31:08 -0400 In-Reply-To: <1242046789.2930.8.camel@localhost.localdomain> Sender: kvm-owner@vger.kernel.org List-ID: Hi Lucas: First, I want to say really thanks for your kindly,carefully words and suggestions. now, I modified my scripts follow your opinions. 1. Add the genload to timedrift, but I am not sure whether it is right or not to add the information CVS relevant. If it is not necessary. I will remove them next time. 2. Replace the API os.system to utils.system 3. Replace the API os.environ.get('HOSTNAME') to socket.gethostname() 4. for the snippet of the code below: + if utils.system(ntp_cmd, ignore_status=True) != 0: + raise error.TestFail, "NTP server has not starting correctly..." Your suggestion is "Instead of the if clause we'd put a try/except block", but I am not clear how to do it. Would you please give me some guides for this. Sorry. Other thing about functional the clauses which to get vm handle below: + # get vm handle + vm = kvm_utils.env_get_vm(env,params.get("main_vm")) + if not vm: + raise error.TestError, "VM object not found in environment" + if not vm.is_alive(): + raise error.TestError, "VM seems to be dead; Test requires a living VM" I agree with you on this point, I remember that somebody to do this before. but seems upstream not accept his modification. Have a good day thanks. Lucas Meneghel Rodrigues wrote: > On Mon, 2009-05-11 at 18:40 +0800, Bear Yang wrote: > >> Hello. >> I have modified my script according Marcelo's suggestion. and resubmit >> my script to you all. :) >> >> Marcelo, Seems except you, no one care my script. I still want to say >> any suggestion on my script would be greatly appreciated. >> >> Thanks. >> >> Bear >> >> > > Hi Bear, sorry, I had some hectic days here so I still haven't reviewed > your patch. > > As a general comment, I realize that in several occasions we are using > os.system() to execute commands on the host, when we would usually > prefer to use the utils.system() or utils.run() API, since it already > throws an exception when exit code != 0 (you can allways set ignore_fail > = True to avoid this behavior if needed) and we are working on doing a > better handling of stdout and stderr upstream. > > My comments follow: > > diff -urN kvm_runtest_2.bak/cpu_stress.c kvm_runtest_2/cpu_stress.c > --- kvm_runtest_2.bak/cpu_stress.c 1969-12-31 19:00:00.000000000 -0500 > +++ kvm_runtest_2/cpu_stress.c 2009-05-05 22:35:34.000000000 -0400 > @@ -0,0 +1,61 @@ > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define MAX_CPUS 256 > +#define BUFFSIZE 1024 > + > + > +void worker_child(int cpu) > +{ > + int cur_freq; > + int min_freq; > + int max_freq; > + int last_freq; > + cpu_set_t mask; > + int i; > + double x; > + int d = 0; > + /* > + * bind this thread to the specified cpu > + */ > + CPU_ZERO(&mask); > + CPU_SET(cpu, &mask); > + sched_setaffinity(0, CPU_SETSIZE, &mask); > + > + while (d++ != 500000) { > + for (i=0; i<100000; i++) > + x = sqrt(x); > + } > + > + _exit(0); > + > +} > + > + > +main() { > + cpu_set_t mask; > + int i; > + int code; > + > + if (sched_getaffinity(0, CPU_SETSIZE, &mask) < 0){ > + perror ("sched_getaffinity"); > + exit(1); > + } > + > + for (i=0; i + if (CPU_ISSET(i, &mask)){ > + printf ("CPU%d\n",i); > + if (fork() == 0) > + worker_child(i); > + } > + > + > + wait(&code); > + exit (WEXITSTATUS(code)); > +} > > I believe we might want to use a more complete stress system, that can do IO stress and put 'memory pressure' on the host system. > When I need to cause stress on a host, what I end up doing is to hack the stress.c program from LTP, because it can do memory and IO stress as well. > I will send you the stress.c program on a separate e-mail. > > diff -urN kvm_runtest_2.bak/kvm_runtest_2.py kvm_runtest_2/kvm_runtest_2.py > --- kvm_runtest_2.bak/kvm_runtest_2.py 2009-04-29 06:17:29.000000000 -0400 > +++ kvm_runtest_2/kvm_runtest_2.py 2009-04-29 08:06:32.000000000 -0400 > @@ -36,6 +36,8 @@ > "autotest": test_routine("kvm_tests", "run_autotest"), > "kvm_install": test_routine("kvm_install", "run_kvm_install"), > "linux_s3": test_routine("kvm_tests", "run_linux_s3"), > + "ntp_server_setup": test_routine("kvm_tests", "run_ntp_server_setup"), > + "timedrift": test_routine("kvm_tests", "run_timedrift"), > } > > # Make it possible to import modules from the test's bindir > diff -urN kvm_runtest_2.bak/kvm_tests.cfg.sample kvm_runtest_2/kvm_tests.cfg.sample > --- kvm_runtest_2.bak/kvm_tests.cfg.sample 2009-04-29 06:17:29.000000000 -0400 > +++ kvm_runtest_2/kvm_tests.cfg.sample 2009-04-29 08:09:36.000000000 -0400 > @@ -81,6 +81,10 @@ > - linux_s3: install setup > type = linux_s3 > > + - ntp_server_setup: > + type = ntp_server_setup > + - timedrift: ntp_server_setup > + type = timedrift > # NICs > variants: > - @rtl8139: > diff -urN kvm_runtest_2.bak/kvm_tests.py kvm_runtest_2/kvm_tests.py > --- kvm_runtest_2.bak/kvm_tests.py 2009-04-29 06:17:29.000000000 -0400 > +++ kvm_runtest_2/kvm_tests.py 2009-05-11 06:00:32.000000000 -0400 > @@ -394,3 +394,247 @@ > kvm_log.info("VM resumed after S3") > > session.close() > + > +def run_ntp_server_setup(test, params, env): > + > + """NTP server configuration and related network file modification > + """ > + > + kvm_log.info("stop the iptables service if it is running for timedrift testing") > + > + if not os.system("/etc/init.d/iptables status"): > + os.system("/etc/init.d/iptables stop") > + > + # prevent dhcp client modify the ntp.conf > + kvm_log.info("prevent dhcp client modify the ntp.conf") > + > + config_file = "/etc/sysconfig/network" > + network_file = open("/etc/sysconfig/network", "a") > + string = "PEERNTP=no" > + > + if os.system("grep %s %s" % (string, config_file)): > + network_file.writelines(str(string)+'\n') > + > + network_file.close() > + > + # stop the ntp service if it is running > + kvm_log.info("stop ntp service if it is running") > + > + if not os.system("/etc/init.d/ntpd status"): > + os.system("/etc/init.d/ntpd stop") > + ntp_running = True > + > + kvm_log.info("start ntp server on host with the custom config file.") > + > + ntp_cmd = ''' > + echo "restrict default kod nomodify notrap nopeer noquery" >> /etc/timedrift.ntp.conf;\ > + echo "restrict 127.0.0.1" >> /etc/timedrift.ntp.conf;\ > + echo "driftfile /var/lib/ntp/drift" >> /etc/timedrift.ntp.conf;\ > + echo "keys /etc/ntp/keys" >> /etc/timedrift.ntp.conf;\ > + echo "server 127.127.1.0" >> /etc/timedrift.ntp.conf;\ > + echo "fudge 127.127.1.0 stratum 1" >> /etc/timedrift.ntp.conf;\ > + ntpd -c /etc/timedrift.ntp.conf; > + ''' > + if os.system(ntp_cmd): > + raise error.TestFail, "NTP server has not starting correct..." > > Here you could have used regular utils.system API instead of os.system since it integrates better with the autotest infrastructure. > Instead of the if clause we'd put a try/except block. Minor nipticking, "NTP server has not started correctly..." > > + #kvm_log.info("sync system clock to BIOS") > + #os.system("/sbin/hwclock --systohc") > + > +def run_timedrift(test, params, env): > + """judge wether the guest clock will encounter timedrift prblem or not. including three stages: > > Typo, "whether" > > + 1: try to sync the clock with host, if the offset value of guest clock is large than 1 sec. > + 2: running the cpu stress testing program on guest > + 3: then run analyze loop totally 20 times to determine if the clock on guest has time drift. > + """ > + # variables using in timedrift testcase > + cpu_stress_program = "cpu_stress.c" > + remote_dir = '/root' > + > + clock_resource_cmd = "cat /sys/devices/system/clocksource/clocksource0/current_clocksource" > + > + pwd = os.path.join(os.environ['AUTODIR'],'tests/kvm_runtest_2') > + cpu_stress_test = os.path.join(pwd, cpu_stress_program) > + cpu_stress_cmdline = 'cd %s;gcc %s -lm;./a.out &' % (remote_dir, os.path.basename(cpu_stress_test)) > + > + cpu_stress_search_cmdline = "ps -ef|grep 'a.out'|grep -v grep" > + > + hostname = os.environ.get("HOSTNAME") > > Can't we use socket.gethostname() here instead of relying on environment variable values? > > + if "localhost.localdomain" == hostname: > + hostname = os.popen('hostname').read().split('\n')[0] > + kvm_log.info("since get wrong hostname from python evnironment, then use the hostname get from system call(hostname).") > + > + kvm_log.info("get host name :%s" % hostname) > + > + # ntpdate info command and ntpdate sync command > + ntpdate_info_cmd = "ntpdate -q %s" % hostname > + ntpdate_sync_cmd = "ntpdate %s" % hostname > + > + # get vm handle > + vm = kvm_utils.env_get_vm(env,params.get("main_vm")) > + if not vm: > + raise error.TestError, "VM object not found in environment" > + if not vm.is_alive(): > + raise error.TestError, "VM seems to be dead; Test requires a living VM" > > I am seeing this piece of code to get the VM handle on several tests, I am starting to think we should factor this on an utility function... > > + kvm_log.info("Waiting for guest to be up...") > + > + pxssh = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2) > + if not pxssh: > + raise error.TestFail, "Could not log into guest" > + > + kvm_log.info("Logged into guest IN run_timedrift function.") > + > + # clock resource get from host and guest > + host_clock_resource = os.popen(clock_resource_cmd).read().split('\n')[0] > + kvm_log.info("the clock resource on host is :%s" % host_clock_resource) > + > + pxssh.sendline(clock_resource_cmd) > + s, o = pxssh.read_up_to_prompt() > + guest_clock_resource = o.splitlines()[-2] > + kvm_log.info("the clock resource on guest is :%s" % guest_clock_resource) > + > + if host_clock_resource != guest_clock_resource: > + #raise error.TestFail, "Host and Guest using different clock resource" > + kvm_log.info("Host and Guest using different clock resource,Let's moving on.") > + else: > + kvm_log.info("Host and Guest using same clock resource,Let's moving on.") > > Little mistake here, "Let's move on." > > + # helper function: > + # ntpdate_op: a entire process to get ntpdate command line result from guest. > + # time_drift_or_not: get the numeric handing by regular expression and make timedrift calulation. > + def ntpdate_op(command): > + output = [] > + try: > + pxssh = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2) > + if not pxssh: > + raise error.TestFail, "Could not log into guest" > + > + kvm_log.info("Logged in:(ntpdate_op)") > + > + while True: > + pxssh.sendline(command) > + s, output = pxssh.read_up_to_prompt() > + if "time server" in output: > + # output is a string contain the (ntpdate -q) infor on guest > + return True, output > + else: > + continue > + except: > + pxssh.close() > + return False, output > + return False, output > + > + def time_drift_or_not(output): > + date_string = re.findall(r'offset [+-]?(.*) sec', output, re.M) > + num = float(date_string[0]) > + if num >= 1: > + kvm_log.info("guest clock has drifted in this scenario :%s %s" % (date_string, num)) > + return False > + else: > + kvm_log.info("guest clock running veracious in now stage :%s %s" % (date_string, num)) > + return True > + > + # send the command and get the ouput from guest > + # this loop will pick out several conditions need to be process > + # Actually, we want to get the info match "time server", then script can analyzing it to > + # determine if guest's clock need sync with host or not. > + while True: > + pxssh.sendline(ntpdate_info_cmd) > + s, output = pxssh.read_up_to_prompt() > + kvm_log.info("the ntpdate query info get from guest is below: \n%s" %output) > + if ("no server suitable" not in output) and ("time server" not in output): > + kvm_log.info("very creazying output got. let's try again") > + continue > + elif "no server suitable" in output: > + kvm_log.info("seems NTP server is not ready for servicing") > + time.sleep(30) > + continue > + elif "time server" in output: > + # get the ntpdate info from guest > + # kvm_log.info("Got the correct output for analyze. The output is below: \n%s" %output) > + break > + > + kvm_log.info("get the ntpdate infomation from guest successfully :%s" % os.popen('date').read()) > + > + # judge the clock need to sync with host or not > + while True: > + date_string = re.findall(r'offset [+-]?(.*) sec', output, re.M) > + num = float(date_string[0]) > + if num >= 1: > + kvm_log.info("guest need sync with the server: %s" % hostname) > + s, output = ntpdate_op(ntpdate_sync_cmd) > + if s: > + continue > + else: > + #pxssh.sendline("hwclock --systohc") > + #kvm_log.info("guest clock sync prcdure is finished. then sync the guest clock to guest bios.") > + > + #pxssh.sendline("hwclock --show") > + #s, o = pxssh.read_up_to_prompt() > + #kvm_log.info("the date infomation get from guest bios is :\n%s" % o) > + > + pxssh.sendline(ntpdate_info_cmd) > + s, o = pxssh.read_up_to_prompt() > + kvm_log.info("guest clock after sync with host is :\n%s" % o) > + > + break > + > + kvm_log.info("Timedrift Preparation *Finished* at last :%s" % os.popen('date').read()) > + > + if not vm.scp_to_remote(cpu_stress_test, remote_dir): > + raise error.TestError, "Could not copy program to guest." > + > + pxssh.sendline(ntpdate_info_cmd) > + s, o = pxssh.read_up_to_prompt() > + kvm_log.info("the ntpdate query from host *BEFORE* running the cpu stress program.\n%s" % o) > + pxssh.sendline(cpu_stress_cmdline) > + s, o = pxssh.read_up_to_prompt() > + kvm_log.info("running command line on guest and sleeping for 1200 secs.\n%s" % o) > + > + time.sleep(1200) > + > + while True: > + if pxssh.get_command_status(cpu_stress_search_cmdline): > + #(s, o) = pxssh.get_command_status_output(cpu_stress_search_cmdline) > + #print "s is :%s" % s > + #print "o is :%s" % o > + #print "--------------------------------------------" > + #aaa = pxssh.get_command_status(cpu_stress_search_cmdline) > + #print "aaa is :%s" % aaa > + #print "--------------------------------------------" > + > + print "stress testing process has been completed and quit." > + break > + else: > + print "stress testing on CPU has not finished yet.waiting for next detect after sleep 60 secs." > + time.sleep(60) > + continue > + > + pxssh.sendline(ntpdate_info_cmd) > + s, o = pxssh.read_up_to_prompt() > + kvm_log.info("the ntpdate query from host *AFTER* running the cpu stress program.\n%s" % o) > + > + pxssh.close() > + > + # Sleep for analyze... > + kvm_log.info("sleeping(180 secs) Starting... :%s" % os.popen('date').read()) > + time.sleep(180) > + kvm_log.info("wakeup to get the analyzing... :%s" % os.popen('date').read()) > + count = 0 > + for i in range(1, 21): > + kvm_log.info("this is %s time to get clock info from guest." % i) > + s, o = ntpdate_op(ntpdate_info_cmd) > + > + if not s: > + raise error.TestFail, "Guest seems hang or ssh service based on guest has been crash down" > + > + if not time_drift_or_not(o): > + count += 1 > + > + if count == 5: > + raise error.TestFail, "TimeDrift testing Abort because guest's clock has drift too much" > + > + kvm_log.info("*********************** Sleep 30 seconds for next loop *************************") > + time.sleep(60) > + > >