From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yolkfull Chow Subject: Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive Date: Thu, 11 Jun 2009 11:37:16 +0800 Message-ID: <4A307BEC.8060906@redhat.com> References: <425001110.1660581244634737690.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Uri Lublin , kvm@vger.kernel.org To: Michael Goldish Return-path: Received: from mx2.redhat.com ([66.187.237.31]:49836 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1762085AbZFKDh0 (ORCPT ); Wed, 10 Jun 2009 23:37:26 -0400 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n5B3bSCU016823 for ; Wed, 10 Jun 2009 23:37:28 -0400 In-Reply-To: <425001110.1660581244634737690.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 06/10/2009 07:52 PM, Michael Goldish wrote: > ----- "Yolkfull Chow" wrote: > > >> On 06/10/2009 06:03 PM, Michael Goldish wrote: >> >>> ----- "Yolkfull Chow" wrote: >>> >>> >>> >>>> On 06/09/2009 05:44 PM, Michael Goldish wrote: >>>> >>>> >>>>> The test looks pretty nicely written. Comments: >>>>> >>>>> 1. Consider making all the cloned VMs use image snapshots: >>>>> >>>>> curr_vm = vm1.clone() >>>>> curr_vm.get_params()["extra_params"] += " -snapshot" >>>>> >>>>> I'm not sure it's a good idea to let all VMs use the same disk >>>>> >>>>> >>>> image. >>>> >>>> >>>>> Or maybe you shouldn't add -snapshot yourself, but rather do it >>>>> >> in >> >>>>> >>>>> >>>> the config >>>> >>>> >>>>> file for the first VM, and then all cloned VMs will have >>>>> >> -snapshot >> >>>>> >>>>> >>>> as well. >>>> >>>> >>>>> >>>>> >>>> Yes I use 'image_snapshot = yes' in config file. >>>> >>>> >>>>> 2. Consider changing the message >>>>> " Booting the %dth guest" % num >>>>> to >>>>> "Booting guest #%d" % num >>>>> (because there's no such thing as 2th and 3th) >>>>> >>>>> 3. Consider changing the message >>>>> "Cannot boot vm anylonger" >>>>> to >>>>> "Cannot create VM #%d" % num >>>>> >>>>> 4. Why not add curr_vm to vms immediately after cloning it? >>>>> That way you can kill it in the exception handler later, without >>>>> >>>>> >>>> having >>>> >>>> >>>>> to send it a 'quit' if you can't login ('if not >>>>> >> curr_vm_session'). >> >>>>> >>>>> >>>> Yes, good idea. >>>> >>>> >>>>> 5. " %dth guest boots up successfully" % num --> again, 2th and >>>>> >> 3th >> >>>>> >>>>> >>>> make no sense. >>>> >>>> >>>>> Also, I wonder why you add those spaces before every info >>>>> >> message. >> >>>>> 6. "%dth guest's session is not responsive" --> same >>>>> (maybe use "Guest session #%d is not responsive" % num) >>>>> >>>>> 7. "Shut down the %dth guest" --> same >>>>> (maybe "Shutting down guest #%d"? or destroying/killing?) >>>>> >>>>> 8. Shouldn't we fail the test when we find an unresponsive >>>>> >> session? >> >>>>> It seems you just display an error message. You can simply >>>>> >> replace >> >>>>> logging.error( with raise error.TestFail(. >>>>> >>>>> >>>>> >>>> >>>> >>>>> 9. Consider using a stricter test than just >>>>> >>>>> >>>> vm_session.is_responsive(). >>>> >>>> >>>>> vm_session.is_responsive() just sends ENTER to the sessions and >>>>> >>>>> >>>> returns >>>> >>>> >>>>> True if it gets anything as a result (usually a prompt, or even >>>>> >> just >> >>>>> >>>>> >>>> a >>>> >>>> >>>>> newline echoed back). If the session passes this test it is >>>>> >> indeed >> >>>>> responsive, so it's a decent test, but maybe you can send some >>>>> >>>>> >>>> command >>>> >>>> >>>>> (user configurable?) and test for some output. I'm really not >>>>> >> sure >> >>>>> >>>>> >>>> this >>>> >>>> >>>>> is important, because I can't imagine a session would respond to >>>>> >> a >> >>>>> >>>>> >>>> newline >>>> >>>> >>>>> but not to other commands, but who knows. Maybe you can send the >>>>> >>>>> >>>> first VM >>>> >>>> >>>>> a user-specified command when the test begins, remember the >>>>> >> output, >> >>>>> >>>>> >>>> and >>>> >>>> >>>>> then send all other VMs the same command and make sure the output >>>>> >> is >> >>>>> >>>>> >>>> the >>>> >>>> >>>>> same. >>>>> >>>>> >>>>> >>>> maybe use 'info status' and send command 'help' via session to vms >>>> >> and >> >>>> compare their output? >>>> >>>> >>> I'm not sure I understand. What does 'info status' do? We're talking >>> >> about >> >>> an SSH shell, not the monitor. You can do whatever you like, like >>> >> 'uname -a', >> >>> and 'ls /', but you should leave it up to the user to decide, so >>> >> he/she >> >>> can specify different commands for different guests. Linux commands >>> >> won't >> >>> work under Windows, so Linux and Windows must have different >>> >> commands in >> >>> the config file. In the Linux section, under '- @Linux:' you can >>> >> add >> >>> something like: >>> >>> stress_boot: >>> stress_boot_test_command = uname -a >>> >>> and under '- @Windows:': >>> >>> stress_boot: >>> stress_boot_test_command = ver&& vol >>> >>> These commands are just naive suggestions. I'm sure someone can >>> >> think of >> >>> much more informative commands. >>> >>> >> That's really good suggestions. Thanks, Michael. And can I use >> 'migration_test_command' instead? >> > Not really. Why would you want to use another test's param? > > 1. There's no guarantee that 'migration_test_command' is defined > for your boot stress test. In fact, it is probably only defined for > migration tests, so you probably won't be able to access it. Try > params.get('migration_test_command') in your test and you'll probably > get None. > > 2. The user may not want to run migration at all, and then he/she > will probably not define 'migration_test_command'. > > 3. The user might want to use different test commands for migration > and for the boot stress test. > > >>>>> 10. I'm not sure you should use the param "kill_vm_gracefully" >>>>> >>>>> >>>> because that's >>>> >>>> >>>>> a postprocessor param (probably not your business). You can just >>>>> >>>>> >>>> call >>>> >>>> >>>>> destroy() in the exception handler with gracefully=False, because >>>>> >> if >> >>>>> >>>>> >>>> the VMs >>>> >>>> >>>>> are non- responsive, I don't expect them to shutdown nicely with >>>>> >> an >> >>>>> >>>>> >>>> SSH >>>> >>>> >>>>> command (that's what gracefully does). Also, we're using >>>>> >> -snapshot, >> >>>>> >>>>> >>>> so >>>> >>>> >>>>> there's no reason to shut them down nicely. >>>>> >>>>> >>>>> >>>> Yes, I agree. :) >>>> >>>> >>>>> 11. "Total number booted successfully: %d" % (num - 1) --> why >>>>> >> not >> >>>>> >>>>> >>>> just num? >>>> >>>> >>>>> We really have num VMs including the first one. >>>>> Or you can say: "Total number booted successfully in addition to >>>>> >> the >> >>>>> >>>>> >>>> first one" >>>> >>>> >>>>> but that's much longer. >>>>> >>>>> >>>>> >>>> Since after the first guest booted, I set num = 1 and then 'num += >>>> >> 1' >> >>>> at first in while loop ( for the purpose of getting a new vm ). >>>> So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot >>>> >> up, >> >>>> the num booted successfully should be (num - 1). >>>> I would use enumerate(vms) that Uri suggested to make number easier >>>> >> to >> >>>> count. >>>> >>>> >>> OK, I didn't notice that. >>> >>> >>> >>>>> 12. Consider adding a 'max_vms' (or 'threshold') user param to >>>>> >> the >> >>>>> >>>>> >>>> test. If >>>> >>>> >>>>> num reaches 'max_vms', we stop adding VMs and pass the test. >>>>> >>>>> >>>> Otherwise the >>>> >>>> >>>>> test will always fail (which is depressing). If >>>>> >>>>> >>>> params.get("threshold") is >>>> >>>> >>>>> None or "", or in short -- 'if not params.get("threshold")', >>>>> >> disable >> >>>>> >>>>> >>>> this >>>> >>>> >>>>> feature and keep adding VMs forever. The user can enable the >>>>> >> feature >> >>>>> >>>>> >>>> with: >>>> >>>> >>>>> max_vms = 50 >>>>> or disable it with: >>>>> max_vms = >>>>> >>>>> >>>>> >>>> This is a good idea for hardware resource limit of host. >>>> >>>> >>>>> 13. Why are you catching OSError? If you get OSError it might be >>>>> >> a >> >>>>> >>>>> >>>> framework bug. >>>> >>>> >>>>> >>>>> >>>> Since sometimes, vm.create() successfully but failed to ssh-login >>>> since >>>> the running python cannot allocate physical memory (OSError). >>>> Add max_vms could fix this problem I think. >>>> >>>> >>> Do you remember exactly where OSError was thrown? Do you happen to >>> >> have >> >>> a backtrace? (I just want to be very it's not a bug.) >>> >>> >> The OSError was thrown when checking all VMs are responsive and I got >> many traceback about "OSError: [Errno 12] Cannot allocate memory". >> Maybe since when last VM was created successfully with lucky, whereas >> python cannot get physical memory after that when checking all >> sessions. >> So can we now catch the OSError and tell user the number of max_vms >> is too large? >> > Sure. I was just worried it might be a framework bug. If it's a legitimate > memory error -- catch it and fail the test. > > If you happen to catch that OSError again, and get a backtrace, I'd like > to see it if that's possible. > Michael, these are the backtrace messages: ... 20090611-064959 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: ERROR: run_once: Test failed: [Errno 12] Cannot allocate memory 20090611-064959 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: DEBUG: run_once: Postprocessing on error... 20090611-065000 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: DEBUG: postprocess_vm: Postprocessing VM 'vm1'... 20090611-065000 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: DEBUG: postprocess_vm: VM object found in environment 20090611-065000 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: DEBUG: send_monitor_cmd: Sending monitor command: screendump /kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]/debug/post_vm1.ppm 20090611-065000 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: DEBUG: run_once: Contents of environment: {'vm__vm1': } post-test sysinfo error: Traceback (most recent call last): File "/kvm-autotest/client/common_lib/log.py", line 58, in decorated_func fn(*args, **dargs) File "/kvm-autotest/client/bin/base_sysinfo.py", line 213, in log_after_each_test log.run(test_sysinfodir) File "/kvm-autotest/client/bin/base_sysinfo.py", line 112, in run shell=True, env=env) File "/usr/lib64/python2.4/subprocess.py", line 412, in call return Popen(*args, **kwargs).wait() File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__ errread, errwrite) File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory 2009-06-11 06:50:02,859 Configuring logger for client level FAIL kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2] kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2] timestamp=1244717402 localtime=Jun 11 06:50:02 Unhandled OSError: [Errno 12] Cannot allocate memory Traceback (most recent call last): File "/kvm-autotest/client/common_lib/test.py", line 304, in _exec self.execute(*p_args, **p_dargs) File "/kvm-autotest/client/common_lib/test.py", line 187, in execute self.run_once(*args, **dargs) File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_runtest_2.py", line 145, in run_once routine_obj.routine(self, params, env) File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py", line 3071, in run_boot_vms curr_vm_session = kvm_utils.wait_for(curr_vm.ssh_login, 240, 0, 2) File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 797, in wait_for output = func() File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py", line 728, in ssh_login session = kvm_utils.ssh(address, port, username, password, prompt, timeout) File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 553, in ssh return remote_login(command, password, prompt, "\n", timeout) File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 431, in remote_login sub = kvm_spawn(command, linesep) File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 114, in __init__ (pid, fd) = pty.fork() File "/usr/lib64/python2.4/pty.py", line 108, in fork pid = os.fork() OSError: [Errno 12] Cannot allocate memory Persistent state variable __group_level now set to 1 END FAIL kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2] kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2] timestamp=1244717403 localtime=Jun 11 06:50:03 Dropping caches 2009-06-11 06:50:03,409 running: sync JOB ERROR: Unhandled OSError: [Errno 12] Cannot allocate memory Traceback (most recent call last): File "/kvm-autotest/client/bin/job.py", line 978, in step_engine execfile(self.control, global_control_vars, global_control_vars) File "/kvm-autotest/client/control", line 1030, in ? cfg_to_test("kvm_tests.cfg") File "/kvm-autotest/client/control", line 1013, in cfg_to_test current_status = job.run_test("kvm_runtest_2", params=dict, tag=tagname) File "/kvm-autotest/client/bin/job.py", line 44, in wrapped utils.drop_caches() File "/kvm-autotest/client/bin/base_utils.py", line 638, in drop_caches utils.system("sync") File "/kvm-autotest/client/common_lib/utils.py", line 510, in system stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status File "/kvm-autotest/client/common_lib/utils.py", line 330, in run bg_job = join_bg_jobs( File "/kvm-autotest/client/common_lib/utils.py", line 37, in __init__ stdin=stdin) File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__ errread, errwrite) File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory Persistent state variable __group_level now set to 0 END ABORT ---- ---- timestamp=1244717418 localtime=Jun 11 06:50:18 Unhandled OSError: [Errno 12] Cannot allocate memory Traceback (most recent call last): File "/kvm-autotest/client/bin/job.py", line 978, in step_engine execfile(self.control, global_control_vars, global_control_vars) File "/kvm-autotest/client/control", line 1030, in ? cfg_to_test("kvm_tests.cfg") File "/kvm-autotest/client/control", line 1013, in cfg_to_test current_status = job.run_test("kvm_runtest_2", params=dict, tag=tagname) File "/kvm-autotest/client/bin/job.py", line 44, in wrapped utils.drop_caches() File "/kvm-autotest/client/bin/base_utils.py", line 638, in drop_caches utils.system("sync") File "/kvm-autotest/client/common_lib/utils.py", line 510, in system stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status File "/kvm-autotest/client/common_lib/utils.py", line 330, in run bg_job = join_bg_jobs( File "/kvm-autotest/client/common_lib/utils.py", line 37, in __init__ stdin=stdin) File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__ errread, errwrite) File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory [root@dhcp-66-70-9 kvm_runtest_2]# > Thanks, > Michael > > >>>>> 14. At the end of the exception handler you should proably >>>>> >> re-raise >> >>>>> >>>>> >>>> the exception >>>> >>>> >>>>> you caught. Otherwise the user won't see the error message. You >>>>> >> can >> >>>>> >>>>> >>>> simply replace >>>> >>>> >>>>> 'break' with 'raise' (no parameters), and it should work, >>>>> >>>>> >>>> hopefully. >>>> >>>> >>>>> >>>>> >>>> Yes I should if add a 'max_vms'. >>>> >>>> >>> I think you should re-raise anyway. Otherwise, what's the point in >>> >> writing >> >>> error messages such as "raise error.TestFail("Cannot boot vm >>> >> anylonger")"? >> >>> I you don't re-raise, the user won't see the messages. >>> >>> >>> >>>>> I know these are quite a few comments, but they're all rather >>>>> >> minor >> >>>>> >>>>> >>>> and the test >>>> >>>> >>>>> is well written in my opinion. >>>>> >>>>> >>>>> >>>> Thank you, I will do modification according to your and Uri's >>>> comments, >>>> and will re-submit it here later. :) >>>> >>>> Thanks and Best Regards, >>>> Yolkfull >>>> >>>> >>>>> Thanks, >>>>> Michael >>>>> >>>>> ----- Original Message ----- >>>>> From: "Yolkfull Chow" >>>>> To:kvm@vger.kernel.org >>>>> Cc: "Uri Lublin" >>>>> Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected >>>>> Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one >>>>> >> of >> >>>>> >>>>> >>>> them becomes unresponsive >>>> >>>> >>>>> Hi, >>>>> >>>>> This test will boot VMs until one of them becomes unresponsive, >>>>> >> and >> >>>>> records the maximum number of VMs successfully started. >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe kvm" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> >> -- >> Yolkfull >> Regards, >> -- Yolkfull Regards,