From: Yolkfull Chow <yzhou@redhat.com>
To: Michael Goldish <mgoldish@redhat.com>
Cc: Uri Lublin <uril@redhat.com>, kvm@vger.kernel.org
Subject: Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
Date: Thu, 11 Jun 2009 11:37:16 +0800 [thread overview]
Message-ID: <4A307BEC.8060906@redhat.com> (raw)
In-Reply-To: <425001110.1660581244634737690.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
On 06/10/2009 07:52 PM, Michael Goldish wrote:
> ----- "Yolkfull Chow"<yzhou@redhat.com> wrote:
>
>
>> On 06/10/2009 06:03 PM, Michael Goldish wrote:
>>
>>> ----- "Yolkfull Chow"<yzhou@redhat.com> wrote:
>>>
>>>
>>>
>>>> On 06/09/2009 05:44 PM, Michael Goldish wrote:
>>>>
>>>>
>>>>> The test looks pretty nicely written. Comments:
>>>>>
>>>>> 1. Consider making all the cloned VMs use image snapshots:
>>>>>
>>>>> curr_vm = vm1.clone()
>>>>> curr_vm.get_params()["extra_params"] += " -snapshot"
>>>>>
>>>>> I'm not sure it's a good idea to let all VMs use the same disk
>>>>>
>>>>>
>>>> image.
>>>>
>>>>
>>>>> Or maybe you shouldn't add -snapshot yourself, but rather do it
>>>>>
>> in
>>
>>>>>
>>>>>
>>>> the config
>>>>
>>>>
>>>>> file for the first VM, and then all cloned VMs will have
>>>>>
>> -snapshot
>>
>>>>>
>>>>>
>>>> as well.
>>>>
>>>>
>>>>>
>>>>>
>>>> Yes I use 'image_snapshot = yes' in config file.
>>>>
>>>>
>>>>> 2. Consider changing the message
>>>>> " Booting the %dth guest" % num
>>>>> to
>>>>> "Booting guest #%d" % num
>>>>> (because there's no such thing as 2th and 3th)
>>>>>
>>>>> 3. Consider changing the message
>>>>> "Cannot boot vm anylonger"
>>>>> to
>>>>> "Cannot create VM #%d" % num
>>>>>
>>>>> 4. Why not add curr_vm to vms immediately after cloning it?
>>>>> That way you can kill it in the exception handler later, without
>>>>>
>>>>>
>>>> having
>>>>
>>>>
>>>>> to send it a 'quit' if you can't login ('if not
>>>>>
>> curr_vm_session').
>>
>>>>>
>>>>>
>>>> Yes, good idea.
>>>>
>>>>
>>>>> 5. " %dth guest boots up successfully" % num --> again, 2th and
>>>>>
>> 3th
>>
>>>>>
>>>>>
>>>> make no sense.
>>>>
>>>>
>>>>> Also, I wonder why you add those spaces before every info
>>>>>
>> message.
>>
>>>>> 6. "%dth guest's session is not responsive" --> same
>>>>> (maybe use "Guest session #%d is not responsive" % num)
>>>>>
>>>>> 7. "Shut down the %dth guest" --> same
>>>>> (maybe "Shutting down guest #%d"? or destroying/killing?)
>>>>>
>>>>> 8. Shouldn't we fail the test when we find an unresponsive
>>>>>
>> session?
>>
>>>>> It seems you just display an error message. You can simply
>>>>>
>> replace
>>
>>>>> logging.error( with raise error.TestFail(.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>> 9. Consider using a stricter test than just
>>>>>
>>>>>
>>>> vm_session.is_responsive().
>>>>
>>>>
>>>>> vm_session.is_responsive() just sends ENTER to the sessions and
>>>>>
>>>>>
>>>> returns
>>>>
>>>>
>>>>> True if it gets anything as a result (usually a prompt, or even
>>>>>
>> just
>>
>>>>>
>>>>>
>>>> a
>>>>
>>>>
>>>>> newline echoed back). If the session passes this test it is
>>>>>
>> indeed
>>
>>>>> responsive, so it's a decent test, but maybe you can send some
>>>>>
>>>>>
>>>> command
>>>>
>>>>
>>>>> (user configurable?) and test for some output. I'm really not
>>>>>
>> sure
>>
>>>>>
>>>>>
>>>> this
>>>>
>>>>
>>>>> is important, because I can't imagine a session would respond to
>>>>>
>> a
>>
>>>>>
>>>>>
>>>> newline
>>>>
>>>>
>>>>> but not to other commands, but who knows. Maybe you can send the
>>>>>
>>>>>
>>>> first VM
>>>>
>>>>
>>>>> a user-specified command when the test begins, remember the
>>>>>
>> output,
>>
>>>>>
>>>>>
>>>> and
>>>>
>>>>
>>>>> then send all other VMs the same command and make sure the output
>>>>>
>> is
>>
>>>>>
>>>>>
>>>> the
>>>>
>>>>
>>>>> same.
>>>>>
>>>>>
>>>>>
>>>> maybe use 'info status' and send command 'help' via session to vms
>>>>
>> and
>>
>>>> compare their output?
>>>>
>>>>
>>> I'm not sure I understand. What does 'info status' do? We're talking
>>>
>> about
>>
>>> an SSH shell, not the monitor. You can do whatever you like, like
>>>
>> 'uname -a',
>>
>>> and 'ls /', but you should leave it up to the user to decide, so
>>>
>> he/she
>>
>>> can specify different commands for different guests. Linux commands
>>>
>> won't
>>
>>> work under Windows, so Linux and Windows must have different
>>>
>> commands in
>>
>>> the config file. In the Linux section, under '- @Linux:' you can
>>>
>> add
>>
>>> something like:
>>>
>>> stress_boot:
>>> stress_boot_test_command = uname -a
>>>
>>> and under '- @Windows:':
>>>
>>> stress_boot:
>>> stress_boot_test_command = ver&& vol
>>>
>>> These commands are just naive suggestions. I'm sure someone can
>>>
>> think of
>>
>>> much more informative commands.
>>>
>>>
>> That's really good suggestions. Thanks, Michael. And can I use
>> 'migration_test_command' instead?
>>
> Not really. Why would you want to use another test's param?
>
> 1. There's no guarantee that 'migration_test_command' is defined
> for your boot stress test. In fact, it is probably only defined for
> migration tests, so you probably won't be able to access it. Try
> params.get('migration_test_command') in your test and you'll probably
> get None.
>
> 2. The user may not want to run migration at all, and then he/she
> will probably not define 'migration_test_command'.
>
> 3. The user might want to use different test commands for migration
> and for the boot stress test.
>
>
>>>>> 10. I'm not sure you should use the param "kill_vm_gracefully"
>>>>>
>>>>>
>>>> because that's
>>>>
>>>>
>>>>> a postprocessor param (probably not your business). You can just
>>>>>
>>>>>
>>>> call
>>>>
>>>>
>>>>> destroy() in the exception handler with gracefully=False, because
>>>>>
>> if
>>
>>>>>
>>>>>
>>>> the VMs
>>>>
>>>>
>>>>> are non- responsive, I don't expect them to shutdown nicely with
>>>>>
>> an
>>
>>>>>
>>>>>
>>>> SSH
>>>>
>>>>
>>>>> command (that's what gracefully does). Also, we're using
>>>>>
>> -snapshot,
>>
>>>>>
>>>>>
>>>> so
>>>>
>>>>
>>>>> there's no reason to shut them down nicely.
>>>>>
>>>>>
>>>>>
>>>> Yes, I agree. :)
>>>>
>>>>
>>>>> 11. "Total number booted successfully: %d" % (num - 1) --> why
>>>>>
>> not
>>
>>>>>
>>>>>
>>>> just num?
>>>>
>>>>
>>>>> We really have num VMs including the first one.
>>>>> Or you can say: "Total number booted successfully in addition to
>>>>>
>> the
>>
>>>>>
>>>>>
>>>> first one"
>>>>
>>>>
>>>>> but that's much longer.
>>>>>
>>>>>
>>>>>
>>>> Since after the first guest booted, I set num = 1 and then 'num +=
>>>>
>> 1'
>>
>>>> at first in while loop ( for the purpose of getting a new vm ).
>>>> So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot
>>>>
>> up,
>>
>>>> the num booted successfully should be (num - 1).
>>>> I would use enumerate(vms) that Uri suggested to make number easier
>>>>
>> to
>>
>>>> count.
>>>>
>>>>
>>> OK, I didn't notice that.
>>>
>>>
>>>
>>>>> 12. Consider adding a 'max_vms' (or 'threshold') user param to
>>>>>
>> the
>>
>>>>>
>>>>>
>>>> test. If
>>>>
>>>>
>>>>> num reaches 'max_vms', we stop adding VMs and pass the test.
>>>>>
>>>>>
>>>> Otherwise the
>>>>
>>>>
>>>>> test will always fail (which is depressing). If
>>>>>
>>>>>
>>>> params.get("threshold") is
>>>>
>>>>
>>>>> None or "", or in short -- 'if not params.get("threshold")',
>>>>>
>> disable
>>
>>>>>
>>>>>
>>>> this
>>>>
>>>>
>>>>> feature and keep adding VMs forever. The user can enable the
>>>>>
>> feature
>>
>>>>>
>>>>>
>>>> with:
>>>>
>>>>
>>>>> max_vms = 50
>>>>> or disable it with:
>>>>> max_vms =
>>>>>
>>>>>
>>>>>
>>>> This is a good idea for hardware resource limit of host.
>>>>
>>>>
>>>>> 13. Why are you catching OSError? If you get OSError it might be
>>>>>
>> a
>>
>>>>>
>>>>>
>>>> framework bug.
>>>>
>>>>
>>>>>
>>>>>
>>>> Since sometimes, vm.create() successfully but failed to ssh-login
>>>> since
>>>> the running python cannot allocate physical memory (OSError).
>>>> Add max_vms could fix this problem I think.
>>>>
>>>>
>>> Do you remember exactly where OSError was thrown? Do you happen to
>>>
>> have
>>
>>> a backtrace? (I just want to be very it's not a bug.)
>>>
>>>
>> The OSError was thrown when checking all VMs are responsive and I got
>> many traceback about "OSError: [Errno 12] Cannot allocate memory".
>> Maybe since when last VM was created successfully with lucky, whereas
>> python cannot get physical memory after that when checking all
>> sessions.
>> So can we now catch the OSError and tell user the number of max_vms
>> is too large?
>>
> Sure. I was just worried it might be a framework bug. If it's a legitimate
> memory error -- catch it and fail the test.
>
> If you happen to catch that OSError again, and get a backtrace, I'd like
> to see it if that's possible.
>
Michael, these are the backtrace messages:
...
20090611-064959
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
ERROR: run_once: Test failed: [Errno 12] Cannot allocate memory
20090611-064959
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
DEBUG: run_once: Postprocessing on error...
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
DEBUG: postprocess_vm: Postprocessing VM 'vm1'...
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
DEBUG: postprocess_vm: VM object found in environment
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
DEBUG: send_monitor_cmd: Sending monitor command: screendump
/kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>/debug/post_vm1.ppm
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
DEBUG: run_once: Contents of environment: {'vm__vm1': <kvm_vm.VM
instance at 0x92999a28>}
post-test sysinfo error:
Traceback (most recent call last):
File "/kvm-autotest/client/common_lib/log.py", line 58, in decorated_func
fn(*args, **dargs)
File "/kvm-autotest/client/bin/base_sysinfo.py", line 213, in
log_after_each_test
log.run(test_sysinfodir)
File "/kvm-autotest/client/bin/base_sysinfo.py", line 112, in run
shell=True, env=env)
File "/usr/lib64/python2.4/subprocess.py", line 412, in call
return Popen(*args, **kwargs).wait()
File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
errread, errwrite)
File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
2009-06-11 06:50:02,859 Configuring logger for client level
FAIL
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
timestamp=1244717402 localtime=Jun 11 06:50:02 Unhandled OSError:
[Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "/kvm-autotest/client/common_lib/test.py", line 304,
in _exec
self.execute(*p_args, **p_dargs)
File "/kvm-autotest/client/common_lib/test.py", line 187,
in execute
self.run_once(*args, **dargs)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_runtest_2.py", line 145,
in run_once
routine_obj.routine(self, params, env)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py", line 3071, in
run_boot_vms
curr_vm_session = kvm_utils.wait_for(curr_vm.ssh_login,
240, 0, 2)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 797, in
wait_for
output = func()
File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py",
line 728, in ssh_login
session = kvm_utils.ssh(address, port, username,
password, prompt, timeout)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 553, in ssh
return remote_login(command, password, prompt, "\n", timeout)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 431, in
remote_login
sub = kvm_spawn(command, linesep)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 114, in
__init__
(pid, fd) = pty.fork()
File "/usr/lib64/python2.4/pty.py", line 108, in fork
pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
Persistent state variable __group_level now set to 1
END FAIL
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
timestamp=1244717403 localtime=Jun 11 06:50:03
Dropping caches
2009-06-11 06:50:03,409 running: sync
JOB ERROR: Unhandled OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
execfile(self.control, global_control_vars, global_control_vars)
File "/kvm-autotest/client/control", line 1030, in ?
cfg_to_test("kvm_tests.cfg")
File "/kvm-autotest/client/control", line 1013, in cfg_to_test
current_status = job.run_test("kvm_runtest_2", params=dict,
tag=tagname)
File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
utils.drop_caches()
File "/kvm-autotest/client/bin/base_utils.py", line 638, in drop_caches
utils.system("sync")
File "/kvm-autotest/client/common_lib/utils.py", line 510, in system
stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
File "/kvm-autotest/client/common_lib/utils.py", line 330, in run
bg_job = join_bg_jobs(
File "/kvm-autotest/client/common_lib/utils.py", line 37, in __init__
stdin=stdin)
File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
errread, errwrite)
File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
Persistent state variable __group_level now set to 0
END ABORT ---- ---- timestamp=1244717418 localtime=Jun 11
06:50:18 Unhandled OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
execfile(self.control, global_control_vars, global_control_vars)
File "/kvm-autotest/client/control", line 1030, in ?
cfg_to_test("kvm_tests.cfg")
File "/kvm-autotest/client/control", line 1013, in cfg_to_test
current_status = job.run_test("kvm_runtest_2", params=dict,
tag=tagname)
File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
utils.drop_caches()
File "/kvm-autotest/client/bin/base_utils.py", line 638, in drop_caches
utils.system("sync")
File "/kvm-autotest/client/common_lib/utils.py", line 510, in system
stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
File "/kvm-autotest/client/common_lib/utils.py", line 330, in run
bg_job = join_bg_jobs(
File "/kvm-autotest/client/common_lib/utils.py", line 37, in __init__
stdin=stdin)
File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
errread, errwrite)
File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
[root@dhcp-66-70-9 kvm_runtest_2]#
> Thanks,
> Michael
>
>
>>>>> 14. At the end of the exception handler you should proably
>>>>>
>> re-raise
>>
>>>>>
>>>>>
>>>> the exception
>>>>
>>>>
>>>>> you caught. Otherwise the user won't see the error message. You
>>>>>
>> can
>>
>>>>>
>>>>>
>>>> simply replace
>>>>
>>>>
>>>>> 'break' with 'raise' (no parameters), and it should work,
>>>>>
>>>>>
>>>> hopefully.
>>>>
>>>>
>>>>>
>>>>>
>>>> Yes I should if add a 'max_vms'.
>>>>
>>>>
>>> I think you should re-raise anyway. Otherwise, what's the point in
>>>
>> writing
>>
>>> error messages such as "raise error.TestFail("Cannot boot vm
>>>
>> anylonger")"?
>>
>>> I you don't re-raise, the user won't see the messages.
>>>
>>>
>>>
>>>>> I know these are quite a few comments, but they're all rather
>>>>>
>> minor
>>
>>>>>
>>>>>
>>>> and the test
>>>>
>>>>
>>>>> is well written in my opinion.
>>>>>
>>>>>
>>>>>
>>>> Thank you, I will do modification according to your and Uri's
>>>> comments,
>>>> and will re-submit it here later. :)
>>>>
>>>> Thanks and Best Regards,
>>>> Yolkfull
>>>>
>>>>
>>>>> Thanks,
>>>>> Michael
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Yolkfull Chow"<yzhou@redhat.com>
>>>>> To:kvm@vger.kernel.org
>>>>> Cc: "Uri Lublin"<uril@redhat.com>
>>>>> Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
>>>>> Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one
>>>>>
>> of
>>
>>>>>
>>>>>
>>>> them becomes unresponsive
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> This test will boot VMs until one of them becomes unresponsive,
>>>>>
>> and
>>
>>>>> records the maximum number of VMs successfully started.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>
>> --
>> Yolkfull
>> Regards,
>>
--
Yolkfull
Regards,
next prev parent reply other threads:[~2009-06-11 3:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <443392010.1660281244634434026.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-06-10 11:52 ` [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive Michael Goldish
2009-06-11 3:37 ` Yolkfull Chow [this message]
[not found] <120253480.1747631244710010660.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-06-11 8:53 ` Michael Goldish
2009-06-11 9:46 ` Yolkfull Chow
[not found] <219655199.1650051244627445364.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-06-10 10:03 ` Michael Goldish
2009-06-10 10:31 ` Yolkfull Chow
[not found] <2021156332.1536421244540393444.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-06-09 9:44 ` Michael Goldish
2009-06-10 8:10 ` Yolkfull Chow
2009-06-08 4:01 [KVM-AUTOTEST PATCH 0/8] Re-submitting some of the patches on the patch queue Lucas Meneghel Rodrigues
2009-06-09 8:41 ` [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive Yolkfull Chow
2009-06-09 9:37 ` Yaniv Kaul
2009-06-09 9:57 ` Michael Goldish
2009-06-09 12:45 ` Uri Lublin
2009-06-10 8:12 ` Yolkfull Chow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A307BEC.8060906@redhat.com \
--to=yzhou@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mgoldish@redhat.com \
--cc=uril@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).