[KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
  2009-06-08  4:01 [KVM-AUTOTEST PATCH 0/8] Re-submitting some of the patches on the patch queue Lucas Meneghel Rodrigues
@ 2009-06-09  8:41 ` Yolkfull Chow
  2009-06-09  9:37   ` Yaniv Kaul
  2009-06-09 12:45   ` Uri Lublin
  0 siblings, 2 replies; 13+ messages in thread
From: Yolkfull Chow @ 2009-06-09  8:41 UTC (permalink / raw)
  To: kvm; +Cc: Uri Lublin

[-- Attachment #1: Type: text/plain, Size: 156 bytes --]


Hi,

This test will boot VMs until one of them becomes unresponsive, and 
records the maximum number of VMs successfully started.


-- 
Yolkfull
Regards,


[-- Attachment #2: kvm_tests.py.patch --]
[-- Type: text/plain, Size: 2892 bytes --]

diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py
index cccc48e..7d00277 100644
--- a/client/tests/kvm/kvm_tests.py
+++ b/client/tests/kvm/kvm_tests.py
@@ -466,3 +466,70 @@ def run_linux_s3(test, params, env):
     logging.info("VM resumed after S3")
 
     session.close()
+
+def run_boot_vms(tests, params, env):
+    """
+    Boots VMs until one of them becomes unresponsive, and records the maximum
+    number of VMs successfully started:
+    1) boot the first vm
+    2) boot the second vm cloned from the first vm, check whether it boots up
+       and all booted vms can ssh-login
+    3) go on until cannot create VM anymore or cannot allocate memory for VM
+
+    @param test: kvm test object
+    @param params: Dictionary with the test parameters
+    @param env: Dictionary with test environment.
+    """
+    # boot the first vm
+    vm1 = kvm_utils.env_get_vm(env, params.get("main_vm"))
+
+    if not vm1:
+        raise error.TestError("VM object not found in environment")
+    if not vm1.is_alive():
+        raise error.TestError("VM seems to be dead; Test requires a living VM")
+
+    logging.info("Waiting for first guest to be up...")
+
+    vm1_session = kvm_utils.wait_for(vm1.ssh_login, 240, 0, 2)
+    if not vm1_session:
+        raise error.TestFail("Could not log into first guest")
+
+    num = 1
+    vms = [vm1]
+    sessions = [vm1_session]
+
+    # boot the VMs
+    while True:
+        try:
+            num += 1
+            vm_name = "vm" + str(num)
+
+            # clone vm according to the first one
+            curr_vm = vm1.clone(vm_name)
+            logging.info(" Booting the %dth guest" % num)
+            if not curr_vm.create():
+                raise error.TestFail("Cannot boot vm anylonger")
+
+            curr_vm_session = kvm_utils.wait_for(curr_vm.ssh_login, 240, 0, 2)
+
+            if not curr_vm_session:
+                curr_vm.send_monitor_cmd("quit")
+                raise error.TestFail("Could not log into %dth guest" % num)
+
+            logging.info(" %dth guest boots up successfully" % num)
+            sessions.append(curr_vm_session)
+            vms.append(curr_vm)
+
+            # check whether all previous ssh sessions are responsive
+            for vm_session in sessions:
+                if not vm_session.is_responsive():
+                    logging.error("%dth guest's session is not responsive" \
+                                       % (sessions.index(vm_session) + 1))
+
+        except (error.TestFail, OSError):
+            for vm in vms:
+                logging.info("Shut down the %dth guest" % (vms.index(vm) + 1))
+                vm.destroy(gracefully = params.get("kill_vm_gracefully") \
+                                                               == "yes")
+            logging.info("Total number booted successfully: %d" % (num - 1))
+            break

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
  2009-06-09  8:41 ` [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive Yolkfull Chow
@ 2009-06-09  9:37   ` Yaniv Kaul
  2009-06-09  9:57     ` Michael Goldish
  2009-06-09 12:45   ` Uri Lublin
  1 sibling, 1 reply; 13+ messages in thread
From: Yaniv Kaul @ 2009-06-09  9:37 UTC (permalink / raw)
  To: Yolkfull Chow; +Cc: kvm, Uri Lublin

>
> Hi,
>
> This test will boot VMs until one of them becomes unresponsive, and 
> records the maximum number of VMs successfully started.
>
>
Can you clarify what this test is exactly testing? Is it any of the 
tests on http://kvm.et.redhat.com/page/KVM-Autotest/TODO (if not, please 
add it).
Are you expecting OOM? Or some VMs to go into swap? Are the VMs 
completely idle, except for responding to SSH?
Are you going to integrate KSM into this?


TIA,
Y.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
       [not found] <2021156332.1536421244540393444.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2009-06-09  9:44 ` Michael Goldish
  2009-06-10  8:10   ` Yolkfull Chow
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Goldish @ 2009-06-09  9:44 UTC (permalink / raw)
  To: Yolkfull Chow; +Cc: Uri Lublin, kvm

The test looks pretty nicely written. Comments:

1. Consider making all the cloned VMs use image snapshots:

curr_vm = vm1.clone()
curr_vm.get_params()["extra_params"] += " -snapshot"

I'm not sure it's a good idea to let all VMs use the same disk image.
Or maybe you shouldn't add -snapshot yourself, but rather do it in the config
file for the first VM, and then all cloned VMs will have -snapshot as well.

2. Consider changing the message
" Booting the %dth guest" % num
to
"Booting guest #%d" % num
(because there's no such thing as 2th and 3th)

3. Consider changing the message
"Cannot boot vm anylonger"
to
"Cannot create VM #%d" % num

4. Why not add curr_vm to vms immediately after cloning it?
That way you can kill it in the exception handler later, without having
to send it a 'quit' if you can't login ('if not curr_vm_session').

5. " %dth guest boots up successfully" % num --> again, 2th and 3th make no sense.
Also, I wonder why you add those spaces before every info message.

6. "%dth guest's session is not responsive" --> same
(maybe use "Guest session #%d is not responsive" % num)

7. "Shut down the %dth guest" --> same
(maybe "Shutting down guest #%d"? or destroying/killing?)

8. Shouldn't we fail the test when we find an unresponsive session?
It seems you just display an error message. You can simply replace
logging.error( with raise error.TestFail(.

9. Consider using a stricter test than just vm_session.is_responsive().
vm_session.is_responsive() just sends ENTER to the sessions and returns
True if it gets anything as a result (usually a prompt, or even just a
newline echoed back). If the session passes this test it is indeed
responsive, so it's a decent test, but maybe you can send some command
(user configurable?) and test for some output. I'm really not sure this
is important, because I can't imagine a session would respond to a newline
but not to other commands, but who knows. Maybe you can send the first VM
a user-specified command when the test begins, remember the output, and
then send all other VMs the same command and make sure the output is the
same.

10. I'm not sure you should use the param "kill_vm_gracefully" because that's
a postprocessor param (probably not your business). You can just call
destroy() in the exception handler with gracefully=False, because if the VMs
are non- responsive, I don't expect them to shutdown nicely with an SSH
command (that's what gracefully does). Also, we're using -snapshot, so
there's no reason to shut them down nicely.

11. "Total number booted successfully: %d" % (num - 1) --> why not just num?
We really have num VMs including the first one.
Or you can say: "Total number booted successfully in addition to the first one"
but that's much longer.

12. Consider adding a 'max_vms' (or 'threshold') user param to the test. If
num reaches 'max_vms', we stop adding VMs and pass the test. Otherwise the
test will always fail (which is depressing). If params.get("threshold") is
None or "", or in short -- 'if not params.get("threshold")', disable this
feature and keep adding VMs forever. The user can enable the feature with:
max_vms = 50
or disable it with:
max_vms =

13. Why are you catching OSError? If you get OSError it might be a framework bug.

14. At the end of the exception handler you should proably re-raise the exception
you caught. Otherwise the user won't see the error message. You can simply replace
'break' with 'raise' (no parameters), and it should work, hopefully.

I know these are quite a few comments, but they're all rather minor and the test
is well written in my opinion.

Thanks,
Michael

----- Original Message -----
From: "Yolkfull Chow" <yzhou@redhat.com>
To: kvm@vger.kernel.org
Cc: "Uri Lublin" <uril@redhat.com>
Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

Hi,

This test will boot VMs until one of them becomes unresponsive, and 
records the maximum number of VMs successfully started.

-- 
Yolkfull
Regards,

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
  2009-06-09  9:37   ` Yaniv Kaul
@ 2009-06-09  9:57     ` Michael Goldish
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Goldish @ 2009-06-09  9:57 UTC (permalink / raw)
  To: Yaniv Kaul; +Cc: kvm, Uri Lublin, Yolkfull Chow

----- "Yaniv Kaul" <ykaul@redhat.com> wrote:

> >
> > Hi,
> >
> > This test will boot VMs until one of them becomes unresponsive, and
> 
> > records the maximum number of VMs successfully started.
> >
> >
> Can you clarify what this test is exactly testing? Is it any of the 
> tests on http://kvm.et.redhat.com/page/KVM-Autotest/TODO (if not,
> please add it).

The test is in the wiki -- I added it months ago but didn't write it:
'Write a test which adds VMs until one of them becomes unresponsive, and records the maximum number of VMs successfully started. [jasowang]'

> Are you expecting OOM? Or some VMs to go into swap? Are the VMs 
> completely idle, except for responding to SSH?
> Are you going to integrate KSM into this?

In my review of the patch I forgot to mention running load on the VMs.
This can be done easily by using 2 sessions per guest (or running in the background of a single session, but I prefer the former), and should be made user configurable via the config file.

I'm not sure about the other things you mentioned -- what should we do about OOM and swap usage? Fail the test? Limit the number of VMs?

And KSM sounds like a good idea, but I'm not sure it should be set up by the framework. Maybe it should be pre-setup on some of the hosts, so eventually some hosts will test with KSM and some without, and the framework can be unaware of that. We can find a way to add that information to the results database (like we currently add the KVM version).
Another option is to write a KSM setup test, like kvm_install, that will either run or not run before all other tests, depending on the control file.

> 
> 
> TIA,
> Y.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
  2009-06-09  8:41 ` [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive Yolkfull Chow
  2009-06-09  9:37   ` Yaniv Kaul
@ 2009-06-09 12:45   ` Uri Lublin
  2009-06-10  8:12     ` Yolkfull Chow
  1 sibling, 1 reply; 13+ messages in thread
From: Uri Lublin @ 2009-06-09 12:45 UTC (permalink / raw)
  To: Yolkfull Chow; +Cc: kvm

On 06/09/2009 11:41 AM, Yolkfull Chow wrote:
>
> Hi,
>
> This test will boot VMs until one of them becomes unresponsive, and
> records the maximum number of VMs successfully started.
>
>

Hello,

Some more comments (in addition to previous comments by others)
1. Do not just send monitor command "quit" but use vm.destroy
    * This was mentioned by Michael, but in a different context.
2. Do not destroy main_vm (or vm1). We may want to run other tests on it.
3. You can use enumerate(vms) instead of looking for vm with index.
4. It would be nice to close all ssh sessions too.

Regards,
     Uri.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
  2009-06-09  9:44 ` Michael Goldish
@ 2009-06-10  8:10   ` Yolkfull Chow
  0 siblings, 0 replies; 13+ messages in thread
From: Yolkfull Chow @ 2009-06-10  8:10 UTC (permalink / raw)
  To: Michael Goldish; +Cc: Uri Lublin, kvm

On 06/09/2009 05:44 PM, Michael Goldish wrote:
> The test looks pretty nicely written. Comments:
>
> 1. Consider making all the cloned VMs use image snapshots:
>
> curr_vm = vm1.clone()
> curr_vm.get_params()["extra_params"] += " -snapshot"
>    
> I'm not sure it's a good idea to let all VMs use the same disk image.
> Or maybe you shouldn't add -snapshot yourself, but rather do it in the config
> file for the first VM, and then all cloned VMs will have -snapshot as well.
>    
Yes I use 'image_snapshot = yes' in config file.
> 2. Consider changing the message
> " Booting the %dth guest" % num
> to
> "Booting guest #%d" % num
> (because there's no such thing as 2th and 3th)
>    
> 3. Consider changing the message
> "Cannot boot vm anylonger"
> to
> "Cannot create VM #%d" % num
>
> 4. Why not add curr_vm to vms immediately after cloning it?
> That way you can kill it in the exception handler later, without having
> to send it a 'quit' if you can't login ('if not curr_vm_session').
>    
Yes, good idea.
> 5. " %dth guest boots up successfully" % num -->  again, 2th and 3th make no sense.
> Also, I wonder why you add those spaces before every info message.
>
> 6. "%dth guest's session is not responsive" -->  same
> (maybe use "Guest session #%d is not responsive" % num)
>
> 7. "Shut down the %dth guest" -->  same
> (maybe "Shutting down guest #%d"? or destroying/killing?)
>
> 8. Shouldn't we fail the test when we find an unresponsive session?
> It seems you just display an error message. You can simply replace
> logging.error( with raise error.TestFail(.
>    

> 9. Consider using a stricter test than just vm_session.is_responsive().
> vm_session.is_responsive() just sends ENTER to the sessions and returns
> True if it gets anything as a result (usually a prompt, or even just a
> newline echoed back). If the session passes this test it is indeed
> responsive, so it's a decent test, but maybe you can send some command
> (user configurable?) and test for some output. I'm really not sure this
> is important, because I can't imagine a session would respond to a newline
> but not to other commands, but who knows. Maybe you can send the first VM
> a user-specified command when the test begins, remember the output, and
> then send all other VMs the same command and make sure the output is the
> same.
>    
maybe use 'info status' and send command 'help' via session to vms and 
compare their output?
> 10. I'm not sure you should use the param "kill_vm_gracefully" because that's
> a postprocessor param (probably not your business). You can just call
> destroy() in the exception handler with gracefully=False, because if the VMs
> are non- responsive, I don't expect them to shutdown nicely with an SSH
> command (that's what gracefully does). Also, we're using -snapshot, so
> there's no reason to shut them down nicely.
>    
Yes,  I agree. :)
> 11. "Total number booted successfully: %d" % (num - 1) -->  why not just num?
> We really have num VMs including the first one.
> Or you can say: "Total number booted successfully in addition to the first one"
> but that's much longer.
>    
Since after the first guest booted, I set num = 1 and then  'num += 1' 
at first in while loop ( for the purpose of getting a new vm ).
So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot up, 
the num booted successfully should be (num - 1).
I would use enumerate(vms) that Uri suggested to make number easier to 
count.
> 12. Consider adding a 'max_vms' (or 'threshold') user param to the test. If
> num reaches 'max_vms', we stop adding VMs and pass the test. Otherwise the
> test will always fail (which is depressing). If params.get("threshold") is
> None or "", or in short -- 'if not params.get("threshold")', disable this
> feature and keep adding VMs forever. The user can enable the feature with:
> max_vms = 50
> or disable it with:
> max_vms =
>    
This is a good idea for hardware resource limit of host.
> 13. Why are you catching OSError? If you get OSError it might be a framework bug.
>    
Since sometimes, vm.create() successfully but failed to ssh-login since 
the running python cannot allocate physical memory (OSError).
Add max_vms could fix this problem I think.
> 14. At the end of the exception handler you should proably re-raise the exception
> you caught. Otherwise the user won't see the error message. You can simply replace
> 'break' with 'raise' (no parameters), and it should work, hopefully.
>    
Yes I should if add a 'max_vms'.
> I know these are quite a few comments, but they're all rather minor and the test
> is well written in my opinion.
>    
Thank you,  I will do modification according to your and Uri's comments, 
and will re-submit it here later. :)

Thanks and Best Regards,
Yolkfull
> Thanks,
> Michael
>
> ----- Original Message -----
> From: "Yolkfull Chow"<yzhou@redhat.com>
> To:kvm@vger.kernel.org
> Cc: "Uri Lublin"<uril@redhat.com>
> Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
> Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
>
>
> Hi,
>
> This test will boot VMs until one of them becomes unresponsive, and
> records the maximum number of VMs successfully started.
>
>
>    

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
  2009-06-09 12:45   ` Uri Lublin
@ 2009-06-10  8:12     ` Yolkfull Chow
  0 siblings, 0 replies; 13+ messages in thread
From: Yolkfull Chow @ 2009-06-10  8:12 UTC (permalink / raw)
  To: Uri Lublin; +Cc: kvm

On 06/09/2009 08:45 PM, Uri Lublin wrote:
> On 06/09/2009 11:41 AM, Yolkfull Chow wrote:
>>
>> Hi,
>>
>> This test will boot VMs until one of them becomes unresponsive, and
>> records the maximum number of VMs successfully started.
>>
>>
>
> Hello,
>
> Some more comments (in addition to previous comments by others)
> 1. Do not just send monitor command "quit" but use vm.destroy
>    * This was mentioned by Michael, but in a different context.
> 2. Do not destroy main_vm (or vm1). We may want to run other tests on it.
> 3. You can use enumerate(vms) instead of looking for vm with index.
> 4. It would be nice to close all ssh sessions too.
OK, I will do modification according to your comments, thank you so much. :)

Best Regards,
Yolkfull

>
> Regards,
>     Uri.
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
       [not found] <219655199.1650051244627445364.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2009-06-10 10:03 ` Michael Goldish
  2009-06-10 10:31   ` Yolkfull Chow
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Goldish @ 2009-06-10 10:03 UTC (permalink / raw)
  To: Yolkfull Chow; +Cc: Uri Lublin, kvm


----- "Yolkfull Chow" <yzhou@redhat.com> wrote:

> On 06/09/2009 05:44 PM, Michael Goldish wrote:
> > The test looks pretty nicely written. Comments:
> >
> > 1. Consider making all the cloned VMs use image snapshots:
> >
> > curr_vm = vm1.clone()
> > curr_vm.get_params()["extra_params"] += " -snapshot"
> >    
> > I'm not sure it's a good idea to let all VMs use the same disk
> image.
> > Or maybe you shouldn't add -snapshot yourself, but rather do it in
> the config
> > file for the first VM, and then all cloned VMs will have -snapshot
> as well.
> >    
> Yes I use 'image_snapshot = yes' in config file.
> > 2. Consider changing the message
> > " Booting the %dth guest" % num
> > to
> > "Booting guest #%d" % num
> > (because there's no such thing as 2th and 3th)
> >    
> > 3. Consider changing the message
> > "Cannot boot vm anylonger"
> > to
> > "Cannot create VM #%d" % num
> >
> > 4. Why not add curr_vm to vms immediately after cloning it?
> > That way you can kill it in the exception handler later, without
> having
> > to send it a 'quit' if you can't login ('if not curr_vm_session').
> >    
> Yes, good idea.
> > 5. " %dth guest boots up successfully" % num -->  again, 2th and 3th
> make no sense.
> > Also, I wonder why you add those spaces before every info message.
> >
> > 6. "%dth guest's session is not responsive" -->  same
> > (maybe use "Guest session #%d is not responsive" % num)
> >
> > 7. "Shut down the %dth guest" -->  same
> > (maybe "Shutting down guest #%d"? or destroying/killing?)
> >
> > 8. Shouldn't we fail the test when we find an unresponsive session?
> > It seems you just display an error message. You can simply replace
> > logging.error( with raise error.TestFail(.
> >    
> 
> > 9. Consider using a stricter test than just
> vm_session.is_responsive().
> > vm_session.is_responsive() just sends ENTER to the sessions and
> returns
> > True if it gets anything as a result (usually a prompt, or even just
> a
> > newline echoed back). If the session passes this test it is indeed
> > responsive, so it's a decent test, but maybe you can send some
> command
> > (user configurable?) and test for some output. I'm really not sure
> this
> > is important, because I can't imagine a session would respond to a
> newline
> > but not to other commands, but who knows. Maybe you can send the
> first VM
> > a user-specified command when the test begins, remember the output,
> and
> > then send all other VMs the same command and make sure the output is
> the
> > same.
> >    
> maybe use 'info status' and send command 'help' via session to vms and
> compare their output?

I'm not sure I understand. What does 'info status' do? We're talking about
an SSH shell, not the monitor. You can do whatever you like, like 'uname -a',
and 'ls /', but you should leave it up to the user to decide, so he/she
can specify different commands for different guests. Linux commands won't
work under Windows, so Linux and Windows must have different commands in
the config file. In the Linux section, under '- @Linux:' you can add
something like:

stress_boot:
    stress_boot_test_command = uname -a

and under '- @Windows:':

stress_boot:
    stress_boot_test_command = ver && vol

These commands are just naive suggestions. I'm sure someone can think of
much more informative commands.

> > 10. I'm not sure you should use the param "kill_vm_gracefully"
> because that's
> > a postprocessor param (probably not your business). You can just
> call
> > destroy() in the exception handler with gracefully=False, because if
> the VMs
> > are non- responsive, I don't expect them to shutdown nicely with an
> SSH
> > command (that's what gracefully does). Also, we're using -snapshot,
> so
> > there's no reason to shut them down nicely.
> >    
> Yes,  I agree. :)
> > 11. "Total number booted successfully: %d" % (num - 1) -->  why not
> just num?
> > We really have num VMs including the first one.
> > Or you can say: "Total number booted successfully in addition to the
> first one"
> > but that's much longer.
> >    
> Since after the first guest booted, I set num = 1 and then  'num += 1'
> 
> at first in while loop ( for the purpose of getting a new vm ).
> So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot up,
> the num booted successfully should be (num - 1).
> I would use enumerate(vms) that Uri suggested to make number easier to
> count.

OK, I didn't notice that.

> > 12. Consider adding a 'max_vms' (or 'threshold') user param to the
> test. If
> > num reaches 'max_vms', we stop adding VMs and pass the test.
> Otherwise the
> > test will always fail (which is depressing). If
> params.get("threshold") is
> > None or "", or in short -- 'if not params.get("threshold")', disable
> this
> > feature and keep adding VMs forever. The user can enable the feature
> with:
> > max_vms = 50
> > or disable it with:
> > max_vms =
> >    
> This is a good idea for hardware resource limit of host.
> > 13. Why are you catching OSError? If you get OSError it might be a
> framework bug.
> >    
> Since sometimes, vm.create() successfully but failed to ssh-login
> since 
> the running python cannot allocate physical memory (OSError).
> Add max_vms could fix this problem I think.

Do you remember exactly where OSError was thrown? Do you happen to have
a backtrace? (I just want to be very it's not a bug.)

> > 14. At the end of the exception handler you should proably re-raise
> the exception
> > you caught. Otherwise the user won't see the error message. You can
> simply replace
> > 'break' with 'raise' (no parameters), and it should work,
> hopefully.
> >    
> Yes I should if add a 'max_vms'.

I think you should re-raise anyway. Otherwise, what's the point in writing
error messages such as "raise error.TestFail("Cannot boot vm anylonger")"?
I you don't re-raise, the user won't see the messages.

> > I know these are quite a few comments, but they're all rather minor
> and the test
> > is well written in my opinion.
> >    
> Thank you,  I will do modification according to your and Uri's
> comments, 
> and will re-submit it here later. :)
> 
> Thanks and Best Regards,
> Yolkfull
> > Thanks,
> > Michael
> >
> > ----- Original Message -----
> > From: "Yolkfull Chow"<yzhou@redhat.com>
> > To:kvm@vger.kernel.org
> > Cc: "Uri Lublin"<uril@redhat.com>
> > Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
> > Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of
> them becomes unresponsive
> >
> >
> > Hi,
> >
> > This test will boot VMs until one of them becomes unresponsive, and
> > records the maximum number of VMs successfully started.
> >
> >
> >    
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
  2009-06-10 10:03 ` Michael Goldish
@ 2009-06-10 10:31   ` Yolkfull Chow
  0 siblings, 0 replies; 13+ messages in thread
From: Yolkfull Chow @ 2009-06-10 10:31 UTC (permalink / raw)
  To: Michael Goldish; +Cc: Uri Lublin, kvm

On 06/10/2009 06:03 PM, Michael Goldish wrote:
> ----- "Yolkfull Chow"<yzhou@redhat.com>  wrote:
>
>    
>> On 06/09/2009 05:44 PM, Michael Goldish wrote:
>>      
>>> The test looks pretty nicely written. Comments:
>>>
>>> 1. Consider making all the cloned VMs use image snapshots:
>>>
>>> curr_vm = vm1.clone()
>>> curr_vm.get_params()["extra_params"] += " -snapshot"
>>>
>>> I'm not sure it's a good idea to let all VMs use the same disk
>>>        
>> image.
>>      
>>> Or maybe you shouldn't add -snapshot yourself, but rather do it in
>>>        
>> the config
>>      
>>> file for the first VM, and then all cloned VMs will have -snapshot
>>>        
>> as well.
>>      
>>>
>>>        
>> Yes I use 'image_snapshot = yes' in config file.
>>      
>>> 2. Consider changing the message
>>> " Booting the %dth guest" % num
>>> to
>>> "Booting guest #%d" % num
>>> (because there's no such thing as 2th and 3th)
>>>
>>> 3. Consider changing the message
>>> "Cannot boot vm anylonger"
>>> to
>>> "Cannot create VM #%d" % num
>>>
>>> 4. Why not add curr_vm to vms immediately after cloning it?
>>> That way you can kill it in the exception handler later, without
>>>        
>> having
>>      
>>> to send it a 'quit' if you can't login ('if not curr_vm_session').
>>>
>>>        
>> Yes, good idea.
>>      
>>> 5. " %dth guest boots up successfully" % num -->   again, 2th and 3th
>>>        
>> make no sense.
>>      
>>> Also, I wonder why you add those spaces before every info message.
>>>
>>> 6. "%dth guest's session is not responsive" -->   same
>>> (maybe use "Guest session #%d is not responsive" % num)
>>>
>>> 7. "Shut down the %dth guest" -->   same
>>> (maybe "Shutting down guest #%d"? or destroying/killing?)
>>>
>>> 8. Shouldn't we fail the test when we find an unresponsive session?
>>> It seems you just display an error message. You can simply replace
>>> logging.error( with raise error.TestFail(.
>>>
>>>        
>>      
>>> 9. Consider using a stricter test than just
>>>        
>> vm_session.is_responsive().
>>      
>>> vm_session.is_responsive() just sends ENTER to the sessions and
>>>        
>> returns
>>      
>>> True if it gets anything as a result (usually a prompt, or even just
>>>        
>> a
>>      
>>> newline echoed back). If the session passes this test it is indeed
>>> responsive, so it's a decent test, but maybe you can send some
>>>        
>> command
>>      
>>> (user configurable?) and test for some output. I'm really not sure
>>>        
>> this
>>      
>>> is important, because I can't imagine a session would respond to a
>>>        
>> newline
>>      
>>> but not to other commands, but who knows. Maybe you can send the
>>>        
>> first VM
>>      
>>> a user-specified command when the test begins, remember the output,
>>>        
>> and
>>      
>>> then send all other VMs the same command and make sure the output is
>>>        
>> the
>>      
>>> same.
>>>
>>>        
>> maybe use 'info status' and send command 'help' via session to vms and
>> compare their output?
>>      
> I'm not sure I understand. What does 'info status' do? We're talking about
> an SSH shell, not the monitor. You can do whatever you like, like 'uname -a',
> and 'ls /', but you should leave it up to the user to decide, so he/she
> can specify different commands for different guests. Linux commands won't
> work under Windows, so Linux and Windows must have different commands in
> the config file. In the Linux section, under '- @Linux:' you can add
> something like:
>
> stress_boot:
>      stress_boot_test_command = uname -a
>
> and under '- @Windows:':
>
> stress_boot:
>      stress_boot_test_command = ver&&  vol
>
> These commands are just naive suggestions. I'm sure someone can think of
> much more informative commands.
>    
That's really good suggestions.  Thanks, Michael.  And can I use 
'migration_test_command' instead?
>    
>>> 10. I'm not sure you should use the param "kill_vm_gracefully"
>>>        
>> because that's
>>      
>>> a postprocessor param (probably not your business). You can just
>>>        
>> call
>>      
>>> destroy() in the exception handler with gracefully=False, because if
>>>        
>> the VMs
>>      
>>> are non- responsive, I don't expect them to shutdown nicely with an
>>>        
>> SSH
>>      
>>> command (that's what gracefully does). Also, we're using -snapshot,
>>>        
>> so
>>      
>>> there's no reason to shut them down nicely.
>>>
>>>        
>> Yes,  I agree. :)
>>      
>>> 11. "Total number booted successfully: %d" % (num - 1) -->   why not
>>>        
>> just num?
>>      
>>> We really have num VMs including the first one.
>>> Or you can say: "Total number booted successfully in addition to the
>>>        
>> first one"
>>      
>>> but that's much longer.
>>>
>>>        
>> Since after the first guest booted, I set num = 1 and then  'num += 1'
>>
>> at first in while loop ( for the purpose of getting a new vm ).
>> So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot up,
>> the num booted successfully should be (num - 1).
>> I would use enumerate(vms) that Uri suggested to make number easier to
>> count.
>>      
> OK, I didn't notice that.
>
>    
>>> 12. Consider adding a 'max_vms' (or 'threshold') user param to the
>>>        
>> test. If
>>      
>>> num reaches 'max_vms', we stop adding VMs and pass the test.
>>>        
>> Otherwise the
>>      
>>> test will always fail (which is depressing). If
>>>        
>> params.get("threshold") is
>>      
>>> None or "", or in short -- 'if not params.get("threshold")', disable
>>>        
>> this
>>      
>>> feature and keep adding VMs forever. The user can enable the feature
>>>        
>> with:
>>      
>>> max_vms = 50
>>> or disable it with:
>>> max_vms =
>>>
>>>        
>> This is a good idea for hardware resource limit of host.
>>      
>>> 13. Why are you catching OSError? If you get OSError it might be a
>>>        
>> framework bug.
>>      
>>>
>>>        
>> Since sometimes, vm.create() successfully but failed to ssh-login
>> since
>> the running python cannot allocate physical memory (OSError).
>> Add max_vms could fix this problem I think.
>>      
> Do you remember exactly where OSError was thrown? Do you happen to have
> a backtrace? (I just want to be very it's not a bug.)
>    
The OSError was thrown when checking all VMs are responsive and I got 
many traceback about "OSError: [Errno 12] Cannot allocate memory".
Maybe since when last VM was created successfully with lucky,  whereas 
python cannot get physical memory after that when checking all sessions.
So can we now catch the OSError and tell user the number of max_vms  is 
too large?
>>> 14. At the end of the exception handler you should proably re-raise
>>>        
>> the exception
>>      
>>> you caught. Otherwise the user won't see the error message. You can
>>>        
>> simply replace
>>      
>>> 'break' with 'raise' (no parameters), and it should work,
>>>        
>> hopefully.
>>      
>>>
>>>        
>> Yes I should if add a 'max_vms'.
>>      
> I think you should re-raise anyway. Otherwise, what's the point in writing
> error messages such as "raise error.TestFail("Cannot boot vm anylonger")"?
> I you don't re-raise, the user won't see the messages.
>
>    
>>> I know these are quite a few comments, but they're all rather minor
>>>        
>> and the test
>>      
>>> is well written in my opinion.
>>>
>>>        
>> Thank you,  I will do modification according to your and Uri's
>> comments,
>> and will re-submit it here later. :)
>>
>> Thanks and Best Regards,
>> Yolkfull
>>      
>>> Thanks,
>>> Michael
>>>
>>> ----- Original Message -----
>>> From: "Yolkfull Chow"<yzhou@redhat.com>
>>> To:kvm@vger.kernel.org
>>> Cc: "Uri Lublin"<uril@redhat.com>
>>> Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
>>> Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of
>>>        
>> them becomes unresponsive
>>      
>>>
>>> Hi,
>>>
>>> This test will boot VMs until one of them becomes unresponsive, and
>>> records the maximum number of VMs successfully started.
>>>
>>>
>>>
>>>        
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>      


-- 
Yolkfull
Regards,


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
       [not found] <443392010.1660281244634434026.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2009-06-10 11:52 ` Michael Goldish
  2009-06-11  3:37   ` Yolkfull Chow
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Goldish @ 2009-06-10 11:52 UTC (permalink / raw)
  To: Yolkfull Chow; +Cc: Uri Lublin, kvm


----- "Yolkfull Chow" <yzhou@redhat.com> wrote:

> On 06/10/2009 06:03 PM, Michael Goldish wrote:
> > ----- "Yolkfull Chow"<yzhou@redhat.com>  wrote:
> >
> >    
> >> On 06/09/2009 05:44 PM, Michael Goldish wrote:
> >>      
> >>> The test looks pretty nicely written. Comments:
> >>>
> >>> 1. Consider making all the cloned VMs use image snapshots:
> >>>
> >>> curr_vm = vm1.clone()
> >>> curr_vm.get_params()["extra_params"] += " -snapshot"
> >>>
> >>> I'm not sure it's a good idea to let all VMs use the same disk
> >>>        
> >> image.
> >>      
> >>> Or maybe you shouldn't add -snapshot yourself, but rather do it
> in
> >>>        
> >> the config
> >>      
> >>> file for the first VM, and then all cloned VMs will have
> -snapshot
> >>>        
> >> as well.
> >>      
> >>>
> >>>        
> >> Yes I use 'image_snapshot = yes' in config file.
> >>      
> >>> 2. Consider changing the message
> >>> " Booting the %dth guest" % num
> >>> to
> >>> "Booting guest #%d" % num
> >>> (because there's no such thing as 2th and 3th)
> >>>
> >>> 3. Consider changing the message
> >>> "Cannot boot vm anylonger"
> >>> to
> >>> "Cannot create VM #%d" % num
> >>>
> >>> 4. Why not add curr_vm to vms immediately after cloning it?
> >>> That way you can kill it in the exception handler later, without
> >>>        
> >> having
> >>      
> >>> to send it a 'quit' if you can't login ('if not
> curr_vm_session').
> >>>
> >>>        
> >> Yes, good idea.
> >>      
> >>> 5. " %dth guest boots up successfully" % num -->   again, 2th and
> 3th
> >>>        
> >> make no sense.
> >>      
> >>> Also, I wonder why you add those spaces before every info
> message.
> >>>
> >>> 6. "%dth guest's session is not responsive" -->   same
> >>> (maybe use "Guest session #%d is not responsive" % num)
> >>>
> >>> 7. "Shut down the %dth guest" -->   same
> >>> (maybe "Shutting down guest #%d"? or destroying/killing?)
> >>>
> >>> 8. Shouldn't we fail the test when we find an unresponsive
> session?
> >>> It seems you just display an error message. You can simply
> replace
> >>> logging.error( with raise error.TestFail(.
> >>>
> >>>        
> >>      
> >>> 9. Consider using a stricter test than just
> >>>        
> >> vm_session.is_responsive().
> >>      
> >>> vm_session.is_responsive() just sends ENTER to the sessions and
> >>>        
> >> returns
> >>      
> >>> True if it gets anything as a result (usually a prompt, or even
> just
> >>>        
> >> a
> >>      
> >>> newline echoed back). If the session passes this test it is
> indeed
> >>> responsive, so it's a decent test, but maybe you can send some
> >>>        
> >> command
> >>      
> >>> (user configurable?) and test for some output. I'm really not
> sure
> >>>        
> >> this
> >>      
> >>> is important, because I can't imagine a session would respond to
> a
> >>>        
> >> newline
> >>      
> >>> but not to other commands, but who knows. Maybe you can send the
> >>>        
> >> first VM
> >>      
> >>> a user-specified command when the test begins, remember the
> output,
> >>>        
> >> and
> >>      
> >>> then send all other VMs the same command and make sure the output
> is
> >>>        
> >> the
> >>      
> >>> same.
> >>>
> >>>        
> >> maybe use 'info status' and send command 'help' via session to vms
> and
> >> compare their output?
> >>      
> > I'm not sure I understand. What does 'info status' do? We're talking
> about
> > an SSH shell, not the monitor. You can do whatever you like, like
> 'uname -a',
> > and 'ls /', but you should leave it up to the user to decide, so
> he/she
> > can specify different commands for different guests. Linux commands
> won't
> > work under Windows, so Linux and Windows must have different
> commands in
> > the config file. In the Linux section, under '- @Linux:' you can
> add
> > something like:
> >
> > stress_boot:
> >      stress_boot_test_command = uname -a
> >
> > and under '- @Windows:':
> >
> > stress_boot:
> >      stress_boot_test_command = ver && vol
> >
> > These commands are just naive suggestions. I'm sure someone can
> think of
> > much more informative commands.
> >    
> That's really good suggestions.  Thanks, Michael.  And can I use 
> 'migration_test_command' instead?

Not really. Why would you want to use another test's param?

1. There's no guarantee that 'migration_test_command' is defined
for your boot stress test. In fact, it is probably only defined for
migration tests, so you probably won't be able to access it. Try
params.get('migration_test_command') in your test and you'll probably
get None.

2. The user may not want to run migration at all, and then he/she
will probably not define 'migration_test_command'.

3. The user might want to use different test commands for migration
and for the boot stress test.

> >>> 10. I'm not sure you should use the param "kill_vm_gracefully"
> >>>        
> >> because that's
> >>      
> >>> a postprocessor param (probably not your business). You can just
> >>>        
> >> call
> >>      
> >>> destroy() in the exception handler with gracefully=False, because
> if
> >>>        
> >> the VMs
> >>      
> >>> are non- responsive, I don't expect them to shutdown nicely with
> an
> >>>        
> >> SSH
> >>      
> >>> command (that's what gracefully does). Also, we're using
> -snapshot,
> >>>        
> >> so
> >>      
> >>> there's no reason to shut them down nicely.
> >>>
> >>>        
> >> Yes,  I agree. :)
> >>      
> >>> 11. "Total number booted successfully: %d" % (num - 1) -->   why
> not
> >>>        
> >> just num?
> >>      
> >>> We really have num VMs including the first one.
> >>> Or you can say: "Total number booted successfully in addition to
> the
> >>>        
> >> first one"
> >>      
> >>> but that's much longer.
> >>>
> >>>        
> >> Since after the first guest booted, I set num = 1 and then  'num +=
> 1'
> >>
> >> at first in while loop ( for the purpose of getting a new vm ).
> >> So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot
> up,
> >> the num booted successfully should be (num - 1).
> >> I would use enumerate(vms) that Uri suggested to make number easier
> to
> >> count.
> >>      
> > OK, I didn't notice that.
> >
> >    
> >>> 12. Consider adding a 'max_vms' (or 'threshold') user param to
> the
> >>>        
> >> test. If
> >>      
> >>> num reaches 'max_vms', we stop adding VMs and pass the test.
> >>>        
> >> Otherwise the
> >>      
> >>> test will always fail (which is depressing). If
> >>>        
> >> params.get("threshold") is
> >>      
> >>> None or "", or in short -- 'if not params.get("threshold")',
> disable
> >>>        
> >> this
> >>      
> >>> feature and keep adding VMs forever. The user can enable the
> feature
> >>>        
> >> with:
> >>      
> >>> max_vms = 50
> >>> or disable it with:
> >>> max_vms =
> >>>
> >>>        
> >> This is a good idea for hardware resource limit of host.
> >>      
> >>> 13. Why are you catching OSError? If you get OSError it might be
> a
> >>>        
> >> framework bug.
> >>      
> >>>
> >>>        
> >> Since sometimes, vm.create() successfully but failed to ssh-login
> >> since
> >> the running python cannot allocate physical memory (OSError).
> >> Add max_vms could fix this problem I think.
> >>      
> > Do you remember exactly where OSError was thrown? Do you happen to
> have
> > a backtrace? (I just want to be very it's not a bug.)
> >    
> The OSError was thrown when checking all VMs are responsive and I got
> many traceback about "OSError: [Errno 12] Cannot allocate memory".
> Maybe since when last VM was created successfully with lucky,  whereas
> python cannot get physical memory after that when checking all
> sessions.
> So can we now catch the OSError and tell user the number of max_vms 
> is too large?

Sure. I was just worried it might be a framework bug. If it's a legitimate
memory error -- catch it and fail the test.

If you happen to catch that OSError again, and get a backtrace, I'd like
to see it if that's possible.

Thanks,
Michael

> >>> 14. At the end of the exception handler you should proably
> re-raise
> >>>        
> >> the exception
> >>      
> >>> you caught. Otherwise the user won't see the error message. You
> can
> >>>        
> >> simply replace
> >>      
> >>> 'break' with 'raise' (no parameters), and it should work,
> >>>        
> >> hopefully.
> >>      
> >>>
> >>>        
> >> Yes I should if add a 'max_vms'.
> >>      
> > I think you should re-raise anyway. Otherwise, what's the point in
> writing
> > error messages such as "raise error.TestFail("Cannot boot vm
> anylonger")"?
> > I you don't re-raise, the user won't see the messages.
> >
> >    
> >>> I know these are quite a few comments, but they're all rather
> minor
> >>>        
> >> and the test
> >>      
> >>> is well written in my opinion.
> >>>
> >>>        
> >> Thank you,  I will do modification according to your and Uri's
> >> comments,
> >> and will re-submit it here later. :)
> >>
> >> Thanks and Best Regards,
> >> Yolkfull
> >>      
> >>> Thanks,
> >>> Michael
> >>>
> >>> ----- Original Message -----
> >>> From: "Yolkfull Chow"<yzhou@redhat.com>
> >>> To:kvm@vger.kernel.org
> >>> Cc: "Uri Lublin"<uril@redhat.com>
> >>> Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
> >>> Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one
> of
> >>>        
> >> them becomes unresponsive
> >>      
> >>>
> >>> Hi,
> >>>
> >>> This test will boot VMs until one of them becomes unresponsive,
> and
> >>> records the maximum number of VMs successfully started.
> >>>
> >>>
> >>>
> >>>        
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>      
> 
> 
> -- 
> Yolkfull
> Regards,

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
  2009-06-10 11:52 ` [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive Michael Goldish
@ 2009-06-11  3:37   ` Yolkfull Chow
  0 siblings, 0 replies; 13+ messages in thread
From: Yolkfull Chow @ 2009-06-11  3:37 UTC (permalink / raw)
  To: Michael Goldish; +Cc: Uri Lublin, kvm

On 06/10/2009 07:52 PM, Michael Goldish wrote:
> ----- "Yolkfull Chow"<yzhou@redhat.com>  wrote:
>
>    
>> On 06/10/2009 06:03 PM, Michael Goldish wrote:
>>      
>>> ----- "Yolkfull Chow"<yzhou@redhat.com>   wrote:
>>>
>>>
>>>        
>>>> On 06/09/2009 05:44 PM, Michael Goldish wrote:
>>>>
>>>>          
>>>>> The test looks pretty nicely written. Comments:
>>>>>
>>>>> 1. Consider making all the cloned VMs use image snapshots:
>>>>>
>>>>> curr_vm = vm1.clone()
>>>>> curr_vm.get_params()["extra_params"] += " -snapshot"
>>>>>
>>>>> I'm not sure it's a good idea to let all VMs use the same disk
>>>>>
>>>>>            
>>>> image.
>>>>
>>>>          
>>>>> Or maybe you shouldn't add -snapshot yourself, but rather do it
>>>>>            
>> in
>>      
>>>>>
>>>>>            
>>>> the config
>>>>
>>>>          
>>>>> file for the first VM, and then all cloned VMs will have
>>>>>            
>> -snapshot
>>      
>>>>>
>>>>>            
>>>> as well.
>>>>
>>>>          
>>>>>
>>>>>            
>>>> Yes I use 'image_snapshot = yes' in config file.
>>>>
>>>>          
>>>>> 2. Consider changing the message
>>>>> " Booting the %dth guest" % num
>>>>> to
>>>>> "Booting guest #%d" % num
>>>>> (because there's no such thing as 2th and 3th)
>>>>>
>>>>> 3. Consider changing the message
>>>>> "Cannot boot vm anylonger"
>>>>> to
>>>>> "Cannot create VM #%d" % num
>>>>>
>>>>> 4. Why not add curr_vm to vms immediately after cloning it?
>>>>> That way you can kill it in the exception handler later, without
>>>>>
>>>>>            
>>>> having
>>>>
>>>>          
>>>>> to send it a 'quit' if you can't login ('if not
>>>>>            
>> curr_vm_session').
>>      
>>>>>
>>>>>            
>>>> Yes, good idea.
>>>>
>>>>          
>>>>> 5. " %dth guest boots up successfully" % num -->    again, 2th and
>>>>>            
>> 3th
>>      
>>>>>
>>>>>            
>>>> make no sense.
>>>>
>>>>          
>>>>> Also, I wonder why you add those spaces before every info
>>>>>            
>> message.
>>      
>>>>> 6. "%dth guest's session is not responsive" -->    same
>>>>> (maybe use "Guest session #%d is not responsive" % num)
>>>>>
>>>>> 7. "Shut down the %dth guest" -->    same
>>>>> (maybe "Shutting down guest #%d"? or destroying/killing?)
>>>>>
>>>>> 8. Shouldn't we fail the test when we find an unresponsive
>>>>>            
>> session?
>>      
>>>>> It seems you just display an error message. You can simply
>>>>>            
>> replace
>>      
>>>>> logging.error( with raise error.TestFail(.
>>>>>
>>>>>
>>>>>            
>>>>
>>>>          
>>>>> 9. Consider using a stricter test than just
>>>>>
>>>>>            
>>>> vm_session.is_responsive().
>>>>
>>>>          
>>>>> vm_session.is_responsive() just sends ENTER to the sessions and
>>>>>
>>>>>            
>>>> returns
>>>>
>>>>          
>>>>> True if it gets anything as a result (usually a prompt, or even
>>>>>            
>> just
>>      
>>>>>
>>>>>            
>>>> a
>>>>
>>>>          
>>>>> newline echoed back). If the session passes this test it is
>>>>>            
>> indeed
>>      
>>>>> responsive, so it's a decent test, but maybe you can send some
>>>>>
>>>>>            
>>>> command
>>>>
>>>>          
>>>>> (user configurable?) and test for some output. I'm really not
>>>>>            
>> sure
>>      
>>>>>
>>>>>            
>>>> this
>>>>
>>>>          
>>>>> is important, because I can't imagine a session would respond to
>>>>>            
>> a
>>      
>>>>>
>>>>>            
>>>> newline
>>>>
>>>>          
>>>>> but not to other commands, but who knows. Maybe you can send the
>>>>>
>>>>>            
>>>> first VM
>>>>
>>>>          
>>>>> a user-specified command when the test begins, remember the
>>>>>            
>> output,
>>      
>>>>>
>>>>>            
>>>> and
>>>>
>>>>          
>>>>> then send all other VMs the same command and make sure the output
>>>>>            
>> is
>>      
>>>>>
>>>>>            
>>>> the
>>>>
>>>>          
>>>>> same.
>>>>>
>>>>>
>>>>>            
>>>> maybe use 'info status' and send command 'help' via session to vms
>>>>          
>> and
>>      
>>>> compare their output?
>>>>
>>>>          
>>> I'm not sure I understand. What does 'info status' do? We're talking
>>>        
>> about
>>      
>>> an SSH shell, not the monitor. You can do whatever you like, like
>>>        
>> 'uname -a',
>>      
>>> and 'ls /', but you should leave it up to the user to decide, so
>>>        
>> he/she
>>      
>>> can specify different commands for different guests. Linux commands
>>>        
>> won't
>>      
>>> work under Windows, so Linux and Windows must have different
>>>        
>> commands in
>>      
>>> the config file. In the Linux section, under '- @Linux:' you can
>>>        
>> add
>>      
>>> something like:
>>>
>>> stress_boot:
>>>       stress_boot_test_command = uname -a
>>>
>>> and under '- @Windows:':
>>>
>>> stress_boot:
>>>       stress_boot_test_command = ver&&  vol
>>>
>>> These commands are just naive suggestions. I'm sure someone can
>>>        
>> think of
>>      
>>> much more informative commands.
>>>
>>>        
>> That's really good suggestions.  Thanks, Michael.  And can I use
>> 'migration_test_command' instead?
>>      
> Not really. Why would you want to use another test's param?
>
> 1. There's no guarantee that 'migration_test_command' is defined
> for your boot stress test. In fact, it is probably only defined for
> migration tests, so you probably won't be able to access it. Try
> params.get('migration_test_command') in your test and you'll probably
> get None.
>
> 2. The user may not want to run migration at all, and then he/she
> will probably not define 'migration_test_command'.
>
> 3. The user might want to use different test commands for migration
> and for the boot stress test.
>
>    
>>>>> 10. I'm not sure you should use the param "kill_vm_gracefully"
>>>>>
>>>>>            
>>>> because that's
>>>>
>>>>          
>>>>> a postprocessor param (probably not your business). You can just
>>>>>
>>>>>            
>>>> call
>>>>
>>>>          
>>>>> destroy() in the exception handler with gracefully=False, because
>>>>>            
>> if
>>      
>>>>>
>>>>>            
>>>> the VMs
>>>>
>>>>          
>>>>> are non- responsive, I don't expect them to shutdown nicely with
>>>>>            
>> an
>>      
>>>>>
>>>>>            
>>>> SSH
>>>>
>>>>          
>>>>> command (that's what gracefully does). Also, we're using
>>>>>            
>> -snapshot,
>>      
>>>>>
>>>>>            
>>>> so
>>>>
>>>>          
>>>>> there's no reason to shut them down nicely.
>>>>>
>>>>>
>>>>>            
>>>> Yes,  I agree. :)
>>>>
>>>>          
>>>>> 11. "Total number booted successfully: %d" % (num - 1) -->    why
>>>>>            
>> not
>>      
>>>>>
>>>>>            
>>>> just num?
>>>>
>>>>          
>>>>> We really have num VMs including the first one.
>>>>> Or you can say: "Total number booted successfully in addition to
>>>>>            
>> the
>>      
>>>>>
>>>>>            
>>>> first one"
>>>>
>>>>          
>>>>> but that's much longer.
>>>>>
>>>>>
>>>>>            
>>>> Since after the first guest booted, I set num = 1 and then  'num +=
>>>>          
>> 1'
>>      
>>>> at first in while loop ( for the purpose of getting a new vm ).
>>>> So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot
>>>>          
>> up,
>>      
>>>> the num booted successfully should be (num - 1).
>>>> I would use enumerate(vms) that Uri suggested to make number easier
>>>>          
>> to
>>      
>>>> count.
>>>>
>>>>          
>>> OK, I didn't notice that.
>>>
>>>
>>>        
>>>>> 12. Consider adding a 'max_vms' (or 'threshold') user param to
>>>>>            
>> the
>>      
>>>>>
>>>>>            
>>>> test. If
>>>>
>>>>          
>>>>> num reaches 'max_vms', we stop adding VMs and pass the test.
>>>>>
>>>>>            
>>>> Otherwise the
>>>>
>>>>          
>>>>> test will always fail (which is depressing). If
>>>>>
>>>>>            
>>>> params.get("threshold") is
>>>>
>>>>          
>>>>> None or "", or in short -- 'if not params.get("threshold")',
>>>>>            
>> disable
>>      
>>>>>
>>>>>            
>>>> this
>>>>
>>>>          
>>>>> feature and keep adding VMs forever. The user can enable the
>>>>>            
>> feature
>>      
>>>>>
>>>>>            
>>>> with:
>>>>
>>>>          
>>>>> max_vms = 50
>>>>> or disable it with:
>>>>> max_vms =
>>>>>
>>>>>
>>>>>            
>>>> This is a good idea for hardware resource limit of host.
>>>>
>>>>          
>>>>> 13. Why are you catching OSError? If you get OSError it might be
>>>>>            
>> a
>>      
>>>>>
>>>>>            
>>>> framework bug.
>>>>
>>>>          
>>>>>
>>>>>            
>>>> Since sometimes, vm.create() successfully but failed to ssh-login
>>>> since
>>>> the running python cannot allocate physical memory (OSError).
>>>> Add max_vms could fix this problem I think.
>>>>
>>>>          
>>> Do you remember exactly where OSError was thrown? Do you happen to
>>>        
>> have
>>      
>>> a backtrace? (I just want to be very it's not a bug.)
>>>
>>>        
>> The OSError was thrown when checking all VMs are responsive and I got
>> many traceback about "OSError: [Errno 12] Cannot allocate memory".
>> Maybe since when last VM was created successfully with lucky,  whereas
>> python cannot get physical memory after that when checking all
>> sessions.
>> So can we now catch the OSError and tell user the number of max_vms
>> is too large?
>>      
> Sure. I was just worried it might be a framework bug. If it's a legitimate
> memory error -- catch it and fail the test.
>
> If you happen to catch that OSError again, and get a backtrace, I'd like
> to see it if that's possible.
>    
Michael, these are the backtrace messages:

...
20090611-064959 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
ERROR: run_once: Test failed: [Errno 12] Cannot allocate memory
20090611-064959 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
DEBUG: run_once: Postprocessing on error...
20090611-065000 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
DEBUG: postprocess_vm: Postprocessing VM 'vm1'...
20090611-065000 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
DEBUG: postprocess_vm: VM object found in environment
20090611-065000 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
DEBUG: send_monitor_cmd: Sending monitor command: screendump 
/kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>/debug/post_vm1.ppm
20090611-065000 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
DEBUG: run_once: Contents of environment: {'vm__vm1': <kvm_vm.VM 
instance at 0x92999a28>}
post-test sysinfo error:
Traceback (most recent call last):
   File "/kvm-autotest/client/common_lib/log.py", line 58, in decorated_func
     fn(*args, **dargs)
   File "/kvm-autotest/client/bin/base_sysinfo.py", line 213, in 
log_after_each_test
     log.run(test_sysinfodir)
   File "/kvm-autotest/client/bin/base_sysinfo.py", line 112, in run
     shell=True, env=env)
   File "/usr/lib64/python2.4/subprocess.py", line 412, in call
     return Popen(*args, **kwargs).wait()
   File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
     errread, errwrite)
   File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
     self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
2009-06-11 06:50:02,859 Configuring logger for client level
         FAIL    
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>    
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>    
timestamp=1244717402    localtime=Jun 11 06:50:02    Unhandled OSError: 
[Errno 12] Cannot allocate memory
           Traceback (most recent call last):
             File "/kvm-autotest/client/common_lib/test.py", line 304, 
in _exec
               self.execute(*p_args, **p_dargs)
             File "/kvm-autotest/client/common_lib/test.py", line 187, 
in execute
               self.run_once(*args, **dargs)
             File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_runtest_2.py", line 145, 
in run_once
               routine_obj.routine(self, params, env)
             File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py", line 3071, in 
run_boot_vms
               curr_vm_session = kvm_utils.wait_for(curr_vm.ssh_login, 
240, 0, 2)
             File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 797, in 
wait_for
               output = func()
             File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py", 
line 728, in ssh_login
               session = kvm_utils.ssh(address, port, username, 
password, prompt, timeout)
             File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 553, in ssh
               return remote_login(command, password, prompt, "\n", timeout)
             File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 431, in 
remote_login
               sub = kvm_spawn(command, linesep)
             File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 114, in 
__init__
               (pid, fd) = pty.fork()
             File "/usr/lib64/python2.4/pty.py", line 108, in fork
               pid = os.fork()
           OSError: [Errno 12] Cannot allocate memory
Persistent state variable __group_level now set to 1
     END FAIL    
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>    
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>    
timestamp=1244717403    localtime=Jun 11 06:50:03
Dropping caches
2009-06-11 06:50:03,409 running: sync
JOB ERROR: Unhandled OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
   File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
     execfile(self.control, global_control_vars, global_control_vars)
   File "/kvm-autotest/client/control", line 1030, in ?
     cfg_to_test("kvm_tests.cfg")
   File "/kvm-autotest/client/control", line 1013, in cfg_to_test
     current_status = job.run_test("kvm_runtest_2", params=dict, 
tag=tagname)
   File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
     utils.drop_caches()
   File "/kvm-autotest/client/bin/base_utils.py", line 638, in drop_caches
     utils.system("sync")
   File "/kvm-autotest/client/common_lib/utils.py", line 510, in system
     stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
   File "/kvm-autotest/client/common_lib/utils.py", line 330, in run
     bg_job = join_bg_jobs(
   File "/kvm-autotest/client/common_lib/utils.py", line 37, in __init__
     stdin=stdin)
   File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
     errread, errwrite)
   File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
     self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Persistent state variable __group_level now set to 0
END ABORT    ----    ----    timestamp=1244717418    localtime=Jun 11 
06:50:18    Unhandled OSError: [Errno 12] Cannot allocate memory
   Traceback (most recent call last):
     File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
       execfile(self.control, global_control_vars, global_control_vars)
     File "/kvm-autotest/client/control", line 1030, in ?
       cfg_to_test("kvm_tests.cfg")
     File "/kvm-autotest/client/control", line 1013, in cfg_to_test
       current_status = job.run_test("kvm_runtest_2", params=dict, 
tag=tagname)
     File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
       utils.drop_caches()
     File "/kvm-autotest/client/bin/base_utils.py", line 638, in drop_caches
       utils.system("sync")
     File "/kvm-autotest/client/common_lib/utils.py", line 510, in system
       stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
     File "/kvm-autotest/client/common_lib/utils.py", line 330, in run
       bg_job = join_bg_jobs(
     File "/kvm-autotest/client/common_lib/utils.py", line 37, in __init__
       stdin=stdin)
     File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
       errread, errwrite)
     File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
       self.pid = os.fork()
   OSError: [Errno 12] Cannot allocate memory
[root@dhcp-66-70-9 kvm_runtest_2]#
> Thanks,
> Michael
>
>    
>>>>> 14. At the end of the exception handler you should proably
>>>>>            
>> re-raise
>>      
>>>>>
>>>>>            
>>>> the exception
>>>>
>>>>          
>>>>> you caught. Otherwise the user won't see the error message. You
>>>>>            
>> can
>>      
>>>>>
>>>>>            
>>>> simply replace
>>>>
>>>>          
>>>>> 'break' with 'raise' (no parameters), and it should work,
>>>>>
>>>>>            
>>>> hopefully.
>>>>
>>>>          
>>>>>
>>>>>            
>>>> Yes I should if add a 'max_vms'.
>>>>
>>>>          
>>> I think you should re-raise anyway. Otherwise, what's the point in
>>>        
>> writing
>>      
>>> error messages such as "raise error.TestFail("Cannot boot vm
>>>        
>> anylonger")"?
>>      
>>> I you don't re-raise, the user won't see the messages.
>>>
>>>
>>>        
>>>>> I know these are quite a few comments, but they're all rather
>>>>>            
>> minor
>>      
>>>>>
>>>>>            
>>>> and the test
>>>>
>>>>          
>>>>> is well written in my opinion.
>>>>>
>>>>>
>>>>>            
>>>> Thank you,  I will do modification according to your and Uri's
>>>> comments,
>>>> and will re-submit it here later. :)
>>>>
>>>> Thanks and Best Regards,
>>>> Yolkfull
>>>>
>>>>          
>>>>> Thanks,
>>>>> Michael
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Yolkfull Chow"<yzhou@redhat.com>
>>>>> To:kvm@vger.kernel.org
>>>>> Cc: "Uri Lublin"<uril@redhat.com>
>>>>> Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
>>>>> Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one
>>>>>            
>> of
>>      
>>>>>
>>>>>            
>>>> them becomes unresponsive
>>>>
>>>>          
>>>>> Hi,
>>>>>
>>>>> This test will boot VMs until one of them becomes unresponsive,
>>>>>            
>> and
>>      
>>>>> records the maximum number of VMs successfully started.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>          
>>
>> -- 
>> Yolkfull
>> Regards,
>>      


-- 
Yolkfull
Regards,


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
       [not found] <120253480.1747631244710010660.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2009-06-11  8:53 ` Michael Goldish
  2009-06-11  9:46   ` Yolkfull Chow
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Goldish @ 2009-06-11  8:53 UTC (permalink / raw)
  To: Yolkfull Chow; +Cc: Uri Lublin, kvm


----- "Yolkfull Chow" <yzhou@redhat.com> wrote:

> Michael, these are the backtrace messages:
> 
> ...
> 20090611-064959 
> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
> 
> ERROR: run_once: Test failed: [Errno 12] Cannot allocate memory
> 20090611-064959 
> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
> 
> DEBUG: run_once: Postprocessing on error...
> 20090611-065000 
> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
> 
> DEBUG: postprocess_vm: Postprocessing VM 'vm1'...
> 20090611-065000 
> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
> 
> DEBUG: postprocess_vm: VM object found in environment
> 20090611-065000 
> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
> 
> DEBUG: send_monitor_cmd: Sending monitor command: screendump 
> /kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>/debug/post_vm1.ppm
> 20090611-065000 
> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
> 
> DEBUG: run_once: Contents of environment: {'vm__vm1': <kvm_vm.VM 
> instance at 0x92999a28>}
> post-test sysinfo error:
> Traceback (most recent call last):
>    File "/kvm-autotest/client/common_lib/log.py", line 58, in
> decorated_func
>      fn(*args, **dargs)
>    File "/kvm-autotest/client/bin/base_sysinfo.py", line 213, in 
> log_after_each_test
>      log.run(test_sysinfodir)
>    File "/kvm-autotest/client/bin/base_sysinfo.py", line 112, in run
>      shell=True, env=env)
>    File "/usr/lib64/python2.4/subprocess.py", line 412, in call
>      return Popen(*args, **kwargs).wait()
>    File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
>      errread, errwrite)
>    File "/usr/lib64/python2.4/subprocess.py", line 902, in
> _execute_child
>      self.pid = os.fork()
> OSError: [Errno 12] Cannot allocate memory
> 2009-06-11 06:50:02,859 Configuring logger for client level
>          FAIL    
> kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
>    
> kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
>    
> timestamp=1244717402    localtime=Jun 11 06:50:02    Unhandled
> OSError: 
> [Errno 12] Cannot allocate memory
>            Traceback (most recent call last):
>              File "/kvm-autotest/client/common_lib/test.py", line 304,
> 
> in _exec
>                self.execute(*p_args, **p_dargs)
>              File "/kvm-autotest/client/common_lib/test.py", line 187,
> 
> in execute
>                self.run_once(*args, **dargs)
>              File 
> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_runtest_2.py", line 145,
> 
> in run_once
>                routine_obj.routine(self, params, env)
>              File 
> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py", line 3071, in
> 
> run_boot_vms
>                curr_vm_session = kvm_utils.wait_for(curr_vm.ssh_login,
> 
> 240, 0, 2)
>              File 
> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 797, in
> 
> wait_for
>                output = func()
>              File
> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py", 
> line 728, in ssh_login
>                session = kvm_utils.ssh(address, port, username, 
> password, prompt, timeout)
>              File 
> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 553, in
> ssh
>                return remote_login(command, password, prompt, "\n",
> timeout)
>              File 
> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 431, in
> 
> remote_login
>                sub = kvm_spawn(command, linesep)
>              File 
> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 114, in
> 
> __init__
>                (pid, fd) = pty.fork()
>              File "/usr/lib64/python2.4/pty.py", line 108, in fork
>                pid = os.fork()
>            OSError: [Errno 12] Cannot allocate memory
> Persistent state variable __group_level now set to 1
>      END FAIL    
> kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
>    
> kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
>    
> timestamp=1244717403    localtime=Jun 11 06:50:03
> Dropping caches
> 2009-06-11 06:50:03,409 running: sync
> JOB ERROR: Unhandled OSError: [Errno 12] Cannot allocate memory
> Traceback (most recent call last):
>    File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
>      execfile(self.control, global_control_vars, global_control_vars)
>    File "/kvm-autotest/client/control", line 1030, in ?
>      cfg_to_test("kvm_tests.cfg")
>    File "/kvm-autotest/client/control", line 1013, in cfg_to_test
>      current_status = job.run_test("kvm_runtest_2", params=dict, 
> tag=tagname)
>    File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
>      utils.drop_caches()
>    File "/kvm-autotest/client/bin/base_utils.py", line 638, in
> drop_caches
>      utils.system("sync")
>    File "/kvm-autotest/client/common_lib/utils.py", line 510, in
> system
>      stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
>    File "/kvm-autotest/client/common_lib/utils.py", line 330, in run
>      bg_job = join_bg_jobs(
>    File "/kvm-autotest/client/common_lib/utils.py", line 37, in
> __init__
>      stdin=stdin)
>    File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
>      errread, errwrite)
>    File "/usr/lib64/python2.4/subprocess.py", line 902, in
> _execute_child
>      self.pid = os.fork()
> OSError: [Errno 12] Cannot allocate memory
> 
> Persistent state variable __group_level now set to 0
> END ABORT    ----    ----    timestamp=1244717418    localtime=Jun 11
> 
> 06:50:18    Unhandled OSError: [Errno 12] Cannot allocate memory
>    Traceback (most recent call last):
>      File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
>        execfile(self.control, global_control_vars,
> global_control_vars)
>      File "/kvm-autotest/client/control", line 1030, in ?
>        cfg_to_test("kvm_tests.cfg")
>      File "/kvm-autotest/client/control", line 1013, in cfg_to_test
>        current_status = job.run_test("kvm_runtest_2", params=dict, 
> tag=tagname)
>      File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
>        utils.drop_caches()
>      File "/kvm-autotest/client/bin/base_utils.py", line 638, in
> drop_caches
>        utils.system("sync")
>      File "/kvm-autotest/client/common_lib/utils.py", line 510, in
> system
>        stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
>      File "/kvm-autotest/client/common_lib/utils.py", line 330, in
> run
>        bg_job = join_bg_jobs(
>      File "/kvm-autotest/client/common_lib/utils.py", line 37, in
> __init__
>        stdin=stdin)
>      File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
>        errread, errwrite)
>      File "/usr/lib64/python2.4/subprocess.py", line 902, in
> _execute_child
>        self.pid = os.fork()
>    OSError: [Errno 12] Cannot allocate memory
> [root@dhcp-66-70-9 kvm_runtest_2]#

Thanks. It does indeed look like a legitimate OSError in os.fork().

BTW, do you have any idea why the result dir has such a weird name?
/kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>/debug/post_vm1.ppm

And why sometimes a normal looking tag appears (in the log messages):
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024

Why all the [] and <> in the weird version? Did you somehow do that intentionally, or is it some sort of bug?
And why is 'None' there? The tag is supposed to be the test's 'shortname', which is determined by kvm_config.py
as it parses kvm_tests.cfg (or the config file you're using).

Normally the result dir should just be kvm_runtest_2.shortname, and in this case:
kvm_runtest_2.no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive
  2009-06-11  8:53 ` Michael Goldish
@ 2009-06-11  9:46   ` Yolkfull Chow
  0 siblings, 0 replies; 13+ messages in thread
From: Yolkfull Chow @ 2009-06-11  9:46 UTC (permalink / raw)
  To: Michael Goldish; +Cc: Uri Lublin, kvm

On 06/11/2009 04:53 PM, Michael Goldish wrote:
> ----- "Yolkfull Chow"<yzhou@redhat.com>  wrote:
>
>    
>> Michael, these are the backtrace messages:
>>
>> ...
>> 20090611-064959
>> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
>>
>> ERROR: run_once: Test failed: [Errno 12] Cannot allocate memory
>> 20090611-064959
>> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
>>
>> DEBUG: run_once: Postprocessing on error...
>> 20090611-065000
>> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
>>
>> DEBUG: postprocess_vm: Postprocessing VM 'vm1'...
>> 20090611-065000
>> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
>>
>> DEBUG: postprocess_vm: VM object found in environment
>> 20090611-065000
>> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
>>
>> DEBUG: send_monitor_cmd: Sending monitor command: screendump
>> /kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>/debug/post_vm1.ppm
>> 20090611-065000
>> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
>>
>> DEBUG: run_once: Contents of environment: {'vm__vm1':<kvm_vm.VM
>> instance at 0x92999a28>}
>> post-test sysinfo error:
>> Traceback (most recent call last):
>>     File "/kvm-autotest/client/common_lib/log.py", line 58, in
>> decorated_func
>>       fn(*args, **dargs)
>>     File "/kvm-autotest/client/bin/base_sysinfo.py", line 213, in
>> log_after_each_test
>>       log.run(test_sysinfodir)
>>     File "/kvm-autotest/client/bin/base_sysinfo.py", line 112, in run
>>       shell=True, env=env)
>>     File "/usr/lib64/python2.4/subprocess.py", line 412, in call
>>       return Popen(*args, **kwargs).wait()
>>     File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
>>       errread, errwrite)
>>     File "/usr/lib64/python2.4/subprocess.py", line 902, in
>> _execute_child
>>       self.pid = os.fork()
>> OSError: [Errno 12] Cannot allocate memory
>> 2009-06-11 06:50:02,859 Configuring logger for client level
>>           FAIL
>> kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
>>
>> kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
>>
>> timestamp=1244717402    localtime=Jun 11 06:50:02    Unhandled
>> OSError:
>> [Errno 12] Cannot allocate memory
>>             Traceback (most recent call last):
>>               File "/kvm-autotest/client/common_lib/test.py", line 304,
>>
>> in _exec
>>                 self.execute(*p_args, **p_dargs)
>>               File "/kvm-autotest/client/common_lib/test.py", line 187,
>>
>> in execute
>>                 self.run_once(*args, **dargs)
>>               File
>> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_runtest_2.py", line 145,
>>
>> in run_once
>>                 routine_obj.routine(self, params, env)
>>               File
>> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py", line 3071, in
>>
>> run_boot_vms
>>                 curr_vm_session = kvm_utils.wait_for(curr_vm.ssh_login,
>>
>> 240, 0, 2)
>>               File
>> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 797, in
>>
>> wait_for
>>                 output = func()
>>               File
>> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py",
>> line 728, in ssh_login
>>                 session = kvm_utils.ssh(address, port, username,
>> password, prompt, timeout)
>>               File
>> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 553, in
>> ssh
>>                 return remote_login(command, password, prompt, "\n",
>> timeout)
>>               File
>> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 431, in
>>
>> remote_login
>>                 sub = kvm_spawn(command, linesep)
>>               File
>> "/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 114, in
>>
>> __init__
>>                 (pid, fd) = pty.fork()
>>               File "/usr/lib64/python2.4/pty.py", line 108, in fork
>>                 pid = os.fork()
>>             OSError: [Errno 12] Cannot allocate memory
>> Persistent state variable __group_level now set to 1
>>       END FAIL
>> kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
>>
>> kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
>>
>> timestamp=1244717403    localtime=Jun 11 06:50:03
>> Dropping caches
>> 2009-06-11 06:50:03,409 running: sync
>> JOB ERROR: Unhandled OSError: [Errno 12] Cannot allocate memory
>> Traceback (most recent call last):
>>     File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
>>       execfile(self.control, global_control_vars, global_control_vars)
>>     File "/kvm-autotest/client/control", line 1030, in ?
>>       cfg_to_test("kvm_tests.cfg")
>>     File "/kvm-autotest/client/control", line 1013, in cfg_to_test
>>       current_status = job.run_test("kvm_runtest_2", params=dict,
>> tag=tagname)
>>     File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
>>       utils.drop_caches()
>>     File "/kvm-autotest/client/bin/base_utils.py", line 638, in
>> drop_caches
>>       utils.system("sync")
>>     File "/kvm-autotest/client/common_lib/utils.py", line 510, in
>> system
>>       stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
>>     File "/kvm-autotest/client/common_lib/utils.py", line 330, in run
>>       bg_job = join_bg_jobs(
>>     File "/kvm-autotest/client/common_lib/utils.py", line 37, in
>> __init__
>>       stdin=stdin)
>>     File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
>>       errread, errwrite)
>>     File "/usr/lib64/python2.4/subprocess.py", line 902, in
>> _execute_child
>>       self.pid = os.fork()
>> OSError: [Errno 12] Cannot allocate memory
>>
>> Persistent state variable __group_level now set to 0
>> END ABORT    ----    ----    timestamp=1244717418    localtime=Jun 11
>>
>> 06:50:18    Unhandled OSError: [Errno 12] Cannot allocate memory
>>     Traceback (most recent call last):
>>       File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
>>         execfile(self.control, global_control_vars,
>> global_control_vars)
>>       File "/kvm-autotest/client/control", line 1030, in ?
>>         cfg_to_test("kvm_tests.cfg")
>>       File "/kvm-autotest/client/control", line 1013, in cfg_to_test
>>         current_status = job.run_test("kvm_runtest_2", params=dict,
>> tag=tagname)
>>       File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
>>         utils.drop_caches()
>>       File "/kvm-autotest/client/bin/base_utils.py", line 638, in
>> drop_caches
>>         utils.system("sync")
>>       File "/kvm-autotest/client/common_lib/utils.py", line 510, in
>> system
>>         stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
>>       File "/kvm-autotest/client/common_lib/utils.py", line 330, in
>> run
>>         bg_job = join_bg_jobs(
>>       File "/kvm-autotest/client/common_lib/utils.py", line 37, in
>> __init__
>>         stdin=stdin)
>>       File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
>>         errread, errwrite)
>>       File "/usr/lib64/python2.4/subprocess.py", line 902, in
>> _execute_child
>>         self.pid = os.fork()
>>     OSError: [Errno 12] Cannot allocate memory
>> [root@dhcp-66-70-9 kvm_runtest_2]#
>>      
> Thanks. It does indeed look like a legitimate OSError in os.fork().
>
> BTW, do you have any idea why the result dir has such a weird name?
> /kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>/debug/post_vm1.ppm
>
> And why sometimes a normal looking tag appears (in the log messages):
> no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024
>
> Why all the [] and<>  in the weird version? Did you somehow do that intentionally, or is it some sort of bug?
> And why is 'None' there? The tag is supposed to be the test's 'shortname', which is determined by kvm_config.py
> as it parses kvm_tests.cfg (or the config file you're using).
>
> Normally the result dir should just be kvm_runtest_2.shortname, and in this case:
> kvm_runtest_2.no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024
>    
Hi Michael, it's not any sort of defect or problem,  we just did that 
intentionally for some purpose. And now we had unified it with 
autotest's style. Thank you so much for kindly remind. :)

-- 
Yolkfull
Regards,


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-06-11  9:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <443392010.1660281244634434026.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-06-10 11:52 ` [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive Michael Goldish
2009-06-11  3:37   ` Yolkfull Chow
     [not found] <120253480.1747631244710010660.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-06-11  8:53 ` Michael Goldish
2009-06-11  9:46   ` Yolkfull Chow
     [not found] <219655199.1650051244627445364.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-06-10 10:03 ` Michael Goldish
2009-06-10 10:31   ` Yolkfull Chow
     [not found] <2021156332.1536421244540393444.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-06-09  9:44 ` Michael Goldish
2009-06-10  8:10   ` Yolkfull Chow
2009-06-08  4:01 [KVM-AUTOTEST PATCH 0/8] Re-submitting some of the patches on the patch queue Lucas Meneghel Rodrigues
2009-06-09  8:41 ` [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive Yolkfull Chow
2009-06-09  9:37   ` Yaniv Kaul
2009-06-09  9:57     ` Michael Goldish
2009-06-09 12:45   ` Uri Lublin
2009-06-10  8:12     ` Yolkfull Chow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).