* [PATCH] Introduce an s3 test
@ 2013-04-26 17:26 Ben Guthro
2013-05-01 10:56 ` Ian Jackson
0 siblings, 1 reply; 6+ messages in thread
From: Ben Guthro @ 2013-04-26 17:26 UTC (permalink / raw)
To: Ian Jackson, xen-devel, Ian.Campbell
From: root <root@bguthro-desktop.(none)>
This test attempts to have an initial pass at introducing a test to catch regressions in S3.
It currently just suspends for N seconds, and checks xl dmesg for a partiular message printed
when S3 is complete.
---
ts-host-suspend | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
create mode 100755 ts-host-suspend
diff --git a/ts-host-suspend b/ts-host-suspend
new file mode 100755
index 0000000..9fe38d5
--- /dev/null
+++ b/ts-host-suspend
@@ -0,0 +1,51 @@
+#!/usr/bin/perl -w
+
+use strict qw(vars);
+
+use Osstest;
+use Osstest::TestSupport;
+
+tsreadconfig();
+
+my $timeout = 30;
+if (@ARGV && $ARGV[0] =~ m/^--timeout=([0-9]*)$/) {
+ $timeout = $1;
+ shift @ARGV;
+}
+
+our ($whhost) = @ARGV;
+$whhost ||= 'host';
+our $ho = selecthost($whhost);
+
+my $RTC = "/sys/class/rtc/rtc0";
+
+# get RTC NOW
+my $epoch = target_cmd_output_root($ho, "cat " . $RTC . "/since_epoch");
+
+# Clear the wake alarm
+target_cmd_root($ho, "echo 0 > " . $RTC . "/wakealarm");
+
+# Set the wake alarm to NOW + time
+my $t2 = $epoch + $timeout;
+target_cmd_root($ho, "echo " . ($epoch + $timeout) . " > ". $RTC . "/wakealarm");
+
+# Put the machine to sleep
+target_cmd_root($ho, "pm-suspend");
+
+# Give the machine some time to go to sleep.
+sleep (5 + $timeout);
+
+# check log for resume message
+poll_loop(4*$timeout, 2, 's3-confirm-resumed',
+ target_cmd_output($ho,"xl dmesg | grep 'ACPI S' | tail -1 | " .
+ "grep -n 'Finishing wakeup from S3 state'"));
+
+# TODO:
+# - Check pcpu state
+# - Affinity has been restored
+# - C-states are not lost
+# - CPU pools are all correct
+# - Check timer queues are correct
+# - vcpu_singleshot_timer on every pcpu
+# - Check for kernel Oops
+# - Check for Xen WARN
--
1.7.9.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] Introduce an s3 test
2013-04-26 17:26 [PATCH] Introduce an s3 test Ben Guthro
@ 2013-05-01 10:56 ` Ian Jackson
2013-05-01 11:23 ` Ian Campbell
2013-05-01 11:58 ` Ben Guthro
0 siblings, 2 replies; 6+ messages in thread
From: Ian Jackson @ 2013-05-01 10:56 UTC (permalink / raw)
To: Ben Guthro; +Cc: Ian Campbell, xen-devel@lists.xen.org
Ben Guthro writes ("[PATCH] Introduce an s3 test"):
> From: root <root@bguthro-desktop.(none)>
>
> This test attempts to have an initial pass at introducing a test to catch regressions in S3.
> It currently just suspends for N seconds, and checks xl dmesg for a partiular message printed
> when S3 is complete.
Thanks. Most of this looks plausible. I have some comments:
> +# Put the machine to sleep
> +target_cmd_root($ho, "pm-suspend");
> +
> +# Give the machine some time to go to sleep.
> +sleep (5 + $timeout);
> +
> +# check log for resume message
> +poll_loop(4*$timeout, 2, 's3-confirm-resumed',
> + target_cmd_output($ho,"xl dmesg | grep 'ACPI S' | tail -1 | " .
> + "grep -n 'Finishing wakeup from S3 state'"));
Why does this need a poll loop ? Surely after the machine comes out
of suspend it should be up right away ?
> +# TODO:
> +# - Check pcpu state
> +# - Affinity has been restored
> +# - C-states are not lost
> +# - CPU pools are all correct
We don't do any cpu affinity testing at all right now. Leaving
this as a TODO here is fine.
> +# - Check timer queues are correct
> +# - vcpu_singleshot_timer on every pcpu
I'm not sure I follow this. Wouldn't messed up timer queues cause
other trouble in the guest ?
> +# - Check for kernel Oops
> +# - Check for Xen WARN
These are a good idea but should perhaps be a separate test step.
Ian.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Introduce an s3 test
2013-05-01 10:56 ` Ian Jackson
@ 2013-05-01 11:23 ` Ian Campbell
2013-05-01 11:58 ` Ben Guthro
1 sibling, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2013-05-01 11:23 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel@lists.xen.org, Ben Guthro
On Wed, 2013-05-01 at 11:56 +0100, Ian Jackson wrote:
> Ben Guthro writes ("[PATCH] Introduce an s3 test"):
> > From: root <root@bguthro-desktop.(none)>
> >
> > This test attempts to have an initial pass at introducing a test to catch regressions in S3.
> > It currently just suspends for N seconds, and checks xl dmesg for a partiular message printed
> > when S3 is complete.
>
> Thanks. Most of this looks plausible. I have some comments:
>
> > +# Put the machine to sleep
> > +target_cmd_root($ho, "pm-suspend");
> > +
> > +# Give the machine some time to go to sleep.
> > +sleep (5 + $timeout);
> > +
> > +# check log for resume message
> > +poll_loop(4*$timeout, 2, 's3-confirm-resumed',
> > + target_cmd_output($ho,"xl dmesg | grep 'ACPI S' | tail -1 | " .
> > + "grep -n 'Finishing wakeup from S3 state'"));
>
> Why does this need a poll loop ? Surely after the machine comes out
> of suspend it should be up right away ?
Not immediately I expect, but what happens if the ssh fails -- is there
retries at that level which make the poll loop redundant?
Do we need to handle the case where the s3 resume fails such that
subsequent tests can work? i.e. finish up with an explicit
reboot/powercycle?
Ian.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Introduce an s3 test
2013-05-01 10:56 ` Ian Jackson
2013-05-01 11:23 ` Ian Campbell
@ 2013-05-01 11:58 ` Ben Guthro
2013-05-02 15:06 ` Ian Jackson
1 sibling, 1 reply; 6+ messages in thread
From: Ben Guthro @ 2013-05-01 11:58 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel@lists.xen.org, Ian Campbell, Ben Guthro
On 05/01/2013 06:56 AM, Ian Jackson wrote:
> Ben Guthro writes ("[PATCH] Introduce an s3 test"):
>> From: root <root@bguthro-desktop.(none)>
>>
>> This test attempts to have an initial pass at introducing a test to catch regressions in S3.
>> It currently just suspends for N seconds, and checks xl dmesg for a partiular message printed
>> when S3 is complete.
>
> Thanks. Most of this looks plausible. I have some comments:
>
>> +# Put the machine to sleep
>> +target_cmd_root($ho, "pm-suspend");
>> +
>> +# Give the machine some time to go to sleep.
>> +sleep (5 + $timeout);
>> +
>> +# check log for resume message
>> +poll_loop(4*$timeout, 2, 's3-confirm-resumed',
>> + target_cmd_output($ho,"xl dmesg | grep 'ACPI S' | tail -1 | " .
>> + "grep -n 'Finishing wakeup from S3 state'"));
>
> Why does this need a poll loop ? Surely after the machine comes out
> of suspend it should be up right away ?
This is a bit of a "first pass" in a test environment I've never used
before. I modeled this after other tests I found in the same dir. If
this is inappropriate, then I suspect you are correct.
I put it in the loop for the case of networking taking some time to come
back online, so if the ssh command failed it would be retried.
Additionally, I have found that the RTC wakeup mechanism is not very
accurate in its timing.
>
>> +# TODO:
>> +# - Check pcpu state
>> +# - Affinity has been restored
>> +# - C-states are not lost
>> +# - CPU pools are all correct
>
> We don't do any cpu affinity testing at all right now. Leaving
> this as a TODO here is fine.
>
>> +# - Check timer queues are correct
>> +# - vcpu_singleshot_timer on every pcpu
>
> I'm not sure I follow this. Wouldn't messed up timer queues cause
> other trouble in the guest ?
Yes, but it has been a common point of failure / problems after S3. I
put this here as a placeholder to verify that everything is still as it
should be.
>
>> +# - Check for kernel Oops
>> +# - Check for Xen WARN
>
> These are a good idea but should perhaps be a separate test step.
Wouldn't you want a warning/oops that was provoked by S3 to be
associated with that test?
>
> Ian.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Introduce an s3 test
2013-05-01 11:58 ` Ben Guthro
@ 2013-05-02 15:06 ` Ian Jackson
2013-05-02 20:28 ` Ben Guthro
0 siblings, 1 reply; 6+ messages in thread
From: Ian Jackson @ 2013-05-02 15:06 UTC (permalink / raw)
To: Ben Guthro; +Cc: Ian Campbell, xen-devel@lists.xen.org
Ben Guthro writes ("Re: [PATCH] Introduce an s3 test"):
> On 05/01/2013 06:56 AM, Ian Jackson wrote:
> >> +# check log for resume message
> >> +poll_loop(4*$timeout, 2, 's3-confirm-resumed',
> >> + target_cmd_output($ho,"xl dmesg | grep 'ACPI S' | tail -1 | " .
> >> + "grep -n 'Finishing wakeup from S3 state'"));
> >
> > Why does this need a poll loop ? Surely after the machine comes out
> > of suspend it should be up right away ?
>
> This is a bit of a "first pass" in a test environment I've never used
> before. I modeled this after other tests I found in the same dir. If
> this is inappropriate, then I suspect you are correct.
Maybe you should be using guest_check_up ?
> I put it in the loop for the case of networking taking some time to come
> back online, so if the ssh command failed it would be retried.
How long is it supposed to take to come back online ? "4*$timeout"
seems (a) a bit arbitrary (b) rather long with your existing value of
$timeout.
> Additionally, I have found that the RTC wakeup mechanism is not very
> accurate in its timing.
How unfortunate.
> > I'm not sure I follow this. Wouldn't messed up timer queues cause
> > other trouble in the guest ?
>
> Yes, but it has been a common point of failure / problems after S3. I
> put this here as a placeholder to verify that everything is still as it
> should be.
Err, OK.
> >> +# - Check for kernel Oops
> >> +# - Check for Xen WARN
> >
> > These are a good idea but should perhaps be a separate test step.
>
> Wouldn't you want a warning/oops that was provoked by S3 to be
> associated with that test?
Hrm. Well in principle this is surely true of any test.
Can we make warnings/oopses fatal ?
Ian.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Introduce an s3 test
2013-05-02 15:06 ` Ian Jackson
@ 2013-05-02 20:28 ` Ben Guthro
0 siblings, 0 replies; 6+ messages in thread
From: Ben Guthro @ 2013-05-02 20:28 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel@lists.xen.org, Ian Campbell, Ben Guthro
On May 2, 2013, at 11:06 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>
wrote:
> Ben Guthro writes ("Re: [PATCH] Introduce an s3 test"):
>> On 05/01/2013 06:56 AM, Ian Jackson wrote:
>>>> +# check log for resume message
>>>> +poll_loop(4*$timeout, 2, 's3-confirm-resumed',
>>>> + target_cmd_output($ho,"xl dmesg | grep 'ACPI S' | tail -1 | " .
>>>> + "grep -n 'Finishing wakeup from S3 state'"));
>>>
>>> Why does this need a poll loop ? Surely after the machine comes out
>>> of suspend it should be up right away ?
>>
>> This is a bit of a "first pass" in a test environment I've never used
>> before. I modeled this after other tests I found in the same dir. If
>> this is inappropriate, then I suspect you are correct.
>
> Maybe you should be using guest_check_up ?
I'll own up to the fact that I wasn't really able to test the infrastructure portions of this script.
I was unsuccessful in getting them to run, even using the "standalone" branch.
It would really help if someone who has access to the test infrastructure could take my script as a starting point, and adapt it to whatever is necessary for that test environment.
>
>> I put it in the loop for the case of networking taking some time to come
>> back online, so if the ssh command failed it would be retried.
>
> How long is it supposed to take to come back online ? "4*$timeout"
> seems (a) a bit arbitrary (b) rather long with your existing value of
> $timeout.
For all devices to come back online, it can sometimes take up to 20s.
This value was arbitrary, but chosen with the RTC variance + devices coming on line.
This should probably be a tunable value.
>
>> Additionally, I have found that the RTC wakeup mechanism is not very
>> accurate in its timing.
>
> How unfortunate.
Indeed. We frequently see sleeping machines for 1m sometimes results in sometimes results in machines waking up 30s later - others 3m later.
>
>>> I'm not sure I follow this. Wouldn't messed up timer queues cause
>>> other trouble in the guest ?
>>
>> Yes, but it has been a common point of failure / problems after S3. I
>> put this here as a placeholder to verify that everything is still as it
>> should be.
>
> Err, OK.
>
I see automated testing as a resource to be able to confirm that problems that occurred in the past do not re-emerge from new development, rather than strictly functional testing.
If you disagree with this, feel free to remove it. I don't feel strongly about this particular point.
>>>> +# - Check for kernel Oops
>>>> +# - Check for Xen WARN
>>>
>>> These are a good idea but should perhaps be a separate test step.
>>
>> Wouldn't you want a warning/oops that was provoked by S3 to be
>> associated with that test?
>
> Hrm. Well in principle this is surely true of any test.
>
> Can we make warnings/oopses fatal ?
>
That seems like it would be prudent, if possible.
As I mentioned above, I had difficulty configuring this test environment, so it may be trivial, and I am just not familiar enough with this environment.
Ben
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-05-02 20:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-26 17:26 [PATCH] Introduce an s3 test Ben Guthro
2013-05-01 10:56 ` Ian Jackson
2013-05-01 11:23 ` Ian Campbell
2013-05-01 11:58 ` Ben Guthro
2013-05-02 15:06 ` Ian Jackson
2013-05-02 20:28 ` Ben Guthro
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.