* S3 is broken again in xen-unstable
@ 2013-04-25 12:00 Ben Guthro
2013-04-25 17:02 ` Ben Guthro
2013-04-29 8:45 ` Jan Beulich
0 siblings, 2 replies; 22+ messages in thread
From: Ben Guthro @ 2013-04-25 12:00 UTC (permalink / raw)
To: xen-devel
I don't have time to bisect this, currently - but just thought I'd let
the list know that, while xen-4.2 works (with the recent S3 changes
I've submitted) - 4.3 is broken again.
I'm not sure if it is the hypervisor, or the kernel, since I upgraded
both in my "unstable" build environment.
Since this is something that XenClient really relies on working, it
has been a pain point with every upgrade of Xen for us.
It is enormously time consuming to debug on every upgrade, and has a
long tail in discovering problems (I started debugging S3 last Aug on
xen-unstable, prior to 4.2 being cut)
How can we work with the community to try to get some sort of
regression testing for this feature that we rely on in our product?
Ben
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-25 12:00 S3 is broken again in xen-unstable Ben Guthro
@ 2013-04-25 17:02 ` Ben Guthro
2013-04-26 8:10 ` Ian Campbell
2013-04-26 20:47 ` Pasi Kärkkäinen
2013-04-29 8:45 ` Jan Beulich
1 sibling, 2 replies; 22+ messages in thread
From: Ben Guthro @ 2013-04-25 17:02 UTC (permalink / raw)
To: xen-devel
On Thu, Apr 25, 2013 at 8:00 AM, Ben Guthro <ben@guthro.net> wrote:
> I don't have time to bisect this, currently - but just thought I'd let
> the list know that, while xen-4.2 works (with the recent S3 changes
> I've submitted) - 4.3 is broken again.
>
> I'm not sure if it is the hypervisor, or the kernel, since I upgraded
> both in my "unstable" build environment.
>
This appears to have been a transient issue. My xen tree was a few days old
updating to the tip seems to have resolved this particular issue.
> Since this is something that XenClient really relies on working, it
> has been a pain point with every upgrade of Xen for us.
> It is enormously time consuming to debug on every upgrade, and has a
> long tail in discovering problems (I started debugging S3 last Aug on
> xen-unstable, prior to 4.2 being cut)
>
> How can we work with the community to try to get some sort of
> regression testing for this feature that we rely on in our product?
I am still interested in ideas for getting this into automated
testing, and any ideas people may have for this.
Would it be helpful to maintain a branch in my xenbits repo that could
be a rebased version of konrad's acpi-s3 patches against Linus' latest
kernel?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-25 17:02 ` Ben Guthro
@ 2013-04-26 8:10 ` Ian Campbell
2013-04-26 12:19 ` Ben Guthro
2013-04-29 11:03 ` George Dunlap
2013-04-26 20:47 ` Pasi Kärkkäinen
1 sibling, 2 replies; 22+ messages in thread
From: Ian Campbell @ 2013-04-26 8:10 UTC (permalink / raw)
To: Ben Guthro; +Cc: Ian Jackson, xen-devel
On Thu, 2013-04-25 at 18:02 +0100, Ben Guthro wrote:
> On Thu, Apr 25, 2013 at 8:00 AM, Ben Guthro <ben@guthro.net> wrote:
> > Since this is something that XenClient really relies on working, it
> > has been a pain point with every upgrade of Xen for us.
> > It is enormously time consuming to debug on every upgrade, and has a
> > long tail in discovering problems (I started debugging S3 last Aug on
> > xen-unstable, prior to 4.2 being cut)
> >
> > How can we work with the community to try to get some sort of
> > regression testing for this feature that we rely on in our product?
>
> I am still interested in ideas for getting this into automated
> testing, and any ideas people may have for this.
CCing Ian Jackson who runs the test infrastructure.
Contributing new tests is now less onerous than it once was (i.e. it
might even possible at all). There is some info at
http://lists.xen.org/archives/html/xen-devel/2012-10/msg01517.html
although the branch may be out of date -- Ian was working on merging the
standalone branch at one point.
Some questions:
* How automatable is s3?
* In particular can we automate the wakeup? s3 is save to RAM
IIRC, and most power control in the test system is done with PDU
power cycling.
* Would s3 ever be expected to work on the sorts of whitebox
server systems which form the osstest pool or do we need to
investigate additional hardware?
* How hardware specific are the s3 failures -- we obviously can't
have one of every laptop ever ;-)
So assuming the answers to the above are positive then contributing a
test case for s3 to the relevant flights seems like a reasonable first
step, even if the expectation is that it would always fail with the
current mainline Xen + mainline Linux. The test system only tracks
regressions, so always failing test cases are OK (you can think of this
in the test-drive development kind of way ;-)).
> Would it be helpful to maintain a branch in my xenbits repo that could
> be a rebased version of konrad's acpi-s3 patches against Linus' latest
> kernel?
What is keeping those out of Linus' tree?
Once we have a test case in the standard flights then we can consider
the options around new flights testing other trees.
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-26 8:10 ` Ian Campbell
@ 2013-04-26 12:19 ` Ben Guthro
2013-04-26 13:17 ` Ian Campbell
2013-05-01 11:01 ` Ian Jackson
2013-04-29 11:03 ` George Dunlap
1 sibling, 2 replies; 22+ messages in thread
From: Ben Guthro @ 2013-04-26 12:19 UTC (permalink / raw)
To: Ian Campbell
Cc: George Dunlap, Konrad Rzeszutek Wilk, Ian Jackson,
Marek Marczykowski, xen-devel
On Fri, Apr 26, 2013 at 4:10 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Thu, 2013-04-25 at 18:02 +0100, Ben Guthro wrote:
>> On Thu, Apr 25, 2013 at 8:00 AM, Ben Guthro <ben@guthro.net> wrote:
>> > Since this is something that XenClient really relies on working, it
>> > has been a pain point with every upgrade of Xen for us.
>> > It is enormously time consuming to debug on every upgrade, and has a
>> > long tail in discovering problems (I started debugging S3 last Aug on
>> > xen-unstable, prior to 4.2 being cut)
>> >
>> > How can we work with the community to try to get some sort of
>> > regression testing for this feature that we rely on in our product?
>>
>> I am still interested in ideas for getting this into automated
>> testing, and any ideas people may have for this.
>
> CCing Ian Jackson who runs the test infrastructure.
I've also CC'ed a few people here, who I mention in my reply below.
>
> Contributing new tests is now less onerous than it once was (i.e. it
> might even possible at all). There is some info at
> http://lists.xen.org/archives/html/xen-devel/2012-10/msg01517.html
> although the branch may be out of date -- Ian was working on merging the
> standalone branch at one point.
I'll read up on this
>
> Some questions:
> * How automatable is s3?
> * In particular can we automate the wakeup? s3 is save to RAM
> IIRC, and most power control in the test system is done with PDU
> power cycling.
I spoke with George Dunlap a bit about this while I was over in the
UK a few weeks ago, and drew up an example shell script for this:
http://xen.markmail.org/thread/ghj2ffngemccq6p4
Marek also weighed in, and included some of his own tests, and experiences.
In my experience, this mechanism is about as reliable as your RTC. On
some systems you might tell it to sleep for 30s, and it will wake in
10s.
That said, when things go wrong, the machine does need to be power
cycled...so if you are not physically located near the machine under
test, you would need a PDU as a recovery mechanism, I suppose.
> * Would s3 ever be expected to work on the sorts of whitebox
> server systems which form the osstest pool or do we need to
> investigate additional hardware?
I don't see why it wouldn't work, though admittedly I haven't dealt
with xen on servers since 2009.
> * How hardware specific are the s3 failures -- we obviously can't
> have one of every laptop ever ;-)
Clearly. I'm just looking to get a foot in the door here, so there is
a chance of catching gross regressions.
The hardware differences seem to be more timing related, due to
speed... ie, you are likely to uncover new failures when new, faster
hardware comes out for laptops.
Since typically server hardware is faster than laptop hardware, that
would theoretically catch problems at a higher frequency.
>
> So assuming the answers to the above are positive then contributing a
> test case for s3 to the relevant flights seems like a reasonable first
> step, even if the expectation is that it would always fail with the
> current mainline Xen + mainline Linux. The test system only tracks
> regressions, so always failing test cases are OK (you can think of this
> in the test-drive development kind of way ;-)).
I'll take a look at the test infrastructure, and see if I can make
heads/tails of it, and come up with a simplistic test.
>
>> Would it be helpful to maintain a branch in my xenbits repo that could
>> be a rebased version of konrad's acpi-s3 patches against Linus' latest
>> kernel?
>
> What is keeping those out of Linus' tree?
Added Konrad here, but I believe he is on vacation this week.
This has been a bullet point on his OSS presentation, as outstanding
pvops work for at least 3 years now.
IIRC, the x86 guys NACK'ed the change as being too invasive.
I googled around a bit, but can't seem to find the thread about it.
>
> Once we have a test case in the standard flights then we can consider
> the options around new flights testing other trees.
I'm not sure I understand this point.
Are you saying you want to see a test that fails in the standard test
flight first...because without Konrad's patches, it will be guaranteed
not to work.
...and without other changesets queued up for the 3.10 merge window,
non-boot CPUs will always have incorrect C-states.
Thanks
Ben
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-26 12:19 ` Ben Guthro
@ 2013-04-26 13:17 ` Ian Campbell
2013-04-26 14:10 ` Ben Guthro
2013-05-21 18:29 ` Ben Guthro
2013-05-01 11:01 ` Ian Jackson
1 sibling, 2 replies; 22+ messages in thread
From: Ian Campbell @ 2013-04-26 13:17 UTC (permalink / raw)
To: Ben Guthro
Cc: George Dunlap, Konrad Rzeszutek Wilk, Ian Jackson,
Marek Marczykowski, xen-devel
On Fri, 2013-04-26 at 13:19 +0100, Ben Guthro wrote:
> On Fri, Apr 26, 2013 at 4:10 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > Some questions:
> > * How automatable is s3?
> > * In particular can we automate the wakeup? s3 is save to RAM
> > IIRC, and most power control in the test system is done with PDU
> > power cycling.
>
> I spoke with George Dunlap a bit about this while I was over in the
> UK a few weeks ago, and drew up an example shell script for this:
> http://xen.markmail.org/thread/ghj2ffngemccq6p4
> Marek also weighed in, and included some of his own tests, and experiences.
>
> In my experience, this mechanism is about as reliable as your RTC. On
> some systems you might tell it to sleep for 30s, and it will wake in
> 10s.
>
> That said, when things go wrong, the machine does need to be power
> cycled...so if you are not physically located near the machine under
> test, you would need a PDU as a recovery mechanism, I suppose.
That'#s OK, all the systems in the test harness would have to have PDU
for the other test cases (initial install etc) anyway.
> >> Would it be helpful to maintain a branch in my xenbits repo that could
> >> be a rebased version of konrad's acpi-s3 patches against Linus' latest
> >> kernel?
> >
> > What is keeping those out of Linus' tree?
>
> Added Konrad here, but I believe he is on vacation this week.
> This has been a bullet point on his OSS presentation, as outstanding
> pvops work for at least 3 years now.
>
> IIRC, the x86 guys NACK'ed the change as being too invasive.
> I googled around a bit, but can't seem to find the thread about it.
I wonder if it might be something like that :-/
> > Once we have a test case in the standard flights then we can consider
> > the options around new flights testing other trees.
>
> I'm not sure I understand this point.
> Are you saying you want to see a test that fails in the standard test
> flight first...because without Konrad's patches, it will be guaranteed
> not to work.
Right. AIUI the flights (and I may be using the wrong term here) are
somewhat uniform and and few in number and get run with various
combinations inputs (Xen tree, Linux tree, Qemu tree), so there is
effectively one "test Linux PV kernel flight" and one "test Xen PV
guests flight" etc, so we want to get S3 into those flights, with the
existing set of "* tree" inputs.
IOW we should add a new row to the grid
http://www.chiark.greenend.org.uk/~xensrcts/logs/17816/ for s3 testing
and then we can consider adding a new column with a different set of
tree's as input.
Ian J may have a different opinion on how to approach, but he's away
until mid next week.
> ...and without other changesets queued up for the 3.10 merge window,
> non-boot CPUs will always have incorrect C-states.
It's OK to add the tests before things work.
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-26 13:17 ` Ian Campbell
@ 2013-04-26 14:10 ` Ben Guthro
2013-04-26 14:32 ` Ian Campbell
2013-05-21 18:29 ` Ben Guthro
1 sibling, 1 reply; 22+ messages in thread
From: Ben Guthro @ 2013-04-26 14:10 UTC (permalink / raw)
To: Ian Campbell
Cc: George Dunlap, Konrad Rzeszutek Wilk, Ian Jackson,
Marek Marczykowski, xen-devel
[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]
On Fri, Apr 26, 2013 at 9:17 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
<snip>
>> I'm not sure I understand this point.
>> Are you saying you want to see a test that fails in the standard test
>> flight first...because without Konrad's patches, it will be guaranteed
>> not to work.
>
> Right. AIUI the flights (and I may be using the wrong term here) are
> somewhat uniform and and few in number and get run with various
> combinations inputs (Xen tree, Linux tree, Qemu tree), so there is
> effectively one "test Linux PV kernel flight" and one "test Xen PV
> guests flight" etc, so we want to get S3 into those flights, with the
> existing set of "* tree" inputs.
>
> IOW we should add a new row to the grid
> http://www.chiark.greenend.org.uk/~xensrcts/logs/17816/ for s3 testing
> and then we can consider adding a new column with a different set of
> tree's as input.
>
> Ian J may have a different opinion on how to approach, but he's away
> until mid next week.
>
>> ...and without other changesets queued up for the 3.10 merge window,
>> non-boot CPUs will always have incorrect C-states.
>
> It's OK to add the tests before things work.
I've attached a patch to osstest that would be the beginnings of this
sort of test, if I'm reading the code correctly.
However, I don't really have a setup that I can smoke test this.
I also left a number of TODOs, since I really just want an opinion to
see if I'm on the right track.
Thanks
Ben
[-- Attachment #2: s3-test.patch --]
[-- Type: application/octet-stream, Size: 1936 bytes --]
diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 141824a..71e4e82 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -53,7 +53,7 @@ BEGIN {
sshopts authorized_keys
remote_perl_script_open remote_perl_script_done
- host_reboot target_reboot target_reboot_hard
+ host_reboot host_suspend target_reboot target_reboot_hard
target_choose_vg target_umount_lv target_await_down
target_ping_check_down target_ping_check_up
@@ -777,6 +777,29 @@ sub get_stashed ($$) {
#---------- other stuff ----------
+sub host_suspend ($$) {
+ my ($ho, $time) = @_;
+ my $RTC = "/sys/class/rtc/rtc0";
+ my $epoch = target_cmd_output_root($ho, "cat " . $RTC . "/since_epoch");
+
+ # Clear the wake alarm
+ target_cmd_root($ho, "echo 0 > " . $RTC . "/wakealarm");
+ # Set the wake alarm to NOW + time
+ target_cmd_root($ho, "echo " . ($epoch + $time) . $RTC . "/wakealarm");
+
+ # Put the machine to sleep
+ # TODO: use pm-utils, or whatever is appropriate for OSS distro
+ # target_cmd_root($ho, "pm-suspend");
+ target_cmd_root($ho, "echo mem > /sys/power/state");
+
+ # TODO: - Wait until system goes into S3
+ # - Determine some way to tell if it failed to go into S3
+ # - Check processor state
+ # - Affinity
+ # - C-states
+ # - CPU pools
+}
+
sub host_reboot ($) {
my ($ho) = @_;
target_reboot($ho);
diff --git a/ts-host-suspend b/ts-host-suspend
new file mode 100755
index 0000000..d0ebe27
--- /dev/null
+++ b/ts-host-suspend
@@ -0,0 +1,14 @@
+#!/usr/bin/perl -w
+
+use strict qw(vars);
+use Osstest;
+use DBI;
+use Osstest::TestSupport;
+
+tsreadconfig();
+
+our ($whhost) = @ARGV;
+$whhost ||= 'host';
+our $ho= selecthost($whhost);
+
+host_suspend($ho, 30);
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-26 14:10 ` Ben Guthro
@ 2013-04-26 14:32 ` Ian Campbell
0 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2013-04-26 14:32 UTC (permalink / raw)
To: Ben Guthro
Cc: George Dunlap, Konrad Rzeszutek Wilk, Ian Jackson,
Marek Marczykowski, xen-devel
On Fri, 2013-04-26 at 15:10 +0100, Ben Guthro wrote:
> On Fri, Apr 26, 2013 at 9:17 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> <snip>
> >> I'm not sure I understand this point.
> >> Are you saying you want to see a test that fails in the standard test
> >> flight first...because without Konrad's patches, it will be guaranteed
> >> not to work.
> >
> > Right. AIUI the flights (and I may be using the wrong term here) are
> > somewhat uniform and and few in number and get run with various
> > combinations inputs (Xen tree, Linux tree, Qemu tree), so there is
> > effectively one "test Linux PV kernel flight" and one "test Xen PV
> > guests flight" etc, so we want to get S3 into those flights, with the
> > existing set of "* tree" inputs.
> >
> > IOW we should add a new row to the grid
> > http://www.chiark.greenend.org.uk/~xensrcts/logs/17816/ for s3 testing
> > and then we can consider adding a new column with a different set of
> > tree's as input.
> >
> > Ian J may have a different opinion on how to approach, but he's away
> > until mid next week.
> >
> >> ...and without other changesets queued up for the 3.10 merge window,
> >> non-boot CPUs will always have incorrect C-states.
> >
> > It's OK to add the tests before things work.
>
> I've attached a patch to osstest that would be the beginnings of this
> sort of test, if I'm reading the code correctly.
> However, I don't really have a setup that I can smoke test this.
>
> I also left a number of TODOs, since I really just want an opinion to
> see if I'm on the right track.
Right track, as far as it goes, but I think most of what you have put in
TestSupport.pm should actually be in the ts-host-suspend test case
itself.
You probably need IanJ's input for anything more concrete.
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-25 17:02 ` Ben Guthro
2013-04-26 8:10 ` Ian Campbell
@ 2013-04-26 20:47 ` Pasi Kärkkäinen
2013-04-26 23:41 ` Ben Guthro
1 sibling, 1 reply; 22+ messages in thread
From: Pasi Kärkkäinen @ 2013-04-26 20:47 UTC (permalink / raw)
To: Ben Guthro; +Cc: xen-devel
On Thu, Apr 25, 2013 at 01:02:35PM -0400, Ben Guthro wrote:
> On Thu, Apr 25, 2013 at 8:00 AM, Ben Guthro <ben@guthro.net> wrote:
> > I don't have time to bisect this, currently - but just thought I'd let
> > the list know that, while xen-4.2 works (with the recent S3 changes
> > I've submitted) - 4.3 is broken again.
> >
> > I'm not sure if it is the hypervisor, or the kernel, since I upgraded
> > both in my "unstable" build environment.
> >
>
> This appears to have been a transient issue. My xen tree was a few days old
> updating to the tip seems to have resolved this particular issue.
>
Ok, so master (xen-unstable) works OK regarding ACPI S3. Good.
What hypervisor-side patches are still missing from stable-4.2 branch?
-- Pasi
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-26 20:47 ` Pasi Kärkkäinen
@ 2013-04-26 23:41 ` Ben Guthro
2013-05-07 8:34 ` Pasi Kärkkäinen
0 siblings, 1 reply; 22+ messages in thread
From: Ben Guthro @ 2013-04-26 23:41 UTC (permalink / raw)
To: Pasi Kärkkäinen; +Cc: xen-devel
On Fri, Apr 26, 2013 at 4:47 PM, Pasi Kärkkäinen <pasik@iki.fi> wrote:
> On Thu, Apr 25, 2013 at 01:02:35PM -0400, Ben Guthro wrote:
>> On Thu, Apr 25, 2013 at 8:00 AM, Ben Guthro <ben@guthro.net> wrote:
>> > I don't have time to bisect this, currently - but just thought I'd let
>> > the list know that, while xen-4.2 works (with the recent S3 changes
>> > I've submitted) - 4.3 is broken again.
>> >
>> > I'm not sure if it is the hypervisor, or the kernel, since I upgraded
>> > both in my "unstable" build environment.
>> >
>>
>> This appears to have been a transient issue. My xen tree was a few days old
>> updating to the tip seems to have resolved this particular issue.
>>
>
> Ok, so master (xen-unstable) works OK regarding ACPI S3. Good.
>
> What hypervisor-side patches are still missing from stable-4.2 branch?
The final one that actually makes it work is
http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=9aa356bc9f7533c3cb7f02c823f532532876d444
Jan Beulich had already indicated that this would be picked up in the
4.2 release cycle, but it was too late to get it into 4.2.2
Then, also the ns16550 change.
While strictly not necessary to fix S3 in the normal path, it does fix
a bug that can lead to S3 not working if you
a. have one of these SuperIO controllers on the LPC bus.
b. have serial enabled.
http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=6e96c186d23873597896051b043cfeb119c4a7d5
On the linux side of things, The following are necessary:
One of the acpi-s3.vX branches. I use v9, but v10 is also available. I
don't think one has an advantage over the other.
acpi-s3.v10:
http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/commit/?h=devel/acpi-s3.v10&id=c268cd657314354f910b773a17a9de0299e1cc21
http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/commit/?h=devel/acpi-s3.v10&id=864848221b056aaf25416999c29cb0e14d3c3197
http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/commit/?h=devel/acpi-s3.v10&id=aa7eb7bbb3f2a39435a07c082f99893386ae83ec
stable/for-linus-3.10:
http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/commit/?h=stable/for-linus-3.10&id=3fac10145b766a2244422788f62dc35978613fd8
Ben
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-25 12:00 S3 is broken again in xen-unstable Ben Guthro
2013-04-25 17:02 ` Ben Guthro
@ 2013-04-29 8:45 ` Jan Beulich
2013-04-29 10:24 ` Ben Guthro
2013-04-29 10:55 ` George Dunlap
1 sibling, 2 replies; 22+ messages in thread
From: Jan Beulich @ 2013-04-29 8:45 UTC (permalink / raw)
To: Ben Guthro; +Cc: xen-devel
>>> On 25.04.13 at 14:00, Ben Guthro <ben@guthro.net> wrote:
> I don't have time to bisect this, currently - but just thought I'd let
> the list know that, while xen-4.2 works (with the recent S3 changes
> I've submitted) - 4.3 is broken again.
>
> I'm not sure if it is the hypervisor, or the kernel, since I upgraded
> both in my "unstable" build environment.
>
> Since this is something that XenClient really relies on working, it
> has been a pain point with every upgrade of Xen for us.
Perhaps one point here also is that you upgrade in too big steps?
More regular participation in development and patch review
would very likely also help keeping down the number of
regressions here.
Jan
> It is enormously time consuming to debug on every upgrade, and has a
> long tail in discovering problems (I started debugging S3 last Aug on
> xen-unstable, prior to 4.2 being cut)
>
> How can we work with the community to try to get some sort of
> regression testing for this feature that we rely on in our product?
>
> Ben
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-29 8:45 ` Jan Beulich
@ 2013-04-29 10:24 ` Ben Guthro
2013-04-29 10:55 ` George Dunlap
1 sibling, 0 replies; 22+ messages in thread
From: Ben Guthro @ 2013-04-29 10:24 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On Mon, Apr 29, 2013 at 4:45 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 25.04.13 at 14:00, Ben Guthro <ben@guthro.net> wrote:
>> I don't have time to bisect this, currently - but just thought I'd let
>> the list know that, while xen-4.2 works (with the recent S3 changes
>> I've submitted) - 4.3 is broken again.
>>
>> I'm not sure if it is the hypervisor, or the kernel, since I upgraded
>> both in my "unstable" build environment.
>>
>> Since this is something that XenClient really relies on working, it
>> has been a pain point with every upgrade of Xen for us.
>
> Perhaps one point here also is that you upgrade in too big steps?
Indeed. Unfortunately, realities in shipping product, and staffing to
this effort also need to be considered, that are out of individual
engineer's control (me)
Since I am not on the open source platform team in Citrix, I am unable
to dedicate my time strictly to the open source development on this,
as much as I would like to.
Also - when a product is stable, and the newer versions are focused on
features we tend not to make use of, it is a tough sell to management
to come up with a reason to upgrade. We stayed on the 4.0.y release
train because it was stable.
>
> More regular participation in development and patch review
> would very likely also help keeping down the number of
> regressions here.
The breakage tends to be out of my expertise, until I have to debug it.
Consequently, I've been learning a lot about schedulers, lately.
That said, these breakages have happened in paths that went through
reviews, and were not caught.
In development of XenClient, test automation is where we tend to
uncover S3 related bugs that are not caught in the review process,
because these problems are indidious in breaking in unexpected ways.
I'll try to participate in reviews in the future though, where I can,
though, since I do appreciate the value in doing so.
Ben
>
> Jan
>
>> It is enormously time consuming to debug on every upgrade, and has a
>> long tail in discovering problems (I started debugging S3 last Aug on
>> xen-unstable, prior to 4.2 being cut)
>>
>> How can we work with the community to try to get some sort of
>> regression testing for this feature that we rely on in our product?
>>
>> Ben
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-29 8:45 ` Jan Beulich
2013-04-29 10:24 ` Ben Guthro
@ 2013-04-29 10:55 ` George Dunlap
2013-04-29 11:07 ` Jan Beulich
1 sibling, 1 reply; 22+ messages in thread
From: George Dunlap @ 2013-04-29 10:55 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ben Guthro, xen-devel
On Mon, Apr 29, 2013 at 9:45 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 25.04.13 at 14:00, Ben Guthro <ben@guthro.net> wrote:
>> I don't have time to bisect this, currently - but just thought I'd let
>> the list know that, while xen-4.2 works (with the recent S3 changes
>> I've submitted) - 4.3 is broken again.
>>
>> I'm not sure if it is the hypervisor, or the kernel, since I upgraded
>> both in my "unstable" build environment.
>>
>> Since this is something that XenClient really relies on working, it
>> has been a pain point with every upgrade of Xen for us.
>
> Perhaps one point here also is that you upgrade in too big steps?
>
> More regular participation in development and patch review
> would very likely also help keeping down the number of
> regressions here.
Perhaps, but given the incredible amounts of traffic on the list, how
is he supposed to know which patches might break suspend or not? And
even if he did, he would have to take the time to understand every
single hypervisor patch and predict how it would act on suspend, which
is just not reasonable. Remember we're on the other side of this
equation wrt Linux -- it's all to easy for someone to move something
apparently innocuous around and have it break dom0 pvops in a way
that's not noticed until 6 months later. That's why we do regular
testing of Linus' tree, as well as Ingo's x86 tree.
The right thing to do is to put at least a basic suspend test into the
testing push-gate, so that when someone submits a change that breaks
suspend, *they* are the ones that have to figure out what went wrong
and fix it.
-George
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-26 8:10 ` Ian Campbell
2013-04-26 12:19 ` Ben Guthro
@ 2013-04-29 11:03 ` George Dunlap
[not found] ` <CAOvdn6VXNDKjxyJmMQNdTSaXu-f3_FYgwg_2LunG6fYpMw+ywQ@mail.gmail.com>
1 sibling, 1 reply; 22+ messages in thread
From: George Dunlap @ 2013-04-29 11:03 UTC (permalink / raw)
To: Ian Campbell; +Cc: Ian Jackson, Ben Guthro, xen-devel
On Fri, Apr 26, 2013 at 9:10 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Thu, 2013-04-25 at 18:02 +0100, Ben Guthro wrote:
>> On Thu, Apr 25, 2013 at 8:00 AM, Ben Guthro <ben@guthro.net> wrote:
>> > Since this is something that XenClient really relies on working, it
>> > has been a pain point with every upgrade of Xen for us.
>> > It is enormously time consuming to debug on every upgrade, and has a
>> > long tail in discovering problems (I started debugging S3 last Aug on
>> > xen-unstable, prior to 4.2 being cut)
>> >
>> > How can we work with the community to try to get some sort of
>> > regression testing for this feature that we rely on in our product?
>>
>> I am still interested in ideas for getting this into automated
>> testing, and any ideas people may have for this.
>
> CCing Ian Jackson who runs the test infrastructure.
>
> Contributing new tests is now less onerous than it once was (i.e. it
> might even possible at all). There is some info at
> http://lists.xen.org/archives/html/xen-devel/2012-10/msg01517.html
> although the branch may be out of date -- Ian was working on merging the
> standalone branch at one point.
>
> Some questions:
> * How automatable is s3?
> * In particular can we automate the wakeup? s3 is save to RAM
> IIRC, and most power control in the test system is done with PDU
> power cycling.
> * Would s3 ever be expected to work on the sorts of whitebox
> server systems which form the osstest pool or do we need to
> investigate additional hardware?
> * How hardware specific are the s3 failures -- we obviously can't
> have one of every laptop ever ;-)
When I discussed this with Ben before, it seemed that almost any
testing would be a very large improvement. Namely, it seemed to me
that even if the test just did a "null suspend" -- i.e., shut the
entire system down as though about to do a suspend but then just
resume without pulling the trigger -- would shake out a lot of bugs
(as well as make it very easy for devs that don't normally care about
suspending their machine to fix things). Having an RTC wake-up would
be the next thing to do after that.
-George
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-29 10:55 ` George Dunlap
@ 2013-04-29 11:07 ` Jan Beulich
0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2013-04-29 11:07 UTC (permalink / raw)
To: George Dunlap; +Cc: Ben Guthro, xen-devel
>>> On 29.04.13 at 12:55, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> On Mon, Apr 29, 2013 at 9:45 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 25.04.13 at 14:00, Ben Guthro <ben@guthro.net> wrote:
>>> I don't have time to bisect this, currently - but just thought I'd let
>>> the list know that, while xen-4.2 works (with the recent S3 changes
>>> I've submitted) - 4.3 is broken again.
>>>
>>> I'm not sure if it is the hypervisor, or the kernel, since I upgraded
>>> both in my "unstable" build environment.
>>>
>>> Since this is something that XenClient really relies on working, it
>>> has been a pain point with every upgrade of Xen for us.
>>
>> Perhaps one point here also is that you upgrade in too big steps?
>>
>> More regular participation in development and patch review
>> would very likely also help keeping down the number of
>> regressions here.
>
> Perhaps, but given the incredible amounts of traffic on the list, how
> is he supposed to know which patches might break suspend or not? And
> even if he did, he would have to take the time to understand every
> single hypervisor patch and predict how it would act on suspend, which
> is just not reasonable. Remember we're on the other side of this
> equation wrt Linux -- it's all to easy for someone to move something
> apparently innocuous around and have it break dom0 pvops in a way
> that's not noticed until 6 months later. That's why we do regular
> testing of Linus' tree, as well as Ingo's x86 tree.
>
> The right thing to do is to put at least a basic suspend test into the
> testing push-gate, so that when someone submits a change that breaks
> suspend, *they* are the ones that have to figure out what went wrong
> and fix it.
I was in no way suggesting this to be a bad idea. What I was
trying to point out is that testing a certain feature only every
couple of major releases is very likely to not nearly help as much
as being involved regularly. And no, I also didn't mean to suggest
for _anyone_ to review each and every individual patch. But
looking at some key ones before they go in would certainly help.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
[not found] ` <CAOvdn6VXNDKjxyJmMQNdTSaXu-f3_FYgwg_2LunG6fYpMw+ywQ@mail.gmail.com>
@ 2013-04-30 9:00 ` George Dunlap
0 siblings, 0 replies; 22+ messages in thread
From: George Dunlap @ 2013-04-30 9:00 UTC (permalink / raw)
To: Ben Guthro, xen-devel
On 04/29/2013 03:00 PM, Ben Guthro wrote:
> On Mon, Apr 29, 2013 at 7:03 AM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
>> On Fri, Apr 26, 2013 at 9:10 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>>> On Thu, 2013-04-25 at 18:02 +0100, Ben Guthro wrote:
>>>> On Thu, Apr 25, 2013 at 8:00 AM, Ben Guthro <ben@guthro.net> wrote:
>>>>> Since this is something that XenClient really relies on working, it
>>>>> has been a pain point with every upgrade of Xen for us.
>>>>> It is enormously time consuming to debug on every upgrade, and has a
>>>>> long tail in discovering problems (I started debugging S3 last Aug on
>>>>> xen-unstable, prior to 4.2 being cut)
>>>>>
>>>>> How can we work with the community to try to get some sort of
>>>>> regression testing for this feature that we rely on in our product?
>>>>
>>>> I am still interested in ideas for getting this into automated
>>>> testing, and any ideas people may have for this.
>>>
>>> CCing Ian Jackson who runs the test infrastructure.
>>>
>>> Contributing new tests is now less onerous than it once was (i.e. it
>>> might even possible at all). There is some info at
>>> http://lists.xen.org/archives/html/xen-devel/2012-10/msg01517.html
>>> although the branch may be out of date -- Ian was working on merging the
>>> standalone branch at one point.
>>>
>>> Some questions:
>>> * How automatable is s3?
>>> * In particular can we automate the wakeup? s3 is save to RAM
>>> IIRC, and most power control in the test system is done with PDU
>>> power cycling.
>>> * Would s3 ever be expected to work on the sorts of whitebox
>>> server systems which form the osstest pool or do we need to
>>> investigate additional hardware?
>>> * How hardware specific are the s3 failures -- we obviously can't
>>> have one of every laptop ever ;-)
>>
>> When I discussed this with Ben before, it seemed that almost any
>> testing would be a very large improvement. Namely, it seemed to me
>> that even if the test just did a "null suspend" -- i.e., shut the
>> entire system down as though about to do a suspend but then just
>> resume without pulling the trigger -- would shake out a lot of bugs
>> (as well as make it very easy for devs that don't normally care about
>> suspending their machine to fix things). Having an RTC wake-up would
>> be the next thing to do after that.
>>
>> -George
>
> FWIW, the patch that implements this "fake s3" functionality is
> implemented here, should it be considered for inclusion:
> http://markmail.org/message/ghj2ffngemccq6p4
[Adding xen-devel back in to the cc]
FYI playing around with this is on my to-do list, but it will probably
be until after the 4.3 release.
-George
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-26 12:19 ` Ben Guthro
2013-04-26 13:17 ` Ian Campbell
@ 2013-05-01 11:01 ` Ian Jackson
2013-05-01 12:03 ` Ben Guthro
1 sibling, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2013-05-01 11:01 UTC (permalink / raw)
To: Ben Guthro
Cc: George Dunlap, Marek Marczykowski, Konrad Rzeszutek Wilk,
Ian Campbell, xen-devel
Ben Guthro writes ("Re: [Xen-devel] S3 is broken again in xen-unstable"):
...
> That said, when things go wrong, the machine does need to be power
> cycled...so if you are not physically located near the machine under
> test, you would need a PDU as a recovery mechanism, I suppose.
Ah this makes matters a bit more complicated. The code which
implements the test schedule would need to know to power cycle the
host after a failure. Could we be confident that after a failed test
of this kind we wouldn't see filesystem corruption ?
Also, looking at your test script, you seem to be testing using dom0
only. We're ignoring guests then. Perhaps this should be a separate
test column. (That might be a way to fudge the recovery question
too.)
> > * How hardware specific are the s3 failures -- we obviously can't
> > have one of every laptop ever ;-)
>
> Clearly. I'm just looking to get a foot in the door here, so there is
> a chance of catching gross regressions.
> The hardware differences seem to be more timing related, due to
> speed... ie, you are likely to uncover new failures when new, faster
> hardware comes out for laptops.
> Since typically server hardware is faster than laptop hardware, that
> would theoretically catch problems at a higher frequency.
If the hardware/BIOS is likely to be buggy, that's a bit of a pain.
We'd have to at least figure out which machines worked and flag them
so that the test was only run on those.
> > Once we have a test case in the standard flights then we can consider
> > the options around new flights testing other trees.
>
> I'm not sure I understand this point.
> Are you saying you want to see a test that fails in the standard test
> flight first...because without Konrad's patches, it will be guaranteed
> not to work.
As Ian says, there is no problem with deploying the test first and
fixing the actual code later...
Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-05-01 11:01 ` Ian Jackson
@ 2013-05-01 12:03 ` Ben Guthro
0 siblings, 0 replies; 22+ messages in thread
From: Ben Guthro @ 2013-05-01 12:03 UTC (permalink / raw)
To: Ian Jackson
Cc: George Dunlap, Marek Marczykowski, Konrad Rzeszutek Wilk,
Ian Campbell, xen-devel
On Wed, May 1, 2013 at 7:01 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> Ben Guthro writes ("Re: [Xen-devel] S3 is broken again in xen-unstable"):
> ...
>> That said, when things go wrong, the machine does need to be power
>> cycled...so if you are not physically located near the machine under
>> test, you would need a PDU as a recovery mechanism, I suppose.
>
> Ah this makes matters a bit more complicated. The code which
> implements the test schedule would need to know to power cycle the
> host after a failure. Could we be confident that after a failed test
> of this kind we wouldn't see filesystem corruption ?
If you are using a journaled filesystem, I think the confidence level
is raised...but there are no guarantees, when you just yank a power
cord.
>
> Also, looking at your test script, you seem to be testing using dom0
> only. We're ignoring guests then. Perhaps this should be a separate
> test column. (That might be a way to fudge the recovery question
> too.)
I'm going for baby-steps here.
The vast majority of the S3 failures we have encountered have been
dom0 related, so I thought that would be a decent starting place.
>
>> > * How hardware specific are the s3 failures -- we obviously can't
>> > have one of every laptop ever ;-)
>>
>> Clearly. I'm just looking to get a foot in the door here, so there is
>> a chance of catching gross regressions.
>> The hardware differences seem to be more timing related, due to
>> speed... ie, you are likely to uncover new failures when new, faster
>> hardware comes out for laptops.
>> Since typically server hardware is faster than laptop hardware, that
>> would theoretically catch problems at a higher frequency.
>
> If the hardware/BIOS is likely to be buggy, that's a bit of a pain.
> We'd have to at least figure out which machines worked and flag them
> so that the test was only run on those.
I think testing a known good configuration for regression seems
appropriate, yes.
They all *should* work...but I'm just being conservative here.
>
>> > Once we have a test case in the standard flights then we can consider
>> > the options around new flights testing other trees.
>>
>> I'm not sure I understand this point.
>> Are you saying you want to see a test that fails in the standard test
>> flight first...because without Konrad's patches, it will be guaranteed
>> not to work.
>
> As Ian says, there is no problem with deploying the test first and
> fixing the actual code later...
>
> Ian.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-26 23:41 ` Ben Guthro
@ 2013-05-07 8:34 ` Pasi Kärkkäinen
2013-05-07 8:40 ` Jan Beulich
[not found] ` <26100746.41126.1367916036066.JavaMail.mobile-sync@vcin11>
0 siblings, 2 replies; 22+ messages in thread
From: Pasi Kärkkäinen @ 2013-05-07 8:34 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ben Guthro, xen-devel
On Fri, Apr 26, 2013 at 07:41:07PM -0400, Ben Guthro wrote:
> >
> > Ok, so master (xen-unstable) works OK regarding ACPI S3. Good.
> >
> > What hypervisor-side patches are still missing from stable-4.2 branch?
>
> The final one that actually makes it work is
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=9aa356bc9f7533c3cb7f02c823f532532876d444
> Jan Beulich had already indicated that this would be picked up in the
> 4.2 release cycle, but it was too late to get it into 4.2.2
>
Yep, I can see this already in 4.2 branch for 4.2.3.
>
> Then, also the ns16550 change.
> While strictly not necessary to fix S3 in the normal path, it does fix
> a bug that can lead to S3 not working if you
> a. have one of these SuperIO controllers on the LPC bus.
> b. have serial enabled.
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=6e96c186d23873597896051b043cfeb119c4a7d5
>
Jan: I think this ns16550 patch should be backported to 4.2 branch aswell..
Thanks,
-- Pasi
>
> On the linux side of things, The following are necessary:
> One of the acpi-s3.vX branches. I use v9, but v10 is also available. I
> don't think one has an advantage over the other.
>
> acpi-s3.v10:
> http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/commit/?h=devel/acpi-s3.v10&id=c268cd657314354f910b773a17a9de0299e1cc21
> http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/commit/?h=devel/acpi-s3.v10&id=864848221b056aaf25416999c29cb0e14d3c3197
> http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/commit/?h=devel/acpi-s3.v10&id=aa7eb7bbb3f2a39435a07c082f99893386ae83ec
>
> stable/for-linus-3.10:
> http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/commit/?h=stable/for-linus-3.10&id=3fac10145b766a2244422788f62dc35978613fd8
>
>
> Ben
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-05-07 8:34 ` Pasi Kärkkäinen
@ 2013-05-07 8:40 ` Jan Beulich
[not found] ` <26100746.41126.1367916036066.JavaMail.mobile-sync@vcin11>
1 sibling, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2013-05-07 8:40 UTC (permalink / raw)
To: Ben Guthro, Pasi Kärkkäinen; +Cc: xen-devel
>>> On 07.05.13 at 10:34, Pasi Kärkkäinen<pasik@iki.fi> wrote:
> On Fri, Apr 26, 2013 at 07:41:07PM -0400, Ben Guthro wrote:
>> Then, also the ns16550 change.
>> While strictly not necessary to fix S3 in the normal path, it does fix
>> a bug that can lead to S3 not working if you
>> a. have one of these SuperIO controllers on the LPC bus.
>> b. have serial enabled.
>>
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=6e96c186d23873597896051b0
> 43cfeb119c4a7d5
>>
>
> Jan: I think this ns16550 patch should be backported to 4.2 branch aswell..
Yeah, as being secondary I left this off until we know that this
really is the only thing known to break resume (i.e. I saw no point
in backporting this when in the end S3 still wouldn't work anyway).
Ben - am I right in understanding your earlier summary in this
thread to mean that the 4.2 branch, according to your testing,
is now is such a state?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
[not found] ` <26100746.41126.1367916036066.JavaMail.mobile-sync@vcin11>
@ 2013-05-07 9:18 ` Ben Guthro
0 siblings, 0 replies; 22+ messages in thread
From: Ben Guthro @ 2013-05-07 9:18 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ben Guthro, xen-devel
On May 7, 2013, at 4:40 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 07.05.13 at 10:34, Pasi Kärkkäinen<pasik@iki.fi> wrote:
>> On Fri, Apr 26, 2013 at 07:41:07PM -0400, Ben Guthro wrote:
>>> Then, also the ns16550 change.
>>> While strictly not necessary to fix S3 in the normal path, it does fix
>>> a bug that can lead to S3 not working if you
>>> a. have one of these SuperIO controllers on the LPC bus.
>>> b. have serial enabled.
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;hn96c186d23873597896051b0
>> 43cfeb119c4a7d5
>>
>> Jan: I think this ns16550 patch should be backported to 4.2 branch aswell..
>
> Yeah, as being secondary I left this off until we know that this
> really is the only thing known to break resume (i.e. I saw no point
> in backporting this when in the end S3 still wouldn't work anyway).
>
> Ben - am I right in understanding your earlier summary in this
> thread to mean that the 4.2 branch, according to your testing,
> is now is such a state?
>
Yea, S3 works on the 4.2.3 branch without this patch. This fixes a
specific corner case on some machines with the SuperIO hardware.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-04-26 13:17 ` Ian Campbell
2013-04-26 14:10 ` Ben Guthro
@ 2013-05-21 18:29 ` Ben Guthro
2013-05-21 18:52 ` Pasi Kärkkäinen
1 sibling, 1 reply; 22+ messages in thread
From: Ben Guthro @ 2013-05-21 18:29 UTC (permalink / raw)
To: Ian Campbell
Cc: George Dunlap, Konrad Rzeszutek Wilk, Ian Jackson,
Marek Marczykowski, xen-devel
On Fri, Apr 26, 2013 at 9:17 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Fri, 2013-04-26 at 13:19 +0100, Ben Guthro wrote:
>> On Fri, Apr 26, 2013 at 4:10 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>
>> > Some questions:
>> > * How automatable is s3?
>> > * In particular can we automate the wakeup? s3 is save to RAM
>> > IIRC, and most power control in the test system is done with PDU
>> > power cycling.
>>
>> I spoke with George Dunlap a bit about this while I was over in the
>> UK a few weeks ago, and drew up an example shell script for this:
>> http://xen.markmail.org/thread/ghj2ffngemccq6p4
>> Marek also weighed in, and included some of his own tests, and experiences.
>>
>> In my experience, this mechanism is about as reliable as your RTC. On
>> some systems you might tell it to sleep for 30s, and it will wake in
>> 10s.
>>
>> That said, when things go wrong, the machine does need to be power
>> cycled...so if you are not physically located near the machine under
>> test, you would need a PDU as a recovery mechanism, I suppose.
>
> That'#s OK, all the systems in the test harness would have to have PDU
> for the other test cases (initial install etc) anyway.
>
>> >> Would it be helpful to maintain a branch in my xenbits repo that could
>> >> be a rebased version of konrad's acpi-s3 patches against Linus' latest
>> >> kernel?
>> >
>> > What is keeping those out of Linus' tree?
>>
>> Added Konrad here, but I believe he is on vacation this week.
>> This has been a bullet point on his OSS presentation, as outstanding
>> pvops work for at least 3 years now.
>>
>> IIRC, the x86 guys NACK'ed the change as being too invasive.
>> I googled around a bit, but can't seem to find the thread about it.
>
> I wonder if it might be something like that :-/
FWIW, I believe we are over another hurdle here.
I have commitment from the acpi maintainer (Rafael Wysocki) that the
following patches will be included in the linux-3.11 merge window:
https://lkml.org/lkml/2013/5/14/465
When this is accepted, this should give "out of the box" S3
functionality with Xen
Once Xen-4.3 is released, I would like to revisit trying to see if
there would be some way to get something into the automated test
system.
Ben
>
>> > Once we have a test case in the standard flights then we can consider
>> > the options around new flights testing other trees.
>>
>> I'm not sure I understand this point.
>> Are you saying you want to see a test that fails in the standard test
>> flight first...because without Konrad's patches, it will be guaranteed
>> not to work.
>
> Right. AIUI the flights (and I may be using the wrong term here) are
> somewhat uniform and and few in number and get run with various
> combinations inputs (Xen tree, Linux tree, Qemu tree), so there is
> effectively one "test Linux PV kernel flight" and one "test Xen PV
> guests flight" etc, so we want to get S3 into those flights, with the
> existing set of "* tree" inputs.
>
> IOW we should add a new row to the grid
> http://www.chiark.greenend.org.uk/~xensrcts/logs/17816/ for s3 testing
> and then we can consider adding a new column with a different set of
> tree's as input.
>
> Ian J may have a different opinion on how to approach, but he's away
> until mid next week.
>
>> ...and without other changesets queued up for the 3.10 merge window,
>> non-boot CPUs will always have incorrect C-states.
>
> It's OK to add the tests before things work.
>
> Ian.
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: S3 is broken again in xen-unstable
2013-05-21 18:29 ` Ben Guthro
@ 2013-05-21 18:52 ` Pasi Kärkkäinen
0 siblings, 0 replies; 22+ messages in thread
From: Pasi Kärkkäinen @ 2013-05-21 18:52 UTC (permalink / raw)
To: Ben Guthro
Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson,
Marek Marczykowski, xen-devel
On Tue, May 21, 2013 at 02:29:23PM -0400, Ben Guthro wrote:
> >> >
> >> > What is keeping those out of Linus' tree?
> >>
> >> Added Konrad here, but I believe he is on vacation this week.
> >> This has been a bullet point on his OSS presentation, as outstanding
> >> pvops work for at least 3 years now.
> >>
> >> IIRC, the x86 guys NACK'ed the change as being too invasive.
> >> I googled around a bit, but can't seem to find the thread about it.
> >
> > I wonder if it might be something like that :-/
>
> FWIW, I believe we are over another hurdle here.
> I have commitment from the acpi maintainer (Rafael Wysocki) that the
> following patches will be included in the linux-3.11 merge window:
>
> https://lkml.org/lkml/2013/5/14/465
>
> When this is accepted, this should give "out of the box" S3
> functionality with Xen
>
This is great, thanks a lot!
> Once Xen-4.3 is released, I would like to revisit trying to see if
> there would be some way to get something into the automated test
> system.
>
-- Pasi
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2013-05-21 18:52 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-25 12:00 S3 is broken again in xen-unstable Ben Guthro
2013-04-25 17:02 ` Ben Guthro
2013-04-26 8:10 ` Ian Campbell
2013-04-26 12:19 ` Ben Guthro
2013-04-26 13:17 ` Ian Campbell
2013-04-26 14:10 ` Ben Guthro
2013-04-26 14:32 ` Ian Campbell
2013-05-21 18:29 ` Ben Guthro
2013-05-21 18:52 ` Pasi Kärkkäinen
2013-05-01 11:01 ` Ian Jackson
2013-05-01 12:03 ` Ben Guthro
2013-04-29 11:03 ` George Dunlap
[not found] ` <CAOvdn6VXNDKjxyJmMQNdTSaXu-f3_FYgwg_2LunG6fYpMw+ywQ@mail.gmail.com>
2013-04-30 9:00 ` George Dunlap
2013-04-26 20:47 ` Pasi Kärkkäinen
2013-04-26 23:41 ` Ben Guthro
2013-05-07 8:34 ` Pasi Kärkkäinen
2013-05-07 8:40 ` Jan Beulich
[not found] ` <26100746.41126.1367916036066.JavaMail.mobile-sync@vcin11>
2013-05-07 9:18 ` Ben Guthro
2013-04-29 8:45 ` Jan Beulich
2013-04-29 10:24 ` Ben Guthro
2013-04-29 10:55 ` George Dunlap
2013-04-29 11:07 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.