* Results from the Xen 4.4-rc2 test day
@ 2014-01-21 10:28 Andrew Cooper
2014-01-23 11:54 ` Andrew Cooper
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2014-01-21 10:28 UTC (permalink / raw)
To: xen-devel@lists.xen.org; +Cc: George Dunlap, Ian Campbell
Hello,
I participated in (a rather extended version of) the 4.4-rc2 test day,
and rc2 got a full XenRT nightly run in the XenServer testing system.
For the setup, the comparison is against XenServer trunk, which is
currently Xen-4.3-staging based (plus patch queue), Linux 3.10.y dom0
kernel, CentOS 6.4 based dom0 userspace.
The tested version had Xen 4.4 (staging, as I needed the ABI fix) in
place of Xen-4.3, but identical dom0 kernel, dom0 userspace, qemu,
toolstack and windows PV drivers.
The major issue identified is with Windows 8/8.1 and Server 2012/2012r2,
which have problems on live migrate. Some source of time is
unexpectedly jumping forwards by two days, from the correct time to 2
days in the future. The observed result is that it looses its DHCP
lease, drops its IP address and networking ceases to work (It appears
that windows will not attempt to renew the lease itself).
I am currently investigating which source of time is jumping forwards,
but this does appear to be a regression directly attributable to Xen 4.4.
~Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Results from the Xen 4.4-rc2 test day
2014-01-21 10:28 Results from the Xen 4.4-rc2 test day Andrew Cooper
@ 2014-01-23 11:54 ` Andrew Cooper
2014-01-23 12:53 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2014-01-23 11:54 UTC (permalink / raw)
To: xen-devel@lists.xen.org; +Cc: George Dunlap, Jan Beulich
On 21/01/14 10:28, Andrew Cooper wrote:
> Hello,
>
> I participated in (a rather extended version of) the 4.4-rc2 test day,
> and rc2 got a full XenRT nightly run in the XenServer testing system.
>
> For the setup, the comparison is against XenServer trunk, which is
> currently Xen-4.3-staging based (plus patch queue), Linux 3.10.y dom0
> kernel, CentOS 6.4 based dom0 userspace.
>
> The tested version had Xen 4.4 (staging, as I needed the ABI fix) in
> place of Xen-4.3, but identical dom0 kernel, dom0 userspace, qemu,
> toolstack and windows PV drivers.
>
>
> The major issue identified is with Windows 8/8.1 and Server 2012/2012r2,
> which have problems on live migrate. Some source of time is
> unexpectedly jumping forwards by two days, from the correct time to 2
> days in the future. The observed result is that it looses its DHCP
> lease, drops its IP address and networking ceases to work (It appears
> that windows will not attempt to renew the lease itself).
>
This is caused by commit e36cd2cdc9674a7a4855d21fb7b3e6e17c4bb33b
"x86/viridian: Time Reference Count MSR"
After double checking with the specification, it does appear to be
implemented as required (subject to a potential issue with multiple vcpu
guests).
I am currently experimenting to see whether hvm_get_guest_time() is
returning unexpected values, or whether it is returning expected values
and Windows is interpreting them differently.
At this point in the 4.4 release cycle, reverting the patch should be
seriously considered, although I would like to see whether it is
possible to work out why it is wrong and whether there is an obvious fix
first.
~Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Results from the Xen 4.4-rc2 test day
2014-01-23 11:54 ` Andrew Cooper
@ 2014-01-23 12:53 ` Jan Beulich
2014-01-23 17:17 ` Andrew Cooper
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2014-01-23 12:53 UTC (permalink / raw)
To: Andrew Cooper; +Cc: George Dunlap, Paul Durrant, xen-devel@lists.xen.org
>>> On 23.01.14 at 12:54, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 21/01/14 10:28, Andrew Cooper wrote:
>> The major issue identified is with Windows 8/8.1 and Server 2012/2012r2,
>> which have problems on live migrate. Some source of time is
>> unexpectedly jumping forwards by two days, from the correct time to 2
>> days in the future. The observed result is that it looses its DHCP
>> lease, drops its IP address and networking ceases to work (It appears
>> that windows will not attempt to renew the lease itself).
>>
>
> This is caused by commit e36cd2cdc9674a7a4855d21fb7b3e6e17c4bb33b
> "x86/viridian: Time Reference Count MSR"
>
> After double checking with the specification, it does appear to be
> implemented as required (subject to a potential issue with multiple vcpu
> guests).
>
> I am currently experimenting to see whether hvm_get_guest_time() is
> returning unexpected values, or whether it is returning expected values
> and Windows is interpreting them differently.
>
> At this point in the 4.4 release cycle, reverting the patch should be
> seriously considered, although I would like to see whether it is
> possible to work out why it is wrong and whether there is an obvious fix
> first.
I suppose you and/or Paul will let us know either way.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Results from the Xen 4.4-rc2 test day
2014-01-23 12:53 ` Jan Beulich
@ 2014-01-23 17:17 ` Andrew Cooper
2014-01-24 9:23 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2014-01-23 17:17 UTC (permalink / raw)
To: Jan Beulich; +Cc: George Dunlap, Paul Durrant, xen-devel@lists.xen.org
On 23/01/14 12:53, Jan Beulich wrote:
>>>> On 23.01.14 at 12:54, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 21/01/14 10:28, Andrew Cooper wrote:
>>> The major issue identified is with Windows 8/8.1 and Server 2012/2012r2,
>>> which have problems on live migrate. Some source of time is
>>> unexpectedly jumping forwards by two days, from the correct time to 2
>>> days in the future. The observed result is that it looses its DHCP
>>> lease, drops its IP address and networking ceases to work (It appears
>>> that windows will not attempt to renew the lease itself).
>>>
>> This is caused by commit e36cd2cdc9674a7a4855d21fb7b3e6e17c4bb33b
>> "x86/viridian: Time Reference Count MSR"
>>
>> After double checking with the specification, it does appear to be
>> implemented as required (subject to a potential issue with multiple vcpu
>> guests).
>>
>> I am currently experimenting to see whether hvm_get_guest_time() is
>> returning unexpected values, or whether it is returning expected values
>> and Windows is interpreting them differently.
>>
>> At this point in the 4.4 release cycle, reverting the patch should be
>> seriously considered, although I would like to see whether it is
>> possible to work out why it is wrong and whether there is an obvious fix
>> first.
> I suppose you and/or Paul will let us know either way.
>
> Jan
>
The value of time read from hvm_get_guest_time() resets with a new
domid, making it an inappropriate source of time for the described
function of the MSR.
I suspect Windows 8 only notices at first on migration as I believe that
it is the first case where the generation ID is supposed to change and
signal a reset of state. The detection of the failure is actually
further complicated as there appears to be a race condition between the
guest tools reseting the clock back to the correct value, and the DHCP
lease being flushed. XenRT only notices the failure if the DHCP lease
is actually lost (thus XenRT can't communicate with it's xmlrpc daemon
inside the VM), and doesn't directly notice the foward/backward stepping
in time.
Anyway - please revert the patch - it will be a non-trivial change to
expose an appropriate source of time to be consumed by this MSR.
~Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Results from the Xen 4.4-rc2 test day
2014-01-23 17:17 ` Andrew Cooper
@ 2014-01-24 9:23 ` Jan Beulich
2014-01-24 11:37 ` Andrew Cooper
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2014-01-24 9:23 UTC (permalink / raw)
To: Andrew Cooper; +Cc: George Dunlap, Paul Durrant, xen-devel@lists.xen.org
>>> On 23.01.14 at 18:17, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> The value of time read from hvm_get_guest_time() resets with a new
> domid, making it an inappropriate source of time for the described
> function of the MSR.
>
> I suspect Windows 8 only notices at first on migration as I believe that
> it is the first case where the generation ID is supposed to change and
> signal a reset of state. The detection of the failure is actually
> further complicated as there appears to be a race condition between the
> guest tools reseting the clock back to the correct value, and the DHCP
> lease being flushed. XenRT only notices the failure if the DHCP lease
> is actually lost (thus XenRT can't communicate with it's xmlrpc daemon
> inside the VM), and doesn't directly notice the foward/backward stepping
> in time.
>
> Anyway - please revert the patch - it will be a non-trivial change to
> expose an appropriate source of time to be consumed by this MSR.
Done, albeit not completely - I left the #define-s in place.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Results from the Xen 4.4-rc2 test day
2014-01-24 9:23 ` Jan Beulich
@ 2014-01-24 11:37 ` Andrew Cooper
2014-01-24 16:19 ` George Dunlap
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2014-01-24 11:37 UTC (permalink / raw)
To: Jan Beulich; +Cc: George Dunlap, Paul Durrant, xen-devel@lists.xen.org
On 24/01/14 09:23, Jan Beulich wrote:
>>>> On 23.01.14 at 18:17, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> The value of time read from hvm_get_guest_time() resets with a new
>> domid, making it an inappropriate source of time for the described
>> function of the MSR.
>>
>> I suspect Windows 8 only notices at first on migration as I believe that
>> it is the first case where the generation ID is supposed to change and
>> signal a reset of state. The detection of the failure is actually
>> further complicated as there appears to be a race condition between the
>> guest tools reseting the clock back to the correct value, and the DHCP
>> lease being flushed. XenRT only notices the failure if the DHCP lease
>> is actually lost (thus XenRT can't communicate with it's xmlrpc daemon
>> inside the VM), and doesn't directly notice the foward/backward stepping
>> in time.
>>
>> Anyway - please revert the patch - it will be a non-trivial change to
>> expose an appropriate source of time to be consumed by this MSR.
> Done, albeit not completely - I left the #define-s in place.
>
> Jan
>
Thanks - I have pulled XenServer's 4.4-rc2 branch forward to current
staging, and the w2k12 vmlifecycle tests are now working without error.
I shall organise another full nightly regression test for some time in
the next few days.
~Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Results from the Xen 4.4-rc2 test day
2014-01-24 11:37 ` Andrew Cooper
@ 2014-01-24 16:19 ` George Dunlap
0 siblings, 0 replies; 7+ messages in thread
From: George Dunlap @ 2014-01-24 16:19 UTC (permalink / raw)
To: Andrew Cooper, Jan Beulich; +Cc: Paul Durrant, xen-devel@lists.xen.org
On 01/24/2014 11:37 AM, Andrew Cooper wrote:
> On 24/01/14 09:23, Jan Beulich wrote:
>>>>> On 23.01.14 at 18:17, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> The value of time read from hvm_get_guest_time() resets with a new
>>> domid, making it an inappropriate source of time for the described
>>> function of the MSR.
>>>
>>> I suspect Windows 8 only notices at first on migration as I believe that
>>> it is the first case where the generation ID is supposed to change and
>>> signal a reset of state. The detection of the failure is actually
>>> further complicated as there appears to be a race condition between the
>>> guest tools reseting the clock back to the correct value, and the DHCP
>>> lease being flushed. XenRT only notices the failure if the DHCP lease
>>> is actually lost (thus XenRT can't communicate with it's xmlrpc daemon
>>> inside the VM), and doesn't directly notice the foward/backward stepping
>>> in time.
>>>
>>> Anyway - please revert the patch - it will be a non-trivial change to
>>> expose an appropriate source of time to be consumed by this MSR.
>> Done, albeit not completely - I left the #define-s in place.
>>
>> Jan
>>
> Thanks - I have pulled XenServer's 4.4-rc2 branch forward to current
> staging, and the w2k12 vmlifecycle tests are now working without error.
>
> I shall organise another full nightly regression test for some time in
> the next few days.
Overall, excellent news. Thanks.
-George
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-01-24 16:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-21 10:28 Results from the Xen 4.4-rc2 test day Andrew Cooper
2014-01-23 11:54 ` Andrew Cooper
2014-01-23 12:53 ` Jan Beulich
2014-01-23 17:17 ` Andrew Cooper
2014-01-24 9:23 ` Jan Beulich
2014-01-24 11:37 ` Andrew Cooper
2014-01-24 16:19 ` George Dunlap
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.