xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Big Bug:Time in VM running on xen goes slower
@ 2012-08-07 15:44 tupeng212
  2012-08-08  9:20 ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: tupeng212 @ 2012-08-07 15:44 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2437 bytes --]

Dear all:
I have found a big bug on xen concerning time virtualization. Please let me show you the whole process:

1 Phenomenon
when I run a JVM based program in IE browser in my Virtual Machine, I have found clearly that time at the right bottom corner in my VM gets more slower and slower.
I studied the bug deeply, and found something below.

2 Xen
vmx_vmexit_handler  --> ......... --> handle_rtc_io  --> rtc_ioport_write  --> rtc_timer_update --> set RTC's REG_A to a high rate--> create_periodic_time(disable the former timer, and init a new one)
Win7 is installed in the vm. This calling path is executed so frequent that may come down to set the RTC's REG_A hundreds of times every second but with the same rate(976.562us(1024HZ)), it is so abnormal to me to see such behavior.

3 OS
I have tried to find why the win7 setted RTC's regA so frequently. finally got the result, all that comes from a function: NtSetTimerResolution --> 0x70,0x71
when I attached windbg into the guest OS, I also found the source, they are all called from a upper system call that comes from JVM(Java Virtual Machine).

4 JVM
I don't know why JVM calls NtSetTimerResolution to set the same RTC's rate down (976.562us(1024HZ)) so frequently. 
But found something useful, in the java source code, I found many palaces written with time.scheduleAtFixedRate(), Informations from Internet told me this function scheduleAtFixedRate demands a higher time resolution. so I guess the whole process may be this: 
java wants a higher time resolution timer, so it changes the RTC's rate from 15.625ms(64HZ) to 976.562us(1024HZ), after that, it reconfirms whether the time is accurate as expected, but it's sorry to get some notice it 's not accurate either. so it sets  the RTC's rate from 15.625ms(64HZ) to 976.562us(1024HZ) again and again..., at last, results in a slow system timer in vm.
there is also a frequently called function going into my eyes: QueryPerformanceCounter.


what my problems are:
1 why the JVM sets the same RTC's rate 976562 down to the coms again and again? what he found  is abnormal?
2 even a abnormal user program calls create_periodic_time to set the rate again and again, how do we avoid the influence upon our time in vm? how do we compensate the elapsed time at the switching point? especially when the switch is so frequent.

can some big figures give me some advices on this?
thanks!



tupeng212

[-- Attachment #1.2: Type: text/html, Size: 6758 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Big Bug:Time in VM running on xen goes slower
  2012-08-07 15:44 Big Bug:Time in VM running on xen goes slower tupeng212
@ 2012-08-08  9:20 ` Jan Beulich
  2012-08-10 15:17   ` Big Bug:Time in VM goes slower; foud Solution but demand Judgement! A Interesting Story! tupeng212
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2012-08-08  9:20 UTC (permalink / raw)
  To: tupeng212; +Cc: xen-devel

>>> On 07.08.12 at 17:44, tupeng212 <tupeng212@gmail.com> wrote:
> 2 Xen
> vmx_vmexit_handler  --> ......... --> handle_rtc_io  --> rtc_ioport_write  --> 
> rtc_timer_update --> set RTC's REG_A to a high rate--> create_periodic_time(disable 
> the former timer, and init a new one)
> Win7 is installed in the vm. This calling path is executed so frequent that 
> may come down to set the RTC's REG_A hundreds of times every second but with 
> the same rate(976.562us(1024HZ)), it is so abnormal to me to see such 
> behavior.

_If_ the problem is merely with the high rate of calls to
create_periodic_time(), I think this could be taken care of by
avoiding the call (and perhaps the call to rtc_timer_update() in
the first place) by checking whether anything actually changes
due to the current write. I don't think, however, that this would
help much, as the high rate of port accesses (and hence traps
into the hypervisor) would remain. It might, nevertheless, get
your immediate problem of the time slowing down taken care of
if that is caused inside Xen (but the cause here may as well be in
the Windows kernel).

> 3 OS
> I have tried to find why the win7 setted RTC's regA so frequently. finally 
> got the result, all that comes from a function: NtSetTimerResolution --> 
> 0x70,0x71
> when I attached windbg into the guest OS, I also found the source, they are 
> all called from a upper system call that comes from JVM(Java Virtual 
> Machine).

Getting Windows to be a little smarter and avoid the port I/O when
doing redundant writes would of course be even better, but is
clearly a difficult to achieve goal.

> 4 JVM
> I don't know why JVM calls NtSetTimerResolution to set the same RTC's rate 
> down (976.562us(1024HZ)) so frequently. 
> But found something useful, in the java source code, I found many palaces 
> written with time.scheduleAtFixedRate(), Informations from Internet told me 
> this function scheduleAtFixedRate demands a higher time resolution. so I 
> guess the whole process may be this: 
> java wants a higher time resolution timer, so it changes the RTC's rate from 
> 15.625ms(64HZ) to 976.562us(1024HZ), after that, it reconfirms whether the 
> time is accurate as expected, but it's sorry to get some notice it 's not 
> accurate either. so it sets  the RTC's rate from 15.625ms(64HZ) to 
> 976.562us(1024HZ) again and again..., at last, results in a slow system timer 
> in vm.

Now that's really the fundamental thing to find out - what makes it
think the clock isn't accurate? Is this an artifact of scheduling (as
the scheduler tick certainly is several milliseconds, whereas
"accurate" here appears to require below 1ms granularity), perhaps
as a result of the box being overloaded (i.e. the VM not being able
to get scheduled quickly enough when the timer expires)? For that,
did you try lowering the scheduler time slice and/or its rate limit
(possible via command line option)? Of course doing so may have
other undesirable side effects, but it would be worth a try.

Did you further check whether the adjustments done to the
scheduled time in create_periodic_time() are responsible for this
conclusion of the JVM (could be effectively doubling the first
interval if HVM_PARAM_VPT_ALIGN is set, and with the high rate
of re-sets this could certainly have a more visible effect than
intended)?

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Big Bug:Time in VM goes slower; foud Solution but demand Judgement! A Interesting Story!
  2012-08-08  9:20 ` Jan Beulich
@ 2012-08-10 15:17   ` tupeng212
  2012-08-13 15:21     ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: tupeng212 @ 2012-08-10 15:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 10477 bytes --]

Dear Jan Beulich:
Thanks for reply! You are a very clever man, you have sensed some thing immediately as I found lately. Please forgive my so lately reply!

1 why JVM set the same rate down so frequently ?
the first achievement I will show is I found the action in JVM and the reason by debugging disassembly code.
it seems to me like this in JVM:
======================================== 1 what happened in JVM ====================================
while (loop)//or a frequent call
{
timeBeginPeriod() --> NtSetTimerResolution(1(enble))
rc = WaitForMultipleObjects(5, 0x2222222, 0, 1); //the last parameter demands 1ms timer resolution
if (rc = TIMEOUT){
break;
}
else{
call 0x44444444;
}
timeEndPeriod() --> NtSetTimerResolution(0(disable))
}
====================================================================================================
so its behavior is totally legal, for it demands higher timer resolution (here 1ms), so it calls NtSetTimerResolution to improve the resolution. 
it is not as I guessed "unaccurate"

I also wrote a tester below, which confirms my suppose. if you are interested in it , you can build it by MS's compiler, and after running the tester in VM about 1 minutes, VM's time will slow.
=========================== 2 a test which will lead to trigger slowing VM's inner time========
#include <stdio.h>
#include <windows.h>
typedef int (__stdcall *NTSETTIMER)(IN ULONG RequestedResolution, IN BOOLEAN Set, OUT PULONG ActualResolution );
typedef int (__stdcall *NTQUERYTIMER)(OUT PULONG   MinimumResolution, OUT PULONG MaximumResolution, OUT PULONG CurrentResolution );
int main()
{
DWORD min_res = 0, max_res = 0, cur_res = 0, ret = 0;
HMODULE  hdll = NULL;
hdll = GetModuleHandle("ntdll.dll");
NTSETTIMER AddrNtSetTimer = 0;
NTQUERYTIMER AddrNtQueyTimer = 0;
AddrNtSetTimer = (NTSETTIMER) GetProcAddress(hdll, "NtSetTimerResolution");
AddrNtQueyTimer = (NTQUERYTIMER)GetProcAddress(hdll, "NtQueryTimerResolution");

while (1)
{
ret = AddrNtQueyTimer(&min_res, &max_res, &cur_res);
printf("min_res = %d, max_res = %d, cur_res = %d\n",min_res, max_res, cur_res);
Sleep(10);
ret = AddrNtSetTimer(10000, 1, &cur_res);
Sleep(10);
ret = AddrNtSetTimer(10000, 0, &cur_res);
Sleep(10);
}
return 0;
}
==============================================================================================

2 Bug in xen
JVM is OK, so left the bug to xen, I have found both the reason and solution. As Jan mentioned avoiding call create_periodic_time, it got much better. so I modified it like this, if the pt timer is created before, setting RegA down is just changing the period value, so I do nothing except just setting period to pt's field. it is OK!

I thought pt->scheduled is responsible for Accuracy of pt_process_missed_ticks, so we should not interfere it at any outer invading or interruption, so I let create_periodic_time changed everything but reserved only one filed pt->scheduled as setted before, I am very happy to find the bug disappear. After I rechecked your mail I found you are really a very smart man, you have predicted something!

Did you further check whether the adjustments done to the
scheduled time in create_periodic_time() are responsible for this
conclusion of the JVM (could be effectively doubling the first
interval if HVM_PARAM_VPT_ALIGN is set, and with the high rate
of re-sets this could certainly have a more visible effect than
intended)?

After I tracked pt->scheduled, more and more truth surfaced. I will show you the RTC's spotting as I observed.
normal spot is like this:
0               1               2               3               4               5
.               .               .               .               .               .      (normal spot)
                                |(pt->scheduled at 2)

when create_periodic_time interfere pt->schedule at NOW
0               1               2               3               4               5
.               .               .               .               .               .      
                    |(NOW)                      | (new pt->scheduled is moved to 3 after ALIGN)

so it real spot is like this:
.               .                               .               .               .       
0               1               2               3               4               5
                                |(here we missed one spot at 2)

the original pt->scheduled is at 2 before create_periodic_time will be executed, but after it, pt->scheduled is moved to 3, so missed one spot to GuestWin.


3 who is wrong?
I doubt align_timer worth suspected:
s_time_t align_timer(s_time_t firsttick, uint64_t period)
{
    if ( !period )
        return firsttick;

    return firsttick + (period - 1) - ((firsttick - 1) % period); //in xen4.x
    return firsttick  - ((firsttick - 1) % period);//it should be aligned like this.
}

I have also found another updating RTC's RegA in tools\ioemu-qemu-xen\hw\MCc146818rtc.c: 

    if (period_code != 0 && (s->cmos_data[RTC_REG_B] & REG_B_PIE)) {
#endif
        if (period_code <= 2)
            period_code += 7;
        /* period in 32 Khz cycles */
        period = 1 << (period_code - 1);
#ifdef IRQ_COALESCE_HACK
        if(period != s->period)
            s->irq_coalesced = (s->irq_coalesced * s->period) / period;
        s->period = period;
#endif
        /* compute 32 khz clock */
        cur_clock = muldiv64(current_time, 32768, ticks_per_sec); //here I don't make sense it ......
        next_irq_clock = (cur_clock & ~(period - 1)) + period;
        s->next_periodic_time = muldiv64(next_irq_clock, ticks_per_sec, 32768) + 1;
        qemu_mod_timer(s->periodic_timer, s->next_periodic_time);



I don't know what happened in real RTC, the most popular RTC chip is MC146818, I have checked its datasheet but found nothing I want. What I want to know is When a real outer 0x71 come down to set RTC's RegA, who does it change or spot the next periodic interrupt time ? In my case, the period doesn't change, it misses one spot, even if the period changes how will it spot the next scheduled time?
I need the man who knows the details deeply concerning when updating a real RTC's regA, how will it take it into effect, and make the transition smoothly in a real RTC.

I have been very anxious in another aspect, in our virtual environment, at create_periodic_time, NOW() may be far more later than pt->scheduled setted before, at this point, the new pt->scheduled may be far more behind the one after executing create_periodic_time. So the interval between both which should be treated by pt_process_missed_ticks to the former pt->scheduled's period is now all thrown away as nothing missed. So I think how to handle the interval between both pt->scheduled worth consideration in create_periodic_time.

Thanks!




tupeng212

From: Jan Beulich
Date: 2012-08-08 17:20
To: tupeng212
CC: xen-devel
Subject: Re: [Xen-devel] Big Bug:Time in VM running on xen goes slower
>>> On 07.08.12 at 17:44, tupeng212 <tupeng212@gmail.com> wrote:
> 2 Xen
> vmx_vmexit_handler  --> ......... --> handle_rtc_io  --> rtc_ioport_write  --> 
> rtc_timer_update --> set RTC's REG_A to a high rate--> create_periodic_time(disable 
> the former timer, and init a new one)
> Win7 is installed in the vm. This calling path is executed so frequent that 
> may come down to set the RTC's REG_A hundreds of times every second but with 
> the same rate(976.562us(1024HZ)), it is so abnormal to me to see such 
> behavior.

_If_ the problem is merely with the high rate of calls to
create_periodic_time(), I think this could be taken care of by
avoiding the call (and perhaps the call to rtc_timer_update() in
the first place) by checking whether anything actually changes
due to the current write. I don't think, however, that this would
help much, as the high rate of port accesses (and hence traps
into the hypervisor) would remain. It might, nevertheless, get
your immediate problem of the time slowing down taken care of
if that is caused inside Xen (but the cause here may as well be in
the Windows kernel).

> 3 OS
> I have tried to find why the win7 setted RTC's regA so frequently. finally 
> got the result, all that comes from a function: NtSetTimerResolution --> 
> 0x70,0x71
> when I attached windbg into the guest OS, I also found the source, they are 
> all called from a upper system call that comes from JVM(Java Virtual 
> Machine).

Getting Windows to be a little smarter and avoid the port I/O when
doing redundant writes would of course be even better, but is
clearly a difficult to achieve goal.

> 4 JVM
> I don't know why JVM calls NtSetTimerResolution to set the same RTC's rate 
> down (976.562us(1024HZ)) so frequently. 
> But found something useful, in the java source code, I found many palaces 
> written with time.scheduleAtFixedRate(), Informations from Internet told me 
> this function scheduleAtFixedRate demands a higher time resolution. so I 
> guess the whole process may be this: 
> java wants a higher time resolution timer, so it changes the RTC's rate from 
> 15.625ms(64HZ) to 976.562us(1024HZ), after that, it reconfirms whether the 
> time is accurate as expected, but it's sorry to get some notice it 's not 
> accurate either. so it sets  the RTC's rate from 15.625ms(64HZ) to 
> 976.562us(1024HZ) again and again..., at last, results in a slow system timer 
> in vm.

Now that's really the fundamental thing to find out - what makes it
think the clock isn't accurate? Is this an artifact of scheduling (as
the scheduler tick certainly is several milliseconds, whereas
"accurate" here appears to require below 1ms granularity), perhaps
as a result of the box being overloaded (i.e. the VM not being able
to get scheduled quickly enough when the timer expires)? For that,
did you try lowering the scheduler time slice and/or its rate limit
(possible via command line option)? Of course doing so may have
other undesirable side effects, but it would be worth a try.

Did you further check whether the adjustments done to the
scheduled time in create_periodic_time() are responsible for this
conclusion of the JVM (could be effectively doubling the first
interval if HVM_PARAM_VPT_ALIGN is set, and with the high rate
of re-sets this could certainly have a more visible effect than
intended)?

Jan

[-- Attachment #1.2: Type: text/html, Size: 24018 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Big Bug:Time in VM goes slower; foud Solution but demand Judgement! A Interesting Story!
  2012-08-10 15:17   ` Big Bug:Time in VM goes slower; foud Solution but demand Judgement! A Interesting Story! tupeng212
@ 2012-08-13 15:21     ` Jan Beulich
       [not found]       ` <201208140024353598835@gmail.com>
                         ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jan Beulich @ 2012-08-13 15:21 UTC (permalink / raw)
  To: tupeng212; +Cc: Tim Deegan, Keir Fraser, xen-devel

[-- Attachment #1: Type: text/plain, Size: 9546 bytes --]

>>> On 10.08.12 at 17:17, tupeng212 <tupeng212@gmail.com> wrote:
> 2 Bug in xen
> JVM is OK, so left the bug to xen, I have found both the reason and 
> solution. As Jan mentioned avoiding call create_periodic_time, it got much 
> better. so I modified it like this, if the pt timer is created before, 
> setting RegA down is just changing the period value, so I do nothing except 

What you describe doesn't sound accurate (i.e. I'm getting the
impression that you might have suppressed the call in cases
where you shouldn't).

Below/attached a first draft of a patch to fix not only this issue,
but a few more with the RTC emulation. Would you give this a
try?

Keir, Tim, others - the change to xen/arch/x86/hvm/vpt.c really
looks more like a hack than a solution, but I don't see another
way without much more intrusive changes. The point is that we
want the RTC code to decide whether to generate an interrupt
(so that RTC_PF can become set correctly even without RTC_PIE
getting enabled by the guest).

Jan

x86/HVM: assorted RTC emulation adjustments

- don't call rtc_timer_update() on REG_A writes when the value didn't
  change (doing the call always was reported to cause wall clock time
  lagging with the JVM under Windows)
- in the same spirit, don't call rtc_timer_update() or
  alarm_timer_update() on REG_B writes when the respective RTC_xIE bit
  didn't change
- raise the RTC IRQ not only when RTC_UIE gets set while RTC_UF was
  already set, but generalize this to alarm and periodic interrupts as
  well
- properly handle RTC_PF when the guest is not also setting RTC_PIE
- also handle the two other clock bases

--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -50,11 +50,24 @@ static void rtc_set_time(RTCState *s);
 static inline int from_bcd(RTCState *s, int a);
 static inline int convert_hour(RTCState *s, int hour);
 
-static void rtc_periodic_cb(struct vcpu *v, void *opaque)
+static void rtc_toggle_irq(RTCState *s)
+{
+    struct domain *d = vrtc_domain(s);
+
+    ASSERT(spin_is_locked(&s->lock));
+    s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
+    hvm_isa_irq_deassert(d, RTC_IRQ);
+    hvm_isa_irq_assert(d, RTC_IRQ);
+}
+
+void rtc_periodic_interrupt(void *opaque)
 {
     RTCState *s = opaque;
+
     spin_lock(&s->lock);
-    s->hw.cmos_data[RTC_REG_C] |= 0xc0;
+    s->hw.cmos_data[RTC_REG_C] |= RTC_PF;
+    if ( s->hw.cmos_data[RTC_REG_B] & RTC_PIE )
+        rtc_toggle_irq(s);
     spin_unlock(&s->lock);
 }
 
@@ -68,19 +81,25 @@ static void rtc_timer_update(RTCState *s
     ASSERT(spin_is_locked(&s->lock));
 
     period_code = s->hw.cmos_data[RTC_REG_A] & RTC_RATE_SELECT;
-    if ( (period_code != 0) && (s->hw.cmos_data[RTC_REG_B] & RTC_PIE) )
+    switch ( s->hw.cmos_data[RTC_REG_A] & RTC_DIV_CTL )
     {
-        if ( period_code <= 2 )
+    case RTC_REF_CLCK_32KHZ:
+        if ( (period_code != 0) && (period_code <= 2) )
             period_code += 7;
-
-        period = 1 << (period_code - 1); /* period in 32 Khz cycles */
-        period = DIV_ROUND((period * 1000000000ULL), 32768); /* period in ns */
-        create_periodic_time(v, &s->pt, period, period, RTC_IRQ,
-                             rtc_periodic_cb, s);
-    }
-    else
-    {
+        /* fall through */
+    case RTC_REF_CLCK_1MHZ:
+    case RTC_REF_CLCK_4MHZ:
+        if ( period_code != 0 )
+        {
+            period = 1 << (period_code - 1); /* period in 32 Khz cycles */
+            period = DIV_ROUND(period * 1000000000ULL, 32768); /* in ns */
+            create_periodic_time(v, &s->pt, period, period, RTC_IRQ, NULL, s);
+            break;
+        }
+        /* fall through */
+    default:
         destroy_periodic_time(&s->pt);
+        break;
     }
 }
 
@@ -144,7 +163,6 @@ static void rtc_update_timer(void *opaqu
 static void rtc_update_timer2(void *opaque)
 {
     RTCState *s = opaque;
-    struct domain *d = vrtc_domain(s);
 
     spin_lock(&s->lock);
     if (!(s->hw.cmos_data[RTC_REG_B] & RTC_SET))
@@ -152,11 +170,7 @@ static void rtc_update_timer2(void *opaq
         s->hw.cmos_data[RTC_REG_C] |= RTC_UF;
         s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
         if ((s->hw.cmos_data[RTC_REG_B] & RTC_UIE))
-        {
-            s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
-            hvm_isa_irq_deassert(d, RTC_IRQ);
-            hvm_isa_irq_assert(d, RTC_IRQ);
-        }
+            rtc_toggle_irq(s);
         check_update_timer(s);
     }
     spin_unlock(&s->lock);
@@ -343,7 +357,6 @@ static void alarm_timer_update(RTCState 
 static void rtc_alarm_cb(void *opaque)
 {
     RTCState *s = opaque;
-    struct domain *d = vrtc_domain(s);
 
     spin_lock(&s->lock);
     if (!(s->hw.cmos_data[RTC_REG_B] & RTC_SET))
@@ -351,11 +364,7 @@ static void rtc_alarm_cb(void *opaque)
         s->hw.cmos_data[RTC_REG_C] |= RTC_AF;
         /* alarm interrupt */
         if (s->hw.cmos_data[RTC_REG_B] & RTC_AIE)
-        {
-            s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
-            hvm_isa_irq_deassert(d, RTC_IRQ);
-            hvm_isa_irq_assert(d, RTC_IRQ);
-        }
+            rtc_toggle_irq(s);
         alarm_timer_update(s);
     }
     spin_unlock(&s->lock);
@@ -365,6 +374,7 @@ static int rtc_ioport_write(void *opaque
 {
     RTCState *s = opaque;
     struct domain *d = vrtc_domain(s);
+    uint32_t orig, mask;
 
     spin_lock(&s->lock);
 
@@ -382,6 +392,7 @@ static int rtc_ioport_write(void *opaque
         return 0;
     }
 
+    orig = s->hw.cmos_data[s->hw.cmos_index];
     switch ( s->hw.cmos_index )
     {
     case RTC_SECONDS_ALARM:
@@ -405,9 +416,9 @@ static int rtc_ioport_write(void *opaque
         break;
     case RTC_REG_A:
         /* UIP bit is read only */
-        s->hw.cmos_data[RTC_REG_A] = (data & ~RTC_UIP) |
-            (s->hw.cmos_data[RTC_REG_A] & RTC_UIP);
-        rtc_timer_update(s);
+        s->hw.cmos_data[RTC_REG_A] = (data & ~RTC_UIP) | (orig & RTC_UIP);
+        if ( (data ^ orig) & (RTC_RATE_SELECT | RTC_DIV_CTL) )
+            rtc_timer_update(s);
         break;
     case RTC_REG_B:
         if ( data & RTC_SET )
@@ -415,7 +426,7 @@ static int rtc_ioport_write(void *opaque
             /* set mode: reset UIP mode */
             s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
             /* adjust cmos before stopping */
-            if (!(s->hw.cmos_data[RTC_REG_B] & RTC_SET))
+            if (!(orig & RTC_SET))
             {
                 s->current_tm = gmtime(get_localtime(d));
                 rtc_copy_date(s);
@@ -424,21 +435,27 @@ static int rtc_ioport_write(void *opaque
         else
         {
             /* if disabling set mode, update the time */
-            if ( s->hw.cmos_data[RTC_REG_B] & RTC_SET )
+            if ( orig & RTC_SET )
                 rtc_set_time(s);
         }
-        /* if the interrupt is already set when the interrupt become
-         * enabled, raise an interrupt immediately*/
-        if ((data & RTC_UIE) && !(s->hw.cmos_data[RTC_REG_B] & RTC_UIE))
-            if (s->hw.cmos_data[RTC_REG_C] & RTC_UF)
+        /*
+         * If the interrupt is already set when the interrupt becomes
+         * enabled, raise an interrupt immediately.
+         * NB: RTC_{A,P,U}IE == RTC_{A,P,U}F respectively.
+         */
+        for ( mask = RTC_UIE; mask <= RTC_PIE; mask <<= 1 )
+            if ( (data & mask) && !(orig & mask) &&
+                 (s->hw.cmos_data[RTC_REG_C] & mask) )
             {
-                hvm_isa_irq_deassert(d, RTC_IRQ);
-                hvm_isa_irq_assert(d, RTC_IRQ);
+                rtc_toggle_irq(s);
+                break;
             }
         s->hw.cmos_data[RTC_REG_B] = data;
-        rtc_timer_update(s);
+        if ( (data ^ orig) & RTC_PIE )
+            rtc_timer_update(s);
         check_update_timer(s);
-        alarm_timer_update(s);
+        if ( (data ^ orig) & RTC_AIE )
+            alarm_timer_update(s);
         break;
     case RTC_REG_C:
     case RTC_REG_D:
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -22,6 +22,7 @@
 #include <asm/hvm/vpt.h>
 #include <asm/event.h>
 #include <asm/apic.h>
+#include <asm/mc146818rtc.h>
 
 #define mode_is(d, name) \
     ((d)->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] == HVMPTM_##name)
@@ -218,6 +219,7 @@ void pt_update_irq(struct vcpu *v)
     struct periodic_time *pt, *temp, *earliest_pt = NULL;
     uint64_t max_lag = -1ULL;
     int irq, is_lapic;
+    void *pt_priv;
 
     spin_lock(&v->arch.hvm_vcpu.tm_lock);
 
@@ -251,13 +253,14 @@ void pt_update_irq(struct vcpu *v)
     earliest_pt->irq_issued = 1;
     irq = earliest_pt->irq;
     is_lapic = (earliest_pt->source == PTSRC_lapic);
+    pt_priv = earliest_pt->priv;
 
     spin_unlock(&v->arch.hvm_vcpu.tm_lock);
 
     if ( is_lapic )
-    {
         vlapic_set_irq(vcpu_vlapic(v), irq, 0);
-    }
+    else if ( irq == RTC_IRQ )
+        rtc_periodic_interrupt(pt_priv);
     else
     {
         hvm_isa_irq_deassert(v->domain, irq);
--- a/xen/include/asm-x86/hvm/vpt.h
+++ b/xen/include/asm-x86/hvm/vpt.h
@@ -181,6 +181,7 @@ void rtc_migrate_timers(struct vcpu *v);
 void rtc_deinit(struct domain *d);
 void rtc_reset(struct domain *d);
 void rtc_update_clock(struct domain *d);
+void rtc_periodic_interrupt(void *);
 
 void pmtimer_init(struct vcpu *v);
 void pmtimer_deinit(struct domain *d);


[-- Attachment #2: x86-hvm-rtc.patch --]
[-- Type: text/plain, Size: 8506 bytes --]

x86/HVM: assorted RTC emulation adjustments

- don't call rtc_timer_update() on REG_A writes when the value didn't
  change (doing the call always was reported to cause wall clock time
  lagging with the JVM under Windows)
- in the same spirit, don't call rtc_timer_update() or
  alarm_timer_update() on REG_B writes when the respective RTC_xIE bit
  didn't change
- raise the RTC IRQ not only when RTC_UIE gets set while RTC_UF was
  already set, but generalize this to alarm and periodic interrupts as
  well
- properly handle RTC_PF when the guest is not also setting RTC_PIE
- also handle the two other clock bases

--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -50,11 +50,24 @@ static void rtc_set_time(RTCState *s);
 static inline int from_bcd(RTCState *s, int a);
 static inline int convert_hour(RTCState *s, int hour);
 
-static void rtc_periodic_cb(struct vcpu *v, void *opaque)
+static void rtc_toggle_irq(RTCState *s)
+{
+    struct domain *d = vrtc_domain(s);
+
+    ASSERT(spin_is_locked(&s->lock));
+    s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
+    hvm_isa_irq_deassert(d, RTC_IRQ);
+    hvm_isa_irq_assert(d, RTC_IRQ);
+}
+
+void rtc_periodic_interrupt(void *opaque)
 {
     RTCState *s = opaque;
+
     spin_lock(&s->lock);
-    s->hw.cmos_data[RTC_REG_C] |= 0xc0;
+    s->hw.cmos_data[RTC_REG_C] |= RTC_PF;
+    if ( s->hw.cmos_data[RTC_REG_B] & RTC_PIE )
+        rtc_toggle_irq(s);
     spin_unlock(&s->lock);
 }
 
@@ -68,19 +81,25 @@ static void rtc_timer_update(RTCState *s
     ASSERT(spin_is_locked(&s->lock));
 
     period_code = s->hw.cmos_data[RTC_REG_A] & RTC_RATE_SELECT;
-    if ( (period_code != 0) && (s->hw.cmos_data[RTC_REG_B] & RTC_PIE) )
+    switch ( s->hw.cmos_data[RTC_REG_A] & RTC_DIV_CTL )
     {
-        if ( period_code <= 2 )
+    case RTC_REF_CLCK_32KHZ:
+        if ( (period_code != 0) && (period_code <= 2) )
             period_code += 7;
-
-        period = 1 << (period_code - 1); /* period in 32 Khz cycles */
-        period = DIV_ROUND((period * 1000000000ULL), 32768); /* period in ns */
-        create_periodic_time(v, &s->pt, period, period, RTC_IRQ,
-                             rtc_periodic_cb, s);
-    }
-    else
-    {
+        /* fall through */
+    case RTC_REF_CLCK_1MHZ:
+    case RTC_REF_CLCK_4MHZ:
+        if ( period_code != 0 )
+        {
+            period = 1 << (period_code - 1); /* period in 32 Khz cycles */
+            period = DIV_ROUND(period * 1000000000ULL, 32768); /* in ns */
+            create_periodic_time(v, &s->pt, period, period, RTC_IRQ, NULL, s);
+            break;
+        }
+        /* fall through */
+    default:
         destroy_periodic_time(&s->pt);
+        break;
     }
 }
 
@@ -144,7 +163,6 @@ static void rtc_update_timer(void *opaqu
 static void rtc_update_timer2(void *opaque)
 {
     RTCState *s = opaque;
-    struct domain *d = vrtc_domain(s);
 
     spin_lock(&s->lock);
     if (!(s->hw.cmos_data[RTC_REG_B] & RTC_SET))
@@ -152,11 +170,7 @@ static void rtc_update_timer2(void *opaq
         s->hw.cmos_data[RTC_REG_C] |= RTC_UF;
         s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
         if ((s->hw.cmos_data[RTC_REG_B] & RTC_UIE))
-        {
-            s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
-            hvm_isa_irq_deassert(d, RTC_IRQ);
-            hvm_isa_irq_assert(d, RTC_IRQ);
-        }
+            rtc_toggle_irq(s);
         check_update_timer(s);
     }
     spin_unlock(&s->lock);
@@ -343,7 +357,6 @@ static void alarm_timer_update(RTCState 
 static void rtc_alarm_cb(void *opaque)
 {
     RTCState *s = opaque;
-    struct domain *d = vrtc_domain(s);
 
     spin_lock(&s->lock);
     if (!(s->hw.cmos_data[RTC_REG_B] & RTC_SET))
@@ -351,11 +364,7 @@ static void rtc_alarm_cb(void *opaque)
         s->hw.cmos_data[RTC_REG_C] |= RTC_AF;
         /* alarm interrupt */
         if (s->hw.cmos_data[RTC_REG_B] & RTC_AIE)
-        {
-            s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
-            hvm_isa_irq_deassert(d, RTC_IRQ);
-            hvm_isa_irq_assert(d, RTC_IRQ);
-        }
+            rtc_toggle_irq(s);
         alarm_timer_update(s);
     }
     spin_unlock(&s->lock);
@@ -365,6 +374,7 @@ static int rtc_ioport_write(void *opaque
 {
     RTCState *s = opaque;
     struct domain *d = vrtc_domain(s);
+    uint32_t orig, mask;
 
     spin_lock(&s->lock);
 
@@ -382,6 +392,7 @@ static int rtc_ioport_write(void *opaque
         return 0;
     }
 
+    orig = s->hw.cmos_data[s->hw.cmos_index];
     switch ( s->hw.cmos_index )
     {
     case RTC_SECONDS_ALARM:
@@ -405,9 +416,9 @@ static int rtc_ioport_write(void *opaque
         break;
     case RTC_REG_A:
         /* UIP bit is read only */
-        s->hw.cmos_data[RTC_REG_A] = (data & ~RTC_UIP) |
-            (s->hw.cmos_data[RTC_REG_A] & RTC_UIP);
-        rtc_timer_update(s);
+        s->hw.cmos_data[RTC_REG_A] = (data & ~RTC_UIP) | (orig & RTC_UIP);
+        if ( (data ^ orig) & (RTC_RATE_SELECT | RTC_DIV_CTL) )
+            rtc_timer_update(s);
         break;
     case RTC_REG_B:
         if ( data & RTC_SET )
@@ -415,7 +426,7 @@ static int rtc_ioport_write(void *opaque
             /* set mode: reset UIP mode */
             s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
             /* adjust cmos before stopping */
-            if (!(s->hw.cmos_data[RTC_REG_B] & RTC_SET))
+            if (!(orig & RTC_SET))
             {
                 s->current_tm = gmtime(get_localtime(d));
                 rtc_copy_date(s);
@@ -424,21 +435,27 @@ static int rtc_ioport_write(void *opaque
         else
         {
             /* if disabling set mode, update the time */
-            if ( s->hw.cmos_data[RTC_REG_B] & RTC_SET )
+            if ( orig & RTC_SET )
                 rtc_set_time(s);
         }
-        /* if the interrupt is already set when the interrupt become
-         * enabled, raise an interrupt immediately*/
-        if ((data & RTC_UIE) && !(s->hw.cmos_data[RTC_REG_B] & RTC_UIE))
-            if (s->hw.cmos_data[RTC_REG_C] & RTC_UF)
+        /*
+         * If the interrupt is already set when the interrupt becomes
+         * enabled, raise an interrupt immediately.
+         * NB: RTC_{A,P,U}IE == RTC_{A,P,U}F respectively.
+         */
+        for ( mask = RTC_UIE; mask <= RTC_PIE; mask <<= 1 )
+            if ( (data & mask) && !(orig & mask) &&
+                 (s->hw.cmos_data[RTC_REG_C] & mask) )
             {
-                hvm_isa_irq_deassert(d, RTC_IRQ);
-                hvm_isa_irq_assert(d, RTC_IRQ);
+                rtc_toggle_irq(s);
+                break;
             }
         s->hw.cmos_data[RTC_REG_B] = data;
-        rtc_timer_update(s);
+        if ( (data ^ orig) & RTC_PIE )
+            rtc_timer_update(s);
         check_update_timer(s);
-        alarm_timer_update(s);
+        if ( (data ^ orig) & RTC_AIE )
+            alarm_timer_update(s);
         break;
     case RTC_REG_C:
     case RTC_REG_D:
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -22,6 +22,7 @@
 #include <asm/hvm/vpt.h>
 #include <asm/event.h>
 #include <asm/apic.h>
+#include <asm/mc146818rtc.h>
 
 #define mode_is(d, name) \
     ((d)->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] == HVMPTM_##name)
@@ -218,6 +219,7 @@ void pt_update_irq(struct vcpu *v)
     struct periodic_time *pt, *temp, *earliest_pt = NULL;
     uint64_t max_lag = -1ULL;
     int irq, is_lapic;
+    void *pt_priv;
 
     spin_lock(&v->arch.hvm_vcpu.tm_lock);
 
@@ -251,13 +253,14 @@ void pt_update_irq(struct vcpu *v)
     earliest_pt->irq_issued = 1;
     irq = earliest_pt->irq;
     is_lapic = (earliest_pt->source == PTSRC_lapic);
+    pt_priv = earliest_pt->priv;
 
     spin_unlock(&v->arch.hvm_vcpu.tm_lock);
 
     if ( is_lapic )
-    {
         vlapic_set_irq(vcpu_vlapic(v), irq, 0);
-    }
+    else if ( irq == RTC_IRQ )
+        rtc_periodic_interrupt(pt_priv);
     else
     {
         hvm_isa_irq_deassert(v->domain, irq);
--- a/xen/include/asm-x86/hvm/vpt.h
+++ b/xen/include/asm-x86/hvm/vpt.h
@@ -181,6 +181,7 @@ void rtc_migrate_timers(struct vcpu *v);
 void rtc_deinit(struct domain *d);
 void rtc_reset(struct domain *d);
 void rtc_update_clock(struct domain *d);
+void rtc_periodic_interrupt(void *);
 
 void pmtimer_init(struct vcpu *v);
 void pmtimer_deinit(struct domain *d);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Big Bug:Time in VM goes slower; foud Solution but demand Judgement! A Interesting Story!
       [not found]       ` <201208140024353598835@gmail.com>
@ 2012-08-14  6:27         ` Jan Beulich
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2012-08-14  6:27 UTC (permalink / raw)
  To: tupeng212; +Cc: xen-devel

(re-adding xen-devel to Cc)

>>> On 13.08.12 at 18:24, tupeng212 <tupeng212@gmail.com> wrote:
> I am home sending you this letter,  I can tell the details:
> I only compared the rate setted down with RegA, if same, don't call 
> create_periodic_time, and it is OK.
> but after a while it changed the rate to 152...(normal rate), and it changed 
> back to 97...(high rate), and 
> this action repeated, at this point, VM's time slowed again. so only block 
> the same rate's setting doesn't work.

But in this case the hypervisor has no choice - it has to update
the periodic timer. I don't think we could reliably guess that the
guest might plan on quickly toggling between two rates.

> but when I let pt->scheduled to be static(no changed) when setting regA down, 
> even at the rate's switching I notice above,
> it works very well, bug of time slowing never appear again.

Yeah, likely that solves the problem for you, but as it's not
correct, it's likely to cause problems for others.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Big Bug:Time in VM goes slower; foud Solution but demand Judgement! A Interesting Story!
  2012-08-13 15:21     ` Jan Beulich
       [not found]       ` <201208140024353598835@gmail.com>
@ 2012-08-14  6:37       ` Jan Beulich
  2012-08-16 13:15       ` Tim Deegan
  2 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2012-08-14  6:37 UTC (permalink / raw)
  To: xen-devel; +Cc: tupeng212, Tim Deegan, Keir Fraser

>>> On 13.08.12 at 17:21, "Jan Beulich" <JBeulich@suse.com> wrote:
> - don't call rtc_timer_update() on REG_A writes when the value didn't
>   change (doing the call always was reported to cause wall clock time
>   lagging with the JVM under Windows)
> - in the same spirit, don't call rtc_timer_update() or
>   alarm_timer_update() on REG_B writes when the respective RTC_xIE bit
>   didn't change

Actually, this didn't go far enough yet: REG_B writes should
never cause any timers to get updated when merely one of the
xIE bits changes, as those bits shouldn't control the timers'
activity (and as a result, the eventual setting of the xF bits in
REG_C).

> - raise the RTC IRQ not only when RTC_UIE gets set while RTC_UF was
>   already set, but generalize this to alarm and periodic interrupts as
>   well
> - properly handle RTC_PF when the guest is not also setting RTC_PIE

In line with the above, this ought to also be done for AF and UF
(it may be the case for UF already).

Jan

> - also handle the two other clock bases

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Big Bug:Time in VM goes slower; foud Solution but demand Judgement! A Interesting Story!
  2012-08-13 15:21     ` Jan Beulich
       [not found]       ` <201208140024353598835@gmail.com>
  2012-08-14  6:37       ` Jan Beulich
@ 2012-08-16 13:15       ` Tim Deegan
  2 siblings, 0 replies; 7+ messages in thread
From: Tim Deegan @ 2012-08-16 13:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: tupeng212, Keir Fraser, xen-devel

At 16:21 +0100 on 13 Aug (1344874869), Jan Beulich wrote:
> Below/attached a first draft of a patch to fix not only this issue,
> but a few more with the RTC emulation. Would you give this a
> try?
> 
> Keir, Tim, others - the change to xen/arch/x86/hvm/vpt.c really
> looks more like a hack than a solution, but I don't see another
> way without much more intrusive changes.

It seems no worse than the code that's already there to special-case
lapic time interrupts.  After 4.3 it might be nice to adjust the vpt
interface to use explicit callbacks instead.

Tim.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-08-16 13:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-07 15:44 Big Bug:Time in VM running on xen goes slower tupeng212
2012-08-08  9:20 ` Jan Beulich
2012-08-10 15:17   ` Big Bug:Time in VM goes slower; foud Solution but demand Judgement! A Interesting Story! tupeng212
2012-08-13 15:21     ` Jan Beulich
     [not found]       ` <201208140024353598835@gmail.com>
2012-08-14  6:27         ` Jan Beulich
2012-08-14  6:37       ` Jan Beulich
2012-08-16 13:15       ` Tim Deegan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).