* PV timer issues in FreeBSD
@ 2013-04-19 18:20 Roger Pau Monné
2013-04-22 7:15 ` Jan Beulich
0 siblings, 1 reply; 3+ messages in thread
From: Roger Pau Monné @ 2013-04-19 18:20 UTC (permalink / raw)
To: xen-devel; +Cc: Justin T. Gibbs, will@FreeBSD.org
[-- Attachment #1: Type: text/plain, Size: 2848 bytes --]
Hello,
While trying to get PV timers on FreeBSD to work, we found that when the
system is under stress sometimes FreeBSD would stop receiving timer
interrupts on certain vCPUs. I've added some trace points to Xen in
order to see what was happening with this timers and got the following
trace. I'm also attaching the crappy changes I've did to Xen in order to
add the trace points, maybe I didn't put the trace points in the right
position and that's why I'm seeing this:
169.097952305 |..x.... d32768v3 d32768 fire timer vCPU3: 1204944804808
169.097966813 |..x.... d32768v3 d32768 set timer vCPU3: 1204946614221
169.097977454 |..x.... d32768v3 d32768 set timer vCPU3: 1204963746273
169.103410517 |..x.... d32768v3 d32768 set timer vCPU3: 1204952682637
169.103429902 |..x.... d32768v3 d32768 set timer vCPU3: 1205000356797
169.124982369 |..x.... d32768v3 d32768 set timer vCPU3: 1204976302984
169.124988972 |..x.... d32768v3 d32768 set timer vCPU3: 1205000356855
169.128470252 |..x.... d32768v3 d32768 set timer vCPU3: 1204976300693
169.128639545 |..x.... d32768v3 d32768 set timer vCPU3: 1205000356796
169.136240633 |..x.... d32768v3 d32768 set timer vCPU3: 1204984176928
169.136246955 |..x.... d32768v3 d32768 set timer vCPU3: 1205000357030
169.139729248 |..x.... d32768v3 d32768 set timer vCPU3: 1204992115770
169.139892831 |..x.... d32768v3 d32768 set timer vCPU3: 1205032863201
169.151133497 |..x.... d32768v3 d32768 set timer vCPU3: 1204999923870
169.151137341 |..x.... d32768v3 d32768 set timer vCPU3: 1205032863121
169.352765068 |..x.... d32768v3 d32768 set timer vCPU3: 1205204647080
169.352769008 |..x.... d32768v3 d32768 set timer vCPU3: 1205633794418
176.818699597 |..x.... d32768v3 d32768 set timer vCPU3: 1212676613211
176.818702670 |..x.... d32768v3 d32768 set timer vCPU3: 1213055669298
318.489693649 |..x.... d32768v3 d32768 set timer vCPU3: 1354464489030 <
318.489698326 |..x.... d32768v3 d32768 set timer vCPU3: 1354766606923
318.491853105 |..x.... d32768v3 d32768 set timer vCPU3: 1354464489183
318.491857363 |..x.... d32768v3 d32768 set timer vCPU3: 1354766606862
318.494243541 |..x.... d32768v3 d32768 set timer vCPU3: 1354466611397
318.494253173 |..x.... d32768v3 d32768 set timer vCPU3: 1354766607167
318.497005208 |..x.... d32768v3 d32768 set timer vCPU3: 1354472363654
318.497009517 |..x.... d32768v3 d32768 set timer vCPU3: 1354766606858
As it can be seen above, the vCPU was expecting a timer to fire at
1213055669298, but as far as I can tell the timer never reached the
guest, neither Xen called execute_timer to fire the timer.
Then in order to reset the timer I've started some other task that
probably sends an IPI to the stuck processor and it tries to set the
timer again (this time successfully).
Is there any reason that could prevent the timer from being fired?
Thanks, Roger.
[-- Attachment #2: trace_timer.patch --]
[-- Type: text/plain, Size: 3515 bytes --]
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 64ee29d..ac59452 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -937,18 +937,27 @@ long do_vcpu_op(int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
{
struct vcpu_set_singleshot_timer set;
- if ( v != current )
+ if ( v != current ) {
+ TRACE_1D(TRC_PV_TIMER_FAIL, 1);
return -EINVAL;
+ }
- if ( copy_from_guest(&set, arg, 1) )
+ if ( copy_from_guest(&set, arg, 1) ) {
+ TRACE_1D(TRC_PV_TIMER_FAIL, 2);
return -EFAULT;
+ }
if ( (set.flags & VCPU_SSHOTTMR_future) &&
- (set.timeout_abs_ns < NOW()) )
+ (set.timeout_abs_ns < NOW()) ) {
+ TRACE_1D(TRC_PV_TIMER_FAIL, 3);
return -ETIME;
+ }
migrate_timer(&v->singleshot_timer, smp_processor_id());
+ v->singleshot_timer.debug = 1;
set_timer(&v->singleshot_timer, set.timeout_abs_ns);
+ TRACE_2D(TRC_PV_SET_TIMER, (uint32_t)set.timeout_abs_ns,
+ (uint32_t)(set.timeout_abs_ns >> 32));
break;
}
@@ -958,6 +967,8 @@ long do_vcpu_op(int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
return -EINVAL;
stop_timer(&v->singleshot_timer);
+ TRACE_2D(TRC_PV_STOP_TIMER, (uint32_t)v->singleshot_timer.expires,
+ (uint32_t)(v->singleshot_timer.expires >> 32));
break;
diff --git a/xen/common/timer.c b/xen/common/timer.c
index 9ed74e9..61e4383 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -21,6 +21,7 @@
#include <xen/cpu.h>
#include <xen/rcupdate.h>
#include <xen/symbols.h>
+#include <xen/trace.h>
#include <asm/system.h>
#include <asm/desc.h>
#include <asm/atomic.h>
@@ -421,6 +422,9 @@ static void execute_timer(struct timers *ts, struct timer *t)
list_add(&t->inactive, &ts->inactive);
ts->running = t;
+ if (t->debug)
+ TRACE_2D(TRC_PV_FIRE_TIMER, (uint32_t)t->expires,
+ (uint32_t)(t->expires >> 32));
spin_unlock_irq(&ts->lock);
(*fn)(data);
spin_lock_irq(&ts->lock);
diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
index e2f60a6..4571d74 100644
--- a/xen/include/public/trace.h
+++ b/xen/include/public/trace.h
@@ -122,6 +122,7 @@
#define TRC_PV_ENTRY 0x00201000 /* Hypervisor entry points for PV guests. */
#define TRC_PV_SUBCALL 0x00202000 /* Sub-call in a multicall hypercall */
+#define TRC_PV_TIMER 0x00208000 /* PV timer */
#define TRC_PV_HYPERCALL (TRC_PV_ENTRY + 1)
#define TRC_PV_TRAP (TRC_PV_ENTRY + 3)
@@ -136,6 +137,10 @@
#define TRC_PV_PTWR_EMULATION_PAE (TRC_PV_ENTRY + 12)
#define TRC_PV_HYPERCALL_V2 (TRC_PV_ENTRY + 13)
#define TRC_PV_HYPERCALL_SUBCALL (TRC_PV_SUBCALL + 14)
+#define TRC_PV_SET_TIMER (TRC_PV_TIMER + 15)
+#define TRC_PV_FIRE_TIMER (TRC_PV_TIMER + 16)
+#define TRC_PV_TIMER_FAIL (TRC_PV_TIMER + 17)
+#define TRC_PV_STOP_TIMER (TRC_PV_TIMER + 18)
/*
* TRC_PV_HYPERCALL_V2 format
diff --git a/xen/include/xen/timer.h b/xen/include/xen/timer.h
index 9531800..b69ae20 100644
--- a/xen/include/xen/timer.h
+++ b/xen/include/xen/timer.h
@@ -43,6 +43,7 @@ struct timer {
#define TIMER_STATUS_in_heap 3 /* In use; on timer heap. */
#define TIMER_STATUS_in_list 4 /* In use; on overflow linked list. */
uint8_t status;
+ uint8_t debug;
};
/*
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: PV timer issues in FreeBSD
2013-04-19 18:20 PV timer issues in FreeBSD Roger Pau Monné
@ 2013-04-22 7:15 ` Jan Beulich
2013-04-22 8:16 ` Roger Pau Monné
0 siblings, 1 reply; 3+ messages in thread
From: Jan Beulich @ 2013-04-22 7:15 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: Justin T. Gibbs, will@FreeBSD.org, xen-devel
>>> On 19.04.13 at 20:20, Roger Pau Monné<roger.pau@citrix.com> wrote:
> While trying to get PV timers on FreeBSD to work, we found that when the
> system is under stress sometimes FreeBSD would stop receiving timer
> interrupts on certain vCPUs. I've added some trace points to Xen in
> order to see what was happening with this timers and got the following
> trace. I'm also attaching the crappy changes I've did to Xen in order to
> add the trace points, maybe I didn't put the trace points in the right
> position and that's why I'm seeing this:
For those of us not familiar with FreeBSD - which interrupt is it
using as timer interrupt (IRQ0, LAPIC, HPET, RTC)?
Considering that you had problems with the RTC running FreeBSD
on unstable, I'm particularly wondering whether there is still an
issue left, even more so as I think that the 10-instance grace
period for not shutting down the periodic interrupt may be too
small when the system is under stress and the period is rather
short (I'm thinking that the period should really be dependent on
vCPU execution time, not absolute [host] time, as a mostly
stalled - due to higher priority load - vCPU may not even have a
chance to get its interrupt handler to run).
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: PV timer issues in FreeBSD
2013-04-22 7:15 ` Jan Beulich
@ 2013-04-22 8:16 ` Roger Pau Monné
0 siblings, 0 replies; 3+ messages in thread
From: Roger Pau Monné @ 2013-04-22 8:16 UTC (permalink / raw)
To: Jan Beulich; +Cc: Justin T. Gibbs, will@FreeBSD.org, xen-devel
On 22/04/13 09:15, Jan Beulich wrote:
>>>> On 19.04.13 at 20:20, Roger Pau Monné<roger.pau@citrix.com> wrote:
>> While trying to get PV timers on FreeBSD to work, we found that when the
>> system is under stress sometimes FreeBSD would stop receiving timer
>> interrupts on certain vCPUs. I've added some trace points to Xen in
>> order to see what was happening with this timers and got the following
>> trace. I'm also attaching the crappy changes I've did to Xen in order to
>> add the trace points, maybe I didn't put the trace points in the right
>> position and that's why I'm seeing this:
>
> For those of us not familiar with FreeBSD - which interrupt is it
> using as timer interrupt (IRQ0, LAPIC, HPET, RTC)?
It is using the one-shot PV timer (VCPUOP_set_singleshot_timer) and the
PVHVM vector injection. I don't think this is related to the previous
issue, which was related to the emulated RTC. I've also tried it with
different Xen versions, -unstable, 4.2 and 4.1 and the result is the
same, so I guess we are missing something regarding the PV timer
implementation in FreeBSD.
>
> Considering that you had problems with the RTC running FreeBSD
> on unstable, I'm particularly wondering whether there is still an
> issue left, even more so as I think that the 10-instance grace
> period for not shutting down the periodic interrupt may be too
> small when the system is under stress and the period is rather
> short (I'm thinking that the period should really be dependent on
> vCPU execution time, not absolute [host] time, as a mostly
> stalled - due to higher priority load - vCPU may not even have a
> chance to get its interrupt handler to run).
FreeBSD sets the timer using:
HYPERVISOR_vcpu_op(VCPUOP_set_singleshot_timer,...
This operation returns 0, so the timer is set OK, but then I don't see
any calls to execute_timer with xentrace, and if I dump the timers on
the console using 'a' I can see that the timer is gone. So I was trying
to figure out what could make the timer disappear without calling
execute_timer. Also, I should mention that the PV timer seems to work OK
most of the time, the issue that I'm describing above only happens when
the system is under heavy load. This is an excerpt from xenalyze:
179.449545526 ...x.... d32768v3 d32768 set timer vCPU3: 373780840651
179.457154834 ...x.... d32768v3 d32768 fire timer vCPU3: 373780840651
179.457185647 ...x.... d32768v3 d32768 set timer vCPU3: 373781172033
179.457396161 ...x.... d32768v3 d32768 fire timer vCPU3: 373781172033
179.457423727 ...x.... d32768v3 d32768 set timer vCPU3: 373788955311
179.465247662 ...x.... d32768v3 d32768 fire timer vCPU3: 373788955311
179.465279399 ...x.... d32768v3 d32768 set timer vCPU3: 373790840795
179.467133153 ...x.... d32768v3 d32768 fire timer vCPU3: 373790840795
179.467163876 ...x.... d32768v3 d32768 set timer vCPU3: 373796829048
the timer seems to work reliably until it suddenly stops...
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-04-22 8:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-19 18:20 PV timer issues in FreeBSD Roger Pau Monné
2013-04-22 7:15 ` Jan Beulich
2013-04-22 8:16 ` Roger Pau Monné
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.