xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* recurring boot time scalability issues affecting time management
@ 2010-05-11  6:59 Jan Beulich
  2010-05-11  7:57 ` Keir Fraser
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2010-05-11  6:59 UTC (permalink / raw)
  To: xen-devel

The patch titled "VT-d: prevent watchdog timer from kicking in when
initializing on systems with huge amounts of memory" submitted
yesterday, other than expected, fully fixed another late boot hang
(when Dom0 was almost fully up), i.e. apparently unrelated to issues
that may exist before Dom0 even gets started. There were no
indications of time problems in any of the logs, yet there must have
been such given that the boot hung without that change, but didn't
with it in place.

I wonder whether the time handling code in Xen itself shouldn't/can't
therefore be made more robust, or at least reliably detect this sort of
issue (from past analysis of similar problems, the platform timer is
rolling over due to there not being frequent enough invocations of
plt_overflow()) to make analysis of the problem easier.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: recurring boot time scalability issues affecting time management
  2010-05-11  6:59 recurring boot time scalability issues affecting time management Jan Beulich
@ 2010-05-11  7:57 ` Keir Fraser
  2010-05-11  8:51   ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Keir Fraser @ 2010-05-11  7:57 UTC (permalink / raw)
  To: Jan Beulich, xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

On 11/05/2010 07:59, "Jan Beulich" <JBeulich@novell.com> wrote:

> I wonder whether the time handling code in Xen itself shouldn't/can't
> therefore be made more robust, or at least reliably detect this sort of
> issue (from past analysis of similar problems, the platform timer is
> rolling over due to there not being frequent enough invocations of
> plt_overflow()) to make analysis of the problem easier.

Something like the attached?

 -- Keir

> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


[-- Attachment #2: 00-plt-ovf --]
[-- Type: application/octet-stream, Size: 1905 bytes --]

diff -r 4ede18e45935 xen/arch/x86/time.c
--- a/xen/arch/x86/time.c	Tue May 11 08:39:01 2010 +0100
+++ b/xen/arch/x86/time.c	Tue May 11 08:56:56 2010 +0100
@@ -571,19 +571,6 @@
 static u64 plt_stamp;            /* hardware-width platform counter stamp   */
 static struct timer plt_overflow_timer;
 
-static void plt_overflow(void *unused)
-{
-    u64 count;
-
-    spin_lock_irq(&platform_timer_lock);
-    count = plt_src.read_counter();
-    plt_stamp64 += (count - plt_stamp) & plt_mask;
-    plt_stamp = count;
-    spin_unlock_irq(&platform_timer_lock);
-
-    set_timer(&plt_overflow_timer, NOW() + plt_overflow_period);
-}
-
 static s_time_t __read_platform_stime(u64 platform_time)
 {
     u64 diff = platform_time - platform_timer_stamp;
@@ -591,6 +578,41 @@
     return (stime_platform_stamp + scale_delta(diff, &plt_scale));
 }
 
+static void plt_overflow(void *unused)
+{
+    int i;
+    u64 count;
+    s_time_t now, plt_now, plt_wrap;
+
+    spin_lock_irq(&platform_timer_lock);
+
+    count = plt_src.read_counter();
+    plt_stamp64 += (count - plt_stamp) & plt_mask;
+    plt_stamp = count;
+
+    now = NOW();
+    plt_wrap = __read_platform_stime(plt_stamp64);
+    for ( i = 0; i < 10; i++ )
+    {
+        plt_now = plt_wrap;
+        plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1);
+        if ( __builtin_abs(plt_wrap - now) > __builtin_abs(plt_now - now) )
+            break;
+        plt_stamp64 += plt_mask + 1;
+    }
+    if ( i != 0 )
+    {
+        static bool_t warned_once;
+        if ( !test_and_set_bool(warned_once) )
+            printk("Platform timer appears to have unexpectedly wrapped "
+                   "%u%s times.\n", i, (i == 10) ? " or more" : "");
+    }
+
+    spin_unlock_irq(&platform_timer_lock);
+
+    set_timer(&plt_overflow_timer, NOW() + plt_overflow_period);
+}
+
 static s_time_t read_platform_stime(void)
 {
     u64 count;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: recurring boot time scalability issues affecting time management
  2010-05-11  7:57 ` Keir Fraser
@ 2010-05-11  8:51   ` Jan Beulich
  2010-05-11 10:22     ` Keir Fraser
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2010-05-11  8:51 UTC (permalink / raw)
  To: Keir Fraser, xen-devel@lists.xensource.com

>>> Keir Fraser <keir.fraser@eu.citrix.com> 11.05.10 09:57 >>>
>On 11/05/2010 07:59, "Jan Beulich" <JBeulich@novell.com> wrote:
>
>> I wonder whether the time handling code in Xen itself shouldn't/can't
>> therefore be made more robust, or at least reliably detect this sort of
>> issue (from past analysis of similar problems, the platform timer is
>> rolling over due to there not being frequent enough invocations of
>> plt_overflow()) to make analysis of the problem easier.
>
>Something like the attached?

Yes, except that you probably mean __builtin_llabs() rather than
__builtin_abs().

Thanks! Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: recurring boot time scalability issues  affecting time management
  2010-05-11  8:51   ` Jan Beulich
@ 2010-05-11 10:22     ` Keir Fraser
  0 siblings, 0 replies; 4+ messages in thread
From: Keir Fraser @ 2010-05-11 10:22 UTC (permalink / raw)
  To: Jan Beulich, xen-devel@lists.xensource.com

On 11/05/2010 09:51, "Jan Beulich" <JBeulich@novell.com> wrote:

>>>> Keir Fraser <keir.fraser@eu.citrix.com> 11.05.10 09:57 >>>
>> On 11/05/2010 07:59, "Jan Beulich" <JBeulich@novell.com> wrote:
>> 
>>> I wonder whether the time handling code in Xen itself shouldn't/can't
>>> therefore be made more robust, or at least reliably detect this sort of
>>> issue (from past analysis of similar problems, the platform timer is
>>> rolling over due to there not being frequent enough invocations of
>>> plt_overflow()) to make analysis of the problem easier.
>> 
>> Something like the attached?
> 
> Yes, except that you probably mean __builtin_llabs() rather than
> __builtin_abs().

xen-unstable:21346

 -- Keir

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-05-11 10:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-11  6:59 recurring boot time scalability issues affecting time management Jan Beulich
2010-05-11  7:57 ` Keir Fraser
2010-05-11  8:51   ` Jan Beulich
2010-05-11 10:22     ` Keir Fraser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).