* xen tsc problems? @ 2010-07-13 14:37 Stefano Stabellini 2010-07-13 14:43 ` Keir Fraser 0 siblings, 1 reply; 10+ messages in thread From: Stefano Stabellini @ 2010-07-13 14:37 UTC (permalink / raw) To: xen-devel; +Cc: Dan Magenheimer Hi all, I get this warning from the HVM DomU kernel (both PV on HVM or normal HVM): checking TSC synchronization [CPU#0 -> CPU#1]: Measured 116836520 cycles TSC warp between CPUs, turning off TSC clock. Marking TSC unstable due to check_tsc_sync_source failed the host cpu is the following: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Genuine Intel(R) CPU 3.00GHz stepping : 2 cpu MHz : 3000.050 cache size : 2048 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht nx constant_tsc pni est cid hypervisor arat bogomips : 6000.10 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: it happens on both xen 4.0 (pre 21236) and 4.1. if I specify tsc_mode=2, first I get: checking TSC synchronization [CPU#0 -> CPU#1]: passed. but a little bit afterwards: Clocksource tsc unstable (delta = 116372610 ns) If I use a PV on HVM kernel (pv timer enabled) and tsc_mode=1, besides these messages I get about 20-30 messages like the following: CE: xen increased min_delta_ns to 506250 nsec tracing them back to xen I found out that they happen when the guest kernel tries to set the next timer event in the past. Does this mean that the host has some serious tsc issues? Can this be a symptom of a bug in xen? Suggestion are welcome. Cheers, Stefano ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xen tsc problems? 2010-07-13 14:37 xen tsc problems? Stefano Stabellini @ 2010-07-13 14:43 ` Keir Fraser 2010-07-13 15:37 ` Dan Magenheimer 0 siblings, 1 reply; 10+ messages in thread From: Keir Fraser @ 2010-07-13 14:43 UTC (permalink / raw) To: Stefano Stabellini, xen-devel@lists.xensource.com; +Cc: Dan Magenheimer On 13/07/2010 15:37, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com> wrote: > Does this mean that the host has some serious tsc issues? > Can this be a symptom of a bug in xen? > Suggestion are welcome. The 's' and 't' debug key handlers will be useful to get an idea of how stable host TSCs are. -- Keir ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: xen tsc problems? 2010-07-13 14:43 ` Keir Fraser @ 2010-07-13 15:37 ` Dan Magenheimer 2010-07-13 17:39 ` Stefano Stabellini 0 siblings, 1 reply; 10+ messages in thread From: Dan Magenheimer @ 2010-07-13 15:37 UTC (permalink / raw) To: Keir Fraser, Stefano Stabellini, xen-devel > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > > On 13/07/2010 15:37, "Stefano Stabellini" > <Stefano.Stabellini@eu.citrix.com> > wrote: > > > Does this mean that the host has some serious tsc issues? > > Can this be a symptom of a bug in xen? > > Suggestion are welcome. > > The 's' and 't' debug key handlers will be useful to get an idea of how > stable host TSCs are. > > -- Keir Also you can try max_cstate=0 as a Xen boot parameter to rule out power management screwing up the tsc. > > Does this mean that the host has some serious tsc issues? Probably. But the default tsc_mode (0) is intended to hide all such issues. Could you check the 's' debug-key output to ensure your guest is actually running with tsc_mode=0? > > Can this be a symptom of a bug in xen? Well, if the guest has problems with the default tsc_mode (0), which does complete tsc emulation, I suppose it could be a bug in Xen. In particular, I wonder if the code that recovers from deep C-states (and writes to the TSC) is broken. IIRC, there was some changesets in that area recently. If the problem goes away with max_cstate=0, that would be a good place to start. Dan ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: xen tsc problems? 2010-07-13 15:37 ` Dan Magenheimer @ 2010-07-13 17:39 ` Stefano Stabellini 2010-07-13 17:48 ` Keir Fraser 0 siblings, 1 reply; 10+ messages in thread From: Stefano Stabellini @ 2010-07-13 17:39 UTC (permalink / raw) To: Dan Magenheimer Cc: xen-devel@lists.xensource.com, Keir Fraser, Stefano Stabellini On Tue, 13 Jul 2010, Dan Magenheimer wrote: > > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > > > > On 13/07/2010 15:37, "Stefano Stabellini" > > <Stefano.Stabellini@eu.citrix.com> > > wrote: > > > > > Does this mean that the host has some serious tsc issues? > > > Can this be a symptom of a bug in xen? > > > Suggestion are welcome. > > > > The 's' and 't' debug key handlers will be useful to get an idea of how > > stable host TSCs are. > > > > -- Keir > > Also you can try max_cstate=0 as a Xen boot parameter to rule > out power management screwing up the tsc. > > > > Does this mean that the host has some serious tsc issues? > > Probably. But the default tsc_mode (0) is intended to hide all > such issues. Could you check the 's' debug-key output to > ensure your guest is actually running with tsc_mode=0? > this is the output of 's' and 't' without max_cstate=0: (XEN) Synced stime skew: max=245ns avg=202ns samples=2 current=160ns (XEN) Synced cycles skew: max=615 avg=577 samples=2 current=540 (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=0 (count=2) (XEN) dom3(hvm): mode=0,ofs=0x2b2e19a77ea,khz=3000048,inc=1,vtsc count: 1211682 total this is the output of 's' and 't' with max_cstate=0: (XEN) Synced stime skew: max=110ns avg=105ns samples=2 current=110ns (XEN) Synced cycles skew: max=1020 avg=652 samples=2 current=285 (XEN) TSC has constant rate, no deep Cstates, passed warp test, deemed reliable, warp=0 (count=2) (XEN) dom2(hvm): mode=0,ofs=0xb748091f5,khz=3000032,inc=1,vtsc count: 758954 total I still get the same warning from the guest. I started to wonder why the guest is seeing such a big tsc warp when xen is seeing 0, so I added more tracing and eventually I found out that the value of v->arch.hvm_vcpu.stime_offset is significantly different between the two vcpus and the difference increases after the scaling. Then I added timer_mode=1 to my vm config file and the problem went away. I think that delay_for_missed_ticks shouldn't cause tsc scew in the guest. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xen tsc problems? 2010-07-13 17:39 ` Stefano Stabellini @ 2010-07-13 17:48 ` Keir Fraser 2010-07-13 18:06 ` Dan Magenheimer 2010-07-13 18:12 ` Keir Fraser 0 siblings, 2 replies; 10+ messages in thread From: Keir Fraser @ 2010-07-13 17:48 UTC (permalink / raw) To: Stefano Stabellini, Dan Magenheimer; +Cc: xen-devel@lists.xensource.com On 13/07/2010 18:39, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com> wrote: > I started to wonder why the guest is seeing such a big tsc warp when xen > is seeing 0, so I added more tracing and eventually I found out that the > value of v->arch.hvm_vcpu.stime_offset is significantly different > between the two vcpus and the difference increases after the scaling. > Then I added timer_mode=1 to my vm config file and the problem went > away. > I think that delay_for_missed_ticks shouldn't cause tsc scew in > the guest. Well, timer_mode=1 is the default and I doubt in all seriousness that the other modes get any use or testing. -- Keir ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: xen tsc problems? 2010-07-13 17:48 ` Keir Fraser @ 2010-07-13 18:06 ` Dan Magenheimer 2010-07-13 18:14 ` Stefano Stabellini 2010-07-13 18:12 ` Keir Fraser 1 sibling, 1 reply; 10+ messages in thread From: Dan Magenheimer @ 2010-07-13 18:06 UTC (permalink / raw) To: Keir Fraser, Stefano Stabellini; +Cc: xen-devel /me wonders if timer_mode=1 is the default for xl? Or only for xm? > -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Tuesday, July 13, 2010 11:48 AM > To: Stefano Stabellini; Dan Magenheimer > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] xen tsc problems? > > On 13/07/2010 18:39, "Stefano Stabellini" > <Stefano.Stabellini@eu.citrix.com> > wrote: > > > I started to wonder why the guest is seeing such a big tsc warp when > xen > > is seeing 0, so I added more tracing and eventually I found out that > the > > value of v->arch.hvm_vcpu.stime_offset is significantly different > > between the two vcpus and the difference increases after the scaling. > > Then I added timer_mode=1 to my vm config file and the problem went > > away. > > I think that delay_for_missed_ticks shouldn't cause tsc scew in > > the guest. > > Well, timer_mode=1 is the default and I doubt in all seriousness that > the > other modes get any use or testing. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: xen tsc problems? 2010-07-13 18:06 ` Dan Magenheimer @ 2010-07-13 18:14 ` Stefano Stabellini 2010-07-13 18:59 ` Keir Fraser 0 siblings, 1 reply; 10+ messages in thread From: Stefano Stabellini @ 2010-07-13 18:14 UTC (permalink / raw) To: Dan Magenheimer Cc: xen-devel@lists.xensource.com, Keir Fraser, Stefano Stabellini On Tue, 13 Jul 2010, Dan Magenheimer wrote: > /me wonders if timer_mode=1 is the default for xl? > Or only for xm? no, it is not. Xl defaults to 0, I am going to change it right now. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xen tsc problems? 2010-07-13 18:14 ` Stefano Stabellini @ 2010-07-13 18:59 ` Keir Fraser 2010-07-13 19:32 ` Dan Magenheimer 0 siblings, 1 reply; 10+ messages in thread From: Keir Fraser @ 2010-07-13 18:59 UTC (permalink / raw) To: Stefano Stabellini, Dan Magenheimer; +Cc: xen-devel@lists.xensource.com On 13/07/2010 19:14, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com> wrote: > On Tue, 13 Jul 2010, Dan Magenheimer wrote: >> /me wonders if timer_mode=1 is the default for xl? >> Or only for xm? > > no, it is not. > Xl defaults to 0, I am going to change it right now. Possibly we should make timer_mode=1 the default in Xen as well, and actually disallow setting it to 0. Clearly no good comes of it. -- Keir ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: xen tsc problems? 2010-07-13 18:59 ` Keir Fraser @ 2010-07-13 19:32 ` Dan Magenheimer 0 siblings, 0 replies; 10+ messages in thread From: Dan Magenheimer @ 2010-07-13 19:32 UTC (permalink / raw) To: Keir Fraser, Stefano Stabellini; +Cc: xen-devel > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Tuesday, July 13, 2010 12:59 PM > To: Stefano Stabellini; Dan Magenheimer > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] xen tsc problems? > > On 13/07/2010 19:14, "Stefano Stabellini" > <Stefano.Stabellini@eu.citrix.com> > wrote: > > > On Tue, 13 Jul 2010, Dan Magenheimer wrote: > >> /me wonders if timer_mode=1 is the default for xl? > >> Or only for xm? > > > > no, it is not. > > Xl defaults to 0, I am going to change it right now. > > Possibly we should make timer_mode=1 the default in Xen as well, and > actually disallow setting it to 0. Clearly no good comes of it. IIRC from >2 years ago, timer_mode=0 was best for older HVM 32-bit Linux guests. Obviously if the code (interacting with the tsc code) has bit-rotted, "best" is a relative term. :-) Dan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xen tsc problems? 2010-07-13 17:48 ` Keir Fraser 2010-07-13 18:06 ` Dan Magenheimer @ 2010-07-13 18:12 ` Keir Fraser 1 sibling, 0 replies; 10+ messages in thread From: Keir Fraser @ 2010-07-13 18:12 UTC (permalink / raw) To: Stefano Stabellini, Dan Magenheimer; +Cc: xen-devel@lists.xensource.com On 13/07/2010 18:48, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: >> I started to wonder why the guest is seeing such a big tsc warp when xen >> is seeing 0, so I added more tracing and eventually I found out that the >> value of v->arch.hvm_vcpu.stime_offset is significantly different >> between the two vcpus and the difference increases after the scaling. >> Then I added timer_mode=1 to my vm config file and the problem went >> away. >> I think that delay_for_missed_ticks shouldn't cause tsc scew in >> the guest. > > Well, timer_mode=1 is the default and I doubt in all seriousness that the > other modes get any use or testing. To give you an idea how long it's probably been broken, my suspicion is that the culprit is xen-unstable:17716, which is over two years old. That patch changed HVM time handling to base it more on Xen system time. The fact that hvm_set_guest_time() no longer directly affects guest TSC is probably the problem here. I think delay_for_missed_ticks might depend on that. Anyway, I'm not certain but I'd put money on it. -- Keir ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-07-13 19:32 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-13 14:37 xen tsc problems? Stefano Stabellini 2010-07-13 14:43 ` Keir Fraser 2010-07-13 15:37 ` Dan Magenheimer 2010-07-13 17:39 ` Stefano Stabellini 2010-07-13 17:48 ` Keir Fraser 2010-07-13 18:06 ` Dan Magenheimer 2010-07-13 18:14 ` Stefano Stabellini 2010-07-13 18:59 ` Keir Fraser 2010-07-13 19:32 ` Dan Magenheimer 2010-07-13 18:12 ` Keir Fraser
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).