All of lore.kernel.org
 help / color / mirror / Atom feed
* issue with PLE and/or scheduler
@ 2011-12-19 23:10 andrew thomas
  2011-12-23  2:44 ` Zhang, Xiantao
  0 siblings, 1 reply; 2+ messages in thread
From: andrew thomas @ 2011-12-19 23:10 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com; +Cc: andrew.thomas

This is with xen-4.1-testing cs 23201:1c89f7d29fbb
and using the default "credit" scheduler.

I've run into an interesting issue with HVM guests which
make use of Pause Loop Exiting (ie. on westmere systems;
and also on romley systems):  after yielding the cpu, guests
don't seem to receive timer interrupts correctly..

Some background: for historical reasons (ie old templates) we boot 
OL/RHEL guests
with the following settings:

kernel parameters: clock=pit nohpet nopmtimer
vm.cfg: timer_mode = 2

With PLE enabled, 2.6.32 guests will crash early on with:
  ..MP-BIOS bug: 8254 timer not connected to IO-APIC
  # a few lines omitted..
  Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with 
apic=debug

While 2.6.18-238 (ie OL/RHEL5u6) will fail to find the timer, but 
continue and
lock up in the serial line initialization.

  ..MP-BIOS bug: 8254 timer not connected to IO-APIC
  # continues until lock up here:
  Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled

Instrumenting the 2.6.32 code (ie timer_irq_works()) shows that jiffies 
isn't advancing (or only 1 or 2 ticks are
being received, which is insufficient for "working"). This is on a 
"quiet" system with no other activity.
So, even though the guest has voluntarily yielded the cpu (through PLE), 
I would still expect it to
receive every clock tick (even with timer_mode=2) as there is no other 
work to do on the
system.

Disabling PLE allows both 2.6.18 and 2.6.32 guests to boot.. [As an 
aside, so does setting ple_gap to
41 (ie prior to 21355:727ccaaa6cce) -- the perf counters show no exits 
happening,
so this is equivalent to disabling PLE.]

I'm hoping someone who knows the scheduler well will be able to quickly
decide whether this is a bug or a feature...

Andrew

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: issue with PLE and/or scheduler
  2011-12-19 23:10 issue with PLE and/or scheduler andrew thomas
@ 2011-12-23  2:44 ` Zhang, Xiantao
  0 siblings, 0 replies; 2+ messages in thread
From: Zhang, Xiantao @ 2011-12-23  2:44 UTC (permalink / raw)
  To: andrew thomas, xen-devel@lists.xensource.com

Andrew, 
   Can you try this patch to see whether to fix your  issue ? 
Xiantao

diff -r 381ab77db71a xen/arch/x86/hvm/vpt.c
--- a/xen/arch/x86/hvm/vpt.c    Mon Apr 18 10:10:02 2011 +0100
+++ b/xen/arch/x86/hvm/vpt.c    Thu Dec 22 11:35:36 2011 +0800
@@ -185,7 +185,7 @@

     list_for_each_entry ( pt, head, list )
     {
-        if ( pt->pending_intr_nr == 0 )
+        if ( pt->pending_intr_nr == 0 && !pt->do_not_freeze)
         {
             pt_process_missed_ticks(pt);
             set_timer(&pt->timer, pt->scheduled);


> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of andrew thomas
> Sent: Tuesday, December 20, 2011 7:10 AM
> To: xen-devel@lists.xensource.com
> Cc: andrew.thomas@oracle.com
> Subject: [Xen-devel] issue with PLE and/or scheduler
> 
> This is with xen-4.1-testing cs 23201:1c89f7d29fbb and using the default
> "credit" scheduler.
> 
> I've run into an interesting issue with HVM guests which make use of Pause
> Loop Exiting (ie. on westmere systems; and also on romley systems):  after
> yielding the cpu, guests don't seem to receive timer interrupts correctly..
> 
> Some background: for historical reasons (ie old templates) we boot OL/RHEL
> guests with the following settings:
> 
> kernel parameters: clock=pit nohpet nopmtimer
> vm.cfg: timer_mode = 2
> 
> With PLE enabled, 2.6.32 guests will crash early on with:
>   ..MP-BIOS bug: 8254 timer not connected to IO-APIC
>   # a few lines omitted..
>   Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with
> apic=debug
> 
> While 2.6.18-238 (ie OL/RHEL5u6) will fail to find the timer, but continue and
> lock up in the serial line initialization.
> 
>   ..MP-BIOS bug: 8254 timer not connected to IO-APIC
>   # continues until lock up here:
>   Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
> 
> Instrumenting the 2.6.32 code (ie timer_irq_works()) shows that jiffies isn't
> advancing (or only 1 or 2 ticks are being received, which is insufficient for
> "working"). This is on a "quiet" system with no other activity.
> So, even though the guest has voluntarily yielded the cpu (through PLE), I
> would still expect it to receive every clock tick (even with timer_mode=2) as
> there is no other work to do on the system.
> 
> Disabling PLE allows both 2.6.18 and 2.6.32 guests to boot.. [As an aside, so
> does setting ple_gap to
> 41 (ie prior to 21355:727ccaaa6cce) -- the perf counters show no exits
> happening, so this is equivalent to disabling PLE.]
> 
> I'm hoping someone who knows the scheduler well will be able to quickly
> decide whether this is a bug or a feature...
> 
> Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-12-23  2:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-19 23:10 issue with PLE and/or scheduler andrew thomas
2011-12-23  2:44 ` Zhang, Xiantao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.