* Xen 4.1 interrupts not delievered.
@ 2010-10-12 17:17 Konrad Rzeszutek Wilk
2010-10-12 23:34 ` Keir Fraser
0 siblings, 1 reply; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-10-12 17:17 UTC (permalink / raw)
To: bruce.edge, Ray.Lin, linux, caker, xen-devel, keir, Ian.Campbell,
m.a.young, jeremy
Hey folks,
The last month there was a flurry of emails about PCI passthrough devices and
some that in which a high load is put on machine and interrupts are
not being delivered (to Dom0 it looks that way) - and even the cases where
there are no PCI passthrough enabled.
Three things that pop up is that:
1). Xen 3.4.4-rc1-pre does not have this problem (this is Mr. Christopher
email about 3Ware 9690SA).
2). Moving all Domains from physical cpu 0 makes the problem go away
(this is Mr. Sander email about High cpu load for events/0).
3). No fixes yet.
A couple of that might fix the problems are:
1). Ian's fix to the event channels:
http://xenbits.xen.org/gitweb?p=people/ianc/linux-2.6.git;a=commit;h=5d30cb2a85912ffb5f6556d55472c26801eef2ea
2). Disable IRQ balancing in Xen (and also in Linux kernel). "noirqbalance"
3). Pin domains, but nothing to Domain 0.
But it might be worth trying them out?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xen 4.1 interrupts not delievered.
2010-10-12 17:17 Xen 4.1 interrupts not delievered Konrad Rzeszutek Wilk
@ 2010-10-12 23:34 ` Keir Fraser
2010-10-13 7:00 ` Sander Eikelenboom
0 siblings, 1 reply; 5+ messages in thread
From: Keir Fraser @ 2010-10-12 23:34 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Bruce Edge, Ray.Lin, linux, caker,
xen-devel, Ia
On 12/10/2010 18:17, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
> A couple of that might fix the problems are:
>
> 1). Ian's fix to the event channels:
> http://xenbits.xen.org/gitweb?p=people/ianc/linux-2.6.git;a=commit;h=5d30cb2a8
> 5912ffb5f6556d55472c26801eef2ea
> 2). Disable IRQ balancing in Xen (and also in Linux kernel). "noirqbalance"
> 3). Pin domains, but nothing to Domain 0.
ITYM cpu 0. Not that this should rightly make any difference that I can see.
My suspicion would be the per-CPU IDT patches introduced during 4.0
development. Or changes to enable deep C-state sleeps by default. One or the
other causing lost interrupts. I think the latter can be discounted by
max_cstate=1 as a Xen boot parameter. The former would require trying a
build of Xen before and after changesets 20072/20073 -- they are the ones
that did the heavy lifting to implement per-CPU IDTs.
-- Keir
> But it might be worth trying them out?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xen 4.1 interrupts not delievered.
2010-10-12 23:34 ` Keir Fraser
@ 2010-10-13 7:00 ` Sander Eikelenboom
2010-10-13 7:52 ` Keir Fraser
0 siblings, 1 reply; 5+ messages in thread
From: Sander Eikelenboom @ 2010-10-13 7:00 UTC (permalink / raw)
To: Keir Fraser
Cc: Jeremy Fitzhardinge, xen-devel, Konrad Rzeszutek Wilk, Ray.Lin,
Ian Campbell, m.a.young, Bruce Edge
Hello Keir,
OK let's rephrase, in what cases is it logical that the xen serial console freezes together with dom0 ?
For example some deadlock causes cpu0 to stall on a heavily loaded system ..
I think having the serial console available to dump the machines state is quite vital :-(
I have tried the max_cstate=1 together with the latest 2.6.32-xen-next-pvops kernel as dom0 kernel (which Ian's fix to the event channels).
But with the compile test it freezes just as fast.
Will try xen before changesets 20072/20073 now, probably with 2.6.31 pvops, since 2.6.32 would need a more recent hypervisor.
--
Sander
Wednesday, October 13, 2010, 1:34:58 AM, you wrote:
> On 12/10/2010 18:17, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
>> A couple of that might fix the problems are:
>>
>> 1). Ian's fix to the event channels:
>> http://xenbits.xen.org/gitweb?p=people/ianc/linux-2.6.git;a=commit;h=5d30cb2a8
>> 5912ffb5f6556d55472c26801eef2ea
>> 2). Disable IRQ balancing in Xen (and also in Linux kernel). "noirqbalance"
>> 3). Pin domains, but nothing to Domain 0.
> ITYM cpu 0. Not that this should rightly make any difference that I can see.
> My suspicion would be the per-CPU IDT patches introduced during 4.0
> development. Or changes to enable deep C-state sleeps by default. One or the
> other causing lost interrupts. I think the latter can be discounted by
> max_cstate=1 as a Xen boot parameter. The former would require trying a
> build of Xen before and after changesets 20072/20073 -- they are the ones
> that did the heavy lifting to implement per-CPU IDTs.
> -- Keir
>> But it might be worth trying them out?
--
Best regards,
Sander mailto:linux@eikelenboom.it
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xen 4.1 interrupts not delievered.
2010-10-13 7:00 ` Sander Eikelenboom
@ 2010-10-13 7:52 ` Keir Fraser
2010-10-16 17:56 ` Sander Eikelenboom
0 siblings, 1 reply; 5+ messages in thread
From: Keir Fraser @ 2010-10-13 7:52 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Jeremy Fitzhardinge, xen-devel, Konrad Rzeszutek Wilk, Ray.Lin,
Ian Campbell, m.a.young, Bruce Edge
On 13/10/2010 08:00, "Sander Eikelenboom" <linux@eikelenboom.it> wrote:
> Hello Keir,
>
> OK let's rephrase, in what cases is it logical that the xen serial console
> freezes together with dom0 ?
> For example some deadlock causes cpu0 to stall on a heavily loaded system ..
> I think having the serial console available to dump the machines state is
> quite vital :-(
Oh, there was a fix for serial interrupt routing: xen-unstable:22148 or
xen-4.0-testing:21342. Are you running a more recent hypervisor than that?
The fix prevents serial interrupt from being migrated away from pcpu0, which
will not work as there is no vector allocated for it on other pcpus. This
kind of fits with the bug you're seeing, which doesn't manifest if you leave
pcpu0 unloaded (and hence presumably serial interrupt binding prefers to
stay with unloaded pcpu0).
-- Keir
> I have tried the max_cstate=1 together with the latest 2.6.32-xen-next-pvops
> kernel as dom0 kernel (which Ian's fix to the event channels).
> But with the compile test it freezes just as fast.
> Will try xen before changesets 20072/20073 now, probably with 2.6.31 pvops,
> since 2.6.32 would need a more recent hypervisor.
>
> --
> Sander
>
>
> Wednesday, October 13, 2010, 1:34:58 AM, you wrote:
>
>> On 12/10/2010 18:17, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
>
>>> A couple of that might fix the problems are:
>>>
>>> 1). Ian's fix to the event channels:
>>> http://xenbits.xen.org/gitweb?p=people/ianc/linux-2.6.git;a=commit;h=5d30cb2
>>> a8
>>> 5912ffb5f6556d55472c26801eef2ea
>>> 2). Disable IRQ balancing in Xen (and also in Linux kernel). "noirqbalance"
>>> 3). Pin domains, but nothing to Domain 0.
>
>> ITYM cpu 0. Not that this should rightly make any difference that I can see.
>
>> My suspicion would be the per-CPU IDT patches introduced during 4.0
>> development. Or changes to enable deep C-state sleeps by default. One or the
>> other causing lost interrupts. I think the latter can be discounted by
>> max_cstate=1 as a Xen boot parameter. The former would require trying a
>> build of Xen before and after changesets 20072/20073 -- they are the ones
>> that did the heavy lifting to implement per-CPU IDTs.
>
>> -- Keir
>
>>> But it might be worth trying them out?
>
>
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xen 4.1 interrupts not delievered.
2010-10-13 7:52 ` Keir Fraser
@ 2010-10-16 17:56 ` Sander Eikelenboom
0 siblings, 0 replies; 5+ messages in thread
From: Sander Eikelenboom @ 2010-10-16 17:56 UTC (permalink / raw)
To: Keir Fraser
Cc: Jeremy Fitzhardinge, xen-devel, Konrad Rzeszutek Wilk, Ray.Lin,
Ian Campbell, m.a.young, Bruce Edge
[-- Attachment #1: Type: text/plain, Size: 4459 bytes --]
Hi Keir,
I don't know if it can give any insights, but i tried running a xentrace, the only thing i don't know is how close to the real freeze has made it to disk ...
In these last 2 seconds of the trace i do see some times:
169.940118823 ||xl d1v0 hypercall 17 (iret) eip ffffffff810012eb
169.940119616 ||xl d1v0 hypercall 11 (xen_version) eip ffffffff8100122a
169.940120050 ||xl d1v0 hypercall 11 (xen_version) eip ffffffff8100122a
169.940120540 ||xl d1v0 hypercall 1d (sched_op) eip ffffffff810013aa
]169.940120843 ||xl d1v0 28006(2:8:6) 2 [ 1 0 ]
]169.940122066 ||xl d1v0 2800e(2:8:e) 2 [ 1 6db9 ]
]169.940122206 ||xl d1v0 2800f(2:8:f) 3 [ 0 6db9 1c9c380 ]
]169.940122393 ||xl d1v0 2800a(2:8:a) 4 [ 1 0 0 2 ]
169.940122586 ||xl d1v0 runstate_change d1v0 running->blocked
sched_runstate_process: 1 lost cpus, setting d1v0 runstate to RUNSTATE_LOST
169.940122820 ||xl d?v? runstate_change d0v2 runnable->running
169.940124900 |x|l d0v0 page_fault[ db3124a0 2b9e dc0d1000 2b9e 6 ]
169.940125350 ||xl d0v2 hypercall 11 (xen_version) eip ffffffff8100922a
169.940125986 ||xl d0v2 hypercall 11 (xen_version) eip ffffffff8100922a
169.940126983 |x|l d0v0 hypercall 11 (xen_version) eip ffffffff8100922a
169.940127210 ||xl d0v2 emulate privop[ 8167dc5e ffffffff ]
169.940127773 ||xl d0v2 emulate privop[ 8167dca6 ffffffff ]
But perhaps that sounds worse than it actually is.
This trace was done on:
- Intel Quad core
- only 1 domU started, with videograbbing on pci-e xhci controller, device using msi-x interrupts
- xen_changeset : Fri Oct 08 11:41:57 2010 +0100 22230:a33886146b45
- dom0 kernel jeremy's pvops xen/next last commit 4ac23c27f34a5ea45a098b0f6e08bf5cc6e74756
- domU kernel konrad's pcifront-0.8.1 tree last commit 369bae8ae5c5e4b122f77726a4c957108ad724ad
Attached:
- last piece of the trace bzip2'ed
--
Sander
Wednesday, October 13, 2010, 9:52:22 AM, you wrote:
> On 13/10/2010 08:00, "Sander Eikelenboom" <linux@eikelenboom.it> wrote:
>> Hello Keir,
>>
>> OK let's rephrase, in what cases is it logical that the xen serial console
>> freezes together with dom0 ?
>> For example some deadlock causes cpu0 to stall on a heavily loaded system ..
>> I think having the serial console available to dump the machines state is
>> quite vital :-(
> Oh, there was a fix for serial interrupt routing: xen-unstable:22148 or
> xen-4.0-testing:21342. Are you running a more recent hypervisor than that?
> The fix prevents serial interrupt from being migrated away from pcpu0, which
> will not work as there is no vector allocated for it on other pcpus. This
> kind of fits with the bug you're seeing, which doesn't manifest if you leave
> pcpu0 unloaded (and hence presumably serial interrupt binding prefers to
> stay with unloaded pcpu0).
> -- Keir
>> I have tried the max_cstate=1 together with the latest 2.6.32-xen-next-pvops
>> kernel as dom0 kernel (which Ian's fix to the event channels).
>> But with the compile test it freezes just as fast.
>> Will try xen before changesets 20072/20073 now, probably with 2.6.31 pvops,
>> since 2.6.32 would need a more recent hypervisor.
>>
>> --
>> Sander
>>
>>
>> Wednesday, October 13, 2010, 1:34:58 AM, you wrote:
>>
>>> On 12/10/2010 18:17, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
>>
>>>> A couple of that might fix the problems are:
>>>>
>>>> 1). Ian's fix to the event channels:
>>>> http://xenbits.xen.org/gitweb?p=people/ianc/linux-2.6.git;a=commit;h=5d30cb2
>>>> a8
>>>> 5912ffb5f6556d55472c26801eef2ea
>>>> 2). Disable IRQ balancing in Xen (and also in Linux kernel). "noirqbalance"
>>>> 3). Pin domains, but nothing to Domain 0.
>>
>>> ITYM cpu 0. Not that this should rightly make any difference that I can see.
>>
>>> My suspicion would be the per-CPU IDT patches introduced during 4.0
>>> development. Or changes to enable deep C-state sleeps by default. One or the
>>> other causing lost interrupts. I think the latter can be discounted by
>>> max_cstate=1 as a Xen boot parameter. The former would require trying a
>>> build of Xen before and after changesets 20072/20073 -- they are the ones
>>> that did the heavy lifting to implement per-CPU IDTs.
>>
>>> -- Keir
>>
>>>> But it might be worth trying them out?
>>
>>
>>
>>
--
Best regards,
Sander mailto:linux@eikelenboom.it
[-- Attachment #2: xen-trace.bz2 --]
[-- Type: application/octet-stream, Size: 1776156 bytes --]
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-10-16 17:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-12 17:17 Xen 4.1 interrupts not delievered Konrad Rzeszutek Wilk
2010-10-12 23:34 ` Keir Fraser
2010-10-13 7:00 ` Sander Eikelenboom
2010-10-13 7:52 ` Keir Fraser
2010-10-16 17:56 ` Sander Eikelenboom
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.