Hello again, the last two weeks no crash with pinning dom0_vcpus_pin and restricting dom0 to 1 cpu. But yesterday it crashed again. So changed the command line again to: iommu=no-intremap noirqbalance com1=115200,8n1,0xe050,0 console=com1,vga mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M cpuid_mask_xsave_eax=0 And today server crashed again and produced a lot of debugging messages, see attached. The "..." in the logfiles mean that the message above the points was repeated very often. My summary so far: - With only 1 cpu atteched to dom0 the server was stable for 2 weeks, the crash there did not really show any irq problems, see crash20130903.txt You can find Andrews ideas to this in http://forums.citrix.com/thread.jspa?messageID=1760771#1760771 - With more than 1 cpu and irqbalance the server produced the crashes I've already posted before - Without irqbalance crash with some other fancy output, see crash20130904.txt Next step is to change the network card. Zhang, any update from your side ? Or do the others have any idea ? Could "ioapic_ack=old" help somewhere ? Best regards Thimo Am 27.08.2013 03:03, schrieb Zhang, Yang Z: > Zhang, Yang Z wrote on 2013-08-23: >> Thimo Eichstädt wrote on 2013-08-23: >>> Hello Yang, >>> >>> any update from your side ? Did your expert have any idea ? Possible >>> Hardware problem ? >> Sorry, no update on this. I am still waiting the answer from hardware team. > Hi Thimo, > > I remember that the CPU always in idle state when this issue happens. So can you have a try to disable the C state in Xen to see if it helps? > >>> Best regards >>> Thimo >>> Am 20.08.2013 10:50, schrieb Zhang, Yang Z: >>>> Jan Beulich wrote on 2013-08-20: >>>>>>>> On 20.08.13 at 07:43, Thimo Eichstädt wrote: >>>>>> (XEN) **Pending EOI error^M (XEN) irq 29, vector 0x21^M (XEN) s[0] >>>>>> irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR >>>>>> 00000000^M (XEN) All LAPIC state:^M (XEN) [vector] ISR TMR >>>>>> IRR^M (XEN) [1f:00] 00000000 00000000 00000000^M (XEN) [3f:20] >>>>>> 00020002 00000000 00000000^M >>>>> It ought to be plain impossible to receive an interrupt at vector >>>>> 0x21 while the ISR bit for vector 0x31 is still set. >>>>> >>>>> Intel folks - any input on this? >>>> I have no idea with this. But I will forward the information to >>>> some experts internally for help. >>>> >>>>> Jan >>>> Best regards, >>>> Yang >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xen.org >>>> http://lists.xen.org/xen-devel >> >> Best regards, >> Yang >> > > Best regards, > Yang > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel