From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pasi =?iso-8859-1?Q?K=E4rkk=E4inen?= Subject: Re: XCP: Crashes on dual Xeon HP ProLiant systems Date: Fri, 30 Apr 2010 21:20:07 +0300 Message-ID: <20100430182007.GA17817@reaktio.net> References: <201004300932.37495.dwight@supercomputer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <201004300932.37495.dwight@supercomputer.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "dwight at supercomputer.org" Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On Fri, Apr 30, 2010 at 09:32:37AM -0700, dwight at supercomputer.org wrote: > Is anyone else running the latest XCP on HP ProLiant DL380 > systems? Or a similar dual Xeon 8-core system? I'm seeing > spontaneous reboots when under a load. > > Specifically, when 4 Windows HVMs are loaded, I haven't noticed > any reboots yet. But when running 7 or 8, the system will > reboot within minutes. Very little information appears on > the console. > > I built a debugging version of the hypervisor, which changed > the behavior; the system managed to stay up for 2-3 hours > with 7 VMs running. However, it again spontaneously rebooted, > with no real messages on the console as to why. > > I can send out the console log messages this evening, along > with the system information if there's interest. Alas, I > don't have access to these items at the moment. > > I have also been running memtest86 overnight. As of 1.5 hours into > the test, there were no errors. But there are 48 GB of RAM > on the system, so the testing wasn't complete when I left. > > Any suggestions here? I was going to build a 32-bit kernel > from the latest patches, but it appears Centos 5.4 Xen is > also not stable on these systems. I had trouble getting > the kernel to build here, with various errors. The most > notable of which was: > > ---------------------- > CC arch/x86/kernel/acpi/processor.o > In file included from arch/x86/kernel/acpi/processor.c:8: > include/linux/kernel.h:185: internal compiler error: Segmentation > fault > Please submit a full bug report, > with preprocessed source if appropriate. > See for instructions. > The bug is not reproducible, so it is likely a hardware or OS > problem. > make[2]: *** [arch/x86/kernel/acpi/processor.o] Error 1 > make[1]: *** [arch/x86/kernel/acpi] Error 2 > make: *** [arch/x86/kernel] Error 2 > ---------------------- > Uhm.. the compiler really shouldn't crash. Are you sure your hardware is OK? If the stock EL5.4 Xen also crashes, it could be broken hardware? Did you try running memtest86+ ? Is baremetal Linux stable, if you run for example "make -j8 bzImage && make -j8 modules && make clean" kernel build in a loop? -- Pasi