From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bruce Edge Subject: Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre Date: Mon, 27 Sep 2010 07:32:30 -0700 Message-ID: References: <4C9DE72E.1000006@hfp.de> <4CA0A8AF.6010908@hfp.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1573531834==" Return-path: In-Reply-To: <4CA0A8AF.6010908@hfp.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andreas Kinzler Cc: xen-devel@lists.xensource.com, xen-users@lists.xensource.com List-Id: xen-devel@lists.xenproject.org --===============1573531834== Content-Type: multipart/alternative; boundary=001485f6db2c3523c004913e99bf --001485f6db2c3523c004913e99bf Content-Type: text/plain; charset=ISO-8859-1 On Mon, Sep 27, 2010 at 7:22 AM, Andreas Kinzler wrote: > On 27.09.2010 16:06, Bruce Edge wrote: > >> I saw reproducible hangs in dom0 when the system is under heavy load. >>>> four dom0s share a nfs server for domU images. a total number of 24 >>>> domUs >>>> (6 >>>> domUs on each dom0). When the system under heavy load, busy processing >>>> e-commerce requests, one or two of the dom0s hanged. no input can be >>>> accepted and reboot is necessary. >>>> Anyone had the same experience? The causes I can come up are following: >>>> >>> Please post your hardware (mainboard, chipset, CPU, RAID controller). >>> I have found a severe problem on Lynnfield systems. >>> >> Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel >> is >> >> causing grief for us too. I was wondering if this was related. >> > > I am still researching this. For testing I bought a test system with > Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while > Intel still lists it as having the C6 errata. This leads me to the > conclusion that the HPET timer migration code (called HPET broadcast) from > Xen is the root cause. This affects all CPUs that use it - but mainly > Nehalem because of turbo mode. > > Regards Andreas > Andreas, Thanks for the info. I'll try disabling turbo mode in the BIOS and see if that helps. Let me know if there's anything I can run/do/test/etc. -Bruce --001485f6db2c3523c004913e99bf Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Mon, Sep 27, 2010 at 7:22 AM, Andreas= Kinzler <ml-xe= n-users@hfp.de> wrote:
On 27.09.2010 16:06, Bruce Edge wrote:
I saw reproducible hangs in dom0 when the system is under heavy load.
four dom0s share a nfs server for domU images. a total number of 24 domUs (6
domUs on each dom0). When the system under heavy load, busy processing
e-commerce requests, one or two of the dom0s hanged. no input can be
accepted and reboot is necessary.
Anyone had the same experience? The causes I can come up are following:
Please post your hardware (mainboard, chipset, CPU, RAID controller).
I have found a severe problem on Lynnfield systems.
Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel i= s

causing grief for us too. =A0I was wondering if this was related.

I am still researching this. For testing I bought a test system with Westme= re-EP (Xeon E5620) which has ARAT. This system worked stable while Intel st= ill lists it as having the C6 errata. This leads me to the conclusion that = the HPET timer migration code (called HPET broadcast) from Xen is the root = cause. This affects all CPUs that use it - but mainly Nehalem because of tu= rbo mode.

Regards Andreas

Andreas,
Thanks for the info. I'l= l try disabling turbo mode in the BIOS and see if that helps.
Let= me know if there's anything I can run/do/test/etc.

-Bruce
--001485f6db2c3523c004913e99bf-- --===============1573531834== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============1573531834==--