From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bruce Edge <bruce.edge@gmail.com>
Subject: Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
Date: Mon, 27 Sep 2010 07:32:30 -0700
Message-ID: <AANLkTi=zOvevpMMX1oQCfWQomzvSnrv+qhapx-TNtk3H@mail.gmail.com>
References: <AANLkTimPVj-AXyR8DuQRxuAwcFwHm0sVkgiXvkA1+f7-@mail.gmail.com>
	<4C9DE72E.1000006@hfp.de>
	<AANLkTi=jxHQp3_GDML9JcoYNNkGTGLR3_okBspWnFdfC@mail.gmail.com>
	<4CA0A8AF.6010908@hfp.de>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1573531834=="
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <4CA0A8AF.6010908@hfp.de>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Andreas Kinzler <ml-xen-users@hfp.de>
Cc: xen-devel@lists.xensource.com, xen-users@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

--===============1573531834==
Content-Type: multipart/alternative; boundary=001485f6db2c3523c004913e99bf

--001485f6db2c3523c004913e99bf
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Sep 27, 2010 at 7:22 AM, Andreas Kinzler <ml-xen-users@hfp.de>wrote:

> On 27.09.2010 16:06, Bruce Edge wrote:
>
>> I saw reproducible hangs in dom0 when the system is under heavy load.
>>>> four dom0s share a nfs server for domU images. a total number of 24
>>>> domUs
>>>> (6
>>>> domUs on each dom0). When the system under heavy load, busy processing
>>>> e-commerce requests, one or two of the dom0s hanged. no input can be
>>>> accepted and reboot is necessary.
>>>> Anyone had the same experience? The causes I can come up are following:
>>>>
>>> Please post your hardware (mainboard, chipset, CPU, RAID controller).
>>> I have found a severe problem on Lynnfield systems.
>>>
>> Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel
>> is
>>
>> causing grief for us too.  I was wondering if this was related.
>>
>
> I am still researching this. For testing I bought a test system with
> Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while
> Intel still lists it as having the C6 errata. This leads me to the
> conclusion that the HPET timer migration code (called HPET broadcast) from
> Xen is the root cause. This affects all CPUs that use it - but mainly
> Nehalem because of turbo mode.
>
> Regards Andreas
>

Andreas,
Thanks for the info. I'll try disabling turbo mode in the BIOS and see if
that helps.
Let me know if there's anything I can run/do/test/etc.

-Bruce

--001485f6db2c3523c004913e99bf
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">On Mon, Sep 27, 2010 at 7:22 AM, Andreas=
 Kinzler <span dir=3D"ltr">&lt;<a href=3D"mailto:ml-xen-users@hfp.de">ml-xe=
n-users@hfp.de</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class=3D"im">On 27.09.2010 16:06, Bruce Edge wrote:<br>
</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex"><div class=3D"im"><blockquote class=3D=
"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding=
-left:1ex">
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
I saw reproducible hangs in dom0 when the system is under heavy load.<br>
four dom0s share a nfs server for domU images. a total number of 24 domUs<b=
r>
(6<br>
domUs on each dom0). When the system under heavy load, busy processing<br>
e-commerce requests, one or two of the dom0s hanged. no input can be<br>
accepted and reboot is necessary.<br>
Anyone had the same experience? The causes I can come up are following:<br>
</blockquote>
Please post your hardware (mainboard, chipset, CPU, RAID controller).<br>
I have found a severe problem on Lynnfield systems.<br>
</blockquote></div>
Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel i=
s<div class=3D"im"><br>
causing grief for us too. =A0I was wondering if this was related.<br>
</div></blockquote>
<br>
I am still researching this. For testing I bought a test system with Westme=
re-EP (Xeon E5620) which has ARAT. This system worked stable while Intel st=
ill lists it as having the C6 errata. This leads me to the conclusion that =
the HPET timer migration code (called HPET broadcast) from Xen is the root =
cause. This affects all CPUs that use it - but mainly Nehalem because of tu=
rbo mode.<br>

<br>
Regards Andreas<br>
</blockquote></div><br><div>Andreas,</div><div>Thanks for the info. I&#39;l=
l try disabling turbo mode in the BIOS and see if that helps.</div><div>Let=
 me know if there&#39;s anything I can run/do/test/etc.</div><div><br>
</div><div>-Bruce</div>

--001485f6db2c3523c004913e99bf--


--===============1573531834==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

--===============1573531834==--