From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU Date: Wed, 28 Apr 2010 04:53:54 +0100 Message-ID: References: <4BD7DA02.3030107@cs.ucsd.edu> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <4BD7DA02.3030107@cs.ucsd.edu> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: John McCullough Cc: Ian Campbell , Jeremy Fitzhardinge , "xen-devel@lists.xensource.com" , Yuvraj Agarwal List-Id: xen-devel@lists.xenproject.org Cc'ing Ian and Jeremy -- one of them should be able to answer definitively on this. I'm *pretty* sure there's an easy way to bump the irq limit you're hitting, on the pv_ops kernels, but 'nr_irqs=3D' on dom0 cmdline clearly isn'= t it as it had no effect at all! -- Keir On 28/04/2010 07:47, "John McCullough" wrote: > I did a little testing. >=20 > With no kernel option: > # dmesg | grep -i nr_irqs > [ 0.000000] nr_irqs_gsi: 88 > [ 0.000000] NR_IRQS:4352 nr_irqs:256 >=20 > w/nr_irqs=3D65536: > # dmesg | grep -i nr_irqs > [ 0.000000] Command line: root=3D/dev/sda1 ro quiet console=3Dhvc0 > nr_irqs=3D65536 > [ 0.000000] nr_irqs_gsi: 88 > [ 0.000000] Kernel command line: root=3D/dev/sda1 ro quiet console=3Dhvc0 > nr_irqs=3D65536 > [ 0.000000] NR_IRQS:4352 nr_irqs:256 >=20 > tweaking the NR_IRQS macro in the kernel will change the NR_IRQS output, > but unfortunately that doesn't change nr_irqs and I run into the same > limit (36 domus on a less-beefy dual core machine). >=20 > I did find this: > http://blogs.sun.com/fvdl/entry/a_million_vms > which references NR_DYNIRQS, which is in 2.6.18, but not in the pvops > kernel. >=20 > Watching /proc/interrupts, the domain irqs seem to be getting allocated > from 248 downward until they hit some other limit: > ... > 64: 59104 xen-pirq-ioapic-level ioc0 > 89: 1 xen-dyn-event evtchn:xenconsoled > 90: 1 xen-dyn-event evtchn:xenstored > 91: 6 xen-dyn-event vif36.0 > 92: 140 xen-dyn-event blkif-backend > 93: 97 xen-dyn-event evtchn:xenconsoled > 94: 139 xen-dyn-event evtchn:xenstored > 95: 7 xen-dyn-event vif35.0 > 96: 301 xen-dyn-event blkif-backend > 97: 261 xen-dyn-event evtchn:xenconsoled > 98: 145 xen-dyn-event evtchn:xenstored > 99: 7 xen-dyn-event vif34.0 > ... > Perhaps the xen irqs are getting allocated out of the nr_irqs pool, > while they could be allocated from the NR_IRQS pool? >=20 > -John >=20 >=20 >=20 >=20 > On 04/27/2010 08:45 PM, Keir Fraser wrote: >> I think nr_irqs is specifiable on the command line on newer kernels. You= may >> be able to do nr_irqs=3D65536 as a kernel boot parameter, or something lik= e >> that, without needing to rebuild the kernel. >>=20 >> -- Keir >>=20 >> On 28/04/2010 02:02, "Yuvraj Agarwal" wrote: >>=20 >> =20 >>> Actually, I did identify the problem (don=E2=80=99t know the fix) at least fr= om >>> the console logs. Its related to running out of nr_irq's (attached JPG >>> for the console log). >>>=20 >>>=20 >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xensource.com >>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >>> Sent: Tuesday, April 27, 2010 5:44 PM >>> To: Yuvraj Agarwal; xen-devel@lists.xensource.com >>> Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system cras= hes >>> on starting 155th domU >>>=20 >>> On 27/04/2010 08:41, "Yuvraj Agarwal" wrote: >>>=20 >>> =20 >>>> Attached is the output of /var/log/daemon.log and /var/log/xen/xend.lo= g, >>>> =20 >>> but >>> =20 >>>> as far as we can see we don=C2=B9t quite know what might be going causing = the >>>> system to crash (no console access anymore and system becomes >>>> =20 >>> unresponsive and >>> =20 >>>> needs to be power-cycled). I have pasted only the relevant bits of >>>> information (the last domU that did successfully start and the next on= e >>>> =20 >>> that >>> =20 >>>> failed). It may be the case that all the log messages weren=C2=B9t flushed >>>> =20 >>> before >>> =20 >>>> the system crashed=C5=A0 >>>>=20 >>>> Does anyone know where this limit of 155 domU is coming from and how w= e >>>> =20 >>> can >>> =20 >>>> fix/increase it? >>>> =20 >>> Get a serial line on a test box, and capture Xen logging output on it. = You >>> can both see if any crash messages come from Xen when the 155th domain = is >>> created, and also try the serial debug keys (e.g., try 'h' to get help = to >>> start with) to see whether Xen itself is still alive. >>>=20 >>> -- Keir >>>=20 >>>=20 >>>=20 >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> =20 >>=20 >>=20 >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> =20 >=20