From mboxrd@z Thu Jan 1 00:00:00 1970 From: Birger =?ISO-8859-1?Q?T=F6dtmann?= Subject: Re: kernel oops/IRQ exception when networking between many domUs Date: Mon, 06 Jun 2005 14:30:10 +0200 Message-ID: <1118061010.7357.10.camel@lomin> References: <1117904746.7507.31.camel@lomin> <20050605165716.GA1231@exp-math.uni-essen.de> <49e83a846cc77d6605f4adc2c0f34858@cl.cam.ac.uk> <1118047945.1972.9.camel@lomin> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: xen-devel@lists.xensource.com, xen-users@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Am Montag, den 06.06.2005, 10:26 +0100 schrieb Keir Fraser: [...] > > somewhere around the magic 128 (NR_IRQS problem in 2.0.x!) when the > > crash happens - could this hint to something? >=20 > The crashes you see with free_mfn removed will be impossible to debug=20 > -- things are very screwed by that point. Even the crash within=20 > free_mfn might be far removed from the cause of the crash, if it's due=20 > to memory corruption. >=20 > It's perhaps worth investigating what critical limit you might be=20 > hitting, and what resource it is that's limited. e.g., can you can=20 > create a few vifs, but connected together by some very large number of=20 > bridges (daisy chained together)? Or can you create a large number of=20 > vifs if they are connected together by just one bridge? This is getting really weird - as I found out I'll enounter problems with far fewer vifs/bridges that suspected. I just fired up a network with 7 nodes, all with four interfaces each connected to the same four bridge interfaces. The nodes can ping through the network, however after a short time, the system (dom0) crashes as well. This time, it dies in net_rx_action() at a slightly different place: [...] [] kfree_skbmem+0x12/0x29 [] __kfree_skb+0xa5/0x13f [] net_rx_action+0x23d/0x4df [...] Funnily, I cannot reproduce this with 5 nodes (domUs) running. I'm a bit unsure where to go from here... Maybe I should try a different machine for further testing. Regards --=20 Birger T=F6dtmann Technik der Rechnernetze, Institut f=FCr Experimentelle Mathematik Universit=E4t Duisburg-Essen, Campus Essen email:btoedtmann@iem.uni-due.de skype:birger.toedtmann pgp:0x6FB166C9