From mboxrd@z Thu Jan 1 00:00:00 1970 From: Birger =?ISO-8859-1?Q?T=F6dtmann?= Subject: Re: kernel oops/IRQ exception when networking between many domUs Date: Mon, 06 Jun 2005 10:52:25 +0200 Message-ID: <1118047945.1972.9.camel@lomin> References: <1117904746.7507.31.camel@lomin> <20050605165716.GA1231@exp-math.uni-essen.de> <49e83a846cc77d6605f4adc2c0f34858@cl.cam.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <49e83a846cc77d6605f4adc2c0f34858@cl.cam.ac.uk> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: xen-devel@lists.xensource.com, xen-users@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Am Montag, den 06.06.2005, 09:23 +0100 schrieb Keir Fraser: > On 5 Jun 2005, at 17:57, Birger Toedtmann wrote: >=20 > > Apparently it is happening somewhere here: > > > > [...] > > 0xc028cbe5 : test %eax,%eax > > 0xc028cbe7 : je 0xc028ca82=20 > > > > 0xc028cbed : mov %esi,%eax > > 0xc028cbef : shr $0xc,%eax > > 0xc028cbf2 : mov %eax,(%esp) > > 0xc028cbf5 : call 0xc028c4c4 > > 0xc028cbfa : mov $0xffffffff,%ecx > > ^^^^^^^^^^ >=20 > Most likely the driver has tried to send a bogus page to a domU.=20 > Because it's bogus the transfer fails. The driver then tries to free=20 > the page back to Xen, but that also fails because the page is bogus.=20 > This confuses the driver, which then BUG()s out. I commented out the free_mfn() and status=3D lines: the kernel now reports the following after it configured the 10th domU and ~80th vif, with approx. 20-25 bridges up. Just an idea: the number of vifs + bridges is somewhere around the magic 128 (NR_IRQS problem in 2.0.x!) when the crash happens - could this hint to something? [...] Jun 6 10:12:14 lomin kernel: 10.2.23.8: port 2(vif10.3) entering forwarding state Jun 6 10:12:14 lomin kernel: 10.2.35.16: topology change detected, propagating Jun 6 10:12:14 lomin kernel: 10.2.35.16: port 2(vif10.4) entering forwarding state Jun 6 10:12:14 lomin kernel: 10.2.35.20: topology change detected, propagating Jun 6 10:12:14 lomin kernel: 10.2.35.20: port 2(vif10.5) entering forwarding state Jun 6 10:12:20 lomin kernel: c014cea4 Jun 6 10:12:20 lomin kernel: [do_page_fault+643/1665] do_page_fault +0x469/0x738 Jun 6 10:12:20 lomin kernel: [] do_page_fault+0x469/0x738 Jun 6 10:12:20 lomin kernel: [fixup_4gb_segment+2/12] page_fault +0x2e/0x34 Jun 6 10:12:20 lomin kernel: [] page_fault+0x2e/0x34 Jun 6 10:12:20 lomin kernel: [do_page_fault+49/1665] do_page_fault +0x217/0x738 Jun 6 10:12:20 lomin kernel: [] do_page_fault+0x217/0x738 Jun 6 10:12:20 lomin kernel: [fixup_4gb_segment+2/12] page_fault +0x2e/0x34 Jun 6 10:12:20 lomin kernel: [] page_fault+0x2e/0x34 Jun 6 10:12:20 lomin kernel: PREEMPT Jun 6 10:12:20 lomin kernel: Modules linked in: dm_snapshot pcmcia bridge ipt_REJECT ipt_state iptable_filter ipt_MASQUERADE iptable_nat ip_conntrack ip_tables autofs4 snd_seq snd_seq_device evdev usbhid rfcomm l2cap bluetooth dm_mod cryptoloop snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore snd_page_alloc tun uhci_hcd usb_storage usbcore irtty_sir sir_dev ircomm_tty ircomm irda yenta_socket rsrc_nonstatic pcmcia_core 3c59x Jun 6 10:12:20 lomin kernel: CPU: 0 Jun 6 10:12:20 lomin kernel: EIP: 0061:[do_wp_page+622/1175] Not tainted VLI Jun 6 10:12:20 lomin kernel: EIP: 0061:[] Not tainted VLI Jun 6 10:12:20 lomin kernel: EFLAGS: 00010206 (2.6.11.11-xen0) Jun 6 10:12:20 lomin kernel: EIP is at handle_mm_fault+0x5d/0x222 Jun 6 10:12:20 lomin kernel: eax: 15555b18 ebx: d8788000 ecx: 00000b18 edx: 15555b18 Jun 6 10:12:20 lomin kernel: esi: dcfc3b4c edi: dcaf5580 ebp: d8789ee4 esp: d8789ebc Jun 6 10:12:20 lomin kernel: ds: 0069 es: 0069 ss: 0069 Jun 6 10:12:20 lomin kernel: Process python (pid: 4670, threadinfo=3Dd8788000 task=3Dde1a1520) Jun 6 10:12:20 lomin kernel: Stack: 00000040 00000001 d40e687c d40e6874 00000006 d40e685c d8789f14 dcaf5580 Jun 6 10:12:20 lomin kernel: dcaf55ac d40e6b1c d8789fbc c01154ce dcaf5580 d40e6b1c b4ec6ff0 00000001 Jun 6 10:12:20 lomin kernel: 00000001 de1a1520 b4ec6ff0 00000006 d8789fc4 d8789fc4 c03405b0 00000006 Jun 6 10:12:20 lomin kernel: Call Trace: Jun 6 10:12:20 lomin kernel: [dump_stack+16/32] show_stack+0x80/0x96 Jun 6 10:12:20 lomin kernel: [] show_stack+0x80/0x96 Jun 6 10:12:20 lomin kernel: [show_registers+384/457] show_registers +0x15a/0x1d1 Jun 6 10:12:20 lomin kernel: [] show_registers+0x15a/0x1d1 Jun 6 10:12:20 lomin kernel: [die+301/458] die+0x106/0x1c4 Jun 6 10:12:20 lomin kernel: [] die+0x106/0x1c4 Jun 6 10:12:20 lomin kernel: [do_page_fault+675/1665] do_page_fault +0x489/0x738 Jun 6 10:12:20 lomin kernel: [] do_page_fault+0x489/0x738 Jun 6 10:12:20 lomin kernel: [fixup_4gb_segment+2/12] page_fault +0x2e/0x34 Jun 6 10:12:20 lomin kernel: [] page_fault+0x2e/0x34 Jun 6 10:12:20 lomin kernel: [do_page_fault+49/1665] do_page_fault +0x217/0x738 Jun 6 10:12:20 lomin kernel: [] do_page_fault+0x217/0x738 Jun 6 10:12:20 lomin kernel: [fixup_4gb_segment+2/12] page_fault +0x2e/0x34 Jun 6 10:12:20 lomin kernel: [] page_fault+0x2e/0x34 Jun 6 10:12:20 lomin kernel: Code: 8b 47 1c c1 ea 16 83 43 14 01 8d 34 90 85 f6 0f 84 52 01 00 00 89 f2 8b 4d 10 89 f8 e8 4a d1 ff ff 85 c0 89 c2 0f 84 3c 01 00 00 <8b> 00 a8 81 75 3d 85 c0 0f 84 01 01 00 00 a8 40 0f 84 a4 00 00 >=20 > It's not at all clear where the bogus address comes from: the driver=20 > basically just reads the address out of an skbuff, and converts it from=20 > virtual to physical address. But something is obviously going wrong,=20 > perhaps under memory pressure. :-( Where, within the domUs or dom0? The latter has lots of memory at hand, the domU are quite strapped of memory. I'll try to find out... Regards, --=20 Birger T=F6dtmann Technik der Rechnernetze, Institut f=FCr Experimentelle Mathematik Universit=E4t Duisburg-Essen, Campus Essen email:btoedtmann@iem.uni-due.de skype:birger.toedtmann pgp:0x6FB166C9