From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Vrabel Subject: Re: RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb Date: Thu, 19 Jun 2014 15:35:11 +0100 Message-ID: <53A2F51F.5010905@citrix.com> References: <5391976F.8020800@univention.de> <20140606105804.GD11959@zion.uk.xensource.com> <53923CD0.7010001@univention.de> <53A1C2DF.10407@univention.de> <20140619141252.GO20819@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WxdQq-0000RD-K2 for xen-devel@lists.xenproject.org; Thu, 19 Jun 2014 14:35:16 +0000 In-Reply-To: <20140619141252.GO20819@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu , Philipp Hahn Cc: xen-devel , Erik Damrose , Ian Campbell , Zoltan Kiss List-Id: xen-devel@lists.xenproject.org On 19/06/14 15:12, Wei Liu wrote: > On Wed, Jun 18, 2014 at 06:48:31PM +0200, Philipp Hahn wrote: > [...] >> >> (gdb) list *(xen_netbk_rx_action+0x18b) >> 0xffffffffa04287dc is in xen_netbk_rx_action >> (/var/build/temp/tmp.hW3dNilayw/pbuilder/linux-3.10.11/drivers/net/xen-netback/netback >> .c:611). >> 606 meta->gso_size = skb_shinfo(skb)->gso_size; >> 607 else >> 608 meta->gso_size = 0; >> 609 >> 610 meta->size = 0; >> 611 meta->id = req->id; >> 612 npo->copy_off = 0; >> 613 npo->copy_gref = req->gref; >> 614 >> 615 data = skb->data; >> >> >> After more debugging today I think something like this happens: >> >> 1. The VM is receiving packets through bonding + bridge + netback + >> netfront. >> >> 2. For some unknown reason at least one packet remains in the rx queue >> and is not delivered to the domU immediately by netback. >> >> 3. The VM finishes shutting down. >> >> 4. The shared ring between dom0 and domU is freed. >> >> 5. then xen-netback continues processing the pending requests and tries >> to put the packet into the now already released shared ring. >> >> >> >From reading the attached disassembly I guess, that >> AX = &meta >> CX = &rx->string >> DX =~ rx.req_cons >> CR2 = &req->id >> where >> CX + DX * sizeof(union struct xen_netif_rx_{request,response})=8 = CR2 >> >> >> Any additional ideas or insight is appreciated. >> > > I think your analysis makes sense. Netback does have it's internal queue > and kthread can certainly be scheduled away. There doesn't seem to be a > synchronisation point between a vif getting disconnet and internal queue > gets processed. I attach a quick hack. If it does work to a degree then > we can try to work out a proper fix. The kthread_stop() in xenvif_disconnect() waits for the kthread to exit so I don't see how Philipp's analysis can be right. David