From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zoltan Kiss Subject: Re: Trying to unmap invalid handle! pending_idx: @ drivers/net/xen-netback/netback.c:998 causes kernel panic/reboot Date: Mon, 21 Jul 2014 11:24:49 +0100 Message-ID: <53CCEA71.4080107@citrix.com> References: <53C33FB2.2000401@ezit.hu> <53C3C4FF.7050204@citrix.com> <53C3C995.3070204@ezit.hu> <53C82157.9080007@citrix.com> <53CBFA65.9000209@ezit.hu> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <53CBFA65.9000209@ezit.hu> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Armin Zentai , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 20/07/14 18:20, Armin Zentai wrote: > Hi! > > On 17/07/14 21:17, Zoltan Kiss wrote: >> Hi, >> >> I've just submitted 2 patch series to xen-devel, one is for the net tree >> (with multiqueue) and one for 3.15. I believe the first two patch in the >> series should address this problem, the another two is for related >> problems. Armin, can you test all four of them together? >> >> Thanks, >> >> Zoltan >> >> p.s.: if you have the test setup running with increased loglevel, it >> would be good to see the result of that, as it can confirm that it is >> the problem I've solved in those patches > > I'll compile the patches at next week, and I'll post the test results. > > It's a live service, not a developers' playground, so its make the > testing harder. We've moved back to 3.10.43-11 from the Xen4CentOS > repository, instead of using a self-compiled kernel. The problem has > been solved (I think), no crash over 5 days. > > BEfore we're moving back the problematic VMs to the old kernel, we've > cloned 2 of them, and running them on a HV, that runs the 3.15 kernel, > with debug logging enabled. These VMs produced only one HV crash after > the cloning, but we had no usable logs. It happened at night, and we > were contiously getting very high load alerts from the HV, and after > some minutes we've lost the connection to it, and to every VM on it. > We've check it via KVM, and we found it continously posting the Draining > TX queue message for all vifs. We only can do a hard reset on it, the > logs were remain intact, no releated log entry have found in them. The > netconsole module remained useless too, no messages has been forwarded > after the _crash_. That seems to be a different issue, might be not even netback related. The "Draining ..." messages means netback is still alive and purging the VMs queue out as it doesn't empty its ring at all. This usually means the guest OS is crashed in some way but haven't rebooted, so the domain is still there. Can you check the logs from the VM during the crash time? If something similar happens next time, I recommend to check top/xentop, and the console of the guest. > > So I'll apply your patches, and wait a week or two. >