From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brad Campbell Subject: Re: XP machine freeze Date: Tue, 31 Mar 2015 15:18:26 +0800 Message-ID: <551A4A42.10309@fnarfbargle.com> References: <009701d05ffb$5e37a740$1aa6f5c0$@astim.si> <550EE047.3030605@fnarfbargle.com> <5519BBF4.7080600@redhat.com> <5519EA01.4010102@fnarfbargle.com> <004f01d06b7c$08b96970$1a2c3c50$@astim.si> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit To: Saso Slavicic , 'Paolo Bonzini' , kvm@vger.kernel.org Return-path: Received: from ns3.fnarfbargle.com ([103.4.17.7]:44145 "EHLO ns3.fnarfbargle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750783AbbCaHSb (ORCPT ); Tue, 31 Mar 2015 03:18:31 -0400 In-Reply-To: <004f01d06b7c$08b96970$1a2c3c50$@astim.si> Sender: kvm-owner@vger.kernel.org List-ID: On 31/03/15 14:29, Saso Slavicic wrote: >> From: Brad Campbell >> Sent: Tuesday, March 31, 2015 2:28 AM >> >> >> If someone could give me some hard tests to do along the lines of what >> Saso is up to I could probably get that done faster. With the right >> bad kernel I can reproduce this lockup in a matter of hours. > Hi, > > My machine usually (but not always) locks during backup. At around 3AM, a > samba machine (a kvm machine on the same server actually) cifs mounts C$ and > starts copying files off of it. The last stacktrace also shows network code. > Is your machine actively working over network (sharing files)? > Better than that, it's recording h264 rtsp streams from 3 CCTV cameras, so there is a constant network load of about 1.5-2MB/s (bytes not bits). Come to think of it, out of the 3 XP VM's I have that are an identical config and actually come from the same qcow2 base image this is the one that hits the network hard. The other 2 hardly touch the network. virtio network interface. I can get it to lock up in hours with the right kernel, and repeat lockups after unlocking it with virt-viewer are usually less than an hour at most. My issue is my first bisect proved to be inconclusive, and the second one is about 3 steps from done, but there are no kvm commits in the current set under investigation. I *know* that 3.15.6 was good as I ran that kernel for months, it all started when I upgraded to a 3.18 and I think I've narrowed it down, but like I said the bisects are just not falling out as plausible, and at 5 days for a good and up to 24 hours for a bad it's slow going. I'll finish this bisect and then have a crack at the good/bad range suggested by Paolo. The issue is being a production box I have to schedule the re-boots. I'm just not sure bisection is the right answer to tracking this down. I just don't have the background to know what to poke to try and debug this any other way. Regards, Brad -- Dolphins are so intelligent that within a few weeks they can train Americans to stand at the edge of the pool and throw them fish.