From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH v2 2/6] reuse env stop and stopped states Date: Tue, 28 Jul 2009 16:45:46 +0300 Message-ID: <4A6F010A.1090004@redhat.com> References: <1248214392-12533-1-git-send-email-glommer@redhat.com> <1248214392-12533-2-git-send-email-glommer@redhat.com> <1248214392-12533-3-git-send-email-glommer@redhat.com> <4A6DCB33.5020008@redhat.com> <20090728004822.GQ4776@poweredge.glommer> <4A6E97E1.9050902@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Marcelo Tosatti To: Glauber Costa Return-path: Received: from mx2.redhat.com ([66.187.237.31]:42510 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754086AbZG1NlJ (ORCPT ); Tue, 28 Jul 2009 09:41:09 -0400 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n6SDf9Fe020550 for ; Tue, 28 Jul 2009 09:41:09 -0400 In-Reply-To: <4A6E97E1.9050902@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 07/28/2009 09:17 AM, Avi Kivity wrote: >> I found out that doing kill -38 makes it run again, so >> we're likely >> hanging somewhere while holding qemu_mutex. The state of the process >> is "D", >> so we're holding qemu_mutex, and then calling something that can block. > > Sounds like we call a vcpu ioctl from the iothread (or from a > different vcpu thread). That's indeed the case. We reload the local apic state from the iothread instead of the vcpu thread. Please write a patch to fix this. >> It's hard for me to believe that this patch introduced it. At best, >> it might have >> made it more likely. Also, I also verified that it sometimes takes a >> while until >> it happen for the first time. Are you sure this is the first patch >> that makes it happen? > > I haven't been able to reproduce it before this patch. Maybe this > patch doesn't introduce it, only exposes it. > It does. The root problem is that env->stopped is cleared during reset, so pause_all_threads() doesn't work: uint32_t stop; /* Stop request */ \ uint32_t stopped; /* Artificially stopped */ \ ... /* from this point: preserved by CPU reset */ \ This kind of bug is incredibly hard to find - you now owe Gleb a solar mass worth of beer. IMO we shouldn't be coding like this, please patch upstream to explicitly clear what needs clearing. I'm now testing the simple fix (moving the variables after the memset point). -- error compiling committee.c: too many arguments to function