From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Lieven Subject: Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop Date: Wed, 22 Aug 2012 14:52:10 +0200 Message-ID: <5034D5FA.70906@dlhnet.de> References: <4FEC56B2.6050502@dlhnet.de> <502E42E9.2020402@siemens.com> <502E56D3.6060607@siemens.com> <502E5800.5060609@siemens.com> <502E5D66.1060003@siemens.com> <5030B51E.3010704@redhat.com> <50333717.6050207@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Paolo Bonzini , Jan Kiszka , Avi Kivity , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" To: Stefan Hajnoczi Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org On 08/21/12 10:23, Stefan Hajnoczi wrote: > On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszka wrote: >> On 2012-08-19 11:42, Avi Kivity wrote: >>> On 08/17/2012 06:04 PM, Jan Kiszka wrote: >>>>>> Can anyone imagine that such a barrier may actually be required? If it >>>>>> is currently possible that env->stop is evaluated before we called into >>>>>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the >>>>>> signal without properly processing its reason (stop). >>>> Should not be required (TM): Both signal eating / stop checking and stop >>>> setting / signal generation happens under the BQL, thus the ordering >>>> must not make a difference here. >>> Agree. >>> >>> >>>> Don't see where we could lose a signal. Maybe due to a subtle memory >>>> corruption that sets thread_kicked to non-zero, preventing the kicking >>>> this way. >>> Cannot be ruled out, yet too much of a coincidence. >>> >>> Could be a kernel bug (either in kvm or elsewhere), we've had several >>> before in this area. >>> >>> Is this reproducible? >> Not for me. Peter only hit it very rarely, Peter obviously more easily. > I have only hit this once and was not able to reproduce it. For me it was very reproducible, but my issue was fixed by: http://www.mail-archive.com/kvm@vger.kernel.org/msg70908.html Never seen this since then, Peter > Stefan From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:40317) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T4AQ3-0006Ew-SW for qemu-devel@nongnu.org; Wed, 22 Aug 2012 08:52:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T4APx-0005x6-Jp for qemu-devel@nongnu.org; Wed, 22 Aug 2012 08:52:23 -0400 Received: from ssl.dlhnet.de ([91.198.192.8]:46823 helo=ssl.dlh.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T4APx-0005wb-EN for qemu-devel@nongnu.org; Wed, 22 Aug 2012 08:52:17 -0400 Message-ID: <5034D5FA.70906@dlhnet.de> Date: Wed, 22 Aug 2012 14:52:10 +0200 From: Peter Lieven MIME-Version: 1.0 References: <4FEC56B2.6050502@dlhnet.de> <502E42E9.2020402@siemens.com> <502E56D3.6060607@siemens.com> <502E5800.5060609@siemens.com> <502E5D66.1060003@siemens.com> <5030B51E.3010704@redhat.com> <50333717.6050207@siemens.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Paolo Bonzini , Jan Kiszka , Avi Kivity , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" On 08/21/12 10:23, Stefan Hajnoczi wrote: > On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszka wrote: >> On 2012-08-19 11:42, Avi Kivity wrote: >>> On 08/17/2012 06:04 PM, Jan Kiszka wrote: >>>>>> Can anyone imagine that such a barrier may actually be required? If it >>>>>> is currently possible that env->stop is evaluated before we called into >>>>>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the >>>>>> signal without properly processing its reason (stop). >>>> Should not be required (TM): Both signal eating / stop checking and stop >>>> setting / signal generation happens under the BQL, thus the ordering >>>> must not make a difference here. >>> Agree. >>> >>> >>>> Don't see where we could lose a signal. Maybe due to a subtle memory >>>> corruption that sets thread_kicked to non-zero, preventing the kicking >>>> this way. >>> Cannot be ruled out, yet too much of a coincidence. >>> >>> Could be a kernel bug (either in kvm or elsewhere), we've had several >>> before in this area. >>> >>> Is this reproducible? >> Not for me. Peter only hit it very rarely, Peter obviously more easily. > I have only hit this once and was not able to reproduce it. For me it was very reproducible, but my issue was fixed by: http://www.mail-archive.com/kvm@vger.kernel.org/msg70908.html Never seen this since then, Peter > Stefan