From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark McLoughlin Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal Date: Wed, 16 Jul 2008 16:33:30 +0100 Message-ID: <1216222410.29458.13.camel@muff> References: <1216219846-663-1-git-send-email-markmc@redhat.com> Reply-To: Mark McLoughlin Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit To: kvm Return-path: Received: from mx1.redhat.com ([66.187.233.31]:43059 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755659AbYGPPdc (ORCPT ); Wed, 16 Jul 2008 11:33:32 -0400 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id m6GFXVaT026399 for ; Wed, 16 Jul 2008 11:33:31 -0400 Received: from mail.boston.redhat.com (mail.boston.redhat.com [10.16.255.12]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id m6GFXVRu009582 for ; Wed, 16 Jul 2008 11:33:31 -0400 Received: from [127.0.0.1] (sebastian-int.corp.redhat.com [172.16.52.221]) by mail.boston.redhat.com (8.13.1/8.13.1) with ESMTP id m6GFXUSV014782 for ; Wed, 16 Jul 2008 11:33:30 -0400 In-Reply-To: <1216219846-663-1-git-send-email-markmc@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Wed, 2008-07-16 at 15:50 +0100, Mark McLoughlin wrote: > The race was observed with a modified kvm-userspace when > running a guest under heavy network load. When it occurs, > KVM never sees another SIGALRM signal because although > the signal is queued up the appropriate bit is never set > in the pending mask. Manually sending the process a SIGALRM > kicks it out of this state. I should clarify what I mean by "modified kvm-userspace". Basically, I was trying out a suggestion of Marcelo's to drop the global qemu mutex when reading GSO packets from a tap device i.e. @@ -4299,7 +4299,9 @@ static void tap_send(void *opaque) sbuf.buf = s->buf; s->size = getmsg(s->fd, NULL, &sbuf, &f) >=0 ? sbuf.len : -1; #else + kvm_mutex_unlock(); s->size = read(s->fd, s->buf, sizeof(s->buf)); + kvm_mutex_lock(); It seems to work fine, but more on that later ... important thing is that if people see a hard-to-reproduce condition where things seem to slow down or lock up, try manually doing a "kill -ALRM $(qemu)" and if that fixes it, then you're probably seeing this bug. Cheers, Mark.