From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=43277 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OOFP4-0003ui-30 for qemu-devel@nongnu.org; Mon, 14 Jun 2010 15:33:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OOFP2-0003En-Pm for qemu-devel@nongnu.org; Mon, 14 Jun 2010 15:33:01 -0400 Received: from mail-gw0-f45.google.com ([74.125.83.45]:54986) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OOFP2-0003Eg-MG for qemu-devel@nongnu.org; Mon, 14 Jun 2010 15:33:00 -0400 Received: by gwaa20 with SMTP id a20so1107749gwa.4 for ; Mon, 14 Jun 2010 12:33:00 -0700 (PDT) Message-ID: <4C1683EC.3010609@codemonkey.ws> Date: Mon, 14 Jun 2010 14:33:00 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] stop cpus before forking. References: <1276543644-32689-1-git-send-email-glommer@redhat.com> In-Reply-To: <1276543644-32689-1-git-send-email-glommer@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Glauber Costa Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, avi@redhat.com On 06/14/2010 02:27 PM, Glauber Costa wrote: > This patch fixes a bug that happens with kvm, irqchip-in-kernel, > while adding a netdev. Despite the situations of reproduction being > specific to kvm, I believe this fix is pretty generic, and fits here. > Specially if we ever want to have our own irqchip in kernel too. > > The problem happens after the fork system call, and although it is not > 100 % reproduceable, happens pretty often. After fork, the memory where > the apic is mapped is present in both processes. It ends up confusing > the vcpus somewhere in the irq<-> ack path, and qemu hangs, with no > irqs being delivered at all from that point on. > > Making sure the vcpus are stopped before forking makes the problem go > away. Besides, this is a pretty unfrequent operation, which already hangs > the io-thread for a while. So it should not hurt performance. > > Signed-off-by: Glauber Costa > This doesn't make very much sense to me but smells like a kernel bug to me. Even if it isn't, I can't rationalize why stopping the vm like this is enough to fix such a problem. Is the problem that the KVM VCPU threads get duplicated while potentially running or something like that? Regards, Anthony Liguori > --- > net/tap.c | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/net/tap.c b/net/tap.c > index 0147dab..f34dd9c 100644 > --- a/net/tap.c > +++ b/net/tap.c > @@ -330,6 +330,9 @@ static int launch_script(const char *setup_script, const char *ifname, int fd) > sigaddset(&mask, SIGCHLD); > sigprocmask(SIG_BLOCK,&mask,&oldmask); > > + /* make sure no cpus are running, so the apic does not > + * get confused */ > + vm_stop(0); > /* try to launch network script */ > pid = fork(); > if (pid == 0) { > @@ -350,6 +353,7 @@ static int launch_script(const char *setup_script, const char *ifname, int fd) > execv(setup_script, args); > _exit(1); > } else if (pid> 0) { > + vm_start(); > while (waitpid(pid,&status, 0) != pid) { > /* loop */ > } >