From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dor Laor Subject: Re: Fix migration issue when the destination is loaded Date: Thu, 16 Jul 2009 15:00:47 +0300 Message-ID: <4A5F166F.2000802@redhat.com> References: <4A5DEFB3.4060702@redhat.com> <1247737179.3038.21.camel@blaa> Reply-To: dlaor@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm-devel To: Mark McLoughlin Return-path: Received: from mx2.redhat.com ([66.187.237.31]:59152 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750958AbZGPMAZ (ORCPT ); Thu, 16 Jul 2009 08:00:25 -0400 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n6GC0Qxf024132 for ; Thu, 16 Jul 2009 08:00:26 -0400 In-Reply-To: <1247737179.3038.21.camel@blaa> Sender: kvm-owner@vger.kernel.org List-ID: On 07/16/2009 12:39 PM, Mark McLoughlin wrote: > Hi Dor, > > On Wed, 2009-07-15 at 18:03 +0300, Dor Laor wrote: >> If the migration socket is full, we get EAGAIN for the write. >> The set_fd_handler2 defers the write for later on. The function >> tries to wake up the iothread by qemu_kvm_notify_work. >> Since this happens in a loop, multiple times, the pipe that emulates >> eventfd becomes full and we get a deadlock. > > I'm not sure I follow: > > - You're seeing qemu_kvm_notify_work() being called many times > > - The call chain is migrate_fd_put_buffer(), qemu_set_fd_handler2(), > main_loop_break() > > - This happens when write() in migrate_fd_put_buffer() returns EAGAIN > because the socket buffer has filled up > > Correct? > > That sounds like migrate_fd_put_buffer() is being called repeatedly > while we know the socket isn't writable? > > Shouldn't the buffered file could stop attempting to call put_buffer() > until it has been notified that the underlying fd is writable? There are two fds here: The one returning EAGAIN, is the migration socket. That's why migrate_fd_put_buffer is called and a call back is rescheduled by qemu_set_fd_handler2. In the procedure of qemu_set_fd_handler2, the main_loop_break is called. It needs to notify the iothread. It does is by qemu_eventfd, since it is being emulated on older kernels, we use a pipe. The pipe fd is the one that blocks. I though of setting it to non-blocking but we might get partial writes with a non blocking fd (in theory) too. > > Cheers, > Mark. >