From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44439) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V4AgQ-0000Jn-Kz for qemu-devel@nongnu.org; Tue, 30 Jul 2013 10:13:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V4AgH-0003Zw-QO for qemu-devel@nongnu.org; Tue, 30 Jul 2013 10:13:50 -0400 Received: from mail-wi0-x22e.google.com ([2a00:1450:400c:c05::22e]:40453) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V4AgH-0003Zq-LE for qemu-devel@nongnu.org; Tue, 30 Jul 2013 10:13:41 -0400 Received: by mail-wi0-f174.google.com with SMTP id j17so3800167wiw.7 for ; Tue, 30 Jul 2013 07:13:41 -0700 (PDT) Date: Tue, 30 Jul 2013 16:13:38 +0200 From: Stefan Hajnoczi Message-ID: <20130730141338.GD7471@stefanha-thinkpad.redhat.com> References: <1374819052-4292-1-git-send-email-morita.kazutaka@lab.ntt.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1374819052-4292-1-git-send-email-morita.kazutaka@lab.ntt.co.jp> Subject: Re: [Qemu-devel] [PATCH v4 00/10] sheepdog: reconnect server after connection failure List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: MORITA Kazutaka Cc: Kevin Wolf , sheepdog@lists.wpkg.org, nick@bytemark.co.uk, qemu-devel@nongnu.org, Stefan Hajnoczi , Liu Yuan , Paolo Bonzini On Fri, Jul 26, 2013 at 03:10:42PM +0900, MORITA Kazutaka wrote: > Currently, if a sheepdog server exits, all the connecting VMs need to > be restarted. This series implements a feature to reconnect the > server, and enables us to do online sheepdog upgrade and avoid > restarting VMs when sheepdog servers crash unexpectedly. > > v4: > - Added comment to explain why we need a failed queue. > - Fixed a return value of sd_acb_cancelable(). > > v3: > - Check return values of qemu_co_recv/send more strictly. > - Move inflight requests to the failed list after reconnection > completes. This is necessary to resend I/Os while connection is > lost. > - Check simultaneous create in resend_aioreq(). > > v2: > - Dropped nonblocking connect patches. > > MORITA Kazutaka (10): > ignore SIGPIPE in qemu-img and qemu-io > iov: handle EOF in iov_send_recv > sheepdog: check return values of qemu_co_recv/send correctly > sheepdog: handle vdi objects in resend_aio_req > sheepdog: reload inode outside of resend_aioreq > coroutine: add co_aio_sleep_ns() to allow sleep in block drivers > sheepdog: try to reconnect to sheepdog after network error > sheepdog: make add_aio_request and send_aioreq void functions > sheepdog: cancel aio requests if possible > sheepdog: check simultaneous create in resend_aioreq > > block/sheepdog.c | 320 +++++++++++++++++++++++++++++----------------- > include/block/coroutine.h | 8 ++ > qemu-coroutine-sleep.c | 47 +++++++ > qemu-img.c | 4 + > qemu-io.c | 4 + > util/iov.c | 6 + > 6 files changed, 269 insertions(+), 120 deletions(-) I have done a brief review. The biggest change that I suggest using the new AioContext timer support that Alex Bligh and Ping Fan are working on (see qemu-devel for the latest patches). It provides a way to use a timer during qemu_aio_wait() without spinning. CCed Nick Thomas who worked on NBD reconnect. Maybe your series will motivate him to push his patches again, or he might have some review suggestions for you.