From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:41443) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TQG0A-0006tK-Rx for qemu-devel@nongnu.org; Mon, 22 Oct 2012 07:17:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TQG06-0004rh-SN for qemu-devel@nongnu.org; Mon, 22 Oct 2012 07:16:58 -0400 Received: from bacon.sh.bytemark.co.uk ([212.110.161.169]:35621) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TQG06-0004qv-MD for qemu-devel@nongnu.org; Mon, 22 Oct 2012 07:16:54 -0400 From: nick@bytemark.co.uk Date: Mon, 22 Oct 2012 12:09:16 +0100 Message-Id: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------1.7.2.5" Subject: [Qemu-devel] [PATCH 0/3] NBD reconnection behaviour List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.orgqemu-devel@nongnu.org Cc: pbonzini@redhat.com, Nick Thomas From: Nick Thomas This is a multi-part message in MIME format. --------------1.7.2.5 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit Hi all, This patchset is about making the NBD backend more useful. Currently, when the NBD server disconnects, the block device in the guest becomes unusable with no option to recover except to restart the QEMU process. These patches introduce a reconnect timer that fires every five seconds until we successfully reconnect. I/O requests that are inflight when the disconnection occurs, or requested while disconnected, are failed with an EIO - so the usual werror/rerror settings apply in those circumstances. All this means that, assuming you can get the NBD server up again, only some I/O requests are failed, rather than all of them. I've got a few more changes to make - specifically: * Allowing the reconnect timer delay to be configurable * Queuing and retrying I/O requests instead of EIO * Proactively killing the TCP connection if the server doesn't respond after a timeout. The rationale for the second is that some guests remount discs r/o if I/O requests fail (rather than apparently hang), which is a pain. The third allows us to quickly detect if a TCP connection disappears without a trace. However, I think these patches stand as an improvement on the curent situation, and I'd rather like some feedback on the best way to do the futher bits - assuming these patches get eventually accepted! Nick Thomas (3): nbd: Only try to send flush/discard commands if connected to the NBD server nbd: Explicitly disconnect and fail inflight I/O requests on error, then reconnect next I/O request. nbd: Move reconnection attempts from each new I/O request to a 5-second timer block/nbd.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++------------ 1 files changed, 93 insertions(+), 24 deletions(-) -- 1.7.2.5 --------------1.7.2.5--