From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:41443)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <nick@bytemark.co.uk>) id 1TQG0A-0006tK-Rx
	for qemu-devel@nongnu.org; Mon, 22 Oct 2012 07:17:04 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <nick@bytemark.co.uk>) id 1TQG06-0004rh-SN
	for qemu-devel@nongnu.org; Mon, 22 Oct 2012 07:16:58 -0400
Received: from bacon.sh.bytemark.co.uk ([212.110.161.169]:35621)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <nick@bytemark.co.uk>) id 1TQG06-0004qv-MD
	for qemu-devel@nongnu.org; Mon, 22 Oct 2012 07:16:54 -0400
From: nick@bytemark.co.uk
Date: Mon, 22 Oct 2012 12:09:16 +0100
Message-Id: <cover.1350901963.git.nick@bytemark.co.uk>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.7.2.5"
Subject: [Qemu-devel] [PATCH 0/3] NBD reconnection behaviour
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.orgqemu-devel@nongnu.org
Cc: pbonzini@redhat.com, Nick Thomas <nick@bytemark.co.uk>

From: Nick Thomas <nick@bytemark.co.uk>

This is a multi-part message in MIME format.
--------------1.7.2.5
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Hi all,

This patchset is about making the NBD backend more useful. Currently,
when the NBD server disconnects, the block device in the guest becomes
unusable with no option to recover except to restart the QEMU process.
These patches introduce a reconnect timer that fires every five
seconds until we successfully reconnect.

I/O requests that are inflight when the disconnection occurs, or
requested while disconnected, are failed with an EIO - so the usual
werror/rerror settings apply in those circumstances.

All this means that, assuming you can get the NBD server up again,
only some I/O requests are failed, rather than all of them.

I've got a few more changes to make - specifically:

  * Allowing the reconnect timer delay to be configurable
  * Queuing and retrying I/O requests instead of EIO
  * Proactively killing the TCP connection if the server doesn't
    respond after a timeout.

The rationale for the second is that some guests remount discs r/o
if I/O requests fail (rather than apparently hang), which is a pain.
The third allows us to quickly detect if a TCP connection disappears
without a trace. 

However, I think these patches stand as an improvement on the curent
situation, and I'd rather like some feedback on the best way to do
the futher bits - assuming these patches get eventually accepted!


Nick Thomas (3):
  nbd: Only try to send flush/discard commands if connected to the NBD
    server
  nbd: Explicitly disconnect and fail inflight I/O requests on error,
    then reconnect next I/O request.
  nbd: Move reconnection attempts from each new I/O request to a
    5-second timer

 block/nbd.c |  117 +++++++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 93 insertions(+), 24 deletions(-)

-- 
1.7.2.5


--------------1.7.2.5--