qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	vsementsov@virtuozzo.com, qemu-devel@nongnu.org,
	qemu-block@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
Date: Thu, 2 Apr 2020 15:04:26 +0100	[thread overview]
Message-ID: <20200402140426.GJ3888@redhat.com> (raw)
In-Reply-To: <6b66952d-24a4-593c-2160-8c2877e42f49@redhat.com>

On Thu, Apr 02, 2020 at 08:41:31AM -0500, Eric Blake wrote:
> On 4/2/20 3:38 AM, Richard W.M. Jones wrote:
> >For the case I care about (long running virt-v2v conversions with an
> >intermittent network) we don't expect that nbdkit will be killed nor
> >gracefully shut down.  Instead what we expect is that nbdkit returns
> >an error such as NBD_EIO and keeps running.
> >
> >Indeed if nbdkit actually dies then reconnecting will not help since
> >there will be no server to reconnect to.

To put this in context for other people reading, virt-v2v uses this
sort of situation:

<pre>
                          +---------- same machine ----------+
                          |                                  |
  +------------+            +----------+        +----------+
  | remote     |            | nbdkit   |        | qemu-img |
  | VMware     |----------->| + VDDK   |------->| convert  |--> output
  | server     |            |          |        |          |
  +------------+            +----------+        +----------+
             VMware proprietary      NBD over Unix skt
             protocol over TCP
</pre>

The problem being addressed is that the whole task can run for many
hours, and a single interruption in the network between virt-v2v and
the remote VMware server can cause the entire process to fail.
nbdkit-retry-filter[0] attempts to address the problem by allowing the
VMware side of the protocol to be restarted without qemu-img seeing
any interruption (nor any error) on the NBD connection.

[0] http://libguestfs.org/nbdkit-retry-filter.1.html

> Hmm.  The idea of reconnect-delay in qemu is that if the connection
> to the server is dropped, we try to reconnect and then retry the I/O
> operation.  Maybe what we want is an nbdkit filter which turns EIO
> errors from the v2v plugin into forced server connection drops, but
> leave nbdkit up and running to allow the next client to connect.

Note that of the three nbdkit plugins we currently use (vddk[1], curl
and ssh) at least two of them have the property that closing and
reopening the plugin handle (which is what nbdkit-retry-filter does)
reconnects to the remote server.  To take nbdkit-ssh-plugin as a
specific example[2], the .open callback calls ssh_connect() and the
.close callback calls ssh_disconnect().  VDDK works the same way.  I'm
a bit unclear on nbdkit-curl-plugin because IIRC underlying HTTPS
connections may be managed in a pool inside Curl.

[1] All in this file, starting here:
https://github.com/libguestfs/virt-v2v/blob/8cf4488d6bcde8dd0b84c199c96ff5763e6a08fa/v2v/nbdkit_sources.ml#L142

[2] https://github.com/libguestfs/nbdkit/blob/d085b87dcbe05c9c2d0049f0fc613455490c1032/plugins/ssh/ssh.c#L468

> That's different from the existing --filter=retry behavior (where we
> try to keep the client connection alive and reopen the plugin), but
> has a similar effect (because we force the connection to the client
> to drop, the client would have to reconnect to get more data, and
> reconnecting triggers a retry on connecting to the plugin).

I get that this is different from the retry filter, but isn't this
just working around behaviour in qemu's NBD client?  Couldn't qemu's
NBD client be changed to reconnect on EIO?  Or retry?  (Optionally of
course, and this would be orthogonal the current patch.)

> And it's different from --filter=exitlast (that says to quit nbdkit
> altogether, rather than just the current connection with a client).

We'd certainly need a new nbdkit_* API, rather like the way we added
nbdkit_shutdown to make nbdkit-exitlast-filter possible.  However I'm
still unclear if the new filter's behaviour would be necessary.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org



      reply	other threads:[~2020-04-02 14:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-01 22:38 [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN Eric Blake
2020-04-02  6:41 ` Vladimir Sementsov-Ogievskiy
2020-04-02 13:33   ` Eric Blake
2020-04-02 13:55     ` Eric Blake
2020-04-02  8:38 ` Richard W.M. Jones
2020-04-02 13:41   ` Eric Blake
2020-04-02 14:04     ` Richard W.M. Jones [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200402140426.GJ3888@redhat.com \
    --to=rjones@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).