[PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
@ 2020-04-01 22:38 Eric Blake
  2020-04-02  6:41 ` Vladimir Sementsov-Ogievskiy
  2020-04-02  8:38 ` Richard W.M. Jones
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Blake @ 2020-04-01 22:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, vsementsov, rjones, qemu-block, Max Reitz

I was trying to test qemu's reconnect-delay parameter by using nbdkit
as a server that I could easily make disappear and resume.  A bit of
experimenting shows that when nbdkit is abruptly killed (SIGKILL),
qemu detects EOF on the socket and manages to reconnect just fine; but
when nbdkit is gracefully killed (SIGTERM), it merely fails all
further guest requests with NBD_ESHUTDOWN until the client disconnects
first, and qemu was blindly failing the I/O request with ESHUTDOWN
from the server instead of attempting to reconnect.

While most NBD server failures are unlikely to change by merely
retrying the same transaction, our decision to not start a retry loop
in the common case is correct.  But NBD_ESHUTDOWN is rare enough, and
really is indicative of a transient situation, that it is worth
special-casing.

Here's the test setup I used: in one terminal, kick off a sequence of
nbdkit commands that has a temporary window where the server is
offline; in another terminal (and within the first 5 seconds) kick off
a qemu-img convert with reconnect enabled.  If the qemu-img process
completes successfully, the reconnect worked.

$ #term1
$ MYSIG=    # or MYSIG='-s KILL'
$ timeout $MYSIG 5s ~/nbdkit/nbdkit -fv --filter=delay --filter=noextents \
  null 200M delay-read=1s; sleep 5; ~/nbdkit/nbdkit -fv --filter=exitlast \
  --filter=delay --filter=noextents null 200M delay-read=1s

$ #term2
$ MYCONN=server.type=inet,server.host=localhost,server.port=10809
$ qemu-img convert -p -O raw --image-opts \
  driver=nbd,$MYCONN,,reconnect-delay=60 out.img

See also: https://bugzilla.redhat.com/show_bug.cgi?id=1819240#c8

Signed-off-by: Eric Blake <eblake@redhat.com>
---

This is not a regression, per se, as reconnect-delay has been unchanged
since 4.2; but I'd like to consider this as an interoperability bugfix
worth including in the next rc.

 block/nbd.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 2906484390f9..576b95fb8753 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -863,6 +863,15 @@ static coroutine_fn int nbd_co_receive_one_chunk(
     if (ret < 0) {
         memset(reply, 0, sizeof(*reply));
         nbd_channel_error(s, ret);
+    } else if (s->reconnect_delay && *request_ret == -ESHUTDOWN) {
+        /*
+         * Special case: if we support reconnect and server is warning
+         * us that it wants to shut down, then treat this like an
+         * abrupt connection loss.
+         */
+        memset(reply, 0, sizeof(*reply));
+        *request_ret = 0;
+        nbd_channel_error(s, -EIO);
     } else {
         /* For assert at loop start in nbd_connection_entry */
         *reply = s->reply;
-- 
2.26.0.rc2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
  2020-04-01 22:38 [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN Eric Blake
@ 2020-04-02  6:41 ` Vladimir Sementsov-Ogievskiy
  2020-04-02 13:33   ` Eric Blake
  2020-04-02  8:38 ` Richard W.M. Jones
  1 sibling, 1 reply; 7+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-02  6:41 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, rjones, qemu-block, Max Reitz

02.04.2020 1:38, Eric Blake wrote:
> I was trying to test qemu's reconnect-delay parameter by using nbdkit
> as a server that I could easily make disappear and resume.  A bit of
> experimenting shows that when nbdkit is abruptly killed (SIGKILL),
> qemu detects EOF on the socket and manages to reconnect just fine; but
> when nbdkit is gracefully killed (SIGTERM), it merely fails all
> further guest requests with NBD_ESHUTDOWN until the client disconnects
> first, and qemu was blindly failing the I/O request with ESHUTDOWN
> from the server instead of attempting to reconnect.
> 
> While most NBD server failures are unlikely to change by merely
> retrying the same transaction, our decision to not start a retry loop
> in the common case is correct.  But NBD_ESHUTDOWN is rare enough, and
> really is indicative of a transient situation, that it is worth
> special-casing.
> 
> Here's the test setup I used: in one terminal, kick off a sequence of
> nbdkit commands that has a temporary window where the server is
> offline; in another terminal (and within the first 5 seconds) kick off
> a qemu-img convert with reconnect enabled.  If the qemu-img process
> completes successfully, the reconnect worked.
> 
> $ #term1
> $ MYSIG=    # or MYSIG='-s KILL'
> $ timeout $MYSIG 5s ~/nbdkit/nbdkit -fv --filter=delay --filter=noextents \
>    null 200M delay-read=1s; sleep 5; ~/nbdkit/nbdkit -fv --filter=exitlast \
>    --filter=delay --filter=noextents null 200M delay-read=1s
> 
> $ #term2
> $ MYCONN=server.type=inet,server.host=localhost,server.port=10809
> $ qemu-img convert -p -O raw --image-opts \
>    driver=nbd,$MYCONN,,reconnect-delay=60 out.img
> 
> See also: https://bugzilla.redhat.com/show_bug.cgi?id=1819240#c8
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 
> This is not a regression, per se, as reconnect-delay has been unchanged
> since 4.2; but I'd like to consider this as an interoperability bugfix
> worth including in the next rc.
> 
>   block/nbd.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 2906484390f9..576b95fb8753 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -863,6 +863,15 @@ static coroutine_fn int nbd_co_receive_one_chunk(
>       if (ret < 0) {
>           memset(reply, 0, sizeof(*reply));
>           nbd_channel_error(s, ret);
> +    } else if (s->reconnect_delay && *request_ret == -ESHUTDOWN) {
> +        /*
> +         * Special case: if we support reconnect and server is warning
> +         * us that it wants to shut down, then treat this like an
> +         * abrupt connection loss.
> +         */
> +        memset(reply, 0, sizeof(*reply));
> +        *request_ret = 0;
> +        nbd_channel_error(s, -EIO);
>       } else {
>           /* For assert at loop start in nbd_connection_entry */
>           *reply = s->reply;
> 

Interesting. I see, that prior to this patch we don't handle ESHUTDOWN at all in nbd client..

What does spec say?

 > On a server shutdown, the server SHOULD wait for inflight requests to be serviced prior to initiating a hard disconnect. A server MAY speed this process up by issuing error replies. The error value issued in respect of these requests and any subsequently received requests SHOULD be NBD_ESHUTDOWN.
 > If the client receives an NBD_ESHUTDOWN error it MUST initiate a soft disconnect.
 > The client MAY issue a soft disconnect at any time, but SHOULD wait until there are no inflight requests first.
 > The client and the server MUST NOT initiate any form of disconnect other than in one of the above circumstances.

Hmm. So, actually we MUST initiate a soft disconnect, which means that we must send NBD_CMD_DISC..

Then, what about "SHOULD wait until no inflight requests"? We don't do it either.. Should we?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
  2020-04-01 22:38 [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN Eric Blake
  2020-04-02  6:41 ` Vladimir Sementsov-Ogievskiy
@ 2020-04-02  8:38 ` Richard W.M. Jones
  2020-04-02 13:41   ` Eric Blake
  1 sibling, 1 reply; 7+ messages in thread
From: Richard W.M. Jones @ 2020-04-02  8:38 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, vsementsov, qemu-devel, qemu-block, Max Reitz


On Wed, Apr 01, 2020 at 05:38:41PM -0500, Eric Blake wrote:
> I was trying to test qemu's reconnect-delay parameter by using nbdkit
> as a server that I could easily make disappear and resume.  A bit of
> experimenting shows that when nbdkit is abruptly killed (SIGKILL),
> qemu detects EOF on the socket and manages to reconnect just fine; but
> when nbdkit is gracefully killed (SIGTERM), it merely fails all
> further guest requests with NBD_ESHUTDOWN until the client disconnects
> first, and qemu was blindly failing the I/O request with ESHUTDOWN
> from the server instead of attempting to reconnect.
> 
> While most NBD server failures are unlikely to change by merely
> retrying the same transaction, our decision to not start a retry loop
> in the common case is correct.  But NBD_ESHUTDOWN is rare enough, and
> really is indicative of a transient situation, that it is worth
> special-casing.
> 
> Here's the test setup I used: in one terminal, kick off a sequence of
> nbdkit commands that has a temporary window where the server is
> offline; in another terminal (and within the first 5 seconds) kick off
> a qemu-img convert with reconnect enabled.  If the qemu-img process
> completes successfully, the reconnect worked.
> 
> $ #term1
> $ MYSIG=    # or MYSIG='-s KILL'
> $ timeout $MYSIG 5s ~/nbdkit/nbdkit -fv --filter=delay --filter=noextents \
>   null 200M delay-read=1s; sleep 5; ~/nbdkit/nbdkit -fv --filter=exitlast \
>   --filter=delay --filter=noextents null 200M delay-read=1s
> 
> $ #term2
> $ MYCONN=server.type=inet,server.host=localhost,server.port=10809
> $ qemu-img convert -p -O raw --image-opts \
>   driver=nbd,$MYCONN,,reconnect-delay=60 out.img
> 
> See also: https://bugzilla.redhat.com/show_bug.cgi?id=1819240#c8
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 
> This is not a regression, per se, as reconnect-delay has been unchanged
> since 4.2; but I'd like to consider this as an interoperability bugfix
> worth including in the next rc.
> 
>  block/nbd.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 2906484390f9..576b95fb8753 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -863,6 +863,15 @@ static coroutine_fn int nbd_co_receive_one_chunk(
>      if (ret < 0) {
>          memset(reply, 0, sizeof(*reply));
>          nbd_channel_error(s, ret);
> +    } else if (s->reconnect_delay && *request_ret == -ESHUTDOWN) {
> +        /*
> +         * Special case: if we support reconnect and server is warning
> +         * us that it wants to shut down, then treat this like an
> +         * abrupt connection loss.
> +         */
> +        memset(reply, 0, sizeof(*reply));
> +        *request_ret = 0;
> +        nbd_channel_error(s, -EIO);
>      } else {
>          /* For assert at loop start in nbd_connection_entry */
>          *reply = s->reply;

For the case I care about (long running virt-v2v conversions with an
intermittent network) we don't expect that nbdkit will be killed nor
gracefully shut down.  Instead what we expect is that nbdkit returns
an error such as NBD_EIO and keeps running.

Indeed if nbdkit actually dies then reconnecting will not help since
there will be no server to reconnect to.

So I'm neutral about this patch.  If you want it for qemu that's fine
but I don't think it will affect v2v.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
  2020-04-02  6:41 ` Vladimir Sementsov-Ogievskiy
@ 2020-04-02 13:33   ` Eric Blake
  2020-04-02 13:55     ` Eric Blake
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Blake @ 2020-04-02 13:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, rjones, qemu-block, Max Reitz

On 4/2/20 1:41 AM, Vladimir Sementsov-Ogievskiy wrote:
> 02.04.2020 1:38, Eric Blake wrote:
>> I was trying to test qemu's reconnect-delay parameter by using nbdkit
>> as a server that I could easily make disappear and resume.  A bit of
>> experimenting shows that when nbdkit is abruptly killed (SIGKILL),
>> qemu detects EOF on the socket and manages to reconnect just fine; but
>> when nbdkit is gracefully killed (SIGTERM), it merely fails all
>> further guest requests with NBD_ESHUTDOWN until the client disconnects
>> first, and qemu was blindly failing the I/O request with ESHUTDOWN
>> from the server instead of attempting to reconnect.
>>
>> While most NBD server failures are unlikely to change by merely
>> retrying the same transaction, our decision to not start a retry loop
>> in the common case is correct.  But NBD_ESHUTDOWN is rare enough, and
>> really is indicative of a transient situation, that it is worth
>> special-casing.

> 
> Interesting. I see, that prior to this patch we don't handle ESHUTDOWN 
> at all in nbd client..
> 
> What does spec say?
> 
>  > On a server shutdown, the server SHOULD wait for inflight requests to 
> be serviced prior to initiating a hard disconnect. A server MAY speed 
> this process up by issuing error replies. The error value issued in 
> respect of these requests and any subsequently received requests SHOULD 
> be NBD_ESHUTDOWN.
>  > If the client receives an NBD_ESHUTDOWN error it MUST initiate a soft 
> disconnect.

Perhaps the spec should be relaxed to state that a client SHOULD 
initiate soft disconnect (as there are existing clients that do not). 
If a server knows it wants to initiate hard disconnect soon, it 
shouldn't be forced to wait for a client to respond to NBD_ESHUTDOWN, 
since not all clients do.  Then again, it is indeed nicer if the client 
does initiate soft disconnect (as soft is always cleaner than hard).

>  > The client MAY issue a soft disconnect at any time, but SHOULD wait 
> until there are no inflight requests first.
>  > The client and the server MUST NOT initiate any form of disconnect 
> other than in one of the above circumstances.
> 
> Hmm. So, actually we MUST initiate a soft disconnect, which means that 
> we must send NBD_CMD_DISC..

With this patch as-is, qemu as client initiates hard disconnect in 
response to NBD_ESHUTDOWN (but only if it plans on trying to reconnect).

> 
> Then, what about "SHOULD wait until no inflight requests"? We don't do 
> it either.. Should we?

qemu as server doesn't send NBD_ESHUTDOWN.  It probably should (the way 
nbdkit does), but that's orthogonal to qemu as client responding to 
NBD_ESHUTDOWN.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
  2020-04-02  8:38 ` Richard W.M. Jones
@ 2020-04-02 13:41   ` Eric Blake
  2020-04-02 14:04     ` Richard W.M. Jones
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Blake @ 2020-04-02 13:41 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Kevin Wolf, vsementsov, qemu-devel, qemu-block, Max Reitz

On 4/2/20 3:38 AM, Richard W.M. Jones wrote:
> 
> On Wed, Apr 01, 2020 at 05:38:41PM -0500, Eric Blake wrote:
>> I was trying to test qemu's reconnect-delay parameter by using nbdkit
>> as a server that I could easily make disappear and resume.  A bit of
>> experimenting shows that when nbdkit is abruptly killed (SIGKILL),
>> qemu detects EOF on the socket and manages to reconnect just fine; but
>> when nbdkit is gracefully killed (SIGTERM), it merely fails all
>> further guest requests with NBD_ESHUTDOWN until the client disconnects
>> first, and qemu was blindly failing the I/O request with ESHUTDOWN
>> from the server instead of attempting to reconnect.
>>
>> While most NBD server failures are unlikely to change by merely
>> retrying the same transaction, our decision to not start a retry loop
>> in the common case is correct.  But NBD_ESHUTDOWN is rare enough, and
>> really is indicative of a transient situation, that it is worth
>> special-casing.
>>

> 
> For the case I care about (long running virt-v2v conversions with an
> intermittent network) we don't expect that nbdkit will be killed nor
> gracefully shut down.  Instead what we expect is that nbdkit returns
> an error such as NBD_EIO and keeps running.
> 
> Indeed if nbdkit actually dies then reconnecting will not help since
> there will be no server to reconnect to.

Hmm.  The idea of reconnect-delay in qemu is that if the connection to 
the server is dropped, we try to reconnect and then retry the I/O 
operation.  Maybe what we want is an nbdkit filter which turns EIO 
errors from the v2v plugin into forced server connection drops, but 
leave nbdkit up and running to allow the next client to connect.  That's 
different from the existing --filter=retry behavior (where we try to 
keep the client connection alive and reopen the plugin), but has a 
similar effect (because we force the connection to the client to drop, 
the client would have to reconnect to get more data, and reconnecting 
triggers a retry on connecting to the plugin).  And it's different from 
--filter=exitlast (that says to quit nbdkit altogether, rather than just 
the current connection with a client).

> 
> So I'm neutral about this patch.  If you want it for qemu that's fine
> but I don't think it will affect v2v.

Then this patch is no longer 5.0 material.  We may still want to improve 
shutdown handling in qemu (both in the client and in the server), but 
doing it correctly will be bigger than just one patch, based on 
Vladimir's response.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
  2020-04-02 13:33   ` Eric Blake
@ 2020-04-02 13:55     ` Eric Blake
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Blake @ 2020-04-02 13:55 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Dr. David Alan Gilbert, rjones, qemu-block, Max Reitz

On 4/2/20 8:33 AM, Eric Blake wrote:

>> Then, what about "SHOULD wait until no inflight requests"? We don't do 
>> it either.. Should we?
> 
> qemu as server doesn't send NBD_ESHUTDOWN.  It probably should (the way 
> nbdkit does), but that's orthogonal to qemu as client responding to 
> NBD_ESHUTDOWN.

Other things I want to document here based on an IRC conversation with Dave:

Our notion of reconnect-delay has a baked-in notion of timeout, but 
selecting the right timeout can be difficult (how do you know it is long 
enough to catch all the cases you care about where recovery will work, 
but not so long that waiting for an actual timeout is not going to be 
painful when recovery is not possible).  And the qemu block layer 
already has the notion of pausing the guest on certain errors (whether 
that be just on ENOSPC, or on all errors), to give the management all 
the time it needs to resolve the problem and then resume the guest.

There's also the issue of TCP timeouts - if the server manages to send 
shutdown(SHUT_WR) before the connection dies, the client gets an instant 
EOF and can be pretty responsive to the need to start the retry cycle. 
But if the connection dies without a clean shutdown, the client may be 
stuck waiting several seconds for a TCP timeout to occur before 
realizing that things are down (use of TCP keep-alive may or may not 
help here) - management apps may be able to figure out from other means 
when an NBD server is having issues long before qemu itself sees the TCP 
connection go down.  In that case, having a way for the client to 
trigger shutdown(SHUT_RD) in order to speed up disconnection, rather 
than waiting for a TCP timeout, can come in handy.

Or, if we have a multipath scenario, where we know that several IP 
addresses will serve the same underlying storage, we may just need a way 
to reopen an existing NBD blockdev but with a different IP address to 
the server.

All of this implies we may want to add a QMP command to force a given 
NBD blockdev to attempt a reconnect now, rather than having to wait for 
a TCP connection death to let us know that a reconnect is the only way 
forward, or even as a way to make sure that we can resume the guest 
after it was paused due to I/O error.  I don't know if the existing 
'x-blockdev-reopen' can be extended to cover our needs, or if we need 
something completely new.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
  2020-04-02 13:41   ` Eric Blake
@ 2020-04-02 14:04     ` Richard W.M. Jones
  0 siblings, 0 replies; 7+ messages in thread
From: Richard W.M. Jones @ 2020-04-02 14:04 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, vsementsov, qemu-devel, qemu-block, Max Reitz

On Thu, Apr 02, 2020 at 08:41:31AM -0500, Eric Blake wrote:
> On 4/2/20 3:38 AM, Richard W.M. Jones wrote:
> >For the case I care about (long running virt-v2v conversions with an
> >intermittent network) we don't expect that nbdkit will be killed nor
> >gracefully shut down.  Instead what we expect is that nbdkit returns
> >an error such as NBD_EIO and keeps running.
> >
> >Indeed if nbdkit actually dies then reconnecting will not help since
> >there will be no server to reconnect to.

To put this in context for other people reading, virt-v2v uses this
sort of situation:

<pre>
                          +---------- same machine ----------+
                          |                                  |
  +------------+            +----------+        +----------+
  | remote     |            | nbdkit   |        | qemu-img |
  | VMware     |----------->| + VDDK   |------->| convert  |--> output
  | server     |            |          |        |          |
  +------------+            +----------+        +----------+
             VMware proprietary      NBD over Unix skt
             protocol over TCP
</pre>

The problem being addressed is that the whole task can run for many
hours, and a single interruption in the network between virt-v2v and
the remote VMware server can cause the entire process to fail.
nbdkit-retry-filter[0] attempts to address the problem by allowing the
VMware side of the protocol to be restarted without qemu-img seeing
any interruption (nor any error) on the NBD connection.

[0] http://libguestfs.org/nbdkit-retry-filter.1.html

> Hmm.  The idea of reconnect-delay in qemu is that if the connection
> to the server is dropped, we try to reconnect and then retry the I/O
> operation.  Maybe what we want is an nbdkit filter which turns EIO
> errors from the v2v plugin into forced server connection drops, but
> leave nbdkit up and running to allow the next client to connect.

Note that of the three nbdkit plugins we currently use (vddk[1], curl
and ssh) at least two of them have the property that closing and
reopening the plugin handle (which is what nbdkit-retry-filter does)
reconnects to the remote server.  To take nbdkit-ssh-plugin as a
specific example[2], the .open callback calls ssh_connect() and the
.close callback calls ssh_disconnect().  VDDK works the same way.  I'm
a bit unclear on nbdkit-curl-plugin because IIRC underlying HTTPS
connections may be managed in a pool inside Curl.

[1] All in this file, starting here:
https://github.com/libguestfs/virt-v2v/blob/8cf4488d6bcde8dd0b84c199c96ff5763e6a08fa/v2v/nbdkit_sources.ml#L142

[2] https://github.com/libguestfs/nbdkit/blob/d085b87dcbe05c9c2d0049f0fc613455490c1032/plugins/ssh/ssh.c#L468

> That's different from the existing --filter=retry behavior (where we
> try to keep the client connection alive and reopen the plugin), but
> has a similar effect (because we force the connection to the client
> to drop, the client would have to reconnect to get more data, and
> reconnecting triggers a retry on connecting to the plugin).

I get that this is different from the retry filter, but isn't this
just working around behaviour in qemu's NBD client?  Couldn't qemu's
NBD client be changed to reconnect on EIO?  Or retry?  (Optionally of
course, and this would be orthogonal the current patch.)

> And it's different from --filter=exitlast (that says to quit nbdkit
> altogether, rather than just the current connection with a client).

We'd certainly need a new nbdkit_* API, rather like the way we added
nbdkit_shutdown to make nbdkit-exitlast-filter possible.  However I'm
still unclear if the new filter's behaviour would be necessary.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-02 14:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-01 22:38 [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN Eric Blake
2020-04-02  6:41 ` Vladimir Sementsov-Ogievskiy
2020-04-02 13:33   ` Eric Blake
2020-04-02 13:55     ` Eric Blake
2020-04-02  8:38 ` Richard W.M. Jones
2020-04-02 13:41   ` Eric Blake
2020-04-02 14:04     ` Richard W.M. Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).