qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org
Cc: kwolf@redhat.com, den@openvz.org, qemu-devel@nongnu.org,
	mreitz@redhat.com
Subject: Re: [PATCH 1/3] block/nbd: allow drain during reconnect attempt
Date: Thu, 23 Jul 2020 13:47:58 -0500	[thread overview]
Message-ID: <7211b25e-94b8-de52-a2da-66f480af9a2a@redhat.com> (raw)
In-Reply-To: <20200720090024.18186-2-vsementsov@virtuozzo.com>

On 7/20/20 4:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> It should be to reenter qio_channel_yield() on io/channel read/write
> path, so it's safe to reduce in_flight and allow attaching new aio
> context. And no problem to allow drain itself: connection attempt is
> not a guest request. Moreover, if remote server is down, we can hang
> in negotiation, blocking drain section and provoking a dead lock.
> 
> How to reproduce the dead lock:
> 

I tried to reproduce this; but in the several minutes it has taken me to 
write this email, it still has not hung.  Still, your stack trace is 
fairly good evidence of the problem, where adding a temporary sleep or 
running it under gdb with a breakpoint can probably make reproduction 
easier.

> 1. Create nbd-fault-injector.conf with the following contents:
> 
> [inject-error "mega1"]
> event=data
> io=readwrite
> when=before
> 
> 2. In one terminal run nbd-fault-injector in a loop, like this:
> 
> n=1; while true; do
>      echo $n; ((n++));

Bashism, but not a problem for the commit message.

>      ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf;
> done
> 
> 3. In another terminal run qemu-io in a loop, like this:
> 
> n=1; while true; do
>      echo $n; ((n++));
>      ./qemu-io -c 'read 0 512' nbd+tcp://127.0.0.1:10000;

I prefer the spelling nbd:// for TCP connections, but also inconsequential.

> Note, that the hang may be
> triggered by another bug, so the whole case is fixed only together with
> commit "block/nbd: on shutdown terminate connection attempt".
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/nbd.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 65a4f56924..49254f1c3c 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -280,7 +280,18 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
>           s->ioc = NULL;
>       }
>   
> +    bdrv_dec_in_flight(s->bs);
>       s->connect_status = nbd_client_connect(s->bs, &local_err);
> +    s->wait_drained_end = true;
> +    while (s->drained) {
> +        /*
> +         * We may be entered once from nbd_client_attach_aio_context_bh
> +         * and then from nbd_client_co_drain_end. So here is a loop.
> +         */
> +        qemu_coroutine_yield();
> +    }
> +    bdrv_inc_in_flight(s->bs);
> +

This is very similar to the code in nbd_co_reconnect_loop.  Does that 
function still need to wait on drained, since it calls 
nbd_reconnect_attempt which is now doing the same loop?  But off-hand, 
I'm not seeing a problem with keeping both places.

Reviewed-by: Eric Blake <eblake@redhat.com>

As a bug fix, I'll be including this in my NBD pull request for the next 
-rc build.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



  parent reply	other threads:[~2020-07-23 18:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-20  9:00 [PATCH for-5.1? 0/3] Fix nbd reconnect dead-locks Vladimir Sementsov-Ogievskiy
2020-07-20  9:00 ` [PATCH 1/3] block/nbd: allow drain during reconnect attempt Vladimir Sementsov-Ogievskiy
2020-07-20  9:02   ` Vladimir Sementsov-Ogievskiy
2020-07-23 18:47   ` Eric Blake [this message]
2020-07-24 10:04     ` Vladimir Sementsov-Ogievskiy
2020-07-24 11:50     ` Vladimir Sementsov-Ogievskiy
2020-07-24  9:49   ` Vladimir Sementsov-Ogievskiy
2020-07-24 10:21   ` Vladimir Sementsov-Ogievskiy
2020-07-20  9:00 ` [PATCH 2/3] block/nbd: on shutdown terminate connection attempt Vladimir Sementsov-Ogievskiy
2020-07-23 18:52   ` Eric Blake
2020-07-20  9:00 ` [PATCH 3/3] block/nbd: nbd_co_reconnect_loop(): don't sleep if drained Vladimir Sementsov-Ogievskiy
2020-07-23 18:55   ` Eric Blake
2020-07-27 18:42     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7211b25e-94b8-de52-a2da-66f480af9a2a@redhat.com \
    --to=eblake@redhat.com \
    --cc=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).