qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, mreitz@redhat.com, kwolf@redhat.com,
	eblake@redhat.com, vsementsov@virtuozzo.com, den@openvz.org
Subject: [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay
Date: Thu,  3 Sep 2020 22:02:58 +0300	[thread overview]
Message-ID: <20200903190301.367620-2-vsementsov@virtuozzo.com> (raw)
In-Reply-To: <20200903190301.367620-1-vsementsov@virtuozzo.com>

We pause reconnect process during drained section. So, if we have some
requests, waiting for reconnect we should cancel them, otherwise they
deadlock the drained section.

How to reproduce:

1. Create an image:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server:
   qemu-nbd xx

3. Start vm with second nbd disk on node2, like this:

  ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
     file=/work/images/cent7.qcow2 -drive \
     driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
     -vnc :0 -m 2G -enable-kvm -vga std

4. Access the vm through vnc (or some other way?), and check that NBD
   drive works:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

   - the command should succeed.

5. Now, kill the nbd server, and run dd in the guest again:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

Now Qemu is trying to reconnect, and dd-generated requests are waiting
for the connection (they will wait up to 60 seconds (see
reconnect-delay option above) and than fail). But suddenly, vm may
totally hang in the deadlock. You may need to increase reconnect-delay
period to catch the dead-lock.

VM doesn't respond because drain dead-lock happens in cpu thread with
global mutex taken. That's not good thing by itself and is not fixed
by this commit (true way is using iothreads). Still this commit fixes
drain dead-lock itself.

Note: probably, we can instead continue to reconnect during drained
section. To achieve this, we may move negotiation to the connect thread
to make it independent of bs aio context. But expanding drained section
doesn't seem good anyway. So, let's now fix the bug the simplest way.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 9daf003bea..912ea27be7 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -242,6 +242,11 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs)
     }
 
     nbd_co_establish_connection_cancel(bs, false);
+
+    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
+        qemu_co_queue_restart_all(&s->free_sema);
+    }
 }
 
 static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs)
-- 
2.18.0



  reply	other threads:[~2020-09-03 19:04 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-03 19:02 [PATCH 0/4] nbd reconnect new fixes Vladimir Sementsov-Ogievskiy
2020-09-03 19:02 ` Vladimir Sementsov-Ogievskiy [this message]
2020-09-23 15:08   ` [PATCH 1/4] block/nbd: fix drain dead-lock because of nbd reconnect-delay Eric Blake
2021-02-03 10:53   ` Roman Kagan
2021-02-03 13:10     ` Vladimir Sementsov-Ogievskiy
2021-02-03 14:21       ` Roman Kagan
2021-02-03 14:44         ` Vladimir Sementsov-Ogievskiy
2021-02-03 15:00           ` Roman Kagan
2020-09-03 19:02 ` [PATCH 2/4] block/nbd: correctly use qio_channel_detach_aio_context when needed Vladimir Sementsov-Ogievskiy
2020-09-23 15:10   ` Eric Blake
2020-09-03 19:03 ` [PATCH 3/4] block/nbd: fix reconnect-delay Vladimir Sementsov-Ogievskiy
2020-09-23 15:15   ` Eric Blake
2020-09-03 19:03 ` [PATCH 4/4] block/nbd: nbd_co_reconnect_loop(): don't connect if drained Vladimir Sementsov-Ogievskiy
2020-09-23 15:16   ` Eric Blake
2020-09-04 13:32 ` [PATCH 0/4] nbd reconnect new fixes no-reply
2020-09-04 14:00   ` Vladimir Sementsov-Ogievskiy
2020-09-04 13:34 ` no-reply
2020-09-04 14:01   ` Vladimir Sementsov-Ogievskiy
2020-09-18 18:29 ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200903190301.367620-2-vsementsov@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).