* [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export
2024-03-14 16:58 [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Kevin Wolf
@ 2024-03-14 16:58 ` Kevin Wolf
2024-03-19 16:49 ` Michael Tokarev
2024-03-14 16:58 ` [PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports Kevin Wolf
2024-03-18 13:28 ` [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Stefan Hajnoczi
2 siblings, 1 reply; 5+ messages in thread
From: Kevin Wolf @ 2024-03-14 16:58 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, hreitz, eblake, aliang, qemu-devel, qemu-stable
When draining an NBD export, nbd_drained_begin() first sets
client->quiescing so that nbd_client_receive_next_request() won't start
any new request coroutines. Then nbd_drained_poll() tries to makes sure
that we wait for any existing request coroutines by checking that
client->nb_requests has become 0.
However, there is a small window between creating a new request
coroutine and increasing client->nb_requests. If a coroutine is in this
state, it won't be waited for and drain returns too early.
In the context of switching to a different AioContext, this means that
blk_aio_attached() will see client->recv_coroutine != NULL and fail its
assertion.
Fix this by increasing client->nb_requests immediately when starting the
coroutine. Doing this after the checks if we should create a new
coroutine is okay because client->lock is held.
Cc: qemu-stable@nongnu.org
Fixes: fd6afc501a019682d1b8468b562355a2887087bd
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
nbd/server.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/nbd/server.c b/nbd/server.c
index 941832f178..c3484cc1eb 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -3007,8 +3007,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
/* Owns a reference to the NBDClient passed as opaque. */
static coroutine_fn void nbd_trip(void *opaque)
{
- NBDClient *client = opaque;
- NBDRequestData *req = NULL;
+ NBDRequestData *req = opaque;
+ NBDClient *client = req->client;
NBDRequest request = { 0 }; /* GCC thinks it can be used uninitialized */
int ret;
Error *local_err = NULL;
@@ -3037,8 +3037,6 @@ static coroutine_fn void nbd_trip(void *opaque)
goto done;
}
- req = nbd_request_get(client);
-
/*
* nbd_co_receive_request() returns -EAGAIN when nbd_drained_begin() has
* set client->quiescing but by the time we get back nbd_drained_end() may
@@ -3112,9 +3110,7 @@ static coroutine_fn void nbd_trip(void *opaque)
}
done:
- if (req) {
- nbd_request_put(req);
- }
+ nbd_request_put(req);
qemu_mutex_unlock(&client->lock);
@@ -3143,10 +3139,13 @@ disconnect:
*/
static void nbd_client_receive_next_request(NBDClient *client)
{
+ NBDRequestData *req;
+
if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS &&
!client->quiescing) {
nbd_client_get(client);
- client->recv_coroutine = qemu_coroutine_create(nbd_trip, client);
+ req = nbd_request_get(client);
+ client->recv_coroutine = qemu_coroutine_create(nbd_trip, req);
aio_co_schedule(client->exp->common.ctx, client->recv_coroutine);
}
}
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports
2024-03-14 16:58 [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Kevin Wolf
2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
@ 2024-03-14 16:58 ` Kevin Wolf
2024-03-18 13:28 ` [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Stefan Hajnoczi
2 siblings, 0 replies; 5+ messages in thread
From: Kevin Wolf @ 2024-03-14 16:58 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, hreitz, eblake, aliang, qemu-devel, qemu-stable
This replicates the scenario in which the bug was reported.
Unfortunately this relies on actually executing a guest (so that the
firmware initialises the virtio-blk device and moves it to its
configured iothread), so this can't make use of the qtest accelerator
like most other test cases. I tried to find a different easy way to
trigger the bug, but couldn't find one.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
tests/qemu-iotests/tests/iothreads-nbd-export | 66 +++++++++++++++++++
.../tests/iothreads-nbd-export.out | 19 ++++++
2 files changed, 85 insertions(+)
create mode 100755 tests/qemu-iotests/tests/iothreads-nbd-export
create mode 100644 tests/qemu-iotests/tests/iothreads-nbd-export.out
diff --git a/tests/qemu-iotests/tests/iothreads-nbd-export b/tests/qemu-iotests/tests/iothreads-nbd-export
new file mode 100755
index 0000000000..63cac8fdbf
--- /dev/null
+++ b/tests/qemu-iotests/tests/iothreads-nbd-export
@@ -0,0 +1,66 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Copyright (C) 2024 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+# Creator/Owner: Kevin Wolf <kwolf@redhat.com>
+
+import asyncio
+import iotests
+import qemu
+import time
+
+iotests.script_initialize(supported_fmts=['qcow2'],
+ supported_platforms=['linux'])
+
+with iotests.FilePath('disk1.img') as path, \
+ iotests.FilePath('nbd.sock', base_dir=iotests.sock_dir) as nbd_sock, \
+ qemu.machine.QEMUMachine(iotests.qemu_prog) as vm:
+
+ img_size = '10M'
+
+ iotests.log('Preparing disk...')
+ iotests.qemu_img_create('-f', iotests.imgfmt, path, img_size)
+ vm.add_args('-blockdev', f'file,node-name=disk-file,filename={path}')
+ vm.add_args('-blockdev', f'qcow2,node-name=disk,file=disk-file')
+ vm.add_args('-object', 'iothread,id=iothread0')
+ vm.add_args('-device', 'virtio-blk,drive=disk,iothread=iothread0,share-rw=on')
+
+ iotests.log('Launching VM...')
+ vm.add_args('-accel', 'kvm', '-accel', 'tcg')
+ #vm.add_args('-accel', 'qtest')
+ vm.launch()
+
+ iotests.log('Exporting to NBD...')
+ iotests.log(vm.qmp('nbd-server-start',
+ addr={'type': 'unix', 'data': {'path': nbd_sock}}))
+ iotests.log(vm.qmp('block-export-add', type='nbd', id='exp0',
+ node_name='disk', writable=True))
+
+ iotests.log('Connecting qemu-img...')
+ qemu_io = iotests.QemuIoInteractive('-f', 'raw',
+ f'nbd+unix:///disk?socket={nbd_sock}')
+
+ iotests.log('Moving the NBD export to a different iothread...')
+ for i in range(0, 10):
+ iotests.log(vm.qmp('system_reset'))
+ time.sleep(0.1)
+
+ iotests.log('Checking that it is still alive...')
+ iotests.log(vm.qmp('query-status'))
+
+ qemu_io.close()
+ vm.shutdown()
diff --git a/tests/qemu-iotests/tests/iothreads-nbd-export.out b/tests/qemu-iotests/tests/iothreads-nbd-export.out
new file mode 100644
index 0000000000..bc514e35e5
--- /dev/null
+++ b/tests/qemu-iotests/tests/iothreads-nbd-export.out
@@ -0,0 +1,19 @@
+Preparing disk...
+Launching VM...
+Exporting to NBD...
+{"return": {}}
+{"return": {}}
+Connecting qemu-img...
+Moving the NBD export to a different iothread...
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+Checking that it is still alive...
+{"return": {"running": true, "status": "running"}}
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread