* [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads
@ 2024-03-14 16:58 Kevin Wolf
2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Kevin Wolf @ 2024-03-14 16:58 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, hreitz, eblake, aliang, qemu-devel, qemu-stable
Kevin Wolf (2):
nbd/server: Fix race in draining the export
iotests: Add test for reset/AioContext switches with NBD exports
nbd/server.c | 15 ++---
tests/qemu-iotests/tests/iothreads-nbd-export | 66 +++++++++++++++++++
.../tests/iothreads-nbd-export.out | 19 ++++++
3 files changed, 92 insertions(+), 8 deletions(-)
create mode 100755 tests/qemu-iotests/tests/iothreads-nbd-export
create mode 100644 tests/qemu-iotests/tests/iothreads-nbd-export.out
--
2.44.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export
2024-03-14 16:58 [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Kevin Wolf
@ 2024-03-14 16:58 ` Kevin Wolf
2024-03-19 16:49 ` Michael Tokarev
2024-03-14 16:58 ` [PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports Kevin Wolf
2024-03-18 13:28 ` [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Stefan Hajnoczi
2 siblings, 1 reply; 5+ messages in thread
From: Kevin Wolf @ 2024-03-14 16:58 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, hreitz, eblake, aliang, qemu-devel, qemu-stable
When draining an NBD export, nbd_drained_begin() first sets
client->quiescing so that nbd_client_receive_next_request() won't start
any new request coroutines. Then nbd_drained_poll() tries to makes sure
that we wait for any existing request coroutines by checking that
client->nb_requests has become 0.
However, there is a small window between creating a new request
coroutine and increasing client->nb_requests. If a coroutine is in this
state, it won't be waited for and drain returns too early.
In the context of switching to a different AioContext, this means that
blk_aio_attached() will see client->recv_coroutine != NULL and fail its
assertion.
Fix this by increasing client->nb_requests immediately when starting the
coroutine. Doing this after the checks if we should create a new
coroutine is okay because client->lock is held.
Cc: qemu-stable@nongnu.org
Fixes: fd6afc501a019682d1b8468b562355a2887087bd
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
nbd/server.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/nbd/server.c b/nbd/server.c
index 941832f178..c3484cc1eb 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -3007,8 +3007,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
/* Owns a reference to the NBDClient passed as opaque. */
static coroutine_fn void nbd_trip(void *opaque)
{
- NBDClient *client = opaque;
- NBDRequestData *req = NULL;
+ NBDRequestData *req = opaque;
+ NBDClient *client = req->client;
NBDRequest request = { 0 }; /* GCC thinks it can be used uninitialized */
int ret;
Error *local_err = NULL;
@@ -3037,8 +3037,6 @@ static coroutine_fn void nbd_trip(void *opaque)
goto done;
}
- req = nbd_request_get(client);
-
/*
* nbd_co_receive_request() returns -EAGAIN when nbd_drained_begin() has
* set client->quiescing but by the time we get back nbd_drained_end() may
@@ -3112,9 +3110,7 @@ static coroutine_fn void nbd_trip(void *opaque)
}
done:
- if (req) {
- nbd_request_put(req);
- }
+ nbd_request_put(req);
qemu_mutex_unlock(&client->lock);
@@ -3143,10 +3139,13 @@ disconnect:
*/
static void nbd_client_receive_next_request(NBDClient *client)
{
+ NBDRequestData *req;
+
if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS &&
!client->quiescing) {
nbd_client_get(client);
- client->recv_coroutine = qemu_coroutine_create(nbd_trip, client);
+ req = nbd_request_get(client);
+ client->recv_coroutine = qemu_coroutine_create(nbd_trip, req);
aio_co_schedule(client->exp->common.ctx, client->recv_coroutine);
}
}
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports
2024-03-14 16:58 [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Kevin Wolf
2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
@ 2024-03-14 16:58 ` Kevin Wolf
2024-03-18 13:28 ` [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Stefan Hajnoczi
2 siblings, 0 replies; 5+ messages in thread
From: Kevin Wolf @ 2024-03-14 16:58 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, hreitz, eblake, aliang, qemu-devel, qemu-stable
This replicates the scenario in which the bug was reported.
Unfortunately this relies on actually executing a guest (so that the
firmware initialises the virtio-blk device and moves it to its
configured iothread), so this can't make use of the qtest accelerator
like most other test cases. I tried to find a different easy way to
trigger the bug, but couldn't find one.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
tests/qemu-iotests/tests/iothreads-nbd-export | 66 +++++++++++++++++++
.../tests/iothreads-nbd-export.out | 19 ++++++
2 files changed, 85 insertions(+)
create mode 100755 tests/qemu-iotests/tests/iothreads-nbd-export
create mode 100644 tests/qemu-iotests/tests/iothreads-nbd-export.out
diff --git a/tests/qemu-iotests/tests/iothreads-nbd-export b/tests/qemu-iotests/tests/iothreads-nbd-export
new file mode 100755
index 0000000000..63cac8fdbf
--- /dev/null
+++ b/tests/qemu-iotests/tests/iothreads-nbd-export
@@ -0,0 +1,66 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Copyright (C) 2024 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+# Creator/Owner: Kevin Wolf <kwolf@redhat.com>
+
+import asyncio
+import iotests
+import qemu
+import time
+
+iotests.script_initialize(supported_fmts=['qcow2'],
+ supported_platforms=['linux'])
+
+with iotests.FilePath('disk1.img') as path, \
+ iotests.FilePath('nbd.sock', base_dir=iotests.sock_dir) as nbd_sock, \
+ qemu.machine.QEMUMachine(iotests.qemu_prog) as vm:
+
+ img_size = '10M'
+
+ iotests.log('Preparing disk...')
+ iotests.qemu_img_create('-f', iotests.imgfmt, path, img_size)
+ vm.add_args('-blockdev', f'file,node-name=disk-file,filename={path}')
+ vm.add_args('-blockdev', f'qcow2,node-name=disk,file=disk-file')
+ vm.add_args('-object', 'iothread,id=iothread0')
+ vm.add_args('-device', 'virtio-blk,drive=disk,iothread=iothread0,share-rw=on')
+
+ iotests.log('Launching VM...')
+ vm.add_args('-accel', 'kvm', '-accel', 'tcg')
+ #vm.add_args('-accel', 'qtest')
+ vm.launch()
+
+ iotests.log('Exporting to NBD...')
+ iotests.log(vm.qmp('nbd-server-start',
+ addr={'type': 'unix', 'data': {'path': nbd_sock}}))
+ iotests.log(vm.qmp('block-export-add', type='nbd', id='exp0',
+ node_name='disk', writable=True))
+
+ iotests.log('Connecting qemu-img...')
+ qemu_io = iotests.QemuIoInteractive('-f', 'raw',
+ f'nbd+unix:///disk?socket={nbd_sock}')
+
+ iotests.log('Moving the NBD export to a different iothread...')
+ for i in range(0, 10):
+ iotests.log(vm.qmp('system_reset'))
+ time.sleep(0.1)
+
+ iotests.log('Checking that it is still alive...')
+ iotests.log(vm.qmp('query-status'))
+
+ qemu_io.close()
+ vm.shutdown()
diff --git a/tests/qemu-iotests/tests/iothreads-nbd-export.out b/tests/qemu-iotests/tests/iothreads-nbd-export.out
new file mode 100644
index 0000000000..bc514e35e5
--- /dev/null
+++ b/tests/qemu-iotests/tests/iothreads-nbd-export.out
@@ -0,0 +1,19 @@
+Preparing disk...
+Launching VM...
+Exporting to NBD...
+{"return": {}}
+{"return": {}}
+Connecting qemu-img...
+Moving the NBD export to a different iothread...
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+Checking that it is still alive...
+{"return": {"running": true, "status": "running"}}
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads
2024-03-14 16:58 [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Kevin Wolf
2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
2024-03-14 16:58 ` [PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports Kevin Wolf
@ 2024-03-18 13:28 ` Stefan Hajnoczi
2 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2024-03-18 13:28 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, hreitz, eblake, aliang, qemu-devel, qemu-stable
[-- Attachment #1: Type: text/plain, Size: 676 bytes --]
On Thu, Mar 14, 2024 at 05:58:23PM +0100, Kevin Wolf wrote:
> Kevin Wolf (2):
> nbd/server: Fix race in draining the export
> iotests: Add test for reset/AioContext switches with NBD exports
>
> nbd/server.c | 15 ++---
> tests/qemu-iotests/tests/iothreads-nbd-export | 66 +++++++++++++++++++
> .../tests/iothreads-nbd-export.out | 19 ++++++
> 3 files changed, 92 insertions(+), 8 deletions(-)
> create mode 100755 tests/qemu-iotests/tests/iothreads-nbd-export
> create mode 100644 tests/qemu-iotests/tests/iothreads-nbd-export.out
>
> --
> 2.44.0
>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export
2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
@ 2024-03-19 16:49 ` Michael Tokarev
0 siblings, 0 replies; 5+ messages in thread
From: Michael Tokarev @ 2024-03-19 16:49 UTC (permalink / raw)
To: Kevin Wolf, qemu-block
Cc: stefanha, hreitz, eblake, aliang, qemu-devel, qemu-stable
14.03.2024 19:58, Kevin Wolf wrote:
> When draining an NBD export, nbd_drained_begin() first sets
> client->quiescing so that nbd_client_receive_next_request() won't start
> any new request coroutines. Then nbd_drained_poll() tries to makes sure
> that we wait for any existing request coroutines by checking that
> client->nb_requests has become 0.
>
> However, there is a small window between creating a new request
> coroutine and increasing client->nb_requests. If a coroutine is in this
> state, it won't be waited for and drain returns too early.
>
> In the context of switching to a different AioContext, this means that
> blk_aio_attached() will see client->recv_coroutine != NULL and fail its
> assertion.
>
> Fix this by increasing client->nb_requests immediately when starting the
> coroutine. Doing this after the checks if we should create a new
> coroutine is okay because client->lock is held.
>
> Cc: qemu-stable@nongnu.org
> Fixes: fd6afc501a019682d1b8468b562355a2887087bd
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Kevin, Stefan,
This change in master, which is Cc'ed stable, touches (refines) exactly the
same areas as f816310d0c32c "nbd/server: only traverse NBDExport->clients
from main loop thread", which is not (yet?) in stable, neither 7.2 nor 8.2.
Also, 7075d235114b4 "nbd/server: introduce NBDClient->lock to protect fields"
touches one of the places too.
I can try to construct something out of the two, but I think it is better
if either of you can do that, - if this seems a good thing to do anyway.
This way it is definitely much saner than my possible attempts.
Or we can just pick f816310d0c32c and 7075d235114b4 into stable too, - when
I evaluated f816310d0c32c for stable before I thought it isn't needed there
because AioContext lock isn't removed in 8.2 yet. And I haven't thought
about 7075d235114b4 at all. All 3 applies cleanly and the result passes
check-block, but it smells a bit too much for stable.
What do you think?
Thanks,
/mjt
> diff --git a/nbd/server.c b/nbd/server.c
> index 941832f178..c3484cc1eb 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -3007,8 +3007,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
> /* Owns a reference to the NBDClient passed as opaque. */
> static coroutine_fn void nbd_trip(void *opaque)
> {
> - NBDClient *client = opaque;
> - NBDRequestData *req = NULL;
> + NBDRequestData *req = opaque;
> + NBDClient *client = req->client;
> NBDRequest request = { 0 }; /* GCC thinks it can be used uninitialized */
> int ret;
> Error *local_err = NULL;
> @@ -3037,8 +3037,6 @@ static coroutine_fn void nbd_trip(void *opaque)
> goto done;
> }
>
> - req = nbd_request_get(client);
> -
> /*
> * nbd_co_receive_request() returns -EAGAIN when nbd_drained_begin() has
> * set client->quiescing but by the time we get back nbd_drained_end() may
> @@ -3112,9 +3110,7 @@ static coroutine_fn void nbd_trip(void *opaque)
> }
>
> done:
> - if (req) {
> - nbd_request_put(req);
> - }
> + nbd_request_put(req);
>
> qemu_mutex_unlock(&client->lock);
>
> @@ -3143,10 +3139,13 @@ disconnect:
> */
> static void nbd_client_receive_next_request(NBDClient *client)
> {
> + NBDRequestData *req;
> +
> if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS &&
> !client->quiescing) {
> nbd_client_get(client);
> - client->recv_coroutine = qemu_coroutine_create(nbd_trip, client);
> + req = nbd_request_get(client);
> + client->recv_coroutine = qemu_coroutine_create(nbd_trip, req);
> aio_co_schedule(client->exp->common.ctx, client->recv_coroutine);
> }
> }
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-03-19 16:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-14 16:58 [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Kevin Wolf
2024-03-14 16:58 ` [PATCH for-9.0 1/2] nbd/server: Fix race in draining the export Kevin Wolf
2024-03-19 16:49 ` Michael Tokarev
2024-03-14 16:58 ` [PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports Kevin Wolf
2024-03-18 13:28 ` [PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).