* Race condition in qemu-iotests nbd 205?
@ 2026-06-08 14:23 Stefan Hajnoczi
2026-06-08 14:48 ` Daniel P. Berrangé
2026-06-09 12:10 ` Vladimir Sementsov-Ogievskiy
0 siblings, 2 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2026-06-08 14:23 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: qemu-devel, Eric Blake, Kevin Wolf, qemu block
Hi Vladimir,
It looks like there is a race condition in qemu-iotests 205 when the
NBD server is shutting down:
+FAIL: test_remove_during_connect_safe_hard (__main__.TestNbdServerRemove)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+ File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 149,
in test_remove_during_connect_safe_hard
+ self.assertExportNotFound('exp')
+ File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 63, in
assertExportNotFound
+ self.assert_qmp(result, 'error/desc', "Export 'exp' is not found")
+ File "/builds/qemu-project/qemu/tests/qemu-iotests/iotests.py", line
1246, in assert_qmp
+ self.assertEqual(result, value,
+AssertionError: "Export 'exp' is already shutting down" != "Export
'exp' is not found"
+- Export 'exp' is already shutting down
++ Export 'exp' is not found
+ : "error/desc" is "Export 'exp' is already shutting down", expected
"Export 'exp' is not found"
https://gitlab.com/qemu-project/qemu/-/jobs/14745043965#L328
I have not seen this CI failure before, so it might be rare and hard
to reproduce.
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Race condition in qemu-iotests nbd 205? 2026-06-08 14:23 Race condition in qemu-iotests nbd 205? Stefan Hajnoczi @ 2026-06-08 14:48 ` Daniel P. Berrangé 2026-06-09 17:46 ` Stefan Hajnoczi 2026-06-09 12:10 ` Vladimir Sementsov-Ogievskiy 1 sibling, 1 reply; 5+ messages in thread From: Daniel P. Berrangé @ 2026-06-08 14:48 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Eric Blake, Kevin Wolf, qemu block On Mon, Jun 08, 2026 at 10:23:18AM -0400, Stefan Hajnoczi wrote: > Hi Vladimir, > It looks like there is a race condition in qemu-iotests 205 when the > NBD server is shutting down: > > +FAIL: test_remove_during_connect_safe_hard (__main__.TestNbdServerRemove) > +---------------------------------------------------------------------- > +Traceback (most recent call last): > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 149, > in test_remove_during_connect_safe_hard > + self.assertExportNotFound('exp') > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 63, in > assertExportNotFound > + self.assert_qmp(result, 'error/desc', "Export 'exp' is not found") > + File "/builds/qemu-project/qemu/tests/qemu-iotests/iotests.py", line > 1246, in assert_qmp > + self.assertEqual(result, value, > +AssertionError: "Export 'exp' is already shutting down" != "Export > 'exp' is not found" > +- Export 'exp' is already shutting down > ++ Export 'exp' is not found > + : "error/desc" is "Export 'exp' is already shutting down", expected > "Export 'exp' is not found" > > https://gitlab.com/qemu-project/qemu/-/jobs/14745043965#L328 > > I have not seen this CI failure before, so it might be rare and hard > to reproduce. FYi, If you don't have time to investigate & fix it quickly, just file a gitlab bug and then annotate the test cases / clases with: @iotests.skip_flaky("https://gitlab.com/qemu-project/qemu/-/work_items/NNN") which will make the test be skipped in CI / locally, until someone has time to look at it. With regards, Daniel -- |: https://berrange.com ~~ https://hachyderm.io/@berrange :| |: https://libvirt.org ~~ https://entangle-photo.org :| |: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :| ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Race condition in qemu-iotests nbd 205? 2026-06-08 14:48 ` Daniel P. Berrangé @ 2026-06-09 17:46 ` Stefan Hajnoczi 0 siblings, 0 replies; 5+ messages in thread From: Stefan Hajnoczi @ 2026-06-09 17:46 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Eric Blake, Kevin Wolf, qemu block On Mon, Jun 8, 2026 at 10:48 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Mon, Jun 08, 2026 at 10:23:18AM -0400, Stefan Hajnoczi wrote: > > Hi Vladimir, > > It looks like there is a race condition in qemu-iotests 205 when the > > NBD server is shutting down: > > > > +FAIL: test_remove_during_connect_safe_hard (__main__.TestNbdServerRemove) > > +---------------------------------------------------------------------- > > +Traceback (most recent call last): > > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 149, > > in test_remove_during_connect_safe_hard > > + self.assertExportNotFound('exp') > > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 63, in > > assertExportNotFound > > + self.assert_qmp(result, 'error/desc', "Export 'exp' is not found") > > + File "/builds/qemu-project/qemu/tests/qemu-iotests/iotests.py", line > > 1246, in assert_qmp > > + self.assertEqual(result, value, > > +AssertionError: "Export 'exp' is already shutting down" != "Export > > 'exp' is not found" > > +- Export 'exp' is already shutting down > > ++ Export 'exp' is not found > > + : "error/desc" is "Export 'exp' is already shutting down", expected > > "Export 'exp' is not found" > > > > https://gitlab.com/qemu-project/qemu/-/jobs/14745043965#L328 > > > > I have not seen this CI failure before, so it might be rare and hard > > to reproduce. > > FYi, If you don't have time to investigate & fix it quickly, just > file a gitlab bug and then annotate the test cases / clases with: > > @iotests.skip_flaky("https://gitlab.com/qemu-project/qemu/-/work_items/NNN") > > which will make the test be skipped in CI / locally, until someone > has time to look at it. Thanks, I'll keep that in mind for other flaky tests. Stefan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Race condition in qemu-iotests nbd 205? 2026-06-08 14:23 Race condition in qemu-iotests nbd 205? Stefan Hajnoczi 2026-06-08 14:48 ` Daniel P. Berrangé @ 2026-06-09 12:10 ` Vladimir Sementsov-Ogievskiy 2026-06-09 15:32 ` Kevin Wolf 1 sibling, 1 reply; 5+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2026-06-09 12:10 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: qemu-devel, Eric Blake, Kevin Wolf, qemu block 08.06.26 17:23, Stefan Hajnoczi пишет: > Hi Vladimir, > It looks like there is a race condition in qemu-iotests 205 when the > NBD server is shutting down: > > +FAIL: test_remove_during_connect_safe_hard (__main__.TestNbdServerRemove) > +---------------------------------------------------------------------- > +Traceback (most recent call last): > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 149, > in test_remove_during_connect_safe_hard > + self.assertExportNotFound('exp') > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 63, in > assertExportNotFound > + self.assert_qmp(result, 'error/desc', "Export 'exp' is not found") > + File "/builds/qemu-project/qemu/tests/qemu-iotests/iotests.py", line > 1246, in assert_qmp > + self.assertEqual(result, value, > +AssertionError: "Export 'exp' is already shutting down" != "Export > 'exp' is not found" > +- Export 'exp' is already shutting down > ++ Export 'exp' is not found > + : "error/desc" is "Export 'exp' is already shutting down", expected > "Export 'exp' is not found" > > https://gitlab.com/qemu-project/qemu/-/jobs/14745043965#L328 > > I have not seen this CI failure before, so it might be rare and hard > to reproduce. > > Stefan Hi! Looks like a degradation in commit commit 3c3bc462adeb561f5dfdcbb84ae691c95ccef916 Author: Kevin Wolf <kwolf@redhat.com> Date: Thu Sep 24 17:27:06 2020 +0200 block/export: Add block-export-del Implement a new QMP command block-export-del and make nbd-server-remove a wrapper around it. Before that commit we search for export by nbd_export_find(), and it fails with "Export 'exp' is not found" as expected, because prior qmp_nbd_server_remove(mode=hard) does nbd_export_remove() -> nbd_export_request_shutdown() -> QTAILQ_REMOVE(&exports, exp, next); when nbd_export_find() searches for that export exactly in this "exports" list (static global in nbd/server.c) Starting with 3c3bc462adeb5 qmp_nbd_server_remove() searches for export by blk_export_find(), which searches in block_exports list in block/export/export.c. Previous qmp_nbd_server_remove(mode=hard) does qmp_block_export_del() -> blk_exp_request_shutdown() -> nbd_export_request_shutdown(), which removes the export only from "exports" list in nbd/server.c. How the export is removed from block_exports list? It is done in blk_exp_delete_bh(), scheduled by void blk_exp_unref(BlockExport *exp) { assert(exp->refcount > 0); if (--exp->refcount == 0) { /* Touch the block_exports list only in the main thread */ aio_bh_schedule_oneshot(qemu_get_aio_context(), blk_exp_delete_bh, exp); } } --- I think, all the references should be removed druing qmp nbd-server-remove(hard) call, but probably, we still have a (small) chance of doing this check in test self.assertExportNotFound('exp') and get unexpected answer _before_ scheduled removal actually done. --- blk_exp_unref() does scheduling to main thread since commit bc4ee65b8c309ed6a726e3ea1b73f7fa31b4bb95 Author: Kevin Wolf <kwolf@redhat.com> Date: Thu Sep 24 17:27:03 2020 +0200 block/export: Add blk_exp_close_all(_type) --- Not sure, how to properly fix it... Of course, better is to support old behavior, when action of nbd-server-remove(hard) was synchronous. -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Race condition in qemu-iotests nbd 205? 2026-06-09 12:10 ` Vladimir Sementsov-Ogievskiy @ 2026-06-09 15:32 ` Kevin Wolf 0 siblings, 0 replies; 5+ messages in thread From: Kevin Wolf @ 2026-06-09 15:32 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: Stefan Hajnoczi, qemu-devel, Eric Blake, qemu block Am 09.06.2026 um 14:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > 08.06.26 17:23, Stefan Hajnoczi пишет: > > Hi Vladimir, > > It looks like there is a race condition in qemu-iotests 205 when the > > NBD server is shutting down: > > > > +FAIL: test_remove_during_connect_safe_hard (__main__.TestNbdServerRemove) > > +---------------------------------------------------------------------- > > +Traceback (most recent call last): > > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 149, > > in test_remove_during_connect_safe_hard > > + self.assertExportNotFound('exp') > > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 63, in > > assertExportNotFound > > + self.assert_qmp(result, 'error/desc', "Export 'exp' is not found") > > + File "/builds/qemu-project/qemu/tests/qemu-iotests/iotests.py", line > > 1246, in assert_qmp > > + self.assertEqual(result, value, > > +AssertionError: "Export 'exp' is already shutting down" != "Export > > 'exp' is not found" > > +- Export 'exp' is already shutting down > > ++ Export 'exp' is not found > > + : "error/desc" is "Export 'exp' is already shutting down", expected > > "Export 'exp' is not found" > > > > https://gitlab.com/qemu-project/qemu/-/jobs/14745043965#L328 > > > > I have not seen this CI failure before, so it might be rare and hard > > to reproduce. > > > > Stefan > > Hi! Looks like a degradation in commit > > > commit 3c3bc462adeb561f5dfdcbb84ae691c95ccef916 > Author: Kevin Wolf <kwolf@redhat.com> > Date: Thu Sep 24 17:27:06 2020 +0200 > > block/export: Add block-export-del > > Implement a new QMP command block-export-del and make nbd-server-remove > a wrapper around it. > > > Before that commit we search for export by nbd_export_find(), and it fails > with "Export 'exp' is not found" as expected, because prior > qmp_nbd_server_remove(mode=hard) does > > nbd_export_remove() -> nbd_export_request_shutdown() -> QTAILQ_REMOVE(&exports, exp, next); > > when nbd_export_find() searches for that export exactly in this "exports" list > (static global in nbd/server.c) > > > Starting with 3c3bc462adeb5 > > qmp_nbd_server_remove() searches for export by blk_export_find(), which searches in > block_exports list in block/export/export.c. > > Previous qmp_nbd_server_remove(mode=hard) does > > qmp_block_export_del() -> blk_exp_request_shutdown() -> nbd_export_request_shutdown(), > > which removes the export only from "exports" list in nbd/server.c. > > > How the export is removed from block_exports list? It is done in blk_exp_delete_bh(), > scheduled by > > void blk_exp_unref(BlockExport *exp) > { > assert(exp->refcount > 0); > if (--exp->refcount == 0) { > /* Touch the block_exports list only in the main thread */ > aio_bh_schedule_oneshot(qemu_get_aio_context(), blk_exp_delete_bh, > exp); > } > } > > --- > > I think, all the references should be removed druing qmp nbd-server-remove(hard) call, > but probably, we still have a (small) chance of doing this check in test > > self.assertExportNotFound('exp') > > and get unexpected answer _before_ scheduled removal actually done. > > --- > > blk_exp_unref() does scheduling to main thread since > > commit bc4ee65b8c309ed6a726e3ea1b73f7fa31b4bb95 > Author: Kevin Wolf <kwolf@redhat.com> > Date: Thu Sep 24 17:27:03 2020 +0200 > > block/export: Add blk_exp_close_all(_type) > > --- > > Not sure, how to properly fix it... Of course, better is to support old behavior, > when action of nbd-server-remove(hard) was synchronous. I guess you can always add one more AIO_WAIT_WHILE(), this time to nbd-server-remove, but blocking QMP commands aren't really nice, so I'd rather avoid that. In the case of nbd-server-remove, if you analysis is correct, the current behaviour has been there for almost six years. So my gut feeling is that it's better to fix the test case to wait for the BLOCK_EXPORT_DELETED event. And then I saw that nbd-server-add and nbd-server-remove have actually been deprecated since QEMU 5.2, so maybe the best way forward is really to just delete them? That will still require changing the test case, of course. Kevin ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-09 17:46 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-08 14:23 Race condition in qemu-iotests nbd 205? Stefan Hajnoczi 2026-06-08 14:48 ` Daniel P. Berrangé 2026-06-09 17:46 ` Stefan Hajnoczi 2026-06-09 12:10 ` Vladimir Sementsov-Ogievskiy 2026-06-09 15:32 ` Kevin Wolf
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.