All of lore.kernel.org
 help / color / mirror / Atom feed
* Race condition in qemu-iotests nbd 205?
@ 2026-06-08 14:23 Stefan Hajnoczi
  2026-06-08 14:48 ` Daniel P. Berrangé
  2026-06-09 12:10 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2026-06-08 14:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Eric Blake, Kevin Wolf, qemu block

Hi Vladimir,
It looks like there is a race condition in qemu-iotests 205 when the
NBD server is shutting down:

+FAIL: test_remove_during_connect_safe_hard (__main__.TestNbdServerRemove)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+ File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 149,
in test_remove_during_connect_safe_hard
+ self.assertExportNotFound('exp')
+ File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 63, in
assertExportNotFound
+ self.assert_qmp(result, 'error/desc', "Export 'exp' is not found")
+ File "/builds/qemu-project/qemu/tests/qemu-iotests/iotests.py", line
1246, in assert_qmp
+ self.assertEqual(result, value,
+AssertionError: "Export 'exp' is already shutting down" != "Export
'exp' is not found"
+- Export 'exp' is already shutting down
++ Export 'exp' is not found
+ : "error/desc" is "Export 'exp' is already shutting down", expected
"Export 'exp' is not found"

https://gitlab.com/qemu-project/qemu/-/jobs/14745043965#L328

I have not seen this CI failure before, so it might be rare and hard
to reproduce.

Stefan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Race condition in qemu-iotests nbd 205?
  2026-06-08 14:23 Race condition in qemu-iotests nbd 205? Stefan Hajnoczi
@ 2026-06-08 14:48 ` Daniel P. Berrangé
  2026-06-09 17:46   ` Stefan Hajnoczi
  2026-06-09 12:10 ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 5+ messages in thread
From: Daniel P. Berrangé @ 2026-06-08 14:48 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Eric Blake, Kevin Wolf,
	qemu block

On Mon, Jun 08, 2026 at 10:23:18AM -0400, Stefan Hajnoczi wrote:
> Hi Vladimir,
> It looks like there is a race condition in qemu-iotests 205 when the
> NBD server is shutting down:
> 
> +FAIL: test_remove_during_connect_safe_hard (__main__.TestNbdServerRemove)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 149,
> in test_remove_during_connect_safe_hard
> + self.assertExportNotFound('exp')
> + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 63, in
> assertExportNotFound
> + self.assert_qmp(result, 'error/desc', "Export 'exp' is not found")
> + File "/builds/qemu-project/qemu/tests/qemu-iotests/iotests.py", line
> 1246, in assert_qmp
> + self.assertEqual(result, value,
> +AssertionError: "Export 'exp' is already shutting down" != "Export
> 'exp' is not found"
> +- Export 'exp' is already shutting down
> ++ Export 'exp' is not found
> + : "error/desc" is "Export 'exp' is already shutting down", expected
> "Export 'exp' is not found"
> 
> https://gitlab.com/qemu-project/qemu/-/jobs/14745043965#L328
> 
> I have not seen this CI failure before, so it might be rare and hard
> to reproduce.

FYi, If you don't have time to investigate & fix it quickly, just
file a gitlab bug and then annotate the test cases / clases with:

 @iotests.skip_flaky("https://gitlab.com/qemu-project/qemu/-/work_items/NNN")

which will make the test be skipped in CI / locally, until someone
has time to look at it.

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Race condition in qemu-iotests nbd 205?
  2026-06-08 14:23 Race condition in qemu-iotests nbd 205? Stefan Hajnoczi
  2026-06-08 14:48 ` Daniel P. Berrangé
@ 2026-06-09 12:10 ` Vladimir Sementsov-Ogievskiy
  2026-06-09 15:32   ` Kevin Wolf
  1 sibling, 1 reply; 5+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2026-06-09 12:10 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, Eric Blake, Kevin Wolf, qemu block

08.06.26 17:23, Stefan Hajnoczi пишет:
> Hi Vladimir,
> It looks like there is a race condition in qemu-iotests 205 when the
> NBD server is shutting down:
> 
> +FAIL: test_remove_during_connect_safe_hard (__main__.TestNbdServerRemove)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 149,
> in test_remove_during_connect_safe_hard
> + self.assertExportNotFound('exp')
> + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 63, in
> assertExportNotFound
> + self.assert_qmp(result, 'error/desc', "Export 'exp' is not found")
> + File "/builds/qemu-project/qemu/tests/qemu-iotests/iotests.py", line
> 1246, in assert_qmp
> + self.assertEqual(result, value,
> +AssertionError: "Export 'exp' is already shutting down" != "Export
> 'exp' is not found"
> +- Export 'exp' is already shutting down
> ++ Export 'exp' is not found
> + : "error/desc" is "Export 'exp' is already shutting down", expected
> "Export 'exp' is not found"
> 
> https://gitlab.com/qemu-project/qemu/-/jobs/14745043965#L328
> 
> I have not seen this CI failure before, so it might be rare and hard
> to reproduce.
> 
> Stefan

Hi! Looks like a degradation in commit


     commit 3c3bc462adeb561f5dfdcbb84ae691c95ccef916
     Author: Kevin Wolf <kwolf@redhat.com>
     Date:   Thu Sep 24 17:27:06 2020 +0200

         block/export: Add block-export-del

         Implement a new QMP command block-export-del and make nbd-server-remove
         a wrapper around it.


Before that commit we search for export by nbd_export_find(), and it fails
with "Export 'exp' is not found" as expected, because prior
qmp_nbd_server_remove(mode=hard) does

   nbd_export_remove() -> nbd_export_request_shutdown() -> QTAILQ_REMOVE(&exports, exp, next);

when nbd_export_find() searches for that export exactly in this "exports" list
(static global in nbd/server.c)


Starting with 3c3bc462adeb5

qmp_nbd_server_remove() searches for export by blk_export_find(), which searches in
block_exports list in block/export/export.c.

Previous qmp_nbd_server_remove(mode=hard) does

   qmp_block_export_del() -> blk_exp_request_shutdown() -> nbd_export_request_shutdown(),

which removes the export only from "exports" list in nbd/server.c.


How the export is removed from block_exports list? It is done in blk_exp_delete_bh(),
scheduled by

void blk_exp_unref(BlockExport *exp)
{
     assert(exp->refcount > 0);
     if (--exp->refcount == 0) {
         /* Touch the block_exports list only in the main thread */
         aio_bh_schedule_oneshot(qemu_get_aio_context(), blk_exp_delete_bh,
                                 exp);
     }
}

---

I think, all the references should be removed druing qmp nbd-server-remove(hard) call,
but probably, we still have a (small) chance of doing this check in test

    self.assertExportNotFound('exp')

and get unexpected answer _before_ scheduled removal actually done.

---

blk_exp_unref() does scheduling to main thread since

     commit bc4ee65b8c309ed6a726e3ea1b73f7fa31b4bb95
     Author: Kevin Wolf <kwolf@redhat.com>
     Date:   Thu Sep 24 17:27:03 2020 +0200

         block/export: Add blk_exp_close_all(_type)

---

Not sure, how to properly fix it... Of course, better is to support old behavior,
when action of nbd-server-remove(hard) was synchronous.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Race condition in qemu-iotests nbd 205?
  2026-06-09 12:10 ` Vladimir Sementsov-Ogievskiy
@ 2026-06-09 15:32   ` Kevin Wolf
  0 siblings, 0 replies; 5+ messages in thread
From: Kevin Wolf @ 2026-06-09 15:32 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Stefan Hajnoczi, qemu-devel, Eric Blake, qemu block

Am 09.06.2026 um 14:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 08.06.26 17:23, Stefan Hajnoczi пишет:
> > Hi Vladimir,
> > It looks like there is a race condition in qemu-iotests 205 when the
> > NBD server is shutting down:
> > 
> > +FAIL: test_remove_during_connect_safe_hard (__main__.TestNbdServerRemove)
> > +----------------------------------------------------------------------
> > +Traceback (most recent call last):
> > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 149,
> > in test_remove_during_connect_safe_hard
> > + self.assertExportNotFound('exp')
> > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 63, in
> > assertExportNotFound
> > + self.assert_qmp(result, 'error/desc', "Export 'exp' is not found")
> > + File "/builds/qemu-project/qemu/tests/qemu-iotests/iotests.py", line
> > 1246, in assert_qmp
> > + self.assertEqual(result, value,
> > +AssertionError: "Export 'exp' is already shutting down" != "Export
> > 'exp' is not found"
> > +- Export 'exp' is already shutting down
> > ++ Export 'exp' is not found
> > + : "error/desc" is "Export 'exp' is already shutting down", expected
> > "Export 'exp' is not found"
> > 
> > https://gitlab.com/qemu-project/qemu/-/jobs/14745043965#L328
> > 
> > I have not seen this CI failure before, so it might be rare and hard
> > to reproduce.
> > 
> > Stefan
> 
> Hi! Looks like a degradation in commit
> 
> 
>     commit 3c3bc462adeb561f5dfdcbb84ae691c95ccef916
>     Author: Kevin Wolf <kwolf@redhat.com>
>     Date:   Thu Sep 24 17:27:06 2020 +0200
> 
>         block/export: Add block-export-del
> 
>         Implement a new QMP command block-export-del and make nbd-server-remove
>         a wrapper around it.
> 
> 
> Before that commit we search for export by nbd_export_find(), and it fails
> with "Export 'exp' is not found" as expected, because prior
> qmp_nbd_server_remove(mode=hard) does
> 
>   nbd_export_remove() -> nbd_export_request_shutdown() -> QTAILQ_REMOVE(&exports, exp, next);
> 
> when nbd_export_find() searches for that export exactly in this "exports" list
> (static global in nbd/server.c)
> 
> 
> Starting with 3c3bc462adeb5
> 
> qmp_nbd_server_remove() searches for export by blk_export_find(), which searches in
> block_exports list in block/export/export.c.
> 
> Previous qmp_nbd_server_remove(mode=hard) does
> 
>   qmp_block_export_del() -> blk_exp_request_shutdown() -> nbd_export_request_shutdown(),
> 
> which removes the export only from "exports" list in nbd/server.c.
> 
> 
> How the export is removed from block_exports list? It is done in blk_exp_delete_bh(),
> scheduled by
> 
> void blk_exp_unref(BlockExport *exp)
> {
>     assert(exp->refcount > 0);
>     if (--exp->refcount == 0) {
>         /* Touch the block_exports list only in the main thread */
>         aio_bh_schedule_oneshot(qemu_get_aio_context(), blk_exp_delete_bh,
>                                 exp);
>     }
> }
> 
> ---
> 
> I think, all the references should be removed druing qmp nbd-server-remove(hard) call,
> but probably, we still have a (small) chance of doing this check in test
> 
>    self.assertExportNotFound('exp')
> 
> and get unexpected answer _before_ scheduled removal actually done.
> 
> ---
> 
> blk_exp_unref() does scheduling to main thread since
> 
>     commit bc4ee65b8c309ed6a726e3ea1b73f7fa31b4bb95
>     Author: Kevin Wolf <kwolf@redhat.com>
>     Date:   Thu Sep 24 17:27:03 2020 +0200
> 
>         block/export: Add blk_exp_close_all(_type)
> 
> ---
> 
> Not sure, how to properly fix it... Of course, better is to support old behavior,
> when action of nbd-server-remove(hard) was synchronous.

I guess you can always add one more AIO_WAIT_WHILE(), this time to
nbd-server-remove, but blocking QMP commands aren't really nice, so I'd
rather avoid that. In the case of nbd-server-remove, if you analysis is
correct, the current behaviour has been there for almost six years.

So my gut feeling is that it's better to fix the test case to wait for
the BLOCK_EXPORT_DELETED event.

And then I saw that nbd-server-add and nbd-server-remove have actually
been deprecated since QEMU 5.2, so maybe the best way forward is really
to just delete them? That will still require changing the test case, of
course.

Kevin



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Race condition in qemu-iotests nbd 205?
  2026-06-08 14:48 ` Daniel P. Berrangé
@ 2026-06-09 17:46   ` Stefan Hajnoczi
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2026-06-09 17:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Eric Blake, Kevin Wolf,
	qemu block

On Mon, Jun 8, 2026 at 10:48 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Mon, Jun 08, 2026 at 10:23:18AM -0400, Stefan Hajnoczi wrote:
> > Hi Vladimir,
> > It looks like there is a race condition in qemu-iotests 205 when the
> > NBD server is shutting down:
> >
> > +FAIL: test_remove_during_connect_safe_hard (__main__.TestNbdServerRemove)
> > +----------------------------------------------------------------------
> > +Traceback (most recent call last):
> > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 149,
> > in test_remove_during_connect_safe_hard
> > + self.assertExportNotFound('exp')
> > + File "/builds/qemu-project/qemu/tests/qemu-iotests/205", line 63, in
> > assertExportNotFound
> > + self.assert_qmp(result, 'error/desc', "Export 'exp' is not found")
> > + File "/builds/qemu-project/qemu/tests/qemu-iotests/iotests.py", line
> > 1246, in assert_qmp
> > + self.assertEqual(result, value,
> > +AssertionError: "Export 'exp' is already shutting down" != "Export
> > 'exp' is not found"
> > +- Export 'exp' is already shutting down
> > ++ Export 'exp' is not found
> > + : "error/desc" is "Export 'exp' is already shutting down", expected
> > "Export 'exp' is not found"
> >
> > https://gitlab.com/qemu-project/qemu/-/jobs/14745043965#L328
> >
> > I have not seen this CI failure before, so it might be rare and hard
> > to reproduce.
>
> FYi, If you don't have time to investigate & fix it quickly, just
> file a gitlab bug and then annotate the test cases / clases with:
>
>  @iotests.skip_flaky("https://gitlab.com/qemu-project/qemu/-/work_items/NNN")
>
> which will make the test be skipped in CI / locally, until someone
> has time to look at it.

Thanks, I'll keep that in mind for other flaky tests.

Stefan


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-09 17:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 14:23 Race condition in qemu-iotests nbd 205? Stefan Hajnoczi
2026-06-08 14:48 ` Daniel P. Berrangé
2026-06-09 17:46   ` Stefan Hajnoczi
2026-06-09 12:10 ` Vladimir Sementsov-Ogievskiy
2026-06-09 15:32   ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.