[PATCH RFC 0/5] Fix accidental crash in iotest 30

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC 0/5] Fix accidental crash in iotest 30
@ 2020-11-20 16:16 Vladimir Sementsov-Ogievskiy
  2020-11-20 16:16 ` [PATCH 1/5] abort-on-set-to-true Vladimir Sementsov-Ogievskiy
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-20 16:16 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, jsnow, mreitz, kwolf, philmd, peter.maydell, berto,
	stefanha, pbonzini, vsementsov, den, eblake

Hi all!

As Peter recently noted, iotest 30 accidentally fails.

I found that Qemu crashes due to interleaving of graph-update operations of parallel mirror and stream block-jobs.

So, here is a "workaround" to discuss.

It's of course not the full solution, as if we decide to go this way we should protect by the mutex all graph-modifying operations, not only here. And move everything into coroutine..

So, I send this mostly as a starting point for discussion, may be someone imagine better solution.

Main patches are 04-05. 01-02 only simplify debugging and 03 is
preparation for 04.

Original qemu crash looks like this:

#0  0x00007f7029b23e35 in raise () at /lib64/libc.so.6
#1  0x00007f7029b0e895 in abort () at /lib64/libc.so.6
#2  0x00007f7029b0e769 in _nl_load_domain.cold () at /lib64/libc.so.6
#3  0x00007f7029b1c566 in annobin_assert.c_end () at /lib64/libc.so.6
#4  0x0000558f3d92f15a in bdrv_replace_child (child=0x558f3fa7c400, new_bs=0x0) at ../block.c:2648
#5  0x0000558f3d92f6e1 in bdrv_detach_child (child=0x558f3fa7c400) at ../block.c:2777
#6  0x0000558f3d92f723 in bdrv_root_unref_child (child=0x558f3fa7c400) at ../block.c:2789
#7  0x0000558f3d897f4c in block_job_remove_all_bdrv (job=0x558f3f626940) at ../blockjob.c:191
#8  0x0000558f3d897c73 in block_job_free (job=0x558f3f626940) at ../blockjob.c:88
#9  0x0000558f3d891456 in job_unref (job=0x558f3f626940) at ../job.c:380
#10 0x0000558f3d892602 in job_exit (opaque=0x558f3f626940) at ../job.c:894
#11 0x0000558f3d9ce2fb in aio_bh_call (bh=0x558f3f5dc480) at ../util/async.c:136
#12 0x0000558f3d9ce405 in aio_bh_poll (ctx=0x558f3e80c5f0) at ../util/async.c:164
#13 0x0000558f3d9f75ea in aio_dispatch (ctx=0x558f3e80c5f0) at ../util/aio-posix.c:381
#14 0x0000558f3d9ce836 in aio_ctx_dispatch (source=0x558f3e80c5f0, callback=0x0, user_data=0x0)
    at ../util/async.c:306
#15 0x00007f702ae75ecd in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#16 0x0000558f3da09e33 in glib_pollfds_poll () at ../util/main-loop.c:221
#17 0x0000558f3da09ead in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:244
#18 0x0000558f3da09fb5 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:520
#19 0x0000558f3d7836b7 in qemu_main_loop () at ../softmmu/vl.c:1678
#20 0x0000558f3d317316 in main (argc=20, argv=0x7fffa94d35a8, envp=0x7fffa94d3650)
    at ../softmmu/main.c:50
(gdb) fr 4
#4  0x0000558f3d92f15a in bdrv_replace_child (child=0x558f3fa7c400, new_bs=0x0) at ../block.c:2648
2648            assert(tighten_restrictions == false);
(gdb) list
2643            int ret;
2644
2645            bdrv_get_cumulative_perm(old_bs, &perm, &shared_perm);
2646            ret = bdrv_check_perm(old_bs, NULL, perm, shared_perm, NULL,
2647                                  &tighten_restrictions, NULL);
2648            assert(tighten_restrictions == false);
2649            if (ret < 0) {
2650                /* We only tried to loosen restrictions, so errors are not fatal */
2651                bdrv_abort_perm_update(old_bs);
2652            } else {


And my exploration shows that this due to permission-graph already broken before this permission update. So we tighten restrictions not because removing the child but because we recalculate broken permissions graph and it becomes correct (and more strict unfortunately).


Also, please look through my explorations on this topic in threads:

"iotest 030 still occasionally intermittently failing"
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg04018.html

"question about bdrv_replace_node"
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg04478.html

Vladimir Sementsov-Ogievskiy (5):
  abort-on-set-to-true
  iotest-30-shorten: concentrate on failing test case
  scripts/block-coroutine-wrapper.py: allow more function types
  block: move some mirror and stream handlers to coroutine
  block: protect some graph-modifyng things by mutex

 block/coroutines.h                 | 11 +++++++
 include/block/block.h              |  2 ++
 block.c                            | 36 +++++++++++++++------
 block/mirror.c                     |  9 ++++--
 block/stream.c                     |  9 ++++--
 scripts/block-coroutine-wrapper.py | 36 +++++++++++++--------
 tests/qemu-iotests/030             | 52 +++++++++++++++---------------
 tests/qemu-iotests/030.out         |  4 +--
 8 files changed, 105 insertions(+), 54 deletions(-)

-- 
2.21.3



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] abort-on-set-to-true
  2020-11-20 16:16 [PATCH RFC 0/5] Fix accidental crash in iotest 30 Vladimir Sementsov-Ogievskiy
@ 2020-11-20 16:16 ` Vladimir Sementsov-Ogievskiy
  2020-11-20 16:16 ` [PATCH 2/5] iotest-30-shorten: concentrate on failing test case Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-20 16:16 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, jsnow, mreitz, kwolf, philmd, peter.maydell, berto,
	stefanha, pbonzini, vsementsov, den, eblake

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c | 34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index f1cedac362..5e8dd98cec 100644
--- a/block.c
+++ b/block.c
@@ -84,6 +84,8 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
+bool abort_on_set_to_true = false;
+
 #ifdef _WIN32
 static int is_windows_drive_prefix(const char *filename)
 {
@@ -2002,6 +2004,9 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
         added_perms = cumulative_perms & ~current_perms;
         removed_shared_perms = current_shared & ~cumulative_shared_perms;
 
+        if ((added_perms || removed_shared_perms) && tighten_restrictions == &abort_on_set_to_true) {
+            abort();
+        }
         *tighten_restrictions = added_perms || removed_shared_perms;
     }
 
@@ -2066,12 +2071,17 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
         bdrv_child_perm(bs, c->bs, c, c->role, q,
                         cumulative_perms, cumulative_shared_perms,
                         &cur_perm, &cur_shared);
-        ret = bdrv_child_check_perm(c, q, cur_perm, cur_shared, ignore_children,
-                                    tighten_restrictions ? &child_tighten_restr
-                                                         : NULL,
-                                    errp);
-        if (tighten_restrictions) {
-            *tighten_restrictions |= child_tighten_restr;
+        if (tighten_restrictions == &abort_on_set_to_true) {
+            ret = bdrv_child_check_perm(c, q, cur_perm, cur_shared, ignore_children,
+                                        &abort_on_set_to_true, errp);
+        } else {
+            ret = bdrv_child_check_perm(c, q, cur_perm, cur_shared, ignore_children,
+                                        tighten_restrictions ? &child_tighten_restr
+                                                             : NULL,
+                                        errp);
+            if (tighten_restrictions) {
+                *tighten_restrictions |= child_tighten_restr;
+            }
         }
         if (ret < 0) {
             return ret;
@@ -2227,6 +2237,9 @@ static int bdrv_check_update_perm(BlockDriverState *bs, BlockReopenQueue *q,
             char *perm_names = bdrv_perm_names(new_used_perm & ~c->shared_perm);
 
             if (tighten_restrictions) {
+                if (tighten_restrictions == &abort_on_set_to_true) {
+                    abort();
+                }
                 *tighten_restrictions = true;
             }
 
@@ -2243,6 +2256,9 @@ static int bdrv_check_update_perm(BlockDriverState *bs, BlockReopenQueue *q,
             char *perm_names = bdrv_perm_names(c->perm & ~new_shared_perm);
 
             if (tighten_restrictions) {
+                if (tighten_restrictions == &abort_on_set_to_true) {
+                    abort();
+                }
                 *tighten_restrictions = true;
             }
 
@@ -2639,13 +2655,13 @@ static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs)
         /* Update permissions for old node. This is guaranteed to succeed
          * because we're just taking a parent away, so we're loosening
          * restrictions. */
-        bool tighten_restrictions;
         int ret;
 
+        assert(abort_on_set_to_true == false);
         bdrv_get_cumulative_perm(old_bs, &perm, &shared_perm);
         ret = bdrv_check_perm(old_bs, NULL, perm, shared_perm, NULL,
-                              &tighten_restrictions, NULL);
-        assert(tighten_restrictions == false);
+                              &abort_on_set_to_true, NULL);
+        assert(abort_on_set_to_true == false);
         if (ret < 0) {
             /* We only tried to loosen restrictions, so errors are not fatal */
             bdrv_abort_perm_update(old_bs);
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] iotest-30-shorten: concentrate on failing test case
  2020-11-20 16:16 [PATCH RFC 0/5] Fix accidental crash in iotest 30 Vladimir Sementsov-Ogievskiy
  2020-11-20 16:16 ` [PATCH 1/5] abort-on-set-to-true Vladimir Sementsov-Ogievskiy
@ 2020-11-20 16:16 ` Vladimir Sementsov-Ogievskiy
  2020-11-20 16:16 ` [PATCH 3/5] scripts/block-coroutine-wrapper.py: allow more function types Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-20 16:16 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, jsnow, mreitz, kwolf, philmd, peter.maydell, berto,
	stefanha, pbonzini, vsementsov, den, eblake

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/030     | 52 +++++++++++++++++++-------------------
 tests/qemu-iotests/030.out |  4 +--
 2 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index dcb4b5d6a6..9e92ec3dd7 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -52,7 +52,7 @@ class TestSingleDrive(iotests.QMPTestCase):
         os.remove(mid_img)
         os.remove(backing_img)
 
-    def test_stream(self):
+    def ntest_stream(self):
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('block-stream', device='drive0')
@@ -67,7 +67,7 @@ class TestSingleDrive(iotests.QMPTestCase):
                          qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
                          'image file map does not match backing file after streaming')
 
-    def test_stream_intermediate(self):
+    def ntest_stream_intermediate(self):
         self.assert_no_active_block_jobs()
 
         self.assertNotEqual(qemu_io('-f', 'raw', '-rU', '-c', 'map', backing_img),
@@ -86,7 +86,7 @@ class TestSingleDrive(iotests.QMPTestCase):
                          qemu_io('-f', iotests.imgfmt, '-c', 'map', mid_img),
                          'image file map does not match backing file after streaming')
 
-    def test_stream_pause(self):
+    def ntest_stream_pause(self):
         self.assert_no_active_block_jobs()
 
         self.vm.pause_drive('drive0')
@@ -116,7 +116,7 @@ class TestSingleDrive(iotests.QMPTestCase):
                          qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
                          'image file map does not match backing file after streaming')
 
-    def test_stream_no_op(self):
+    def ntest_stream_no_op(self):
         self.assert_no_active_block_jobs()
 
         # The image map is empty before the operation
@@ -134,7 +134,7 @@ class TestSingleDrive(iotests.QMPTestCase):
         self.assertEqual(qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
                          empty_map, 'image file map changed after a no-op')
 
-    def test_stream_partial(self):
+    def ntest_stream_partial(self):
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('block-stream', device='drive0', base=backing_img)
@@ -149,16 +149,16 @@ class TestSingleDrive(iotests.QMPTestCase):
                          qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
                          'image file map does not match backing file after streaming')
 
-    def test_device_not_found(self):
+    def ntest_device_not_found(self):
         result = self.vm.qmp('block-stream', device='nonexistent')
         self.assert_qmp(result, 'error/desc',
             'Cannot find device=nonexistent nor node_name=nonexistent')
 
-    def test_job_id_missing(self):
+    def ntest_job_id_missing(self):
         result = self.vm.qmp('block-stream', device='mid')
         self.assert_qmp(result, 'error/desc', "Invalid job ID ''")
 
-    def test_read_only(self):
+    def ntest_read_only(self):
         # Create a new file that we can attach (we need a read-only top)
         with iotests.FilePath('ro-top.img') as ro_top_path:
             qemu_img('create', '-f', iotests.imgfmt, ro_top_path,
@@ -230,7 +230,7 @@ class TestParallelOps(iotests.QMPTestCase):
     # Test that it's possible to run several block-stream operations
     # in parallel in the same snapshot chain
     @unittest.skipIf(os.environ.get('QEMU_CHECK_BLOCK_AUTO'), 'disabled in CI')
-    def test_stream_parallel(self):
+    def ntest_stream_parallel(self):
         self.assert_no_active_block_jobs()
 
         # Check that the maps don't match before the streaming operations
@@ -272,7 +272,7 @@ class TestParallelOps(iotests.QMPTestCase):
 
     # Test that it's not possible to perform two block-stream
     # operations if there are nodes involved in both.
-    def test_overlapping_1(self):
+    def ntest_overlapping_1(self):
         self.assert_no_active_block_jobs()
 
         # Set a speed limit to make sure that this job blocks the rest
@@ -313,7 +313,7 @@ class TestParallelOps(iotests.QMPTestCase):
 
     # Similar to test_overlapping_1, but with block-commit
     # blocking the other jobs
-    def test_overlapping_2(self):
+    def ntest_overlapping_2(self):
         self.assertLessEqual(9, self.num_imgs)
         self.assert_no_active_block_jobs()
 
@@ -349,7 +349,7 @@ class TestParallelOps(iotests.QMPTestCase):
 
     # Similar to test_overlapping_2, but here block-commit doesn't use the 'top' parameter.
     # Internally this uses a mirror block job, hence the separate test case.
-    def test_overlapping_3(self):
+    def ntest_overlapping_3(self):
         self.assertLessEqual(8, self.num_imgs)
         self.assert_no_active_block_jobs()
 
@@ -377,7 +377,7 @@ class TestParallelOps(iotests.QMPTestCase):
     # In this case the base node of the stream job is the same as the
     # top node of commit job. Since this results in the commit filter
     # node being part of the stream chain, this is not allowed.
-    def test_overlapping_4(self):
+    def ntest_overlapping_4(self):
         self.assert_no_active_block_jobs()
 
         # Commit from node2 into node0
@@ -401,7 +401,7 @@ class TestParallelOps(iotests.QMPTestCase):
     # filter node.  stream does not have a real dependency on its base
     # node, so even though commit removes it when it is done, there is
     # no conflict.
-    def test_overlapping_5(self):
+    def ntest_overlapping_5(self):
         self.assert_no_active_block_jobs()
 
         # Commit from node2 into node0
@@ -457,7 +457,7 @@ class TestParallelOps(iotests.QMPTestCase):
 
     # This is similar to test_stream_commit_1 but both jobs are slowed
     # down so they can run in parallel for a little while.
-    def test_stream_commit_2(self):
+    def ntest_stream_commit_2(self):
         self.assertLessEqual(8, self.num_imgs)
         self.assert_no_active_block_jobs()
 
@@ -492,7 +492,7 @@ class TestParallelOps(iotests.QMPTestCase):
         self.assert_no_active_block_jobs()
 
     # Test the base_node parameter
-    def test_stream_base_node_name(self):
+    def ntest_stream_base_node_name(self):
         self.assert_no_active_block_jobs()
 
         self.assertNotEqual(qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', self.imgs[4]),
@@ -568,7 +568,7 @@ class TestQuorum(iotests.QMPTestCase):
         for img in self.backing:
             os.remove(img)
 
-    def test_stream_quorum(self):
+    def ntest_stream_quorum(self):
         self.assertNotEqual(qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', self.children[0]),
                             qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', self.backing[0]),
                             'image file map matches backing file before streaming')
@@ -601,7 +601,7 @@ class TestSmallerBackingFile(iotests.QMPTestCase):
 
     # If this hangs, then you are missing a fix to complete streaming when the
     # end of the backing file is reached.
-    def test_stream(self):
+    def ntest_stream(self):
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('block-stream', device='drive0')
@@ -659,7 +659,7 @@ class TestEIO(TestErrors):
         os.remove(backing_img)
         os.remove(self.blkdebug_file)
 
-    def test_report(self):
+    def ntest_report(self):
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('block-stream', device='drive0')
@@ -687,7 +687,7 @@ class TestEIO(TestErrors):
         self.assert_no_active_block_jobs()
         self.vm.shutdown()
 
-    def test_ignore(self):
+    def ntest_ignore(self):
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('block-stream', device='drive0', on_error='ignore')
@@ -720,7 +720,7 @@ class TestEIO(TestErrors):
         self.assert_no_active_block_jobs()
         self.vm.shutdown()
 
-    def test_stop(self):
+    def ntest_stop(self):
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('block-stream', device='drive0', on_error='stop')
@@ -763,7 +763,7 @@ class TestEIO(TestErrors):
         self.assert_no_active_block_jobs()
         self.vm.shutdown()
 
-    def test_enospc(self):
+    def ntest_enospc(self):
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('block-stream', device='drive0', on_error='enospc')
@@ -809,7 +809,7 @@ class TestENOSPC(TestErrors):
         os.remove(backing_img)
         os.remove(self.blkdebug_file)
 
-    def test_enospc(self):
+    def ntest_enospc(self):
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('block-stream', device='drive0', on_error='enospc')
@@ -870,7 +870,7 @@ class TestStreamStop(iotests.QMPTestCase):
         os.remove(test_img)
         os.remove(backing_img)
 
-    def test_stream_stop(self):
+    def ntest_stream_stop(self):
         self.assert_no_active_block_jobs()
 
         self.vm.pause_drive('drive0')
@@ -918,7 +918,7 @@ class TestSetSpeed(iotests.QMPTestCase):
 
         self.assert_no_active_block_jobs()
 
-    def test_set_speed(self):
+    def ntest_set_speed(self):
         self.assert_no_active_block_jobs()
 
         self.vm.pause_drive('drive0')
@@ -951,7 +951,7 @@ class TestSetSpeed(iotests.QMPTestCase):
 
         self.cancel_and_wait(resume=True)
 
-    def test_set_speed_invalid(self):
+    def ntest_set_speed_invalid(self):
         self.assert_no_active_block_jobs()
 
         result = self.vm.qmp('block-stream', device='drive0', speed=-1)
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 6d9bee1a4b..ae1213e6f8 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-...........................
+.
 ----------------------------------------------------------------------
-Ran 27 tests
+Ran 1 tests
 
 OK
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] scripts/block-coroutine-wrapper.py: allow more function types
  2020-11-20 16:16 [PATCH RFC 0/5] Fix accidental crash in iotest 30 Vladimir Sementsov-Ogievskiy
  2020-11-20 16:16 ` [PATCH 1/5] abort-on-set-to-true Vladimir Sementsov-Ogievskiy
  2020-11-20 16:16 ` [PATCH 2/5] iotest-30-shorten: concentrate on failing test case Vladimir Sementsov-Ogievskiy
@ 2020-11-20 16:16 ` Vladimir Sementsov-Ogievskiy
  2020-11-20 16:16 ` [PATCH 4/5] block: move some mirror and stream handlers to coroutine Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-20 16:16 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, jsnow, mreitz, kwolf, philmd, peter.maydell, berto,
	stefanha, pbonzini, vsementsov, den, eblake

Allow void functions, functions with first argument of type "Job *" and
functions not starting with "bdrv_".

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 scripts/block-coroutine-wrapper.py | 36 +++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py
index 0461fd1c45..68fad41f19 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -64,6 +64,7 @@ class ParamDecl:
 class FuncDecl:
     def __init__(self, return_type: str, name: str, args: str) -> None:
         self.return_type = return_type.strip()
+        self.void = return_type == 'void'
         self.name = name.strip()
         self.args = [ParamDecl(arg.strip()) for arg in args.split(',')]
 
@@ -73,16 +74,19 @@ class FuncDecl:
     def gen_block(self, format: str) -> str:
         return '\n'.join(format.format_map(arg.__dict__) for arg in self.args)
 
+    def if_ret(self, value: str) -> str:
+        return '' if self.void else value
+
 
 # Match wrappers declared with a generated_co_wrapper mark
-func_decl_re = re.compile(r'^int\s*generated_co_wrapper\s*'
+func_decl_re = re.compile(r'^(?P<return_type>(void|int))\s*generated_co_wrapper\s*'
                           r'(?P<wrapper_name>[a-z][a-z0-9_]*)'
                           r'\((?P<args>[^)]*)\);$', re.MULTILINE)
 
 
 def func_decl_iter(text: str) -> Iterator:
     for m in func_decl_re.finditer(text):
-        yield FuncDecl(return_type='int',
+        yield FuncDecl(return_type=m.group('return_type'),
                        name=m.group('wrapper_name'),
                        args=m.group('args'))
 
@@ -98,13 +102,19 @@ def snake_to_camel(func_name: str) -> str:
 
 
 def gen_wrapper(func: FuncDecl) -> str:
-    assert func.name.startswith('bdrv_')
-    assert not func.name.startswith('bdrv_co_')
-    assert func.return_type == 'int'
-    assert func.args[0].type in ['BlockDriverState *', 'BdrvChild *']
-
-    name = 'bdrv_co_' + func.name[5:]
-    bs = 'bs' if func.args[0].type == 'BlockDriverState *' else 'child->bs'
+    subsystem, rname = func.name.split('_', 1)
+    assert not rname.startswith('co_')
+    assert func.return_type in ('int', 'void')
+    assert func.args[0].type in ['BlockDriverState *', 'BdrvChild *', 'Job *']
+
+    name = f'{subsystem}_co_{rname}'
+    arg0_t = func.args[0].type
+    if arg0_t == 'BlockDriverState *':
+        bs = 'bs'
+    elif arg0_t == 'BdrvChild *':
+        bs = 'child->bs'
+    elif arg0_t == 'Job *':
+        bs = 'blk_bs(container_of(job, BlockJob, job)->blk)'
     struct_name = snake_to_camel(name)
 
     return f"""\
@@ -121,16 +131,16 @@ static void coroutine_fn {name}_entry(void *opaque)
 {{
     {struct_name} *s = opaque;
 
-    s->poll_state.ret = {name}({ func.gen_list('s->{name}') });
+    {func.if_ret('s->poll_state.ret = ')}{name}({ func.gen_list('s->{name}') });
     s->poll_state.in_progress = false;
 
     aio_wait_kick();
 }}
 
-int {func.name}({ func.gen_list('{decl}') })
+{func.return_type} {func.name}({ func.gen_list('{decl}') })
 {{
     if (qemu_in_coroutine()) {{
-        return {name}({ func.gen_list('{name}') });
+        {func.if_ret('return ')}{name}({ func.gen_list('{name}') });
     }} else {{
         {struct_name} s = {{
             .poll_state.bs = {bs},
@@ -141,7 +151,7 @@ int {func.name}({ func.gen_list('{decl}') })
 
         s.poll_state.co = qemu_coroutine_create({name}_entry, &s);
 
-        return bdrv_poll_co(&s.poll_state);
+        {func.if_ret('return ')}bdrv_poll_co(&s.poll_state);
     }}
 }}"""
 
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] block: move some mirror and stream handlers to coroutine
  2020-11-20 16:16 [PATCH RFC 0/5] Fix accidental crash in iotest 30 Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2020-11-20 16:16 ` [PATCH 3/5] scripts/block-coroutine-wrapper.py: allow more function types Vladimir Sementsov-Ogievskiy
@ 2020-11-20 16:16 ` Vladimir Sementsov-Ogievskiy
  2020-11-20 16:16 ` [PATCH 5/5] block: protect some graph-modifyng things by mutex Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-20 16:16 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, jsnow, mreitz, kwolf, philmd, peter.maydell, berto,
	stefanha, pbonzini, vsementsov, den, eblake

We are going to use coroutine mutex to protect intersection of these
graph modifying things. Move them to coroutine now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/coroutines.h | 11 +++++++++++
 block/mirror.c     |  5 +++--
 block/stream.c     |  5 +++--
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 4cfb4946e6..aa60747b2f 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -26,6 +26,7 @@
 #define BLOCK_COROUTINES_INT_H
 
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 
 int coroutine_fn bdrv_co_check(BlockDriverState *bs,
                                BdrvCheckResult *res, BdrvCheckMode fix);
@@ -66,4 +67,14 @@ int coroutine_fn bdrv_co_readv_vmstate(BlockDriverState *bs,
 int coroutine_fn bdrv_co_writev_vmstate(BlockDriverState *bs,
                                         QEMUIOVector *qiov, int64_t pos);
 
+void generated_co_wrapper stream_clean(Job *job);
+void coroutine_fn stream_co_clean(Job *job);
+int generated_co_wrapper stream_prepare(Job *job);
+int coroutine_fn stream_co_prepare(Job *job);
+
+void generated_co_wrapper mirror_complete(Job *job, Error **errp);
+void coroutine_fn mirror_co_complete(Job *job, Error **errp);
+int generated_co_wrapper mirror_exit_common(Job *job);
+int coroutine_fn mirror_co_exit_common(Job *job);
+
 #endif /* BLOCK_COROUTINES_INT_H */
diff --git a/block/mirror.c b/block/mirror.c
index 8e1ad6eceb..91e98b2349 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -18,6 +18,7 @@
 #include "trace.h"
 #include "block/blockjob_int.h"
 #include "block/block_int.h"
+#include "block/coroutines.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
@@ -625,7 +626,7 @@ static void coroutine_fn mirror_wait_for_all_io(MirrorBlockJob *s)
  * for .prepare, returns 0 on success and -errno on failure.
  * for .abort cases, denoted by abort = true, MUST return 0.
  */
-static int mirror_exit_common(Job *job)
+int coroutine_fn mirror_co_exit_common(Job *job)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
     BlockJob *bjob = &s->common;
@@ -1103,7 +1104,7 @@ immediate_exit:
     return ret;
 }
 
-static void mirror_complete(Job *job, Error **errp)
+void coroutine_fn mirror_co_complete(Job *job, Error **errp)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
     BlockDriverState *target;
diff --git a/block/stream.c b/block/stream.c
index 236384f2f7..8a4b88b223 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -15,6 +15,7 @@
 #include "trace.h"
 #include "block/block_int.h"
 #include "block/blockjob_int.h"
+#include "block/coroutines.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
@@ -58,7 +59,7 @@ static void stream_abort(Job *job)
     }
 }
 
-static int stream_prepare(Job *job)
+int coroutine_fn stream_co_prepare(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
@@ -90,7 +91,7 @@ static int stream_prepare(Job *job)
     return ret;
 }
 
-static void stream_clean(Job *job)
+void coroutine_fn stream_co_clean(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] block: protect some graph-modifyng things by mutex
  2020-11-20 16:16 [PATCH RFC 0/5] Fix accidental crash in iotest 30 Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2020-11-20 16:16 ` [PATCH 4/5] block: move some mirror and stream handlers to coroutine Vladimir Sementsov-Ogievskiy
@ 2020-11-20 16:16 ` Vladimir Sementsov-Ogievskiy
  2020-11-20 16:27 ` [PATCH RFC 0/5] Fix accidental crash in iotest 30 no-reply
  2020-11-20 16:36 ` Kevin Wolf
  6 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-20 16:16 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, jsnow, mreitz, kwolf, philmd, peter.maydell, berto,
	stefanha, pbonzini, vsementsov, den, eblake

Iotest 30 accidentally fails due to interleaving of mirror and stream
graph-modifying procedures. Protect these things by global co-mutex.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block.h | 2 ++
 block.c               | 2 ++
 block/mirror.c        | 4 ++++
 block/stream.c        | 4 ++++
 4 files changed, 12 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index c9d7c58765..a92756cbfc 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -20,6 +20,8 @@
  */
 #define generated_co_wrapper
 
+extern CoMutex graph_modify_mutex;
+
 /* block.c */
 typedef struct BlockDriver BlockDriver;
 typedef struct BdrvChild BdrvChild;
diff --git a/block.c b/block.c
index 5e8dd98cec..eb82b1ca1e 100644
--- a/block.c
+++ b/block.c
@@ -86,6 +86,8 @@ static int use_bdrv_whitelist;
 
 bool abort_on_set_to_true = false;
 
+CoMutex graph_modify_mutex;
+
 #ifdef _WIN32
 static int is_windows_drive_prefix(const char *filename)
 {
diff --git a/block/mirror.c b/block/mirror.c
index 91e98b2349..16c3e0b0cb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -628,6 +628,8 @@ static void coroutine_fn mirror_wait_for_all_io(MirrorBlockJob *s)
  */
 int coroutine_fn mirror_co_exit_common(Job *job)
 {
+    QEMU_LOCK_GUARD(&graph_modify_mutex);
+
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
     BlockJob *bjob = &s->common;
     MirrorBDSOpaque *bs_opaque;
@@ -1106,6 +1108,8 @@ immediate_exit:
 
 void coroutine_fn mirror_co_complete(Job *job, Error **errp)
 {
+    QEMU_LOCK_GUARD(&graph_modify_mutex);
+
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
     BlockDriverState *target;
 
diff --git a/block/stream.c b/block/stream.c
index 8a4b88b223..13eba00ce8 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -61,6 +61,8 @@ static void stream_abort(Job *job)
 
 int coroutine_fn stream_co_prepare(Job *job)
 {
+    QEMU_LOCK_GUARD(&graph_modify_mutex);
+
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
     BlockDriverState *bs = blk_bs(bjob->blk);
@@ -93,6 +95,8 @@ int coroutine_fn stream_co_prepare(Job *job)
 
 void coroutine_fn stream_co_clean(Job *job)
 {
+    QEMU_LOCK_GUARD(&graph_modify_mutex);
+
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
     BlockDriverState *bs = blk_bs(bjob->blk);
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/5] Fix accidental crash in iotest 30
  2020-11-20 16:16 [PATCH RFC 0/5] Fix accidental crash in iotest 30 Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2020-11-20 16:16 ` [PATCH 5/5] block: protect some graph-modifyng things by mutex Vladimir Sementsov-Ogievskiy
@ 2020-11-20 16:27 ` no-reply
  2020-11-20 16:35   ` Vladimir Sementsov-Ogievskiy
  2020-11-20 16:36 ` Kevin Wolf
  6 siblings, 1 reply; 16+ messages in thread
From: no-reply @ 2020-11-20 16:27 UTC (permalink / raw)
  To: vsementsov
  Cc: kwolf, peter.maydell, vsementsov, berto, qemu-block, jsnow,
	qemu-devel, mreitz, stefanha, den, pbonzini, philmd

Patchew URL: https://patchew.org/QEMU/20201120161622.1537-1-vsementsov@virtuozzo.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201120161622.1537-1-vsementsov@virtuozzo.com
Subject: [PATCH RFC 0/5] Fix accidental crash in iotest 30

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20201120161622.1537-1-vsementsov@virtuozzo.com -> patchew/20201120161622.1537-1-vsementsov@virtuozzo.com
Switched to a new branch 'test'
9aed580 block: protect some graph-modifyng things by mutex
74c4557 block: move some mirror and stream handlers to coroutine
fae70bd scripts/block-coroutine-wrapper.py: allow more function types
7bccf2c iotest-30-shorten: concentrate on failing test case
1c79f08 abort-on-set-to-true

=== OUTPUT BEGIN ===
1/5 Checking commit 1c79f086eae8 (abort-on-set-to-true)
ERROR: do not initialise globals to 0 or NULL
#18: FILE: block.c:87:
+bool abort_on_set_to_true = false;

ERROR: line over 90 characters
#27: FILE: block.c:2007:
+        if ((added_perms || removed_shared_perms) && tighten_restrictions == &abort_on_set_to_true) {

WARNING: line over 80 characters
#44: FILE: block.c:2075:
+            ret = bdrv_child_check_perm(c, q, cur_perm, cur_shared, ignore_children,

WARNING: line over 80 characters
#47: FILE: block.c:2078:
+            ret = bdrv_child_check_perm(c, q, cur_perm, cur_shared, ignore_children,

WARNING: line over 80 characters
#48: FILE: block.c:2079:
+                                        tighten_restrictions ? &child_tighten_restr

total: 2 errors, 3 warnings, 74 lines checked

Patch 1/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/5 Checking commit 7bccf2cf8bff (iotest-30-shorten: concentrate on failing test case)
3/5 Checking commit fae70bd3851a (scripts/block-coroutine-wrapper.py: allow more function types)
WARNING: line over 80 characters
#35: FILE: scripts/block-coroutine-wrapper.py:82:
+func_decl_re = re.compile(r'^(?P<return_type>(void|int))\s*generated_co_wrapper\s*'

total: 0 errors, 1 warnings, 80 lines checked

Patch 3/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/5 Checking commit 74c4557e6dea (block: move some mirror and stream handlers to coroutine)
5/5 Checking commit 9aed58093018 (block: protect some graph-modifyng things by mutex)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201120161622.1537-1-vsementsov@virtuozzo.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/5] Fix accidental crash in iotest 30
  2020-11-20 16:27 ` [PATCH RFC 0/5] Fix accidental crash in iotest 30 no-reply
@ 2020-11-20 16:35   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-20 16:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, jsnow, mreitz, kwolf, philmd, peter.maydell, berto,
	stefanha, pbonzini, den, eblake

20.11.2020 19:27, no-reply@patchew.org wrote:
> Patchew URL:https://patchew.org/QEMU/20201120161622.1537-1-vsementsov@virtuozzo.com/
> 
> 
> 
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:

This is an RFC, so I didn't care.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/5] Fix accidental crash in iotest 30
  2020-11-20 16:16 [PATCH RFC 0/5] Fix accidental crash in iotest 30 Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2020-11-20 16:27 ` [PATCH RFC 0/5] Fix accidental crash in iotest 30 no-reply
@ 2020-11-20 16:36 ` Kevin Wolf
  2020-11-20 16:43   ` Vladimir Sementsov-Ogievskiy
  6 siblings, 1 reply; 16+ messages in thread
From: Kevin Wolf @ 2020-11-20 16:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: peter.maydell, berto, qemu-block, jsnow, qemu-devel, mreitz,
	stefanha, den, pbonzini, philmd

Am 20.11.2020 um 17:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi all!
> 
> As Peter recently noted, iotest 30 accidentally fails.
> 
> I found that Qemu crashes due to interleaving of graph-update
> operations of parallel mirror and stream block-jobs.

I haven't found the time yet to properly look into this or your other
thread where you had a similar question, but there is one thing I'm
wondering: Why can the nested job even make progress and run its
completion handler?

When we modify the graph, we should have drained the subtree in
question, so in theory while one job finishes and modifies the graph,
there should be no way for the other job to make progress and get
interleaved - it shouldn't be able to start I/O requests and much less
to run its completion handler and modify the graph.

Are we missing drained sections somewhere or do they fail to achieve
what I think they should achieve?

Kevin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/5] Fix accidental crash in iotest 30
  2020-11-20 16:36 ` Kevin Wolf
@ 2020-11-20 16:43   ` Vladimir Sementsov-Ogievskiy
  2020-11-20 17:22     ` Kevin Wolf
  0 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-20 16:43 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, qemu-devel, jsnow, mreitz, philmd, peter.maydell,
	berto, stefanha, pbonzini, den, eblake

20.11.2020 19:36, Kevin Wolf wrote:
> Am 20.11.2020 um 17:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi all!
>>
>> As Peter recently noted, iotest 30 accidentally fails.
>>
>> I found that Qemu crashes due to interleaving of graph-update
>> operations of parallel mirror and stream block-jobs.
> 
> I haven't found the time yet to properly look into this or your other
> thread where you had a similar question, but there is one thing I'm
> wondering: Why can the nested job even make progress and run its
> completion handler?
> 
> When we modify the graph, we should have drained the subtree in
> question, so in theory while one job finishes and modifies the graph,
> there should be no way for the other job to make progress and get
> interleaved - it shouldn't be able to start I/O requests and much less
> to run its completion handler and modify the graph.
> 
> Are we missing drained sections somewhere or do they fail to achieve
> what I think they should achieve?
> 

It all looks like both jobs are reached their finish simultaneously. So, all progress is done in both jobs. And they go concurrently to completion procedures which interleaves. So, there no more io through blk, which is restricted by drained sections.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/5] Fix accidental crash in iotest 30
  2020-11-20 16:43   ` Vladimir Sementsov-Ogievskiy
@ 2020-11-20 17:22     ` Kevin Wolf
  2020-11-20 18:19       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Wolf @ 2020-11-20 17:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: peter.maydell, berto, qemu-block, jsnow, qemu-devel, mreitz,
	stefanha, den, pbonzini, philmd

Am 20.11.2020 um 17:43 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 20.11.2020 19:36, Kevin Wolf wrote:
> > Am 20.11.2020 um 17:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > Hi all!
> > > 
> > > As Peter recently noted, iotest 30 accidentally fails.
> > > 
> > > I found that Qemu crashes due to interleaving of graph-update
> > > operations of parallel mirror and stream block-jobs.
> > 
> > I haven't found the time yet to properly look into this or your other
> > thread where you had a similar question, but there is one thing I'm
> > wondering: Why can the nested job even make progress and run its
> > completion handler?
> > 
> > When we modify the graph, we should have drained the subtree in
> > question, so in theory while one job finishes and modifies the graph,
> > there should be no way for the other job to make progress and get
> > interleaved - it shouldn't be able to start I/O requests and much less
> > to run its completion handler and modify the graph.
> > 
> > Are we missing drained sections somewhere or do they fail to achieve
> > what I think they should achieve?
> > 
> 
> It all looks like both jobs are reached their finish simultaneously.
> So, all progress is done in both jobs. And they go concurrently to
> completion procedures which interleaves. So, there no more io through
> blk, which is restricted by drained sections.

They can't be truly simultaneous because they run in the same thread.
During job completion, this is the main thread.

However as soon as job_is_completed() returns true, it seems we're not
pausing the job any more when one of its nodes gets drained.

Possibly also relevant: The job->busy = false in job_exit(). The comment
there says it's a lie, but we might deadlock otherwise.

This problem will probably affect other callers, too, which drain a
subtree and then resonably expect that nobody will modify the graph
until they end the drained section. So I think the problem that we need
to address is that jobs run their completion handlers even though they
are supposed to be paused by a drain.

I'm not saying that your graph modification locks are a bad idea, but
they are probably not a complete solution.

Kevin



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/5] Fix accidental crash in iotest 30
  2020-11-20 17:22     ` Kevin Wolf
@ 2020-11-20 18:19       ` Vladimir Sementsov-Ogievskiy
  2020-11-23 10:10         ` Kevin Wolf
  0 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-20 18:19 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, qemu-devel, jsnow, mreitz, philmd, peter.maydell,
	berto, stefanha, pbonzini, den, eblake

20.11.2020 20:22, Kevin Wolf wrote:
> Am 20.11.2020 um 17:43 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 20.11.2020 19:36, Kevin Wolf wrote:
>>> Am 20.11.2020 um 17:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> Hi all!
>>>>
>>>> As Peter recently noted, iotest 30 accidentally fails.
>>>>
>>>> I found that Qemu crashes due to interleaving of graph-update
>>>> operations of parallel mirror and stream block-jobs.
>>>
>>> I haven't found the time yet to properly look into this or your other
>>> thread where you had a similar question, but there is one thing I'm
>>> wondering: Why can the nested job even make progress and run its
>>> completion handler?
>>>
>>> When we modify the graph, we should have drained the subtree in
>>> question, so in theory while one job finishes and modifies the graph,
>>> there should be no way for the other job to make progress and get
>>> interleaved - it shouldn't be able to start I/O requests and much less
>>> to run its completion handler and modify the graph.
>>>
>>> Are we missing drained sections somewhere or do they fail to achieve
>>> what I think they should achieve?
>>>
>>
>> It all looks like both jobs are reached their finish simultaneously.
>> So, all progress is done in both jobs. And they go concurrently to
>> completion procedures which interleaves. So, there no more io through
>> blk, which is restricted by drained sections.
> 
> They can't be truly simultaneous because they run in the same thread.
> During job completion, this is the main thread.

No, they not truly simultaneous, but completions may interleave through nested aio_poll loops.

> 
> However as soon as job_is_completed() returns true, it seems we're not
> pausing the job any more when one of its nodes gets drained.
> 
> Possibly also relevant: The job->busy = false in job_exit(). The comment
> there says it's a lie, but we might deadlock otherwise.
> 
> This problem will probably affect other callers, too, which drain a
> subtree and then resonably expect that nobody will modify the graph
> until they end the drained section. So I think the problem that we need
> to address is that jobs run their completion handlers even though they
> are supposed to be paused by a drain.

Hmm. I always thought about drained section as about thing that stops IO requests, not other operations.. And we do graph modifications in drained section to avoid in-flight IO requests during graph modification.

> 
> I'm not saying that your graph modification locks are a bad idea, but
> they are probably not a complete solution.
> 

Hmm. What do you mean? It's of course not complete, as I didn't protect every graph modification procedure.. But if we do protect all such things and do graph modifications always under this mutex, it should work I think.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/5] Fix accidental crash in iotest 30
  2020-11-20 18:19       ` Vladimir Sementsov-Ogievskiy
@ 2020-11-23 10:10         ` Kevin Wolf
  2020-11-23 10:29           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Wolf @ 2020-11-23 10:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: peter.maydell, berto, qemu-block, jsnow, qemu-devel, mreitz,
	stefanha, den, pbonzini, philmd

Am 20.11.2020 um 19:19 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 20.11.2020 20:22, Kevin Wolf wrote:
> > Am 20.11.2020 um 17:43 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 20.11.2020 19:36, Kevin Wolf wrote:
> > > > Am 20.11.2020 um 17:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > Hi all!
> > > > > 
> > > > > As Peter recently noted, iotest 30 accidentally fails.
> > > > > 
> > > > > I found that Qemu crashes due to interleaving of graph-update
> > > > > operations of parallel mirror and stream block-jobs.
> > > > 
> > > > I haven't found the time yet to properly look into this or your other
> > > > thread where you had a similar question, but there is one thing I'm
> > > > wondering: Why can the nested job even make progress and run its
> > > > completion handler?
> > > > 
> > > > When we modify the graph, we should have drained the subtree in
> > > > question, so in theory while one job finishes and modifies the graph,
> > > > there should be no way for the other job to make progress and get
> > > > interleaved - it shouldn't be able to start I/O requests and much less
> > > > to run its completion handler and modify the graph.
> > > > 
> > > > Are we missing drained sections somewhere or do they fail to achieve
> > > > what I think they should achieve?
> > > > 
> > > 
> > > It all looks like both jobs are reached their finish simultaneously.
> > > So, all progress is done in both jobs. And they go concurrently to
> > > completion procedures which interleaves. So, there no more io through
> > > blk, which is restricted by drained sections.
> > 
> > They can't be truly simultaneous because they run in the same thread.
> > During job completion, this is the main thread.
> 
> No, they not truly simultaneous, but completions may interleave
> through nested aio_poll loops.
> 
> > 
> > However as soon as job_is_completed() returns true, it seems we're not
> > pausing the job any more when one of its nodes gets drained.
> > 
> > Possibly also relevant: The job->busy = false in job_exit(). The comment
> > there says it's a lie, but we might deadlock otherwise.
> > 
> > This problem will probably affect other callers, too, which drain a
> > subtree and then resonably expect that nobody will modify the graph
> > until they end the drained section. So I think the problem that we need
> > to address is that jobs run their completion handlers even though they
> > are supposed to be paused by a drain.
> 
> Hmm. I always thought about drained section as about thing that stops
> IO requests, not other operations.. And we do graph modifications in
> drained section to avoid in-flight IO requests during graph
> modification.

Is there any use for an operation that only stops I/O, but doesn't
prevent graph changes?

I always understood it as a request to have exclusive access to a
subtree, so that nobody else would touch it.

> > I'm not saying that your graph modification locks are a bad idea, but
> > they are probably not a complete solution.
> > 
> 
> Hmm. What do you mean? It's of course not complete, as I didn't
> protect every graph modification procedure.. But if we do protect all
> such things and do graph modifications always under this mutex, it
> should work I think.

What I mean is that not only graph modifications can conflict with each
other, but most callers of drain_begin/end will probably not be prepared
for the graph changing under their feet, even if they don't actively
change the graph themselves.

Kevin



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/5] Fix accidental crash in iotest 30
  2020-11-23 10:10         ` Kevin Wolf
@ 2020-11-23 10:29           ` Vladimir Sementsov-Ogievskiy
  2020-11-23 11:10             ` Kevin Wolf
  0 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-23 10:29 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, qemu-devel, jsnow, mreitz, philmd, peter.maydell,
	berto, stefanha, pbonzini, den, eblake

23.11.2020 13:10, Kevin Wolf wrote:
> Am 20.11.2020 um 19:19 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 20.11.2020 20:22, Kevin Wolf wrote:
>>> Am 20.11.2020 um 17:43 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 20.11.2020 19:36, Kevin Wolf wrote:
>>>>> Am 20.11.2020 um 17:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> Hi all!
>>>>>>
>>>>>> As Peter recently noted, iotest 30 accidentally fails.
>>>>>>
>>>>>> I found that Qemu crashes due to interleaving of graph-update
>>>>>> operations of parallel mirror and stream block-jobs.
>>>>>
>>>>> I haven't found the time yet to properly look into this or your other
>>>>> thread where you had a similar question, but there is one thing I'm
>>>>> wondering: Why can the nested job even make progress and run its
>>>>> completion handler?
>>>>>
>>>>> When we modify the graph, we should have drained the subtree in
>>>>> question, so in theory while one job finishes and modifies the graph,
>>>>> there should be no way for the other job to make progress and get
>>>>> interleaved - it shouldn't be able to start I/O requests and much less
>>>>> to run its completion handler and modify the graph.
>>>>>
>>>>> Are we missing drained sections somewhere or do they fail to achieve
>>>>> what I think they should achieve?
>>>>>
>>>>
>>>> It all looks like both jobs are reached their finish simultaneously.
>>>> So, all progress is done in both jobs. And they go concurrently to
>>>> completion procedures which interleaves. So, there no more io through
>>>> blk, which is restricted by drained sections.
>>>
>>> They can't be truly simultaneous because they run in the same thread.
>>> During job completion, this is the main thread.
>>
>> No, they not truly simultaneous, but completions may interleave
>> through nested aio_poll loops.
>>
>>>
>>> However as soon as job_is_completed() returns true, it seems we're not
>>> pausing the job any more when one of its nodes gets drained.
>>>
>>> Possibly also relevant: The job->busy = false in job_exit(). The comment
>>> there says it's a lie, but we might deadlock otherwise.
>>>
>>> This problem will probably affect other callers, too, which drain a
>>> subtree and then resonably expect that nobody will modify the graph
>>> until they end the drained section. So I think the problem that we need
>>> to address is that jobs run their completion handlers even though they
>>> are supposed to be paused by a drain.
>>
>> Hmm. I always thought about drained section as about thing that stops
>> IO requests, not other operations.. And we do graph modifications in
>> drained section to avoid in-flight IO requests during graph
>> modification.
> 
> Is there any use for an operation that only stops I/O, but doesn't
> prevent graph changes?
> 
> I always understood it as a request to have exclusive access to a
> subtree, so that nobody else would touch it.
> 
>>> I'm not saying that your graph modification locks are a bad idea, but
>>> they are probably not a complete solution.
>>>
>>
>> Hmm. What do you mean? It's of course not complete, as I didn't
>> protect every graph modification procedure.. But if we do protect all
>> such things and do graph modifications always under this mutex, it
>> should work I think.
> 
> What I mean is that not only graph modifications can conflict with each
> other, but most callers of drain_begin/end will probably not be prepared
> for the graph changing under their feet, even if they don't actively
> change the graph themselves.
> 

Understand now.. Right.. Anyway, it looks as we need some kind of mutex. As the user of drained section of course wants to do graph modifications and even IO (for example update backing-link in metadata). The first thing that comes to mind is to protect all outer-most drained sections by global CoMutex and assert in drain_begin/drain_end that the mutex is locked.

Hmm, it also looks like RW-lock, and simple IO is "read" and something under drained section is "write".


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/5] Fix accidental crash in iotest 30
  2020-11-23 10:29           ` Vladimir Sementsov-Ogievskiy
@ 2020-11-23 11:10             ` Kevin Wolf
  2020-11-23 13:44               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Wolf @ 2020-11-23 11:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: peter.maydell, berto, qemu-block, jsnow, qemu-devel, mreitz,
	stefanha, den, pbonzini, philmd

Am 23.11.2020 um 11:29 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 23.11.2020 13:10, Kevin Wolf wrote:
> > Am 20.11.2020 um 19:19 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 20.11.2020 20:22, Kevin Wolf wrote:
> > > > Am 20.11.2020 um 17:43 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > 20.11.2020 19:36, Kevin Wolf wrote:
> > > > > > Am 20.11.2020 um 17:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > > > Hi all!
> > > > > > > 
> > > > > > > As Peter recently noted, iotest 30 accidentally fails.
> > > > > > > 
> > > > > > > I found that Qemu crashes due to interleaving of graph-update
> > > > > > > operations of parallel mirror and stream block-jobs.
> > > > > > 
> > > > > > I haven't found the time yet to properly look into this or your other
> > > > > > thread where you had a similar question, but there is one thing I'm
> > > > > > wondering: Why can the nested job even make progress and run its
> > > > > > completion handler?
> > > > > > 
> > > > > > When we modify the graph, we should have drained the subtree in
> > > > > > question, so in theory while one job finishes and modifies the graph,
> > > > > > there should be no way for the other job to make progress and get
> > > > > > interleaved - it shouldn't be able to start I/O requests and much less
> > > > > > to run its completion handler and modify the graph.
> > > > > > 
> > > > > > Are we missing drained sections somewhere or do they fail to achieve
> > > > > > what I think they should achieve?
> > > > > > 
> > > > > 
> > > > > It all looks like both jobs are reached their finish simultaneously.
> > > > > So, all progress is done in both jobs. And they go concurrently to
> > > > > completion procedures which interleaves. So, there no more io through
> > > > > blk, which is restricted by drained sections.
> > > > 
> > > > They can't be truly simultaneous because they run in the same thread.
> > > > During job completion, this is the main thread.
> > > 
> > > No, they not truly simultaneous, but completions may interleave
> > > through nested aio_poll loops.
> > > 
> > > > 
> > > > However as soon as job_is_completed() returns true, it seems we're not
> > > > pausing the job any more when one of its nodes gets drained.
> > > > 
> > > > Possibly also relevant: The job->busy = false in job_exit(). The comment
> > > > there says it's a lie, but we might deadlock otherwise.
> > > > 
> > > > This problem will probably affect other callers, too, which drain a
> > > > subtree and then resonably expect that nobody will modify the graph
> > > > until they end the drained section. So I think the problem that we need
> > > > to address is that jobs run their completion handlers even though they
> > > > are supposed to be paused by a drain.
> > > 
> > > Hmm. I always thought about drained section as about thing that stops
> > > IO requests, not other operations.. And we do graph modifications in
> > > drained section to avoid in-flight IO requests during graph
> > > modification.
> > 
> > Is there any use for an operation that only stops I/O, but doesn't
> > prevent graph changes?
> > 
> > I always understood it as a request to have exclusive access to a
> > subtree, so that nobody else would touch it.
> > 
> > > > I'm not saying that your graph modification locks are a bad idea, but
> > > > they are probably not a complete solution.
> > > > 
> > > 
> > > Hmm. What do you mean? It's of course not complete, as I didn't
> > > protect every graph modification procedure.. But if we do protect all
> > > such things and do graph modifications always under this mutex, it
> > > should work I think.
> > 
> > What I mean is that not only graph modifications can conflict with each
> > other, but most callers of drain_begin/end will probably not be prepared
> > for the graph changing under their feet, even if they don't actively
> > change the graph themselves.
> > 
> 
> Understand now.. Right.. Anyway, it looks as we need some kind of
> mutex. As the user of drained section of course wants to do graph
> modifications and even IO (for example update backing-link in
> metadata). The first thing that comes to mind is to protect all
> outer-most drained sections by global CoMutex and assert in
> drain_begin/drain_end that the mutex is locked.
> 
> Hmm, it also looks like RW-lock, and simple IO is "read" and something
> under drained section is "write".

In a way, drain _is_ the implementation of a lock. But as you've shown,
it's a buggy implementation.

What I was looking at was basically fixing the one instance of a bug
while leaving the design as it is.

My impression is that you're looking at this more radically and want to
rearchitecture the whole drain mechanism so that such bugs would be much
less likely to start with. Maybe this is a good idea, but it's probably
also a lot more effort.

Basically, for making use of more traditional locks, the naive approach
would be changing blk/bdrv_inc/dec_in_flight() so that it takes/releases
an actual coroutine lock. As you suggest, probably a CoRwLock.

I see a few non-trivial questions already for this part:

* What about requests for which bs->in_flight is increased more than
  once? Do we need some sort of recursive lock for them?

* How do you know whether you should take a reader or a writer lock? For
  drains called from coroutine context, maybe you could store the caller
  that "owns" the drain section in the BDS, but what about non-coroutine
  drains?

  What do you do if coroutine A drains and then (directly or indirectly)
  spawns coroutine B to do some work?

* Multiple concurrent requests from the same origin (the drain section
  owner) shouldn't be serialised, so the CoRwLock needs to be taken once
  per origin, not once per request. Again, how do we identify origins
  and where do we store the common lock?

* Is it desirable that requests hang in bdrv_inc_in_flight() waiting for
  the lock to be released? This may be in the middle of another
  operation that needs to complete before drain_begin can return.

  I seem to remember that introducing queued_requests in BlockBackend
  was already non-trivial because of potential deadlocks. We would have
  to prepare for more of the same in BlockDriverState.

  The BlockBackend code involves temporarily dropping the in_flight
  counter change that the request made, but on the BDS level we don't
  even know which counters we increased how often before reaching
  bdrv_inc_in_flight().

Do you have a different approach for placing the locks or do you have
ideas for how we would find good answers for these problems?

Kevin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/5] Fix accidental crash in iotest 30
  2020-11-23 11:10             ` Kevin Wolf
@ 2020-11-23 13:44               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-23 13:44 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, qemu-devel, jsnow, mreitz, philmd, peter.maydell,
	berto, stefanha, pbonzini, den, eblake

23.11.2020 14:10, Kevin Wolf wrote:
> Am 23.11.2020 um 11:29 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 23.11.2020 13:10, Kevin Wolf wrote:
>>> Am 20.11.2020 um 19:19 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 20.11.2020 20:22, Kevin Wolf wrote:
>>>>> Am 20.11.2020 um 17:43 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> 20.11.2020 19:36, Kevin Wolf wrote:
>>>>>>> Am 20.11.2020 um 17:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>>>> Hi all!
>>>>>>>>
>>>>>>>> As Peter recently noted, iotest 30 accidentally fails.
>>>>>>>>
>>>>>>>> I found that Qemu crashes due to interleaving of graph-update
>>>>>>>> operations of parallel mirror and stream block-jobs.
>>>>>>>
>>>>>>> I haven't found the time yet to properly look into this or your other
>>>>>>> thread where you had a similar question, but there is one thing I'm
>>>>>>> wondering: Why can the nested job even make progress and run its
>>>>>>> completion handler?
>>>>>>>
>>>>>>> When we modify the graph, we should have drained the subtree in
>>>>>>> question, so in theory while one job finishes and modifies the graph,
>>>>>>> there should be no way for the other job to make progress and get
>>>>>>> interleaved - it shouldn't be able to start I/O requests and much less
>>>>>>> to run its completion handler and modify the graph.
>>>>>>>
>>>>>>> Are we missing drained sections somewhere or do they fail to achieve
>>>>>>> what I think they should achieve?
>>>>>>>
>>>>>>
>>>>>> It all looks like both jobs are reached their finish simultaneously.
>>>>>> So, all progress is done in both jobs. And they go concurrently to
>>>>>> completion procedures which interleaves. So, there no more io through
>>>>>> blk, which is restricted by drained sections.
>>>>>
>>>>> They can't be truly simultaneous because they run in the same thread.
>>>>> During job completion, this is the main thread.
>>>>
>>>> No, they not truly simultaneous, but completions may interleave
>>>> through nested aio_poll loops.
>>>>
>>>>>
>>>>> However as soon as job_is_completed() returns true, it seems we're not
>>>>> pausing the job any more when one of its nodes gets drained.
>>>>>
>>>>> Possibly also relevant: The job->busy = false in job_exit(). The comment
>>>>> there says it's a lie, but we might deadlock otherwise.
>>>>>
>>>>> This problem will probably affect other callers, too, which drain a
>>>>> subtree and then resonably expect that nobody will modify the graph
>>>>> until they end the drained section. So I think the problem that we need
>>>>> to address is that jobs run their completion handlers even though they
>>>>> are supposed to be paused by a drain.
>>>>
>>>> Hmm. I always thought about drained section as about thing that stops
>>>> IO requests, not other operations.. And we do graph modifications in
>>>> drained section to avoid in-flight IO requests during graph
>>>> modification.
>>>
>>> Is there any use for an operation that only stops I/O, but doesn't
>>> prevent graph changes?
>>>
>>> I always understood it as a request to have exclusive access to a
>>> subtree, so that nobody else would touch it.
>>>
>>>>> I'm not saying that your graph modification locks are a bad idea, but
>>>>> they are probably not a complete solution.
>>>>>
>>>>
>>>> Hmm. What do you mean? It's of course not complete, as I didn't
>>>> protect every graph modification procedure.. But if we do protect all
>>>> such things and do graph modifications always under this mutex, it
>>>> should work I think.
>>>
>>> What I mean is that not only graph modifications can conflict with each
>>> other, but most callers of drain_begin/end will probably not be prepared
>>> for the graph changing under their feet, even if they don't actively
>>> change the graph themselves.
>>>
>>
>> Understand now.. Right.. Anyway, it looks as we need some kind of
>> mutex. As the user of drained section of course wants to do graph
>> modifications and even IO (for example update backing-link in
>> metadata). The first thing that comes to mind is to protect all
>> outer-most drained sections by global CoMutex and assert in
>> drain_begin/drain_end that the mutex is locked.
>>
>> Hmm, it also looks like RW-lock, and simple IO is "read" and something
>> under drained section is "write".
> 
> In a way, drain _is_ the implementation of a lock. But as you've shown,
> it's a buggy implementation.
> 
> What I was looking at was basically fixing the one instance of a bug
> while leaving the design as it is.
> 
> My impression is that you're looking at this more radically and want to
> rearchitecture the whole drain mechanism so that such bugs would be much
> less likely to start with. Maybe this is a good idea, but it's probably
> also a lot more effort.
> 
> Basically, for making use of more traditional locks, the naive approach
> would be changing blk/bdrv_inc/dec_in_flight() so that it takes/releases
> an actual coroutine lock. As you suggest, probably a CoRwLock.
> 
> I see a few non-trivial questions already for this part:
> 
> * What about requests for which bs->in_flight is increased more than
>    once? Do we need some sort of recursive lock for them?

Is there reasonable example? I'd avoid recursive locks if possible, it doesn't make things simpler.

> 
> * How do you know whether you should take a reader or a writer lock? For
>    drains called from coroutine context, maybe you could store the caller
>    that "owns" the drain section in the BDS, but what about non-coroutine
>    drains?

Intuitively, readers are all IO requests and writers are drained sections.

Why should we store drained-section owner somewhere?

Ah, we need to do "read" when holding write lock.. So we need a possibility for readers to operate without taking lock, when write lock is taken by current[*] coroutine.

And I don't see the way to make synchronization without moving everything to coroutine.
In non coroutine we'll dead lock on nested aio_poll loop which will do nested drain of another job. In coroutine we can wait on mutex more efficiently.

> 
>    What do you do if coroutine A drains and then (directly or indirectly)
>    spawns coroutine B to do some work?

And that means that [*] is not current coroutine but may be some another coroutine started indirectly by lock owner..

So, just RW lock is not enough.. We need something like RW lock but with a possibility to call read operations while write lock is taken (by owner of write lock)..

> 
> * Multiple concurrent requests from the same origin (the drain section
>    owner) shouldn't be serialised, so the CoRwLock needs to be taken once
>    per origin, not once per request. Again, how do we identify origins
>    and where do we store the common lock?

This makes things complex. Why not to take lock per request? IO requests will take read lock and go in parallel.

> 
> * Is it desirable that requests hang in bdrv_inc_in_flight() waiting for
>    the lock to be released? This may be in the middle of another
>    operation that needs to complete before drain_begin can return.

drain_begin should take write lock. This automatically means than all read locks are released..

> 
>    I seem to remember that introducing queued_requests in BlockBackend
>    was already non-trivial because of potential deadlocks. We would have
>    to prepare for more of the same in BlockDriverState.
> 
>    The BlockBackend code involves temporarily dropping the in_flight
>    counter change that the request made, but on the BDS level we don't
>    even know which counters we increased how often before reaching
>    bdrv_inc_in_flight().
> 
> Do you have a different approach for placing the locks or do you have
> ideas for how we would find good answers for these problems?
> 

No, I don't have good complete architecture in-mind.. I was interested in this because I'm preparing a series to refactor permissions-update, and it's quite close. But still a lot simpler than all these problems. So, I'd start from my already prepared series anyway.

Do you think it worth protecting by global lock some job handlers, like I propose in patch 05? It's of course better to move generic job_*() functions to coroutine, to not make same thing for each job. It will not solve the whole problem but at least fix 030 iotest.. I don't remember real bugs around this and don't think that users really run parallel stream and commit jobs. On the other hand the fix should not hurt.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-11-23 13:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-20 16:16 [PATCH RFC 0/5] Fix accidental crash in iotest 30 Vladimir Sementsov-Ogievskiy
2020-11-20 16:16 ` [PATCH 1/5] abort-on-set-to-true Vladimir Sementsov-Ogievskiy
2020-11-20 16:16 ` [PATCH 2/5] iotest-30-shorten: concentrate on failing test case Vladimir Sementsov-Ogievskiy
2020-11-20 16:16 ` [PATCH 3/5] scripts/block-coroutine-wrapper.py: allow more function types Vladimir Sementsov-Ogievskiy
2020-11-20 16:16 ` [PATCH 4/5] block: move some mirror and stream handlers to coroutine Vladimir Sementsov-Ogievskiy
2020-11-20 16:16 ` [PATCH 5/5] block: protect some graph-modifyng things by mutex Vladimir Sementsov-Ogievskiy
2020-11-20 16:27 ` [PATCH RFC 0/5] Fix accidental crash in iotest 30 no-reply
2020-11-20 16:35   ` Vladimir Sementsov-Ogievskiy
2020-11-20 16:36 ` Kevin Wolf
2020-11-20 16:43   ` Vladimir Sementsov-Ogievskiy
2020-11-20 17:22     ` Kevin Wolf
2020-11-20 18:19       ` Vladimir Sementsov-Ogievskiy
2020-11-23 10:10         ` Kevin Wolf
2020-11-23 10:29           ` Vladimir Sementsov-Ogievskiy
2020-11-23 11:10             ` Kevin Wolf
2020-11-23 13:44               ` Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).