* [PATCH 00/11] mirror: cancel nbd reconnect
@ 2020-11-18 18:04 Vladimir Sementsov-Ogievskiy
  2020-11-18 18:04 ` [PATCH 01/11] block: add new BlockDriver handler: bdrv_cancel_in_flight Vladimir Sementsov-Ogievskiy
                   ` (13 more replies)
  0 siblings, 14 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
Hi all!
The problem
Assume we have mirror job with nbd target node with enabled reconnect.
Connection failed. So, all current requests to nbd node are waiting for
nbd driver to reconnect. And they will wait for reconnect-delay time
specified in nbd blockdev options. This timeout may be long enough, for
example, we in Virtuozzo use 300 seconds by default.
So, if at this moment user tries to cancel the job, job will wait for
its in-flight requests to finish up to 300 seconds. From the user point
of view, cancelling the job takes a long time. Bad.
Solution
Let's just cancel "waiting for reconnect in in-flight request coroutines"
on mirror (and backup) cancel. Welcome the series below.
Vladimir Sementsov-Ogievskiy (11):
  block: add new BlockDriver handler: bdrv_cancel_in_flight
  block/nbd: implement .bdrv_cancel_in_flight
  block/raw-format: implement .bdrv_cancel_in_flight handler
  job: add .cancel handler for the driver
  block/mirror: implement .cancel job handler
  iotests/264: fix style
  iotests/264: move to python unittest
  iotests.py: qemu_nbd_popen: remove pid file after use
  iotests/264: add mirror-cancel test-case
  block/backup: implement .cancel job handler
  iotests/264: add backup-cancel test-case
 include/block/block.h         |   3 +
 include/block/block_int.h     |   9 +++
 include/qemu/job.h            |   5 ++
 block/backup.c                |  10 +++
 block/io.c                    |  11 +++
 block/mirror.c                |   9 +++
 block/nbd.c                   |  15 ++++
 block/raw-format.c            |   6 ++
 job.c                         |   3 +
 tests/qemu-iotests/264        | 143 ++++++++++++++++++++++------------
 tests/qemu-iotests/264.out    |  20 ++---
 tests/qemu-iotests/iotests.py |   6 +-
 12 files changed, 173 insertions(+), 67 deletions(-)
-- 
2.21.3
^ permalink raw reply	[flat|nested] 36+ messages in thread
* [PATCH 01/11] block: add new BlockDriver handler: bdrv_cancel_in_flight
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-20 22:27   ` Eric Blake
  2020-11-18 18:04 ` [PATCH 02/11] block/nbd: implement .bdrv_cancel_in_flight Vladimir Sementsov-Ogievskiy
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
It will be used to stop retrying NBD requests on mirror cancel.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block.h     |  3 +++
 include/block/block_int.h |  9 +++++++++
 block/io.c                | 11 +++++++++++
 3 files changed, 23 insertions(+)
diff --git a/include/block/block.h b/include/block/block.h
index c9d7c58765..3990ee3677 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -836,4 +836,7 @@ int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
                                     BdrvChild *dst, uint64_t dst_offset,
                                     uint64_t bytes, BdrvRequestFlags read_flags,
                                     BdrvRequestFlags write_flags);
+
+void bdrv_cancel_in_flight(BlockDriverState *bs);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 95d9333be1..07a87ce5e7 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -344,6 +344,15 @@ struct BlockDriver {
         bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
         int64_t *map, BlockDriverState **file);
 
+    /*
+     * This informs the driver that we are not more interested in in-flight
+     * requests results, so don't waste the time if possible.
+     *
+     * The example usage is to not wait for nbd target nodedreconnect timeout on
+     * job-cancel.
+     */
+    void (*bdrv_cancel_in_flight)(BlockDriverState *bs);
+
     /*
      * Invalidate any cached meta-data.
      */
diff --git a/block/io.c b/block/io.c
index ec5e152bb7..5dcb6433f9 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3288,3 +3288,14 @@ out:
 
     return ret;
 }
+
+void bdrv_cancel_in_flight(BlockDriverState *bs)
+{
+    if (!bs || !bs->drv) {
+        return;
+    }
+
+    if (bs->drv->bdrv_cancel_in_flight) {
+        bs->drv->bdrv_cancel_in_flight(bs);
+    }
+}
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH 02/11] block/nbd: implement .bdrv_cancel_in_flight
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
  2020-11-18 18:04 ` [PATCH 01/11] block: add new BlockDriver handler: bdrv_cancel_in_flight Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-20 23:13   ` Eric Blake
  2020-11-18 18:04 ` [PATCH 03/11] block/raw-format: implement .bdrv_cancel_in_flight handler Vladimir Sementsov-Ogievskiy
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
Just stop waiting for connection in existing requests.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
diff --git a/block/nbd.c b/block/nbd.c
index 42536702b6..887410b106 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -2419,6 +2419,18 @@ static const char *const nbd_strong_runtime_opts[] = {
     NULL
 };
 
+static void nbd_cancel_in_flight(BlockDriverState *bs)
+{
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+
+    reconnect_delay_timer_del(s);
+
+    if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
+        s->state = NBD_CLIENT_CONNECTING_NOWAIT;
+        qemu_co_queue_restart_all(&s->free_sema);
+    }
+}
+
 static BlockDriver bdrv_nbd = {
     .format_name                = "nbd",
     .protocol_name              = "nbd",
@@ -2445,6 +2457,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_co_block_status       = nbd_client_co_block_status,
     .bdrv_dirname               = nbd_dirname,
     .strong_runtime_opts        = nbd_strong_runtime_opts,
+    .bdrv_cancel_in_flight      = nbd_cancel_in_flight,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -2473,6 +2486,7 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_co_block_status       = nbd_client_co_block_status,
     .bdrv_dirname               = nbd_dirname,
     .strong_runtime_opts        = nbd_strong_runtime_opts,
+    .bdrv_cancel_in_flight      = nbd_cancel_in_flight,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -2501,6 +2515,7 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_co_block_status       = nbd_client_co_block_status,
     .bdrv_dirname               = nbd_dirname,
     .strong_runtime_opts        = nbd_strong_runtime_opts,
+    .bdrv_cancel_in_flight      = nbd_cancel_in_flight,
 };
 
 static void bdrv_nbd_init(void)
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH 03/11] block/raw-format: implement .bdrv_cancel_in_flight handler
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
  2020-11-18 18:04 ` [PATCH 01/11] block: add new BlockDriver handler: bdrv_cancel_in_flight Vladimir Sementsov-Ogievskiy
  2020-11-18 18:04 ` [PATCH 02/11] block/nbd: implement .bdrv_cancel_in_flight Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-20 23:15   ` Eric Blake
  2020-11-18 18:04 ` [PATCH 04/11] job: add .cancel handler for the driver Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
We are going to cancel in-flight requests on mirror nbd target on job
cancel. Still nbd is often used not directly but as raw-format child.
So, add pass-through handler here.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/raw-format.c | 6 ++++++
 1 file changed, 6 insertions(+)
diff --git a/block/raw-format.c b/block/raw-format.c
index 42ec50802b..7717578ed6 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -575,6 +575,11 @@ static const char *const raw_strong_runtime_opts[] = {
     NULL
 };
 
+static void raw_cancel_in_flight(BlockDriverState *bs)
+{
+    bdrv_cancel_in_flight(bs->file->bs);
+}
+
 BlockDriver bdrv_raw = {
     .format_name          = "raw",
     .instance_size        = sizeof(BDRVRawState),
@@ -608,6 +613,7 @@ BlockDriver bdrv_raw = {
     .bdrv_has_zero_init   = &raw_has_zero_init,
     .strong_runtime_opts  = raw_strong_runtime_opts,
     .mutable_opts         = mutable_opts,
+    .bdrv_cancel_in_flight = raw_cancel_in_flight,
 };
 
 static void bdrv_raw_init(void)
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH 04/11] job: add .cancel handler for the driver
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2020-11-18 18:04 ` [PATCH 03/11] block/raw-format: implement .bdrv_cancel_in_flight handler Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-20 23:17   ` Eric Blake
  2020-11-18 18:04 ` [PATCH 05/11] block/mirror: implement .cancel job handler Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
To be used in mirror in the following commit to cancel in-flight io on
target to not waste the time.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/qemu/job.h | 5 +++++
 job.c              | 3 +++
 2 files changed, 8 insertions(+)
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 32aabb1c60..efc6fa7544 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -251,6 +251,11 @@ struct JobDriver {
      */
     void (*clean)(Job *job);
 
+    /**
+     * If the callback is not NULL, it will be invoked in job_cancel_async
+     */
+    void (*cancel)(Job *job);
+
 
     /** Called when the job is freed */
     void (*free)(Job *job);
diff --git a/job.c b/job.c
index 8fecf38960..65012dbd03 100644
--- a/job.c
+++ b/job.c
@@ -712,6 +712,9 @@ static int job_finalize_single(Job *job)
 
 static void job_cancel_async(Job *job, bool force)
 {
+    if (job->driver->cancel) {
+        job->driver->cancel(job);
+    }
     if (job->user_paused) {
         /* Do not call job_enter here, the caller will handle it.  */
         if (job->driver->user_resume) {
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH 05/11] block/mirror: implement .cancel job handler
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2020-11-18 18:04 ` [PATCH 04/11] job: add .cancel handler for the driver Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-21  1:05   ` Eric Blake
  2020-11-18 18:04 ` [PATCH 06/11] iotests/264: fix style Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
Cancel in-flight io on target to not waste the time.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/mirror.c | 9 +++++++++
 1 file changed, 9 insertions(+)
diff --git a/block/mirror.c b/block/mirror.c
index 8e1ad6eceb..9faffe4707 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1179,6 +1179,14 @@ static bool mirror_drained_poll(BlockJob *job)
     return !!s->in_flight;
 }
 
+static void mirror_cancel(Job *job)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
+    BlockDriverState *target = blk_bs(s->target);
+
+    bdrv_cancel_in_flight(target);
+}
+
 static const BlockJobDriver mirror_job_driver = {
     .job_driver = {
         .instance_size          = sizeof(MirrorBlockJob),
@@ -1190,6 +1198,7 @@ static const BlockJobDriver mirror_job_driver = {
         .abort                  = mirror_abort,
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
+        .cancel                 = mirror_cancel,
     },
     .drained_poll           = mirror_drained_poll,
 };
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH 06/11] iotests/264: fix style
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2020-11-18 18:04 ` [PATCH 05/11] block/mirror: implement .cancel job handler Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-21  1:13   ` Eric Blake
  2020-11-18 18:04 ` [PATCH 07/11] iotests/264: move to python unittest Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
Fix long line, extra import and one mypy complaint about incompatible
int and float.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/264 | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/tests/qemu-iotests/264 b/tests/qemu-iotests/264
index 666f164ed8..0e8b213657 100755
--- a/tests/qemu-iotests/264
+++ b/tests/qemu-iotests/264
@@ -21,8 +21,7 @@
 import time
 
 import iotests
-from iotests import qemu_img_create, qemu_io_silent_check, file_path, \
-        qemu_nbd_popen, log
+from iotests import qemu_img_create, file_path, qemu_nbd_popen, log
 
 iotests.script_initialize(
     supported_fmts=['qcow2'],
@@ -31,7 +30,7 @@ iotests.script_initialize(
 disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
 nbd_uri = 'nbd+unix:///?socket=' + nbd_sock
 size = 5 * 1024 * 1024
-wait_limit = 3
+wait_limit = 3.0
 wait_step = 0.2
 
 qemu_img_create('-f', iotests.imgfmt, disk_a, str(size))
@@ -48,11 +47,11 @@ with qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b):
                   'file': {'driver': 'nbd',
                            'server': {'type': 'unix', 'path': nbd_sock},
                            'reconnect-delay': 10}})
-    vm.qmp_log('blockdev-backup', device='drive0', sync='full', target='backup0',
-               speed=(1 * 1024 * 1024))
+    vm.qmp_log('blockdev-backup', device='drive0', sync='full',
+               target='backup0', speed=(1 * 1024 * 1024))
 
     # Wait for some progress
-    t = 0
+    t = 0.0
     while t < wait_limit:
         jobs = vm.qmp('query-block-jobs')['return']
         if jobs and jobs[0]['offset'] > 0:
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH 07/11] iotests/264: move to python unittest
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2020-11-18 18:04 ` [PATCH 06/11] iotests/264: fix style Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-21  1:17   ` Eric Blake
  2020-11-18 18:04 ` [PATCH 08/11] iotests.py: qemu_nbd_popen: remove pid file after use Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
We are going to add more test cases, so use the library supporting test
cases.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/264     | 93 ++++++++++++++++++++++----------------
 tests/qemu-iotests/264.out | 20 ++------
 2 files changed, 58 insertions(+), 55 deletions(-)
diff --git a/tests/qemu-iotests/264 b/tests/qemu-iotests/264
index 0e8b213657..8c61628921 100755
--- a/tests/qemu-iotests/264
+++ b/tests/qemu-iotests/264
@@ -19,13 +19,10 @@
 #
 
 import time
+import os
 
 import iotests
-from iotests import qemu_img_create, file_path, qemu_nbd_popen, log
-
-iotests.script_initialize(
-    supported_fmts=['qcow2'],
-)
+from iotests import qemu_img_create, file_path, qemu_nbd_popen
 
 disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
 nbd_uri = 'nbd+unix:///?socket=' + nbd_sock
@@ -33,46 +30,62 @@ size = 5 * 1024 * 1024
 wait_limit = 3.0
 wait_step = 0.2
 
-qemu_img_create('-f', iotests.imgfmt, disk_a, str(size))
-qemu_img_create('-f', iotests.imgfmt, disk_b, str(size))
 
-with qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b):
-    vm = iotests.VM().add_drive(disk_a)
-    vm.launch()
-    vm.hmp_qemu_io('drive0', 'write 0 {}'.format(size))
+class TestNbdReconnect(iotests.QMPTestCase):
+    def setUp(self):
+        qemu_img_create('-f', iotests.imgfmt, disk_a, str(size))
+        qemu_img_create('-f', iotests.imgfmt, disk_b, str(size))
+        self.vm = iotests.VM().add_drive(disk_a)
+        self.vm.launch()
+        self.vm.hmp_qemu_io('drive0', 'write 0 {}'.format(size))
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(disk_a)
+        os.remove(disk_b)
+
+    def test(self):
+        with qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b):
+            result = self.vm.qmp('blockdev-add',
+                                 **{'node_name': 'backup0',
+                                    'driver': 'raw',
+                                    'file': {'driver': 'nbd',
+                                             'server': {'type': 'unix',
+                                                        'path': nbd_sock},
+                                             'reconnect-delay': 10}})
+            self.assert_qmp(result, 'return', {})
+            result = self.vm.qmp('blockdev-backup', device='drive0',
+                                 sync='full', target='backup0',
+                                 speed=(1 * 1024 * 1024))
+            self.assert_qmp(result, 'return', {})
+
+            # Wait for some progress
+            t = 0.0
+            while t < wait_limit:
+                jobs = self.vm.qmp('query-block-jobs')['return']
+                if jobs and jobs[0]['offset'] > 0:
+                    break
+                time.sleep(wait_step)
+                t += wait_step
 
-    vm.qmp_log('blockdev-add', filters=[iotests.filter_qmp_testfiles],
-               **{'node_name': 'backup0',
-                  'driver': 'raw',
-                  'file': {'driver': 'nbd',
-                           'server': {'type': 'unix', 'path': nbd_sock},
-                           'reconnect-delay': 10}})
-    vm.qmp_log('blockdev-backup', device='drive0', sync='full',
-               target='backup0', speed=(1 * 1024 * 1024))
+            self.assertTrue(jobs and jobs[0]['offset'] > 0)  # job started
 
-    # Wait for some progress
-    t = 0.0
-    while t < wait_limit:
-        jobs = vm.qmp('query-block-jobs')['return']
-        if jobs and jobs[0]['offset'] > 0:
-            break
-        time.sleep(wait_step)
-        t += wait_step
+        jobs = self.vm.qmp('query-block-jobs')['return']
+        # Check that job is still in progress
+        self.assertTrue(jobs and jobs[0]['offset'] < jobs[0]['len'])
 
-    if jobs and jobs[0]['offset'] > 0:
-        log('Backup job is started')
+        result = self.vm.qmp('block-job-set-speed', device='drive0', speed=0)
+        self.assert_qmp(result, 'return', {})
 
-jobs = vm.qmp('query-block-jobs')['return']
-if jobs and jobs[0]['offset'] < jobs[0]['len']:
-    log('Backup job is still in progress')
+        # Emulate server down time for 1 second
+        time.sleep(1)
 
-vm.qmp_log('block-job-set-speed', device='drive0', speed=0)
+        with qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b):
+            e = self.vm.event_wait('BLOCK_JOB_COMPLETED')
+            self.assertEqual(e['data']['offset'], size)
+            result = self.vm.qmp('blockdev-del', node_name='backup0')
+            self.assert_qmp(result, 'return', {})
 
-# Emulate server down time for 1 second
-time.sleep(1)
 
-with qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b):
-    e = vm.event_wait('BLOCK_JOB_COMPLETED')
-    log('Backup completed: {}'.format(e['data']['offset']))
-    vm.qmp_log('blockdev-del', node_name='backup0')
-    vm.shutdown()
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/264.out b/tests/qemu-iotests/264.out
index c45b1e81ef..ae1213e6f8 100644
--- a/tests/qemu-iotests/264.out
+++ b/tests/qemu-iotests/264.out
@@ -1,15 +1,5 @@
-Start NBD server
-{"execute": "blockdev-add", "arguments": {"driver": "raw", "file": {"driver": "nbd", "reconnect-delay": 10, "server": {"path": "TEST_DIR/PID-nbd-sock", "type": "unix"}}, "node-name": "backup0"}}
-{"return": {}}
-{"execute": "blockdev-backup", "arguments": {"device": "drive0", "speed": 1048576, "sync": "full", "target": "backup0"}}
-{"return": {}}
-Backup job is started
-Kill NBD server
-Backup job is still in progress
-{"execute": "block-job-set-speed", "arguments": {"device": "drive0", "speed": 0}}
-{"return": {}}
-Start NBD server
-Backup completed: 5242880
-{"execute": "blockdev-del", "arguments": {"node-name": "backup0"}}
-{"return": {}}
-Kill NBD server
+.
+----------------------------------------------------------------------
+Ran 1 tests
+
+OK
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH 08/11] iotests.py: qemu_nbd_popen: remove pid file after use
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2020-11-18 18:04 ` [PATCH 07/11] iotests/264: move to python unittest Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-21  1:19   ` Eric Blake
  2020-11-18 18:04 ` [PATCH 09/11] iotests/264: add mirror-cancel test-case Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
To note interfere with other qemu_nbd_popen() calls in same test.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/iotests.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index bcd4fe5b6f..6df280a97e 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -291,7 +291,9 @@ def qemu_nbd_list_log(*args: str) -> str:
 @contextmanager
 def qemu_nbd_popen(*args):
     '''Context manager running qemu-nbd within the context'''
-    pid_file = file_path("pid")
+    pid_file = file_path("qemu_nbd_popen-nbd-pid-file")
+
+    assert not os.path.exists(pid_file)
 
     cmd = list(qemu_nbd_args)
     cmd.extend(('--persistent', '--pid-file', pid_file))
@@ -309,6 +311,8 @@ def qemu_nbd_popen(*args):
             time.sleep(0.01)
         yield
     finally:
+        if os.path.exists(pid_file):
+            os.remove(pid_file)
         log('Kill NBD server')
         p.kill()
         p.wait()
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH 09/11] iotests/264: add mirror-cancel test-case
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2020-11-18 18:04 ` [PATCH 08/11] iotests.py: qemu_nbd_popen: remove pid file after use Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-21  1:26   ` Eric Blake
  2020-11-18 18:04 ` [PATCH 10/11] block/backup: implement .cancel job handler Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
Check that cancel doesn't wait for 10s of nbd reconnect timeout.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/264     | 38 ++++++++++++++++++++++++++++++--------
 tests/qemu-iotests/264.out |  4 ++--
 2 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/tests/qemu-iotests/264 b/tests/qemu-iotests/264
index 8c61628921..3c6f29317f 100755
--- a/tests/qemu-iotests/264
+++ b/tests/qemu-iotests/264
@@ -26,25 +26,26 @@ from iotests import qemu_img_create, file_path, qemu_nbd_popen
 
 disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
 nbd_uri = 'nbd+unix:///?socket=' + nbd_sock
-size = 5 * 1024 * 1024
 wait_limit = 3.0
 wait_step = 0.2
 
 
 class TestNbdReconnect(iotests.QMPTestCase):
-    def setUp(self):
-        qemu_img_create('-f', iotests.imgfmt, disk_a, str(size))
-        qemu_img_create('-f', iotests.imgfmt, disk_b, str(size))
+    def init_vm(self, disk_size):
+        qemu_img_create('-f', iotests.imgfmt, disk_a, str(disk_size))
+        qemu_img_create('-f', iotests.imgfmt, disk_b, str(disk_size))
         self.vm = iotests.VM().add_drive(disk_a)
         self.vm.launch()
-        self.vm.hmp_qemu_io('drive0', 'write 0 {}'.format(size))
+        self.vm.hmp_qemu_io('drive0', 'write 0 {}'.format(disk_size))
 
     def tearDown(self):
         self.vm.shutdown()
         os.remove(disk_a)
         os.remove(disk_b)
 
-    def test(self):
+    def start_job(self, job):
+        """Stat job with nbd target and kill the server"""
+        assert job in ('blockdev-backup', 'blockdev-mirror')
         with qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b):
             result = self.vm.qmp('blockdev-add',
                                  **{'node_name': 'backup0',
@@ -54,7 +55,7 @@ class TestNbdReconnect(iotests.QMPTestCase):
                                                         'path': nbd_sock},
                                              'reconnect-delay': 10}})
             self.assert_qmp(result, 'return', {})
-            result = self.vm.qmp('blockdev-backup', device='drive0',
+            result = self.vm.qmp(job, device='drive0',
                                  sync='full', target='backup0',
                                  speed=(1 * 1024 * 1024))
             self.assert_qmp(result, 'return', {})
@@ -72,7 +73,8 @@ class TestNbdReconnect(iotests.QMPTestCase):
 
         jobs = self.vm.qmp('query-block-jobs')['return']
         # Check that job is still in progress
-        self.assertTrue(jobs and jobs[0]['offset'] < jobs[0]['len'])
+        self.assertTrue(jobs)
+        self.assertTrue(jobs[0]['offset'] < jobs[0]['len'])
 
         result = self.vm.qmp('block-job-set-speed', device='drive0', speed=0)
         self.assert_qmp(result, 'return', {})
@@ -80,12 +82,32 @@ class TestNbdReconnect(iotests.QMPTestCase):
         # Emulate server down time for 1 second
         time.sleep(1)
 
+    def test_backup(self):
+        size = 5 * 1024 * 1024
+        self.init_vm(size)
+        self.start_job('blockdev-backup')
+
         with qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b):
             e = self.vm.event_wait('BLOCK_JOB_COMPLETED')
             self.assertEqual(e['data']['offset'], size)
             result = self.vm.qmp('blockdev-del', node_name='backup0')
             self.assert_qmp(result, 'return', {})
 
+    def test_mirror_cancel(self):
+        # Mirror speed limit doesn't work well enough, it seems that mirror
+        # will run many parallel requests anyway. MAX_IN_FLIGHT is 16 and
+        # MAX_IO_BYTES is 1M in mirror.c, so let's use 20M disk.
+        self.init_vm(20 * 1024 * 1024)
+        self.start_job('blockdev-mirror')
+
+        result = self.vm.qmp('block-job-cancel', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        start_t = time.time()
+        self.vm.event_wait('BLOCK_JOB_CANCELLED')
+        delta_t = time.time() - start_t
+        self.assertTrue(delta_t < 2.0)
+
 
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/264.out b/tests/qemu-iotests/264.out
index ae1213e6f8..fbc63e62f8 100644
--- a/tests/qemu-iotests/264.out
+++ b/tests/qemu-iotests/264.out
@@ -1,5 +1,5 @@
-.
+..
 ----------------------------------------------------------------------
-Ran 1 tests
+Ran 2 tests
 
 OK
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH 10/11] block/backup: implement .cancel job handler
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2020-11-18 18:04 ` [PATCH 09/11] iotests/264: add mirror-cancel test-case Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-21  1:27   ` Eric Blake
  2020-11-18 18:04 ` [PATCH 11/11] iotests/264: add backup-cancel test-case Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
Cancel in-flight io on target to not waste the time.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup.c | 10 ++++++++++
 1 file changed, 10 insertions(+)
diff --git a/block/backup.c b/block/backup.c
index 9afa0bf3b4..9878594c83 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -36,6 +36,7 @@ typedef struct BackupBlockJob {
     BlockJob common;
     BlockDriverState *backup_top;
     BlockDriverState *source_bs;
+    BlockDriverState *target_bs;
 
     BdrvDirtyBitmap *sync_bitmap;
 
@@ -279,6 +280,13 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
     return ret;
 }
 
+static void backup_cancel(Job *job)
+{
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
+
+    bdrv_cancel_in_flight(s->target_bs);
+}
+
 static const BlockJobDriver backup_job_driver = {
     .job_driver = {
         .instance_size          = sizeof(BackupBlockJob),
@@ -289,6 +297,7 @@ static const BlockJobDriver backup_job_driver = {
         .commit                 = backup_commit,
         .abort                  = backup_abort,
         .clean                  = backup_clean,
+        .cancel                 = backup_cancel,
     }
 };
 
@@ -456,6 +465,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
 
     job->backup_top = backup_top;
     job->source_bs = bs;
+    job->target_bs = target;
     job->on_source_error = on_source_error;
     job->on_target_error = on_target_error;
     job->sync_mode = sync_mode;
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* [PATCH 11/11] iotests/264: add backup-cancel test-case
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2020-11-18 18:04 ` [PATCH 10/11] block/backup: implement .cancel job handler Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:04 ` Vladimir Sementsov-Ogievskiy
  2021-01-21  1:28   ` Eric Blake
  2020-11-18 18:19 ` [PATCH 00/11] mirror: cancel nbd reconnect Eric Blake
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:04 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, eblake, fam, stefanha, mreitz, kwolf, jsnow,
	vsementsov, den
Check that cancel doesn't wait for 10s of nbd reconnect timeout.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/264 | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/tests/qemu-iotests/264 b/tests/qemu-iotests/264
index 3c6f29317f..b830078834 100755
--- a/tests/qemu-iotests/264
+++ b/tests/qemu-iotests/264
@@ -93,13 +93,7 @@ class TestNbdReconnect(iotests.QMPTestCase):
             result = self.vm.qmp('blockdev-del', node_name='backup0')
             self.assert_qmp(result, 'return', {})
 
-    def test_mirror_cancel(self):
-        # Mirror speed limit doesn't work well enough, it seems that mirror
-        # will run many parallel requests anyway. MAX_IN_FLIGHT is 16 and
-        # MAX_IO_BYTES is 1M in mirror.c, so let's use 20M disk.
-        self.init_vm(20 * 1024 * 1024)
-        self.start_job('blockdev-mirror')
-
+    def cancel_job(self):
         result = self.vm.qmp('block-job-cancel', device='drive0')
         self.assert_qmp(result, 'return', {})
 
@@ -108,6 +102,19 @@ class TestNbdReconnect(iotests.QMPTestCase):
         delta_t = time.time() - start_t
         self.assertTrue(delta_t < 2.0)
 
+    def test_mirror_cancel(self):
+        # Mirror speed limit doesn't work well enough, it seems that mirror
+        # will run many parallel requests anyway. MAX_IN_FLIGHT is 16 and
+        # MAX_IO_BYTES is 1M in mirror.c, so let's use 20M disk.
+        self.init_vm(20 * 1024 * 1024)
+        self.start_job('blockdev-mirror')
+        self.cancel_job()
+
+    def test_backup_cancel(self):
+        self.init_vm(5 * 1024 * 1024)
+        self.start_job('blockdev-backup')
+        self.cancel_job()
+
 
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2'])
-- 
2.21.3
^ permalink raw reply related	[flat|nested] 36+ messages in thread
* Re: [PATCH 00/11] mirror: cancel nbd reconnect
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2020-11-18 18:04 ` [PATCH 11/11] iotests/264: add backup-cancel test-case Vladimir Sementsov-Ogievskiy
@ 2020-11-18 18:19 ` Eric Blake
  2020-11-18 18:36   ` Vladimir Sementsov-Ogievskiy
  2020-12-18 11:05 ` Vladimir Sementsov-Ogievskiy
  2021-01-21  2:14 ` Eric Blake
  13 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2020-11-18 18:19 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The problem
> 
> Assume we have mirror job with nbd target node with enabled reconnect.
> Connection failed. So, all current requests to nbd node are waiting for
> nbd driver to reconnect. And they will wait for reconnect-delay time
> specified in nbd blockdev options. This timeout may be long enough, for
> example, we in Virtuozzo use 300 seconds by default.
> 
> So, if at this moment user tries to cancel the job, job will wait for
> its in-flight requests to finish up to 300 seconds. From the user point
> of view, cancelling the job takes a long time. Bad.
> 
> Solution
> 
> Let's just cancel "waiting for reconnect in in-flight request coroutines"
> on mirror (and backup) cancel. Welcome the series below.
Given that we're past -rc2, I think this is enough on the 'new feature'
side to defer into 6.0 rather than trying to claim as a bug-fix needed
for 5.2-rc3.
That said, the summary does make it sound like a worthwhile thing to add.
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 00/11] mirror: cancel nbd reconnect
  2020-11-18 18:19 ` [PATCH 00/11] mirror: cancel nbd reconnect Eric Blake
@ 2020-11-18 18:36   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-11-18 18:36 UTC (permalink / raw)
  To: Eric Blake, qemu-block
  Cc: qemu-devel, fam, stefanha, mreitz, kwolf, jsnow, den
18.11.2020 21:19, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> The problem
>>
>> Assume we have mirror job with nbd target node with enabled reconnect.
>> Connection failed. So, all current requests to nbd node are waiting for
>> nbd driver to reconnect. And they will wait for reconnect-delay time
>> specified in nbd blockdev options. This timeout may be long enough, for
>> example, we in Virtuozzo use 300 seconds by default.
>>
>> So, if at this moment user tries to cancel the job, job will wait for
>> its in-flight requests to finish up to 300 seconds. From the user point
>> of view, cancelling the job takes a long time. Bad.
>>
>> Solution
>>
>> Let's just cancel "waiting for reconnect in in-flight request coroutines"
>> on mirror (and backup) cancel. Welcome the series below.
> 
> Given that we're past -rc2, I think this is enough on the 'new feature'
> side to defer into 6.0 rather than trying to claim as a bug-fix needed
> for 5.2-rc3.
Of course.
> 
> That said, the summary does make it sound like a worthwhile thing to add.
> 
-- 
Best regards,
Vladimir
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 00/11] mirror: cancel nbd reconnect
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (11 preceding siblings ...)
  2020-11-18 18:19 ` [PATCH 00/11] mirror: cancel nbd reconnect Eric Blake
@ 2020-12-18 11:05 ` Vladimir Sementsov-Ogievskiy
  2021-01-09 10:11   ` Vladimir Sementsov-Ogievskiy
  2021-01-21  2:14 ` Eric Blake
  13 siblings, 1 reply; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-12-18 11:05 UTC (permalink / raw)
  To: qemu-block; +Cc: fam, kwolf, jsnow, qemu-devel, mreitz, stefanha, den
ping
18.11.2020 21:04, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The problem
> 
> Assume we have mirror job with nbd target node with enabled reconnect.
> Connection failed. So, all current requests to nbd node are waiting for
> nbd driver to reconnect. And they will wait for reconnect-delay time
> specified in nbd blockdev options. This timeout may be long enough, for
> example, we in Virtuozzo use 300 seconds by default.
> 
> So, if at this moment user tries to cancel the job, job will wait for
> its in-flight requests to finish up to 300 seconds. From the user point
> of view, cancelling the job takes a long time. Bad.
> 
> Solution
> 
> Let's just cancel "waiting for reconnect in in-flight request coroutines"
> on mirror (and backup) cancel. Welcome the series below.
> 
> Vladimir Sementsov-Ogievskiy (11):
>    block: add new BlockDriver handler: bdrv_cancel_in_flight
>    block/nbd: implement .bdrv_cancel_in_flight
>    block/raw-format: implement .bdrv_cancel_in_flight handler
>    job: add .cancel handler for the driver
>    block/mirror: implement .cancel job handler
>    iotests/264: fix style
>    iotests/264: move to python unittest
>    iotests.py: qemu_nbd_popen: remove pid file after use
>    iotests/264: add mirror-cancel test-case
>    block/backup: implement .cancel job handler
>    iotests/264: add backup-cancel test-case
> 
>   include/block/block.h         |   3 +
>   include/block/block_int.h     |   9 +++
>   include/qemu/job.h            |   5 ++
>   block/backup.c                |  10 +++
>   block/io.c                    |  11 +++
>   block/mirror.c                |   9 +++
>   block/nbd.c                   |  15 ++++
>   block/raw-format.c            |   6 ++
>   job.c                         |   3 +
>   tests/qemu-iotests/264        | 143 ++++++++++++++++++++++------------
>   tests/qemu-iotests/264.out    |  20 ++---
>   tests/qemu-iotests/iotests.py |   6 +-
>   12 files changed, 173 insertions(+), 67 deletions(-)
> 
-- 
Best regards,
Vladimir
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 00/11] mirror: cancel nbd reconnect
  2020-12-18 11:05 ` Vladimir Sementsov-Ogievskiy
@ 2021-01-09 10:11   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-01-09 10:11 UTC (permalink / raw)
  To: qemu-block; +Cc: fam, kwolf, jsnow, qemu-devel, mreitz, stefanha, den
ping ping
18.12.2020 14:05, Vladimir Sementsov-Ogievskiy wrote:
> ping
> 
> 18.11.2020 21:04, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> The problem
>>
>> Assume we have mirror job with nbd target node with enabled reconnect.
>> Connection failed. So, all current requests to nbd node are waiting for
>> nbd driver to reconnect. And they will wait for reconnect-delay time
>> specified in nbd blockdev options. This timeout may be long enough, for
>> example, we in Virtuozzo use 300 seconds by default.
>>
>> So, if at this moment user tries to cancel the job, job will wait for
>> its in-flight requests to finish up to 300 seconds. From the user point
>> of view, cancelling the job takes a long time. Bad.
>>
>> Solution
>>
>> Let's just cancel "waiting for reconnect in in-flight request coroutines"
>> on mirror (and backup) cancel. Welcome the series below.
>>
>> Vladimir Sementsov-Ogievskiy (11):
>>    block: add new BlockDriver handler: bdrv_cancel_in_flight
>>    block/nbd: implement .bdrv_cancel_in_flight
>>    block/raw-format: implement .bdrv_cancel_in_flight handler
>>    job: add .cancel handler for the driver
>>    block/mirror: implement .cancel job handler
>>    iotests/264: fix style
>>    iotests/264: move to python unittest
>>    iotests.py: qemu_nbd_popen: remove pid file after use
>>    iotests/264: add mirror-cancel test-case
>>    block/backup: implement .cancel job handler
>>    iotests/264: add backup-cancel test-case
>>
>>   include/block/block.h         |   3 +
>>   include/block/block_int.h     |   9 +++
>>   include/qemu/job.h            |   5 ++
>>   block/backup.c                |  10 +++
>>   block/io.c                    |  11 +++
>>   block/mirror.c                |   9 +++
>>   block/nbd.c                   |  15 ++++
>>   block/raw-format.c            |   6 ++
>>   job.c                         |   3 +
>>   tests/qemu-iotests/264        | 143 ++++++++++++++++++++++------------
>>   tests/qemu-iotests/264.out    |  20 ++---
>>   tests/qemu-iotests/iotests.py |   6 +-
>>   12 files changed, 173 insertions(+), 67 deletions(-)
>>
> 
> 
-- 
Best regards,
Vladimir
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 01/11] block: add new BlockDriver handler: bdrv_cancel_in_flight
  2020-11-18 18:04 ` [PATCH 01/11] block: add new BlockDriver handler: bdrv_cancel_in_flight Vladimir Sementsov-Ogievskiy
@ 2021-01-20 22:27   ` Eric Blake
  2021-01-21 18:50     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2021-01-20 22:27 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> It will be used to stop retrying NBD requests on mirror cancel.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/block.h     |  3 +++
>  include/block/block_int.h |  9 +++++++++
>  block/io.c                | 11 +++++++++++
>  3 files changed, 23 insertions(+)
> 
How does this relate to the recent addition of the QMP yank command?
> +++ b/include/block/block_int.h
> @@ -344,6 +344,15 @@ struct BlockDriver {
>          bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
>          int64_t *map, BlockDriverState **file);
>  
> +    /*
> +     * This informs the driver that we are not more interested in in-flight
that we are no longer interested in the result of in-flight requests, so
> +     * requests results, so don't waste the time if possible.
> +     *
> +     * The example usage is to not wait for nbd target nodedreconnect timeout on
> +     * job-cancel.
One example usage is to avoid waiting for an nbd target node reconnect
timeout during job-cancel.
Reviewed-by: Eric Blake <eblake@redhat.com>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 02/11] block/nbd: implement .bdrv_cancel_in_flight
  2020-11-18 18:04 ` [PATCH 02/11] block/nbd: implement .bdrv_cancel_in_flight Vladimir Sementsov-Ogievskiy
@ 2021-01-20 23:13   ` Eric Blake
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Blake @ 2021-01-20 23:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Just stop waiting for connection in existing requests.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
That turned out to be rather easy ;)
Reviewed-by: Eric Blake <eblake@redhat.com>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 03/11] block/raw-format: implement .bdrv_cancel_in_flight handler
  2020-11-18 18:04 ` [PATCH 03/11] block/raw-format: implement .bdrv_cancel_in_flight handler Vladimir Sementsov-Ogievskiy
@ 2021-01-20 23:15   ` Eric Blake
  2021-01-21 19:08     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2021-01-20 23:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> We are going to cancel in-flight requests on mirror nbd target on job
> cancel. Still nbd is often used not directly but as raw-format child.
> So, add pass-through handler here.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/raw-format.c | 6 ++++++
>  1 file changed, 6 insertions(+)
Should all filters do this automatically (or rather, should the block
layer do this automatically for all filters)?  But I understand that it
does NOT make sense for format drivers in general (cancelling a guest
request may still require metadata updates), where raw-format is the
exception, so doing it in raw-format instead of the block layer makes sense.
>  BlockDriver bdrv_raw = {
>      .format_name          = "raw",
>      .instance_size        = sizeof(BDRVRawState),
> @@ -608,6 +613,7 @@ BlockDriver bdrv_raw = {
>      .bdrv_has_zero_init   = &raw_has_zero_init,
>      .strong_runtime_opts  = raw_strong_runtime_opts,
>      .mutable_opts         = mutable_opts,
> +    .bdrv_cancel_in_flight = raw_cancel_in_flight,
A demonstration of why I don't like aligning the =.  But it's merely
cosmetic, so doesn't affect:
Reviewed-by: Eric Blake <eblake@redhat.com>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 04/11] job: add .cancel handler for the driver
  2020-11-18 18:04 ` [PATCH 04/11] job: add .cancel handler for the driver Vladimir Sementsov-Ogievskiy
@ 2021-01-20 23:17   ` Eric Blake
  2021-01-21  9:32     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2021-01-20 23:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> To be used in mirror in the following commit to cancel in-flight io on
> target to not waste the time.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/qemu/job.h | 5 +++++
>  job.c              | 3 +++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 32aabb1c60..efc6fa7544 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -251,6 +251,11 @@ struct JobDriver {
>       */
>      void (*clean)(Job *job);
>  
> +    /**
> +     * If the callback is not NULL, it will be invoked in job_cancel_async
> +     */
> +    void (*cancel)(Job *job);
> +
Does the call need to be re-entrant or even worry about being invoked
more than once on the same BDS?  Or worded differently,
> +++ b/job.c
> @@ -712,6 +712,9 @@ static int job_finalize_single(Job *job)
>  
>  static void job_cancel_async(Job *job, bool force)
>  {
> +    if (job->driver->cancel) {
> +        job->driver->cancel(job);
> +    }
>      if (job->user_paused) {
can job_cancel_async be reached more than once on the same BDS?
>          /* Do not call job_enter here, the caller will handle it.  */
>          if (job->driver->user_resume) {
> 
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 05/11] block/mirror: implement .cancel job handler
  2020-11-18 18:04 ` [PATCH 05/11] block/mirror: implement .cancel job handler Vladimir Sementsov-Ogievskiy
@ 2021-01-21  1:05   ` Eric Blake
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Blake @ 2021-01-21  1:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Cancel in-flight io on target to not waste the time.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/mirror.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 06/11] iotests/264: fix style
  2020-11-18 18:04 ` [PATCH 06/11] iotests/264: fix style Vladimir Sementsov-Ogievskiy
@ 2021-01-21  1:13   ` Eric Blake
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Blake @ 2021-01-21  1:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Fix long line, extra import and one mypy complaint about incompatible
> int and float.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/264 | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 07/11] iotests/264: move to python unittest
  2020-11-18 18:04 ` [PATCH 07/11] iotests/264: move to python unittest Vladimir Sementsov-Ogievskiy
@ 2021-01-21  1:17   ` Eric Blake
  2021-01-21 19:29     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2021-01-21  1:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> We are going to add more test cases, so use the library supporting test
> cases.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/264     | 93 ++++++++++++++++++++++----------------
>  tests/qemu-iotests/264.out | 20 ++------
>  2 files changed, 58 insertions(+), 55 deletions(-)
> 
> +++ b/tests/qemu-iotests/264.out
> @@ -1,15 +1,5 @@
> -Start NBD server
> -{"execute": "blockdev-add", "arguments": {"driver": "raw", "file": {"driver": "nbd", "reconnect-delay": 10, "server": {"path": "TEST_DIR/PID-nbd-sock", "type": "unix"}}, "node-name": "backup0"}}
> -{"return": {}}
> -{"execute": "blockdev-backup", "arguments": {"device": "drive0", "speed": 1048576, "sync": "full", "target": "backup0"}}
> -{"return": {}}
> -Backup job is started
> -Kill NBD server
> -Backup job is still in progress
> -{"execute": "block-job-set-speed", "arguments": {"device": "drive0", "speed": 0}}
> -{"return": {}}
> -Start NBD server
> -Backup completed: 5242880
> -{"execute": "blockdev-del", "arguments": {"node-name": "backup0"}}
> -{"return": {}}
> -Kill NBD server
> +.
> +----------------------------------------------------------------------
> +Ran 1 tests
> +
> +OK
I find it a shame that the expected output no longer shows what was
executed.  But the test still passes, and if it makes it easier for you
to extend the test in a later patch, I won't stand in the way (this is
more an indication that by stripping the useful output, I'm no longer in
as decent a position to help debug if the test starts failing).
Reviewed-by: Eric Blake <eblake@redhat.com>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 08/11] iotests.py: qemu_nbd_popen: remove pid file after use
  2020-11-18 18:04 ` [PATCH 08/11] iotests.py: qemu_nbd_popen: remove pid file after use Vladimir Sementsov-Ogievskiy
@ 2021-01-21  1:19   ` Eric Blake
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Blake @ 2021-01-21  1:19 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> To note interfere with other qemu_nbd_popen() calls in same test.
not
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/iotests.py | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 09/11] iotests/264: add mirror-cancel test-case
  2020-11-18 18:04 ` [PATCH 09/11] iotests/264: add mirror-cancel test-case Vladimir Sementsov-Ogievskiy
@ 2021-01-21  1:26   ` Eric Blake
  2021-01-21 19:42     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2021-01-21  1:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Check that cancel doesn't wait for 10s of nbd reconnect timeout.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/264     | 38 ++++++++++++++++++++++++++++++--------
>  tests/qemu-iotests/264.out |  4 ++--
>  2 files changed, 32 insertions(+), 10 deletions(-)
>  
> +    def test_mirror_cancel(self):
> +        # Mirror speed limit doesn't work well enough, it seems that mirror
> +        # will run many parallel requests anyway. MAX_IN_FLIGHT is 16 and
> +        # MAX_IO_BYTES is 1M in mirror.c, so let's use 20M disk.
> +        self.init_vm(20 * 1024 * 1024)
> +        self.start_job('blockdev-mirror')
Is this comment still accurate given recent work on the mirror filter?
I'm fine taking the patch as-is and tweaking it with followups, though,
in order to make progress.
> +
> +        result = self.vm.qmp('block-job-cancel', device='drive0')
> +        self.assert_qmp(result, 'return', {})
> +
> +        start_t = time.time()
> +        self.vm.event_wait('BLOCK_JOB_CANCELLED')
> +        delta_t = time.time() - start_t
> +        self.assertTrue(delta_t < 2.0)
I hope this doesn't fail on CI platforms under heavy load.  It didn't
fail for me locally, but I hope we don't have to revisit it.  Is there
any way we can test this in a manner that is not as fragile?
Reviewed-by: Eric Blake <eblake@redhat.com>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 10/11] block/backup: implement .cancel job handler
  2020-11-18 18:04 ` [PATCH 10/11] block/backup: implement .cancel job handler Vladimir Sementsov-Ogievskiy
@ 2021-01-21  1:27   ` Eric Blake
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Blake @ 2021-01-21  1:27 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Cancel in-flight io on target to not waste the time.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/backup.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 11/11] iotests/264: add backup-cancel test-case
  2020-11-18 18:04 ` [PATCH 11/11] iotests/264: add backup-cancel test-case Vladimir Sementsov-Ogievskiy
@ 2021-01-21  1:28   ` Eric Blake
  2021-01-21  2:21     ` Eric Blake
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2021-01-21  1:28 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Check that cancel doesn't wait for 10s of nbd reconnect timeout.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/264 | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 00/11] mirror: cancel nbd reconnect
  2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
                   ` (12 preceding siblings ...)
  2020-12-18 11:05 ` Vladimir Sementsov-Ogievskiy
@ 2021-01-21  2:14 ` Eric Blake
  2021-01-21 19:44   ` Vladimir Sementsov-Ogievskiy
  13 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2021-01-21  2:14 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The problem
> 
> Assume we have mirror job with nbd target node with enabled reconnect.
> Connection failed. So, all current requests to nbd node are waiting for
> nbd driver to reconnect. And they will wait for reconnect-delay time
> specified in nbd blockdev options. This timeout may be long enough, for
> example, we in Virtuozzo use 300 seconds by default.
> 
> So, if at this moment user tries to cancel the job, job will wait for
> its in-flight requests to finish up to 300 seconds. From the user point
> of view, cancelling the job takes a long time. Bad.
> 
> Solution
> 
> Let's just cancel "waiting for reconnect in in-flight request coroutines"
> on mirror (and backup) cancel. Welcome the series below.
Because of my question on 4/11, I did not queue the entire series yet.
But 6/11 was trivial enough to queue now.
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 11/11] iotests/264: add backup-cancel test-case
  2021-01-21  1:28   ` Eric Blake
@ 2021-01-21  2:21     ` Eric Blake
  2021-01-21 19:43       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2021-01-21  2:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
On 1/20/21 7:28 PM, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Check that cancel doesn't wait for 10s of nbd reconnect timeout.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  tests/qemu-iotests/264 | 21 ++++++++++++++-------
>>  1 file changed, 14 insertions(+), 7 deletions(-)
>>
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
With this patch applied as-is, the test fails for me:
--- /home/eblake/qemu/tests/qemu-iotests/264.out	2021-01-20
20:18:54.741366521 -0600
+++ /home/eblake/qemu/build/tests/qemu-iotests/264.out.bad	2021-01-20
20:20:37.242451172 -0600
@@ -1,5 +1,5 @@
-..
+...
 ----------------------------------------------------------------------
-Ran 2 tests
+Ran 3 tests
but with no obvious visibility into why.  Did you forget to check in
changes to 264.out?
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 04/11] job: add .cancel handler for the driver
  2021-01-20 23:17   ` Eric Blake
@ 2021-01-21  9:32     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-01-21  9:32 UTC (permalink / raw)
  To: Eric Blake, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
21.01.2021 02:17, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> To be used in mirror in the following commit to cancel in-flight io on
>> target to not waste the time.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   include/qemu/job.h | 5 +++++
>>   job.c              | 3 +++
>>   2 files changed, 8 insertions(+)
>>
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index 32aabb1c60..efc6fa7544 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -251,6 +251,11 @@ struct JobDriver {
>>        */
>>       void (*clean)(Job *job);
>>   
>> +    /**
>> +     * If the callback is not NULL, it will be invoked in job_cancel_async
>> +     */
>> +    void (*cancel)(Job *job);
>> +
> 
> Does the call need to be re-entrant or even worry about being invoked
> more than once on the same BDS?  Or worded differently,
> 
>> +++ b/job.c
>> @@ -712,6 +712,9 @@ static int job_finalize_single(Job *job)
>>   
>>   static void job_cancel_async(Job *job, bool force)
>>   {
>> +    if (job->driver->cancel) {
>> +        job->driver->cancel(job);
>> +    }
>>       if (job->user_paused) {
> 
> can job_cancel_async be reached more than once on the same BDS?
what do you mean by same BDS? On same job?
I don't think that job_cancel_async should be called several times during one "cancel" operation. But if it is, it's still safe enough with only .cancel implementation in [05], which just calls bdrv_cancel_in_flight() (which of course should be safe to call several times).
> 
>>           /* Do not call job_enter here, the caller will handle it.  */
>>           if (job->driver->user_resume) {
>>
> 
-- 
Best regards,
Vladimir
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 01/11] block: add new BlockDriver handler: bdrv_cancel_in_flight
  2021-01-20 22:27   ` Eric Blake
@ 2021-01-21 18:50     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-01-21 18:50 UTC (permalink / raw)
  To: Eric Blake, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
21.01.2021 01:27, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> It will be used to stop retrying NBD requests on mirror cancel.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   include/block/block.h     |  3 +++
>>   include/block/block_int.h |  9 +++++++++
>>   block/io.c                | 11 +++++++++++
>>   3 files changed, 23 insertions(+)
>>
> 
> How does this relate to the recent addition of the QMP yank command?
Hmm. Don't know.. Looking in spec:
# Currently implemented yank instances:
#  - nbd block device:
#    Yanking it will shut down the connection to the nbd server without
#    attempting to reconnect.
Looks like a close thing. But actually, I don't want to stop reconnecting process, but only cancel some requests which we don't want to wait for. After that, the nbd node may successfully reconnect and continue to work.
> 
> 
>> +++ b/include/block/block_int.h
>> @@ -344,6 +344,15 @@ struct BlockDriver {
>>           bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
>>           int64_t *map, BlockDriverState **file);
>>   
>> +    /*
>> +     * This informs the driver that we are not more interested in in-flight
> 
> that we are no longer interested in the result of in-flight requests, so
> 
>> +     * requests results, so don't waste the time if possible.
>> +     *
>> +     * The example usage is to not wait for nbd target nodedreconnect timeout on
>> +     * job-cancel.
> 
> One example usage is to avoid waiting for an nbd target node reconnect
> timeout during job-cancel.
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
-- 
Best regards,
Vladimir
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 03/11] block/raw-format: implement .bdrv_cancel_in_flight handler
  2021-01-20 23:15   ` Eric Blake
@ 2021-01-21 19:08     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-01-21 19:08 UTC (permalink / raw)
  To: Eric Blake, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
21.01.2021 02:15, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> We are going to cancel in-flight requests on mirror nbd target on job
>> cancel. Still nbd is often used not directly but as raw-format child.
>> So, add pass-through handler here.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/raw-format.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
> 
> Should all filters do this automatically (or rather, should the block
> layer do this automatically for all filters)?  But I understand that it
> does NOT make sense for format drivers in general (cancelling a guest
> request may still require metadata updates), where raw-format is the
> exception, so doing it in raw-format instead of the block layer makes sense.
> 
Hmm.. probably not for backup_top filter. But of course OK for throttling
> 
>>   BlockDriver bdrv_raw = {
>>       .format_name          = "raw",
>>       .instance_size        = sizeof(BDRVRawState),
>> @@ -608,6 +613,7 @@ BlockDriver bdrv_raw = {
>>       .bdrv_has_zero_init   = &raw_has_zero_init,
>>       .strong_runtime_opts  = raw_strong_runtime_opts,
>>       .mutable_opts         = mutable_opts,
>> +    .bdrv_cancel_in_flight = raw_cancel_in_flight,
> 
> A demonstration of why I don't like aligning the =.  But it's merely
> cosmetic, so doesn't affect:
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
-- 
Best regards,
Vladimir
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 07/11] iotests/264: move to python unittest
  2021-01-21  1:17   ` Eric Blake
@ 2021-01-21 19:29     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-01-21 19:29 UTC (permalink / raw)
  To: Eric Blake, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
21.01.2021 04:17, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> We are going to add more test cases, so use the library supporting test
>> cases.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/264     | 93 ++++++++++++++++++++++----------------
>>   tests/qemu-iotests/264.out | 20 ++------
>>   2 files changed, 58 insertions(+), 55 deletions(-)
>>
> 
>> +++ b/tests/qemu-iotests/264.out
>> @@ -1,15 +1,5 @@
>> -Start NBD server
>> -{"execute": "blockdev-add", "arguments": {"driver": "raw", "file": {"driver": "nbd", "reconnect-delay": 10, "server": {"path": "TEST_DIR/PID-nbd-sock", "type": "unix"}}, "node-name": "backup0"}}
>> -{"return": {}}
>> -{"execute": "blockdev-backup", "arguments": {"device": "drive0", "speed": 1048576, "sync": "full", "target": "backup0"}}
>> -{"return": {}}
>> -Backup job is started
>> -Kill NBD server
>> -Backup job is still in progress
>> -{"execute": "block-job-set-speed", "arguments": {"device": "drive0", "speed": 0}}
>> -{"return": {}}
>> -Start NBD server
>> -Backup completed: 5242880
>> -{"execute": "blockdev-del", "arguments": {"node-name": "backup0"}}
>> -{"return": {}}
>> -Kill NBD server
>> +.
>> +----------------------------------------------------------------------
>> +Ran 1 tests
>> +
>> +OK
> 
> I find it a shame that the expected output no longer shows what was
> executed.  But the test still passes, and if it makes it easier for you
> to extend the test in a later patch, I won't stand in the way (this is
> more an indication that by stripping the useful output, I'm no longer in
> as decent a position to help debug if the test starts failing).
> 
Still, what is executed is understandable from the test itself.. And IMHO, debugging python unittests is simpler: you get the stack, and immediately see what happens. When with output-checking tests, you should first understand, what is the statement corresponding to the wrong output. It's not saying about the fact that with unittests I can simply test only one test-case (that's the reason, why I think that tests with several testcases must be written as unittest tests). And debugging output-checking tests with a lot of test-cases inside is always a pain for me.
Another benefit of unittest: on failure test immediately finishes. With output-checking tests, test continue to execute, and may produce unnecessary not-matching log, or hang, or anything else.
Another drawback of output-cheking tests: they often test too much unrelated things. Sometimes it's good: you can catch some unrelated bug :) But often it's a pain: you have to modify test outputs when creating new features or changing the behaviour actually unrelated to what the test actually want to test.
Python unittests are more difficult to write, as you should understand what exactly you want/should to check.. When with output-checking tests you can just log everything. But in general I'm for python unittests.
Still I think sometimes about supporting output for python-unitests based tests (not loosing the ability to execute test-cases in separate, may be .out file per test-case?), it may be a good compromise.
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
-- 
Best regards,
Vladimir
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 09/11] iotests/264: add mirror-cancel test-case
  2021-01-21  1:26   ` Eric Blake
@ 2021-01-21 19:42     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-01-21 19:42 UTC (permalink / raw)
  To: Eric Blake, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
21.01.2021 04:26, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Check that cancel doesn't wait for 10s of nbd reconnect timeout.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/264     | 38 ++++++++++++++++++++++++++++++--------
>>   tests/qemu-iotests/264.out |  4 ++--
>>   2 files changed, 32 insertions(+), 10 deletions(-)
> 
>>   
>> +    def test_mirror_cancel(self):
>> +        # Mirror speed limit doesn't work well enough, it seems that mirror
>> +        # will run many parallel requests anyway. MAX_IN_FLIGHT is 16 and
>> +        # MAX_IO_BYTES is 1M in mirror.c, so let's use 20M disk.
>> +        self.init_vm(20 * 1024 * 1024)
>> +        self.start_job('blockdev-mirror')
> 
> Is this comment still accurate given recent work on the mirror filter?
Hmm, what do you mean? I missed it..
> I'm fine taking the patch as-is and tweaking it with followups, though,
> in order to make progress.
Good for me, of course
> 
>> +
>> +        result = self.vm.qmp('block-job-cancel', device='drive0')
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        start_t = time.time()
>> +        self.vm.event_wait('BLOCK_JOB_CANCELLED')
>> +        delta_t = time.time() - start_t
>> +        self.assertTrue(delta_t < 2.0)
> 
> I hope this doesn't fail on CI platforms under heavy load.  It didn't
> fail for me locally, but I hope we don't have to revisit it.  Is there
> any way we can test this in a manner that is not as fragile?
Hmm, I don't know. We want to check that cancel is not as long as reconnect timeout.. If it fails, we'll adjust the constants :) And we have no limit in it, we can use 1hour for reconnect-timeout and 10min for mirror to cancel for example (but probably something other may fail with such big timeouts)
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
-- 
Best regards,
Vladimir
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 11/11] iotests/264: add backup-cancel test-case
  2021-01-21  2:21     ` Eric Blake
@ 2021-01-21 19:43       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-01-21 19:43 UTC (permalink / raw)
  To: Eric Blake, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
21.01.2021 05:21, Eric Blake wrote:
> On 1/20/21 7:28 PM, Eric Blake wrote:
>> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> Check that cancel doesn't wait for 10s of nbd reconnect timeout.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   tests/qemu-iotests/264 | 21 ++++++++++++++-------
>>>   1 file changed, 14 insertions(+), 7 deletions(-)
>>>
>>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>
> 
> With this patch applied as-is, the test fails for me:
> 
> --- /home/eblake/qemu/tests/qemu-iotests/264.out	2021-01-20
> 20:18:54.741366521 -0600
> +++ /home/eblake/qemu/build/tests/qemu-iotests/264.out.bad	2021-01-20
> 20:20:37.242451172 -0600
> @@ -1,5 +1,5 @@
> -..
> +...
>   ----------------------------------------------------------------------
> -Ran 2 tests
> +Ran 3 tests
> 
> but with no obvious visibility into why.  Did you forget to check in
> changes to 264.out?
> 
Oops, definitely, I forget to add new test-case to .out file.
-- 
Best regards,
Vladimir
^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH 00/11] mirror: cancel nbd reconnect
  2021-01-21  2:14 ` Eric Blake
@ 2021-01-21 19:44   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 36+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-01-21 19:44 UTC (permalink / raw)
  To: Eric Blake, qemu-block
  Cc: fam, kwolf, qemu-devel, mreitz, stefanha, den, jsnow
21.01.2021 05:14, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> The problem
>>
>> Assume we have mirror job with nbd target node with enabled reconnect.
>> Connection failed. So, all current requests to nbd node are waiting for
>> nbd driver to reconnect. And they will wait for reconnect-delay time
>> specified in nbd blockdev options. This timeout may be long enough, for
>> example, we in Virtuozzo use 300 seconds by default.
>>
>> So, if at this moment user tries to cancel the job, job will wait for
>> its in-flight requests to finish up to 300 seconds. From the user point
>> of view, cancelling the job takes a long time. Bad.
>>
>> Solution
>>
>> Let's just cancel "waiting for reconnect in in-flight request coroutines"
>> on mirror (and backup) cancel. Welcome the series below.
> 
> Because of my question on 4/11, I did not queue the entire series yet.
> But 6/11 was trivial enough to queue now.
> 
Thanks!
-- 
Best regards,
Vladimir
^ permalink raw reply	[flat|nested] 36+ messages in thread
end of thread, other threads:[~2021-01-21 19:52 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-18 18:04 [PATCH 00/11] mirror: cancel nbd reconnect Vladimir Sementsov-Ogievskiy
2020-11-18 18:04 ` [PATCH 01/11] block: add new BlockDriver handler: bdrv_cancel_in_flight Vladimir Sementsov-Ogievskiy
2021-01-20 22:27   ` Eric Blake
2021-01-21 18:50     ` Vladimir Sementsov-Ogievskiy
2020-11-18 18:04 ` [PATCH 02/11] block/nbd: implement .bdrv_cancel_in_flight Vladimir Sementsov-Ogievskiy
2021-01-20 23:13   ` Eric Blake
2020-11-18 18:04 ` [PATCH 03/11] block/raw-format: implement .bdrv_cancel_in_flight handler Vladimir Sementsov-Ogievskiy
2021-01-20 23:15   ` Eric Blake
2021-01-21 19:08     ` Vladimir Sementsov-Ogievskiy
2020-11-18 18:04 ` [PATCH 04/11] job: add .cancel handler for the driver Vladimir Sementsov-Ogievskiy
2021-01-20 23:17   ` Eric Blake
2021-01-21  9:32     ` Vladimir Sementsov-Ogievskiy
2020-11-18 18:04 ` [PATCH 05/11] block/mirror: implement .cancel job handler Vladimir Sementsov-Ogievskiy
2021-01-21  1:05   ` Eric Blake
2020-11-18 18:04 ` [PATCH 06/11] iotests/264: fix style Vladimir Sementsov-Ogievskiy
2021-01-21  1:13   ` Eric Blake
2020-11-18 18:04 ` [PATCH 07/11] iotests/264: move to python unittest Vladimir Sementsov-Ogievskiy
2021-01-21  1:17   ` Eric Blake
2021-01-21 19:29     ` Vladimir Sementsov-Ogievskiy
2020-11-18 18:04 ` [PATCH 08/11] iotests.py: qemu_nbd_popen: remove pid file after use Vladimir Sementsov-Ogievskiy
2021-01-21  1:19   ` Eric Blake
2020-11-18 18:04 ` [PATCH 09/11] iotests/264: add mirror-cancel test-case Vladimir Sementsov-Ogievskiy
2021-01-21  1:26   ` Eric Blake
2021-01-21 19:42     ` Vladimir Sementsov-Ogievskiy
2020-11-18 18:04 ` [PATCH 10/11] block/backup: implement .cancel job handler Vladimir Sementsov-Ogievskiy
2021-01-21  1:27   ` Eric Blake
2020-11-18 18:04 ` [PATCH 11/11] iotests/264: add backup-cancel test-case Vladimir Sementsov-Ogievskiy
2021-01-21  1:28   ` Eric Blake
2021-01-21  2:21     ` Eric Blake
2021-01-21 19:43       ` Vladimir Sementsov-Ogievskiy
2020-11-18 18:19 ` [PATCH 00/11] mirror: cancel nbd reconnect Eric Blake
2020-11-18 18:36   ` Vladimir Sementsov-Ogievskiy
2020-12-18 11:05 ` Vladimir Sementsov-Ogievskiy
2021-01-09 10:11   ` Vladimir Sementsov-Ogievskiy
2021-01-21  2:14 ` Eric Blake
2021-01-21 19:44   ` Vladimir Sementsov-Ogievskiy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).