* [PULL v2 01/15] block/backup: avoid integer overflow of `max-workers`
2021-10-07 15:39 [PULL v2 00/15] jobs: mirror: Handle errors after READY cancel Vladimir Sementsov-Ogievskiy
@ 2021-10-07 15:39 ` Vladimir Sementsov-Ogievskiy
2021-10-07 15:39 ` [PULL v2 02/15] block/aio_task: assert `max_busy_tasks` is greater than 0 Vladimir Sementsov-Ogievskiy
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-07 15:39 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, peter.maydell, kwolf, hreitz, vsementsov, jsnow,
richard.henderson, Stefano Garzarella
From: Stefano Garzarella <sgarzare@redhat.com>
QAPI generates `struct BackupPerf` where `max-workers` value is stored
in an `int64_t` variable.
But block_copy_async(), and the underlying code, uses an `int` parameter.
At the end that variable is used to initialize `max_busy_tasks` in
block/aio_task.c causing the following assertion failure if a value
greater than INT_MAX(2147483647) is used:
../block/aio_task.c:63: aio_task_pool_wait_one: Assertion `pool->busy_tasks > 0' failed.
Let's check that `max-workers` doesn't exceed INT_MAX and print an
error in that case.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2009310
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211005161157.282396-2-sgarzare@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
block/backup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/backup.c b/block/backup.c
index 687d2882bc..8b072db5d9 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -407,8 +407,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
return NULL;
}
- if (perf->max_workers < 1) {
- error_setg(errp, "max-workers must be greater than zero");
+ if (perf->max_workers < 1 || perf->max_workers > INT_MAX) {
+ error_setg(errp, "max-workers must be between 1 and %d", INT_MAX);
return NULL;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PULL v2 09/15] job: Do not soft-cancel after a job is done
2021-10-07 15:39 [PULL v2 00/15] jobs: mirror: Handle errors after READY cancel Vladimir Sementsov-Ogievskiy
2021-10-07 15:39 ` [PULL v2 01/15] block/backup: avoid integer overflow of `max-workers` Vladimir Sementsov-Ogievskiy
2021-10-07 15:39 ` [PULL v2 02/15] block/aio_task: assert `max_busy_tasks` is greater than 0 Vladimir Sementsov-Ogievskiy
@ 2021-10-07 15:39 ` Vladimir Sementsov-Ogievskiy
2021-10-07 15:39 ` [PULL v2 10/15] job: Add job_cancel_requested() Vladimir Sementsov-Ogievskiy
2021-10-07 19:06 ` [PULL v2 00/15] jobs: mirror: Handle errors after READY cancel Richard Henderson
4 siblings, 0 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-07 15:39 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, peter.maydell, kwolf, hreitz, vsementsov, jsnow,
richard.henderson, Eric Blake
From: Hanna Reitz <hreitz@redhat.com>
The only job that supports a soft cancel mode is the mirror job, and in
such a case it resets its .cancelled field before it leaves its .run()
function, so it does not really count as cancelled.
However, it is possible to cancel the job after .run() returns and
before job_exit() (which is run in the main loop) is executed. Then,
.cancelled would still be true and the job would count as cancelled.
This does not seem to be in the interest of the mirror job, so adjust
job_cancel_async() to not set .cancelled in such a case, and
job_cancel() to not invoke job_completed_txn_abort().
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006151940.214590-8-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
job.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/job.c b/job.c
index 81c016eb10..44e741ebd4 100644
--- a/job.c
+++ b/job.c
@@ -734,9 +734,19 @@ static void job_cancel_async(Job *job, bool force)
assert(job->pause_count > 0);
job->pause_count--;
}
- job->cancelled = true;
- /* To prevent 'force == false' overriding a previous 'force == true' */
- job->force_cancel |= force;
+
+ /*
+ * Ignore soft cancel requests after the job is already done
+ * (We will still invoke job->driver->cancel() above, but if the
+ * job driver supports soft cancelling and the job is done, that
+ * should be a no-op, too. We still call it so it can override
+ * @force.)
+ */
+ if (force || !job->deferred_to_main_loop) {
+ job->cancelled = true;
+ /* To prevent 'force == false' overriding a previous 'force == true' */
+ job->force_cancel |= force;
+ }
}
static void job_completed_txn_abort(Job *job)
@@ -963,7 +973,14 @@ void job_cancel(Job *job, bool force)
if (!job_started(job)) {
job_completed(job);
} else if (job->deferred_to_main_loop) {
- job_completed_txn_abort(job);
+ /*
+ * job_cancel_async() ignores soft-cancel requests for jobs
+ * that are already done (i.e. deferred to the main loop). We
+ * have to check again whether the job is really cancelled.
+ */
+ if (job_is_cancelled(job)) {
+ job_completed_txn_abort(job);
+ }
} else {
job_enter(job);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PULL v2 10/15] job: Add job_cancel_requested()
2021-10-07 15:39 [PULL v2 00/15] jobs: mirror: Handle errors after READY cancel Vladimir Sementsov-Ogievskiy
` (2 preceding siblings ...)
2021-10-07 15:39 ` [PULL v2 09/15] job: Do not soft-cancel after a job is done Vladimir Sementsov-Ogievskiy
@ 2021-10-07 15:39 ` Vladimir Sementsov-Ogievskiy
2021-10-07 19:06 ` [PULL v2 00/15] jobs: mirror: Handle errors after READY cancel Richard Henderson
4 siblings, 0 replies; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-10-07 15:39 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, peter.maydell, kwolf, hreitz, vsementsov, jsnow,
richard.henderson, Eric Blake
From: Hanna Reitz <hreitz@redhat.com>
Most callers of job_is_cancelled() actually want to know whether the job
is on its way to immediate termination. For example, we refuse to pause
jobs that are cancelled; but this only makes sense for jobs that are
really actually cancelled.
A mirror job that is cancelled during READY with force=false should
absolutely be allowed to pause. This "cancellation" (which is actually
a kind of completion) may take an indefinite amount of time, and so
should behave like any job during normal operation. For example, with
on-target-error=stop, the job should stop on write errors. (In
contrast, force-cancelled jobs should not get write errors, as they
should just terminate and not do further I/O.)
Therefore, redefine job_is_cancelled() to only return true for jobs that
are force-cancelled (which as of HEAD^ means any job that interprets the
cancellation request as a request for immediate termination), and add
job_cancel_requested() as the general variant, which returns true for
any jobs which have been requested to be cancelled, whether it be
immediately or after an arbitrarily long completion phase.
Finally, here is a justification for how different job_is_cancelled()
invocations are treated by this patch:
- block/mirror.c (mirror_run()):
- The first invocation is a while loop that should loop until the job
has been cancelled or scheduled for completion. What kind of cancel
does not matter, only the fact that the job is supposed to end.
- The second invocation wants to know whether the job has been
soft-cancelled. Calling job_cancel_requested() is a bit too broad,
but if the job were force-cancelled, we should leave the main loop
as soon as possible anyway, so this should not matter here.
- The last two invocations already check force_cancel, so they should
continue to use job_is_cancelled().
- block/backup.c, block/commit.c, block/stream.c, anything in tests/:
These jobs know only force-cancel, so there is no difference between
job_is_cancelled() and job_cancel_requested(). We can continue using
job_is_cancelled().
- job.c:
- job_pause_point(), job_yield(), job_sleep_ns(): Only force-cancelled
jobs should be prevented from being paused. Continue using job_is_cancelled().
- job_update_rc(), job_finalize_single(), job_finish_sync(): These
functions are all called after the job has left its main loop. The
mirror job (the only job that can be soft-cancelled) will clear
.cancelled before leaving the main loop if it has been
soft-cancelled. Therefore, these functions will observe .cancelled
to be true only if the job has been force-cancelled. We can
continue to use job_is_cancelled().
(Furthermore, conceptually, a soft-cancelled mirror job should not
report to have been cancelled. It should report completion (see
also the block-job-cancel QAPI documentation). Therefore, it makes
sense for these functions not to distinguish between a
soft-cancelled mirror job and a job that has completed as normal.)
- job_completed_txn_abort(): All jobs other than @job have been
force-cancelled. job_is_cancelled() must be true for them.
Regarding @job itself: job_completed_txn_abort() is mostly called
when the job's return value is not 0. A soft-cancelled mirror has a
return value of 0, and so will not end up here then.
However, job_cancel() invokes job_completed_txn_abort() if the job
has been deferred to the main loop, which is mostly the case for
completed jobs (which skip the assertion), but not for sure.
To be safe, use job_cancel_requested() in this assertion.
- job_complete(): This is function eventually invoked by the user
(through qmp_block_job_complete() or qmp_job_complete(), or
job_complete_sync(), which comes from qemu-img). The intention here
is to prevent a user from invoking job-complete after the job has
been cancelled. This should also apply to soft cancelling: After a
mirror job has been soft-cancelled, the user should not be able to
decide otherwise and have it complete as normal (i.e. pivoting to
the target).
- job_cancel(): Both functions are equivalent (see comment there), but
we want to use job_is_cancelled(), because this shows that we call
job_completed_txn_abort() only for force-cancelled jobs. (As
explained for job_update_rc(), soft-cancelled jobs should be treated
as if they have completed as normal.)
Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006151940.214590-9-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
include/qemu/job.h | 8 +++++++-
block/mirror.c | 10 ++++------
job.c | 14 ++++++++++++--
3 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 90f6abbd6a..6e67b6977f 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
/** Returns true if the job should not be visible to the management layer. */
bool job_is_internal(Job *job);
-/** Returns whether the job is scheduled for cancellation. */
+/** Returns whether the job is being cancelled. */
bool job_is_cancelled(Job *job);
+/**
+ * Returns whether the job is scheduled for cancellation (at an
+ * indefinite point).
+ */
+bool job_cancel_requested(Job *job);
+
/** Returns whether the job is in a completed state. */
bool job_is_completed(Job *job);
diff --git a/block/mirror.c b/block/mirror.c
index 010b9e1672..2d9642cb00 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -943,7 +943,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
/* Transition to the READY state and wait for complete. */
job_transition_to_ready(&s->common.job);
s->actively_synced = true;
- while (!job_is_cancelled(&s->common.job) && !s->should_complete) {
+ while (!job_cancel_requested(&s->common.job) && !s->should_complete) {
job_yield(&s->common.job);
}
s->common.job.cancelled = false;
@@ -1050,7 +1050,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
}
should_complete = s->should_complete ||
- job_is_cancelled(&s->common.job);
+ job_cancel_requested(&s->common.job);
cnt = bdrv_get_dirty_count(s->dirty_bitmap);
}
@@ -1094,7 +1094,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
delay_ns);
job_sleep_ns(&s->common.job, delay_ns);
- if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {
+ if (job_is_cancelled(&s->common.job)) {
break;
}
s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1106,9 +1106,7 @@ immediate_exit:
* or it was cancelled prematurely so that we do not guarantee that
* the target is a copy of the source.
*/
- assert(ret < 0 ||
- (s->common.job.force_cancel &&
- job_is_cancelled(&s->common.job)));
+ assert(ret < 0 || job_is_cancelled(&s->common.job));
assert(need_drain);
mirror_wait_for_all_io(s);
}
diff --git a/job.c b/job.c
index 44e741ebd4..b0cf2d8374 100644
--- a/job.c
+++ b/job.c
@@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
}
bool job_is_cancelled(Job *job)
+{
+ return job->cancelled && job->force_cancel;
+}
+
+bool job_cancel_requested(Job *job)
{
return job->cancelled;
}
@@ -798,7 +803,7 @@ static void job_completed_txn_abort(Job *job)
ctx = other_job->aio_context;
aio_context_acquire(ctx);
if (!job_is_completed(other_job)) {
- assert(job_is_cancelled(other_job));
+ assert(job_cancel_requested(other_job));
job_finish_sync(other_job, NULL, NULL);
}
job_finalize_single(other_job);
@@ -977,6 +982,11 @@ void job_cancel(Job *job, bool force)
* job_cancel_async() ignores soft-cancel requests for jobs
* that are already done (i.e. deferred to the main loop). We
* have to check again whether the job is really cancelled.
+ * (job_cancel_requested() and job_is_cancelled() are equivalent
+ * here, because job_cancel_async() will make soft-cancel
+ * requests no-ops when deferred_to_main_loop is true. We
+ * choose to call job_is_cancelled() to show that we invoke
+ * job_completed_txn_abort() only for force-cancelled jobs.)
*/
if (job_is_cancelled(job)) {
job_completed_txn_abort(job);
@@ -1044,7 +1054,7 @@ void job_complete(Job *job, Error **errp)
if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
return;
}
- if (job_is_cancelled(job) || !job->driver->complete) {
+ if (job_cancel_requested(job) || !job->driver->complete) {
error_setg(errp, "The active block job '%s' cannot be completed",
job->id);
return;
--
2.31.1
^ permalink raw reply related [flat|nested] 6+ messages in thread