From: Stefan Hajnoczi <stefanha@gmail.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, kwolf@redhat.com,
famz@redhat.com, mreitz@redhat.com, stefanha@redhat.com,
pbonzini@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH 18/21] backup: new async architecture
Date: Wed, 1 Feb 2017 16:13:21 +0000 [thread overview]
Message-ID: <20170201161321.GA12283@stefanha-x1.localdomain> (raw)
In-Reply-To: <1482503344-6424-19-git-send-email-vsementsov@virtuozzo.com>
[-- Attachment #1: Type: text/plain, Size: 7390 bytes --]
On Fri, Dec 23, 2016 at 05:29:01PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> @@ -120,11 +257,42 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
> uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
> job->sectors_read);
> job->sectors_read = 0;
> + job->delayed = true;
> + trace_backup_sleep_delay(block_job_is_cancelled(&job->common),
> + block_job_should_pause(&job->common));
> block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
> + job->delayed = false;
> } else {
> - block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
> + trace_backup_sleep_zero(block_job_is_cancelled(&job->common),
> + block_job_should_pause(&job->common));
> + block_job_pause_point(&job->common);
> + }
> +
> + trace_backup_sleep_finish();
> +}
> +
> +/* backup_busy_sleep
> + * Just yield, without setting busy=false
How does this interact with block_job_detach_aio_context() and
block_job_drain()? Jobs must reach pause points regularly so that
cancellation and detaching AioContexts works.
> @@ -132,69 +300,246 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
> return false;
> }
>
> -static int coroutine_fn backup_do_read(BackupBlockJob *job,
> - int64_t offset, unsigned int bytes,
> - QEMUIOVector *qiov)
> +static void backup_job_wait_workers(BackupBlockJob *job)
Missing coroutine_fn. This function yields and will crash unless called
from coroutine context. Please use coroutine_fn throughout the code so
it's clear when a function is only allowed to be called from coroutine
context.
> +static void backup_worker_pause(BackupBlockJob *job)
> +{
> + job->nb_busy_workers--;
> +
> + trace_backup_worker_pause(qemu_coroutine_self(), job->nb_busy_workers,
> + job->waiting_for_workers);
> +
> + if (job->nb_busy_workers == 0 && job->waiting_for_workers) {
> + qemu_coroutine_add_next(job->common.co);
> + }
> +
> + qemu_co_queue_wait(&job->paused_workers);
> +
> + trace_backup_worker_unpause(qemu_coroutine_self());
> +
> + job->nb_busy_workers++;
This is similar to rwlock. The main coroutine would use wrlock and the
workers would use rdlock.
rwlock avoids situations where the main blockjob re-enters a worker and
vice versa. qemu_coroutine_add_next() is a lower-level primitive and
does not prevent this type of bug.
Please use rwlock.
> +static inline bool check_delay(BackupBlockJob *job)
> +{
> + uint64_t delay_ns;
> +
> + if (!job->common.speed) {
> + return false;
> + }
> +
> + delay_ns = ratelimit_calculate_delay(&job->limit, job->sectors_read);
> + job->sectors_read = 0;
> +
> + if (delay_ns == 0) {
> + if (job->delayed) {
> + job->delayed = false;
> + qemu_co_queue_restart_all(&job->paused_workers);
> }
> + return false;
> + }
>
> - return ret;
> + return job->delayed = true;
This looks like a "== vs =" bug. Please reformat it so readers don't
have to puzzle out what you meant:
job->delayed = true;
return true;
> +static void coroutine_fn backup_worker_co(void *opaque)
> +{
> + BackupBlockJob *job = opaque;
> +
> + job->running_workers++;
> + job->nb_busy_workers++;
> +
> + while (true) {
> + int64_t cluster = backup_get_work(job);
> + trace_backup_worker_got_work(job, qemu_coroutine_self(), cluster);
> +
> + switch (cluster) {
> + case BACKUP_WORKER_STOP:
> + job->nb_busy_workers--;
> + job->running_workers--;
> + if (job->nb_busy_workers == 0 && job->waiting_for_workers) {
> + qemu_coroutine_add_next(job->common.co);
Is there a reason for using qemu_coroutine_add_next() instead of
qemu_coroutine_enter()?
I think neither function prevents a crash if backup_get_work() returns
BACKUP_WORKER_STOP and this coroutine has never yielded yet. We would
try to re-enter the main blockjob coroutine.
> static int coroutine_fn backup_before_write_notify(
> NotifierWithReturn *notifier,
> void *opaque)
> {
> - BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
> - BdrvTrackedRequest *req = opaque;
> - int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
> - int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
> + BdrvTrackedRequest *tr = opaque;
> + NotifierRequest *nr;
> + BackupBlockJob *job = (BackupBlockJob *)tr->bs->job;
> + int64_t start = tr->offset / job->cluster_size;
> + int64_t end = DIV_ROUND_UP(tr->offset + tr->bytes, job->cluster_size);
> + int ret = 0;
> +
> + assert((tr->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> + assert((tr->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
Are these assertions still necessary? req->bytes >> BDRV_SECTOR_BITS
rounds down so we needed a multiple of BDRV_SECTOR_SIZE. Now
DIV_ROUND_UP() is used so it doesn't matter.
> +
> + nr = add_notif_req(job, start, end, qemu_coroutine_self());
>
> - assert(req->bs == blk_bs(job->common.blk));
> - assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> - assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> + if (nr == NULL) {
> + trace_backup_before_write_notify_skip(job, tr->offset, tr->bytes);
> + } else {
> + trace_backup_before_write_notify_start(job, tr->offset, tr->bytes, nr,
> + nr->start, nr->end, nr->nb_wait);
> +
> + if (!job->has_errors) {
> + qemu_co_queue_restart_all(&job->paused_workers);
> + }
> + co_aio_sleep_ns(blk_get_aio_context(job->common.blk),
> + QEMU_CLOCK_REALTIME, WRITE_NOTIFY_TIMEOUT_NS);
> + if (nr->nb_wait > 0) {
> + /* timer expired and read request not finished */
> + ret = -EINVAL;
#define EINVAL 22 /* Invalid argument */
Why did you choose this errno?
EIO is the errno that kernel uses when I/O fails (e.g. hardware timeout).
> @@ -586,50 +923,62 @@ static void coroutine_fn backup_run(void *opaque)
> BackupCompleteData *data;
> BlockDriverState *bs = blk_bs(job->common.blk);
> int64_t end;
> - int ret = 0;
> + int i;
> + bool is_top = job->sync_mode == MIRROR_SYNC_MODE_TOP;
> + bool is_full = job->sync_mode == MIRROR_SYNC_MODE_FULL;
> +
> + trace_backup_run();
This trace event isn't useful if the guest has multiple drives. Please
include arguments that correlate the event with specific objects like
the BlockBackend/BlockDriverState/BlockJob instances, I/O request sector
and length, etc.
> BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
> int is_read, int error)
> {
block_job_get_error_action() was copy-pasted from
block_job_error_action(). Now there are two functions with similar
names that duplicate code.
Why can't you use the existing block_job_error_action() function?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
next prev parent reply other threads:[~2017-02-01 16:13 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-23 14:28 [Qemu-devel] [PATCH 00/21] new backup architecture Vladimir Sementsov-Ogievskiy
2016-12-23 14:28 ` [Qemu-devel] [PATCH 01/21] backup: move from done_bitmap to copy_bitmap Vladimir Sementsov-Ogievskiy
2017-01-23 5:34 ` Jeff Cody
2017-01-23 12:20 ` Vladimir Sementsov-Ogievskiy
2017-01-31 10:25 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 02/21] backup: init copy_bitmap from sync_bitmap for incremental Vladimir Sementsov-Ogievskiy
2017-01-24 7:09 ` Fam Zheng
2017-01-24 9:00 ` Vladimir Sementsov-Ogievskiy
2017-01-24 9:46 ` Fam Zheng
2017-01-24 10:16 ` Vladimir Sementsov-Ogievskiy
2017-01-31 10:36 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 03/21] backup: improve non-dirty bits progress processing Vladimir Sementsov-Ogievskiy
2017-01-24 7:17 ` Fam Zheng
2017-01-24 9:12 ` Vladimir Sementsov-Ogievskiy
2017-01-31 10:56 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 04/21] backup: use copy_bitmap in incremental backup Vladimir Sementsov-Ogievskiy
2017-01-31 11:01 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 05/21] hbitmap: improve dirty iter Vladimir Sementsov-Ogievskiy
2017-01-31 11:20 ` Stefan Hajnoczi
2017-01-31 11:29 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 06/21] backup: rewrite top mode cluster skipping Vladimir Sementsov-Ogievskiy
2017-01-31 13:31 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 07/21] backup: refactor: merge top/full/incremental backup code Vladimir Sementsov-Ogievskiy
2017-01-31 14:26 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 08/21] backup: skip unallocated clusters for full mode Vladimir Sementsov-Ogievskiy
2017-01-24 7:59 ` Fam Zheng
2017-01-24 9:18 ` Vladimir Sementsov-Ogievskiy
2017-01-24 9:36 ` Fam Zheng
2017-01-24 10:13 ` Vladimir Sementsov-Ogievskiy
2017-01-24 11:12 ` Fam Zheng
2017-01-31 14:33 ` Stefan Hajnoczi
2017-01-31 14:38 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 09/21] backup: separate copy function Vladimir Sementsov-Ogievskiy
2017-01-31 14:40 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 10/21] backup: refactor backup_copy_cluster() Vladimir Sementsov-Ogievskiy
2017-01-31 14:57 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 11/21] backup: move r/w error handling code to r/w functions Vladimir Sementsov-Ogievskiy
2017-01-31 14:57 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 12/21] iotests: add supported_cache_modes to main function Vladimir Sementsov-Ogievskiy
2017-01-31 14:58 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 13/21] coroutine: add qemu_coroutine_add_next Vladimir Sementsov-Ogievskiy
2017-01-31 15:03 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 14/21] block: add trace point on bdrv_close_all Vladimir Sementsov-Ogievskiy
2017-01-31 15:03 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 15/21] bitmap: add bitmap_count_between() function Vladimir Sementsov-Ogievskiy
2017-01-31 15:15 ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 16/21] hbitmap: add hbitmap_count_between() function Vladimir Sementsov-Ogievskiy
2017-01-31 15:56 ` Stefan Hajnoczi
2016-12-23 14:29 ` [Qemu-devel] [PATCH 17/21] backup: make all reads not serializing Vladimir Sementsov-Ogievskiy
2017-01-31 16:30 ` Stefan Hajnoczi
2016-12-23 14:29 ` [Qemu-devel] [PATCH 18/21] backup: new async architecture Vladimir Sementsov-Ogievskiy
2017-01-31 16:46 ` Stefan Hajnoczi
2017-02-01 16:13 ` Stefan Hajnoczi [this message]
2016-12-23 14:29 ` [Qemu-devel] [PATCH 20/21] backup: move bitmap handling from backup_do_cow to get_work Vladimir Sementsov-Ogievskiy
2016-12-23 14:29 ` [Qemu-devel] [PATCH 21/21] backup: refactor: remove backup_do_cow() Vladimir Sementsov-Ogievskiy
2017-01-09 11:04 ` [Qemu-devel] [PATCH 00/21] new backup architecture Stefan Hajnoczi
2017-01-10 6:05 ` Jeff Cody
2017-01-10 18:48 ` John Snow
2017-01-31 10:20 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170201161321.GA12283@stefanha-x1.localdomain \
--to=stefanha@gmail.com \
--cc=den@openvz.org \
--cc=famz@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).