qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@gmail.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, kwolf@redhat.com,
	famz@redhat.com, mreitz@redhat.com, stefanha@redhat.com,
	pbonzini@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH 18/21] backup: new async architecture
Date: Wed, 1 Feb 2017 16:13:21 +0000	[thread overview]
Message-ID: <20170201161321.GA12283@stefanha-x1.localdomain> (raw)
In-Reply-To: <1482503344-6424-19-git-send-email-vsementsov@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 7390 bytes --]

On Fri, Dec 23, 2016 at 05:29:01PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> @@ -120,11 +257,42 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
>          uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
>                                                        job->sectors_read);
>          job->sectors_read = 0;
> +        job->delayed = true;
> +        trace_backup_sleep_delay(block_job_is_cancelled(&job->common),
> +                                 block_job_should_pause(&job->common));
>          block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
> +        job->delayed = false;
>      } else {
> -        block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
> +        trace_backup_sleep_zero(block_job_is_cancelled(&job->common),
> +                                block_job_should_pause(&job->common));
> +        block_job_pause_point(&job->common);
> +    }
> +
> +    trace_backup_sleep_finish();
> +}
> +
> +/* backup_busy_sleep
> + * Just yield, without setting busy=false

How does this interact with block_job_detach_aio_context() and
block_job_drain()?  Jobs must reach pause points regularly so that
cancellation and detaching AioContexts works.

> @@ -132,69 +300,246 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
>      return false;
>  }
>  
> -static int coroutine_fn backup_do_read(BackupBlockJob *job,
> -                                       int64_t offset, unsigned int bytes,
> -                                       QEMUIOVector *qiov)
> +static void backup_job_wait_workers(BackupBlockJob *job)

Missing coroutine_fn.  This function yields and will crash unless called
from coroutine context.  Please use coroutine_fn throughout the code so
it's clear when a function is only allowed to be called from coroutine
context.

> +static void backup_worker_pause(BackupBlockJob *job)
> +{
> +    job->nb_busy_workers--;
> +
> +    trace_backup_worker_pause(qemu_coroutine_self(), job->nb_busy_workers,
> +                              job->waiting_for_workers);
> +
> +    if (job->nb_busy_workers == 0 && job->waiting_for_workers) {
> +        qemu_coroutine_add_next(job->common.co);
> +    }
> +
> +    qemu_co_queue_wait(&job->paused_workers);
> +
> +    trace_backup_worker_unpause(qemu_coroutine_self());
> +
> +    job->nb_busy_workers++;

This is similar to rwlock.  The main coroutine would use wrlock and the
workers would use rdlock.

rwlock avoids situations where the main blockjob re-enters a worker and
vice versa.  qemu_coroutine_add_next() is a lower-level primitive and
does not prevent this type of bug.

Please use rwlock.

> +static inline bool check_delay(BackupBlockJob *job)
> +{
> +    uint64_t delay_ns;
> +
> +    if (!job->common.speed) {
> +        return false;
> +    }
> +
> +    delay_ns = ratelimit_calculate_delay(&job->limit, job->sectors_read);
> +    job->sectors_read = 0;
> +
> +    if (delay_ns == 0) {
> +        if (job->delayed) {
> +            job->delayed = false;
> +            qemu_co_queue_restart_all(&job->paused_workers);
>          }
> +        return false;
> +    }
>  
> -        return ret;
> +    return job->delayed = true;

This looks like a "== vs =" bug.  Please reformat it so readers don't
have to puzzle out what you meant:

job->delayed = true;
return true;

> +static void coroutine_fn backup_worker_co(void *opaque)
> +{
> +    BackupBlockJob *job = opaque;
> +
> +    job->running_workers++;
> +    job->nb_busy_workers++;
> +
> +    while (true) {
> +        int64_t cluster = backup_get_work(job);
> +        trace_backup_worker_got_work(job, qemu_coroutine_self(), cluster);
> +
> +        switch (cluster) {
> +        case BACKUP_WORKER_STOP:
> +            job->nb_busy_workers--;
> +            job->running_workers--;
> +            if (job->nb_busy_workers == 0 && job->waiting_for_workers) {
> +                qemu_coroutine_add_next(job->common.co);

Is there a reason for using qemu_coroutine_add_next() instead of
qemu_coroutine_enter()?

I think neither function prevents a crash if backup_get_work() returns
BACKUP_WORKER_STOP and this coroutine has never yielded yet.  We would
try to re-enter the main blockjob coroutine.

>  static int coroutine_fn backup_before_write_notify(
>          NotifierWithReturn *notifier,
>          void *opaque)
>  {
> -    BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
> -    BdrvTrackedRequest *req = opaque;
> -    int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
> -    int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
> +    BdrvTrackedRequest *tr = opaque;
> +    NotifierRequest *nr;
> +    BackupBlockJob *job = (BackupBlockJob *)tr->bs->job;
> +    int64_t start = tr->offset / job->cluster_size;
> +    int64_t end = DIV_ROUND_UP(tr->offset + tr->bytes, job->cluster_size);
> +    int ret = 0;
> +
> +    assert((tr->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> +    assert((tr->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);

Are these assertions still necessary?  req->bytes >> BDRV_SECTOR_BITS
rounds down so we needed a multiple of BDRV_SECTOR_SIZE.  Now
DIV_ROUND_UP() is used so it doesn't matter.

> +
> +    nr = add_notif_req(job, start, end, qemu_coroutine_self());
>  
> -    assert(req->bs == blk_bs(job->common.blk));
> -    assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> -    assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> +    if (nr == NULL) {
> +        trace_backup_before_write_notify_skip(job, tr->offset, tr->bytes);
> +    } else {
> +        trace_backup_before_write_notify_start(job, tr->offset, tr->bytes, nr,
> +                                               nr->start, nr->end, nr->nb_wait);
> +
> +        if (!job->has_errors) {
> +            qemu_co_queue_restart_all(&job->paused_workers);
> +        }
> +        co_aio_sleep_ns(blk_get_aio_context(job->common.blk),
> +                        QEMU_CLOCK_REALTIME, WRITE_NOTIFY_TIMEOUT_NS);
> +        if (nr->nb_wait > 0) {
> +            /* timer expired and read request not finished */
> +            ret = -EINVAL;

#define	EINVAL		22	/* Invalid argument */

Why did you choose this errno?

EIO is the errno that kernel uses when I/O fails (e.g. hardware timeout).

> @@ -586,50 +923,62 @@ static void coroutine_fn backup_run(void *opaque)
>      BackupCompleteData *data;
>      BlockDriverState *bs = blk_bs(job->common.blk);
>      int64_t end;
> -    int ret = 0;
> +    int i;
> +    bool is_top = job->sync_mode == MIRROR_SYNC_MODE_TOP;
> +    bool is_full = job->sync_mode == MIRROR_SYNC_MODE_FULL;
> +
> +    trace_backup_run();

This trace event isn't useful if the guest has multiple drives.  Please
include arguments that correlate the event with specific objects like
the BlockBackend/BlockDriverState/BlockJob instances, I/O request sector
and length, etc.

>  BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
>                                          int is_read, int error)
>  {

block_job_get_error_action() was copy-pasted from
block_job_error_action().  Now there are two functions with similar
names that duplicate code.

Why can't you use the existing block_job_error_action() function?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

  parent reply	other threads:[~2017-02-01 16:13 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-23 14:28 [Qemu-devel] [PATCH 00/21] new backup architecture Vladimir Sementsov-Ogievskiy
2016-12-23 14:28 ` [Qemu-devel] [PATCH 01/21] backup: move from done_bitmap to copy_bitmap Vladimir Sementsov-Ogievskiy
2017-01-23  5:34   ` Jeff Cody
2017-01-23 12:20   ` Vladimir Sementsov-Ogievskiy
2017-01-31 10:25   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 02/21] backup: init copy_bitmap from sync_bitmap for incremental Vladimir Sementsov-Ogievskiy
2017-01-24  7:09   ` Fam Zheng
2017-01-24  9:00     ` Vladimir Sementsov-Ogievskiy
2017-01-24  9:46       ` Fam Zheng
2017-01-24 10:16         ` Vladimir Sementsov-Ogievskiy
2017-01-31 10:36   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 03/21] backup: improve non-dirty bits progress processing Vladimir Sementsov-Ogievskiy
2017-01-24  7:17   ` Fam Zheng
2017-01-24  9:12     ` Vladimir Sementsov-Ogievskiy
2017-01-31 10:56       ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 04/21] backup: use copy_bitmap in incremental backup Vladimir Sementsov-Ogievskiy
2017-01-31 11:01   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 05/21] hbitmap: improve dirty iter Vladimir Sementsov-Ogievskiy
2017-01-31 11:20   ` Stefan Hajnoczi
2017-01-31 11:29   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 06/21] backup: rewrite top mode cluster skipping Vladimir Sementsov-Ogievskiy
2017-01-31 13:31   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 07/21] backup: refactor: merge top/full/incremental backup code Vladimir Sementsov-Ogievskiy
2017-01-31 14:26   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 08/21] backup: skip unallocated clusters for full mode Vladimir Sementsov-Ogievskiy
2017-01-24  7:59   ` Fam Zheng
2017-01-24  9:18     ` Vladimir Sementsov-Ogievskiy
2017-01-24  9:36       ` Fam Zheng
2017-01-24 10:13         ` Vladimir Sementsov-Ogievskiy
2017-01-24 11:12           ` Fam Zheng
2017-01-31 14:33   ` Stefan Hajnoczi
2017-01-31 14:38   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 09/21] backup: separate copy function Vladimir Sementsov-Ogievskiy
2017-01-31 14:40   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 10/21] backup: refactor backup_copy_cluster() Vladimir Sementsov-Ogievskiy
2017-01-31 14:57   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 11/21] backup: move r/w error handling code to r/w functions Vladimir Sementsov-Ogievskiy
2017-01-31 14:57   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 12/21] iotests: add supported_cache_modes to main function Vladimir Sementsov-Ogievskiy
2017-01-31 14:58   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 13/21] coroutine: add qemu_coroutine_add_next Vladimir Sementsov-Ogievskiy
2017-01-31 15:03   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 14/21] block: add trace point on bdrv_close_all Vladimir Sementsov-Ogievskiy
2017-01-31 15:03   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 15/21] bitmap: add bitmap_count_between() function Vladimir Sementsov-Ogievskiy
2017-01-31 15:15   ` Stefan Hajnoczi
2016-12-23 14:28 ` [Qemu-devel] [PATCH 16/21] hbitmap: add hbitmap_count_between() function Vladimir Sementsov-Ogievskiy
2017-01-31 15:56   ` Stefan Hajnoczi
2016-12-23 14:29 ` [Qemu-devel] [PATCH 17/21] backup: make all reads not serializing Vladimir Sementsov-Ogievskiy
2017-01-31 16:30   ` Stefan Hajnoczi
2016-12-23 14:29 ` [Qemu-devel] [PATCH 18/21] backup: new async architecture Vladimir Sementsov-Ogievskiy
2017-01-31 16:46   ` Stefan Hajnoczi
2017-02-01 16:13   ` Stefan Hajnoczi [this message]
2016-12-23 14:29 ` [Qemu-devel] [PATCH 20/21] backup: move bitmap handling from backup_do_cow to get_work Vladimir Sementsov-Ogievskiy
2016-12-23 14:29 ` [Qemu-devel] [PATCH 21/21] backup: refactor: remove backup_do_cow() Vladimir Sementsov-Ogievskiy
2017-01-09 11:04 ` [Qemu-devel] [PATCH 00/21] new backup architecture Stefan Hajnoczi
2017-01-10  6:05   ` Jeff Cody
2017-01-10 18:48     ` John Snow
2017-01-31 10:20 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170201161321.GA12283@stefanha-x1.localdomain \
    --to=stefanha@gmail.com \
    --cc=den@openvz.org \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).