qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	"qemu-stable@nongnu.org" <qemu-stable@nongnu.org>,
	Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] block/backup: install notifier during creation
Date: Wed, 21 Aug 2019 16:01:52 -0400	[thread overview]
Message-ID: <154bc276-d782-443f-3db6-38d87992d609@redhat.com> (raw)
In-Reply-To: <b85698e6-cd79-a9c5-554c-c92487060280@virtuozzo.com>



On 8/21/19 10:41 AM, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 23:13, John Snow wrote:
>> Backup jobs may yield prior to installing their handler, because of the
>> job_co_entry shim which guarantees that a job won't begin work until
>> we are ready to start an entire transaction.
>>
>> Unfortunately, this makes proving correctness about transactional
>> points-in-time for backup hard to reason about. Make it explicitly clear
>> by moving the handler registration to creation time, and changing the
>> write notifier to a no-op until the job is started.
>>
>> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   block/backup.c     | 32 +++++++++++++++++++++++---------
>>   include/qemu/job.h |  5 +++++
>>   job.c              |  2 +-
>>   3 files changed, 29 insertions(+), 10 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index 07d751aea4..4df5b95415 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -344,6 +344,13 @@ static int coroutine_fn backup_before_write_notify(
>>       assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
>>       assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));
>>   
>> +    /* The handler is installed at creation time; the actual point-in-time
>> +     * starts at job_start(). Transactions guarantee those two points are
>> +     * the same point in time. */
>> +    if (!job_started(&job->common.job)) {
>> +        return 0;
>> +    }
> 
> Hmm, sorry if it is a stupid question, I'm not good in multiprocessing and in
> Qemu iothreads..
> 
> job_started just reads job->co. If bs runs in iothread, and therefore write-notifier
> is in iothread, when job_start is called from main thread.. Is it guaranteed that
> write-notifier will see job->co variable change early enough to not miss guest write?
> Should not job->co be volatile for example or something like this?
> 
> If not think about this patch looks good for me.
> 

You know, it's a really good question.
So good, in fact, that I have no idea.

¯\_(ツ)_/¯

I'm fairly certain that IO will not come in until the .clean phase of a
qmp_transaction, because bdrv_drained_begin(bs) is called during
.prepare, and we activate the handler (by starting the job) in .commit.
We do not end the drained section until .clean.

I'm not fully clear on what threading guarantees we have otherwise,
though; is it possible that "Thread A" would somehow lift the bdrv_drain
on an IO thread ("Thread B") and, after that, "Thread B" would somehow
still be able to see an outdated version of job->co that was set by
"Thread A"?

I doubt it; but I can't prove it.

Paolo, may I please ask you for a consult here as our resident
volatility expert?

--js

>> +
>>       return backup_do_cow(job, req->offset, req->bytes, NULL, true);
>>   }
>>   
>> @@ -398,6 +405,12 @@ static void backup_clean(Job *job)
>>       BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
>>       BlockDriverState *bs = blk_bs(s->common.blk);
>>   
>> +    /* cancelled before job_start: remove write_notifier */
>> +    if (s->before_write.notify) {
>> +        notifier_with_return_remove(&s->before_write);
>> +        s->before_write.notify = NULL;
>> +    }
>> +
>>       if (s->copy_bitmap) {
>>           bdrv_release_dirty_bitmap(bs, s->copy_bitmap);
>>           s->copy_bitmap = NULL;
>> @@ -527,17 +540,8 @@ static void backup_init_copy_bitmap(BackupBlockJob *job)
>>   static int coroutine_fn backup_run(Job *job, Error **errp)
>>   {
>>       BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
>> -    BlockDriverState *bs = blk_bs(s->common.blk);
>>       int ret = 0;
>>   
>> -    QLIST_INIT(&s->inflight_reqs);
>> -    qemu_co_rwlock_init(&s->flush_rwlock);
>> -
>> -    backup_init_copy_bitmap(s);
>> -
>> -    s->before_write.notify = backup_before_write_notify;
>> -    bdrv_add_before_write_notifier(bs, &s->before_write);
>> -
>>       if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
>>           int64_t offset = 0;
>>           int64_t count;
>> @@ -572,6 +576,7 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>   
>>    out:
>>       notifier_with_return_remove(&s->before_write);
>> +    s->before_write.notify = NULL;
>>   
>>       /* wait until pending backup_do_cow() calls have completed */
>>       qemu_co_rwlock_wrlock(&s->flush_rwlock);
>> @@ -767,6 +772,15 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>                          &error_abort);
>>       job->len = len;
>>   
>> +    /* Finally, install a write notifier that takes effect after job_start() */
>> +    backup_init_copy_bitmap(job);
>> +
>> +    QLIST_INIT(&job->inflight_reqs);
>> +    qemu_co_rwlock_init(&job->flush_rwlock);
>> +
>> +    job->before_write.notify = backup_before_write_notify;
>> +    bdrv_add_before_write_notifier(bs, &job->before_write);
>> +
>>       return &job->common;
>>   
>>    error:
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index 9e7cd1e4a0..733afb696b 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -394,6 +394,11 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job));
>>    */
>>   void job_start(Job *job);
>>   
>> +/**
>> + * job_started returns true if the @job has started.
>> + */
>> +bool job_started(Job *job);
>> +
>>   /**
>>    * @job: The job to enter.
>>    *
>> diff --git a/job.c b/job.c
>> index 28dd48f8a5..745af659ff 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -243,7 +243,7 @@ bool job_is_completed(Job *job)
>>       return false;
>>   }
>>   
>> -static bool job_started(Job *job)
>> +bool job_started(Job *job)
>>   {
>>       return job->co;
>>   }
>>
> 
> 


  reply	other threads:[~2019-08-21 20:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09 20:13 [Qemu-devel] [PATCH] block/backup: install notifier during creation John Snow
2019-08-20 22:42 ` John Snow
2019-08-21 14:41 ` Vladimir Sementsov-Ogievskiy
2019-08-21 20:01   ` John Snow [this message]
2019-09-10  8:19     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-09-10 13:23       ` John Snow
2019-09-18 20:31         ` John Snow
2019-09-19  7:11           ` Vladimir Sementsov-Ogievskiy
2019-09-19 19:11             ` [Qemu-block] [Qemu-devel] " John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=154bc276-d782-443f-3db6-38d87992d609@redhat.com \
    --to=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).